Makefile Based Blogging

涛叔 2023-05-30 ⏳3.7 min(1.5k words) 🕸️

In this article, I will build a simple static site generator using make, bash, and pandoc. All codes have been published to GitHub named after makedown¹.

Quick Start

Hello, world!

If you are not interesting in details of how makedown works, and just want to blogging, you can follow the bellow steps.

Clone the makedown repository

git clone https://github.com/taoso/makedown.git

Initialize new blog name after hello:

make -f path/to/makedown/Makefile new dir=hello

Generate HTML files:

cd hello
make -f path/to/makedown/Makefile all

Preview in brwoser.

python -m http.server

Then open http://localhost:8000 in the browser, and you will see post list.

makedown will generate posts list for every directories.

While the index.html generated for the top root directory will contain links for all markdown’s html files, the index.html in one sub-directory only contains the html in this directory.

As it is seldom for personal blog to host thousands of posts, makedown does not generate pagination for post lists. Further more, makedown does not support generating tag page for post, because it suppose that one post should focus on one field and should not have many tags. This design limitations makes makedown very simple, yet powerful.

You still have the ability to make many customizations.

Customization

Let’s $root represents the directory of blog.

You need to custom some site data in the $root/env file:

site_title=makedown
site_url=https://example.com
author_name=Bob
author_email=bob@example.com

While most of this info are mainly used for generating Atom feed, the $site_title will also be used for generating index.html for all directories.

The template file $root/article.tpl is used for generating html for markdown. And the template file $root/index.tpl is used for generating all index.html[s].

Both two files are the standard pandoc template. And in the current implementation, both two templates share the same $root/head.tpl and $root/footer.tpl, which can be modified by the author freely.

And now, let me introduce how makedown works.

Implementation

There are only two things for static site generator to do: one is converting the markdown files into html, the other is generating indexs for all posts and directories.

Converting Markdown into HTML

It very easy to do this job with the help of pandoc.

Given the following markdown file:

---
title: "Hello, World!"
date: 2023-05-30
---

This is a demo markdown page.

We can convert it into HTML using the following command:

pandoc -o hello.html hello.md

And the generated HTML content is:

<p>This is a demo markdown page.</p>

As you can see, it is not a complete HTML file. In order to generate a complete one, we need to indicate a template. For example:

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>$title$</title>
    $if(description)$
    <meta name="description" content="$description$">
    $endif$
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <link href="/feed.xml" type="application/atom+xml" rel="alternate" title="$site_title$">
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/water.css@2/out/water.css">
  </head>
  <body>
    <article>
      <h1>$title$</h1>
      <a rel="author" href="$author_url$">$author_name$</a>
      <date>$date$</date>
$body$
    </article>
    <footer>
      Powered by <a href="https://github.com/taoso/makedown/">Makedown</a>
    </footer>
  </body>
</html>

We need to use the --template parameter to indicate template:

pandoc -o hello.html --template article.tpl hello.md

And you will get the complete HTML file.

As you can see, in the aforementioned template, I have defined several variables:

$title$ Title of article, required.
$date$ Date of article, required.
$description$ Subject of content, optional.
$author_url$ URL for author website, optional.
$author_name$ Name of Author, optional.

All of above variables can/should be set in the Front Matter of markdown:

---
title: Hello, World!
date: 2023-05-30
description: This is a demo markdown page.
author_name: 涛叔
author_url: https://taoshu.in
---

This is a demo markdown page.

In my opinion, it is boring to write description in the front matter. And writing multi-lines description in YAML format is also not easy to maintenance. So I decide to write the description into the first paragraph of post and choose to use one lua filter to extract it and inject into the $description$ variable.

Here is the lua code:

local desc = ""

function get_desc(blocks)
  for _, block in ipairs(blocks) do
    if block.t == 'Para' then
      desc = pandoc.utils.stringify(block)
      break
    end
  end
  return nil
end

return {{
  Blocks = get_desc,
  Meta = function (meta)
    meta.description = desc
    return meta
  end,
}}

I defines one variable desc. When pandoc process the markdown file, it split the content into blocks, and it will call the Blocks handler, that is the get_desc function.

In get_desc, I iterate all blocks and only process the first block with type of Para and store it content into desc. Before pandoc converting, it also calls the Meta handler, in which I store the desc variable into meta.description, which can be used in template file.

Save it into desc.lua and We can use the --lua-filter to let pandoc call it:

pandoc -o hello.html \
  --template article.tpl \
  --lua-filter path/to/desc.lua \
  hello.md

There are many place using the $author_name$ and $author_url$ . So I decide to store them into one env file in to format of dotenv:

author_name=Bob
author_email=bob@example.com

And there is no need to set them int Front Matter repeatedly. But how to let pandoc load these variables? It needs the lua filter again. I just add additional lua codes in the aforementioned filter code:

return {{
  Blocks = get_desc,
  Meta = function (meta)
    meta.description = desc
    --- load envs
    local envs = pandoc.system.environment()
    for k,v in pairs(envs) do
      if meta[k] == nil then
        meta[k] = v
      end
    end
    return meta
  end,
}}

We can load all environments by pandoc.system.environment() and inject them into meta object, which can be accessed by template variables.

So we can load and export variable like this:

set -o allexport
source env
set +o allexport

We can simplify markdown Front Matter, only keep title and date:

---
title: "Hello, World!"
date: 2023-05-30
---

This is a demo markdown page.

If we set value of one variable by both environment variable and Front Matter, the value in Front Matter take precedence.

Now that we have generated HTML files for markdown, the next step is to generate all indexes for every directories.

Generating index files

I assume the number of posts is small. So I let makedown do no pagination. All links of the posts under certain directory will be listed into one index.html entirely.

But how to detect the order of the list? We may change theses posts from time to time. So it’s not a good idea to sort list by last changing time, because the order is unstable. Every time we modify one post, it’s order will change.

So I choose to sort these lists according to the date, that is the created date of post. And this is a stable sort. Another two values I need to extract is title and path. The title can be extract from the Front Matter like date, and the path can be collected by bash or lua.

I choose to use lua filter for better compatibility and easier maintainability. And I design one template for meta data:

- { "date": "$date$", "path": "$path$", "title": "$title$", "desc": "$desc$", "updated": "$updated$" }

Both the $date$ and $title$ can fetched from the Front Matter. The $desc$ and $path$ can be extract by lua filter. And we can get the $updated$ from environment setting by bash.

Here is the meta.lua:

local desc = ""

function get_desc(blocks)
  for _, block in ipairs(blocks) do
    if block.t == 'Para' then
      desc = pandoc.utils.stringify(block)
      break
    end
  end
  return nil
end

return {{
  Blocks = get_desc,
  Meta = function (meta)
    meta.desc = desc
    meta.path = "/"..PANDOC_STATE.input_files[1]:sub(1,-3).."html"
    return meta
  end,
}}

The only change is the assignment of meta.path, which extracts the path from the first input files and replace the .md into .html.

Reader may wonder why we need to extract the description. It is because I use this meta to generate Atom feed as well.

Finally, we need a bash script meta.sh:

if [[ $(uname -s) == "Darwin" ]]; then
  updated=`date -jf "%s" "$(stat -f "%m" $1)" "+%Y-%m-%dT%H:%M:%SZ"`
else
  updated=`date -u +"%Y-%m-%dT%H:%M:%SZ" -r $1`
fi

pandoc -f markdown \
  --template $ROOT_DIR/meta.tpl \
  --metadata=updated:$updated \
  --lua-filter $ROOT_DIR/meta.lua \
  -o - $1

If we run meta.sh hello.md, we will get the following YAML file:

- { "date": "2023-05-30", "path": "/hello.html", "title": "Hello, World!", "desc": "This is a demo markdown page.", "updated": "2023-05-30T17:08:38Z" }

Do not break these content, because makedown will sort them in another script.

With these meta files, we can generate index files by pandoc. Here is the script:

# find all directories and iterating
find $1 -type d | while read dir; do

# load and export env
if [[ -f $dir/env ]]; then
        set -o allexport
        source $dir/env
        set +o allexport
fi

# generate index.yml
yml=$dir/index.yml
echo "title: $site_title" > $yml
echo "articles:" >> $yml

# collect all meta.yaml but exclude draft posts, index.yml and feed.yml
# and sort and append into index.yml
find $dir -name '*.yml' \
        ! -name "draft-*.yml" \
        ! -name "index.yml" \
        ! -name "feed.yml" \
        -exec cat {} + | sort -r >> $yml

# there is no *.md file
if [[ "$(tail -n 1 $yml)" == "articles:" ]]; then
        rm $yml
        continue
fi

# finally generate corresponding index.html
# we need another template
index=$dir/index.html
pandoc -f markdown \
        --template index.tpl \
        --metadata-file=$yml \
        --lua-filter $LUA_FILTER \
        - o $index /dev/null

done

The content of index.tpl is like this:

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>$title$</title>
    $if(description)$
    <meta name="description" content="$description$">
    $endif$
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <link href="/feed.xml" type="application/atom+xml" rel="alternate" title="$site_title$">
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/water.css@2/out/water.css">
  </head>
  <body>
    <header class="index">
     <h1><a href="/">$title$</a></h1>
    </header>
    <ol id="articles" reversed>
$for(articles)$
      <li><a href="$it.path$">$it.title$</a> <date>$it.date$</date></li>
$endfor$
    </ol>
    <footer>
      Powered by <a href="https://github.com/taoso/makedown/">Makedown</a>
    </footer>
  </body>
</html>

The key part of index.tpl is the loop of $for(articles)$ . It will iterate all item of the list $articles defined in the index.yml, which contains meta data of all markdowns under this directory. In the loop, we can use the $it.XXX to reference the meta data object.

Generating Atom feed is very similar. It just uses another template:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>$site_title$</title>
  <id>$site_url$/</id>
  <author>
    <name>$author_name$</name>
    <email>$author_email$</email>
  </author>
  <link href="$site_url$"/>
  <link href="$site_url$/feed.xml" rel="self"/>
  <updated>$updated$</updated>
$for(articles)$
  <entry>
    <id>$site_url$$it.path$</id>
    <link href="$site_url$$it.path$"/>
    <title>$it.title$</title>
    <updated>$it.updated$</updated>
    <published>$it.date$T00:00:00Z</published>
    <summary type="html"><![CDATA[$it.desc$]]></summary>
  </entry>
$endfor$
</feed>

These are all magics used by makedown. All we need to do is to organize all these magics into one make file.

# find all makedown files
MDs := $(shell find . -name '*.md')
# array list containing all *.html target
HTMLs := $(MDs:.md=.html)
# array list containing all *.yml target
METAs := $(MDs:.md=.yml)

# get the path of makefile
PWD := $(shell dirname $(realpath $(firstword $(MAKEFILE_LIST))))

# export some environment variable used by other scripts
export ROOT_DIR = $(PWD)
export LUA_FILTER = $(PWD)/desc.lua

# auto-target for all meta data file
%.yml: %.md
        $(PWD)/meta.sh $< > $@

# auto-target for all html data file
%.html: %.md head.tpl footer.tpl article.tpl
        pandoc -s -p --wrap=none \
                --toc \
                --mathml \
                --template article.tpl \
                --highlight-style=pygments \
                --lua-filter $(LUA_FILTER) \
                --from markdown+east_asian_line_breaks \
                $< -o $@

# target to generate all index.html for all directories
# it depends all meta data task
index: $(METAs)
        $(PWD)/index.sh .
        $(PWD)/feed.sh .

# target to genrate all index.html and all post html
build: index $(HTMLs)

# target to create new blog
new:
        mkdir $(dir)
        cp -R $(PWD)/skeleton/ $(dir)

.PHONY: index all new

One of the most interesting parts of the above makefile is that it use the find command to find all the markdown files and generates dynamic target for metadata and html.

Every time you run make -f path/to/makedown/Makefile build, it only generate meta data file (*.yml) and html file for articles new added or modified. This will make the build process far more fast. And further more, we can use the -j option to let make do the build simultaneously, which will further accelerate the build speed.

Conclusion

This post describes a simple yet powerful static site generator, makedown, based on make, bash, and pandoc. By using the makefile, makedown can build blog incrementally and concurrently, which makes it very fast.

https://github.com/taoso/makedown ↩︎