A Template Expander

Static site generators create HTML pages from templates, directives, and data.
A static site generator has the same core features as a programming language.
Special-purpose mini-languages quickly become as complex as other languages.
Static methods are a convenient way to group functions together.

Terms defined: abstract class, abstract method, Application Programming Interface, Boolean expression, static site generator, truthy

Every program needs documentation, and the best place to put documentation is on the web. Writing and updating HTML pages by hand is time-consuming and error-prone, particularly when many parts are the same. Most modern websites therefore use some kind of static site generator (SSG) to create pages from templates.

Hundreds of SSGs have been written in every popular programming language, and languages like PHP have been invented primarily for this purpose. Most of these systems use one of three designs (Figure 12.1):

Mix commands in an existing language such as JavaScript with the HTML or Markdown using some kind of marker to indicate which parts are commands and which parts are to be taken as-is. This approach is taken by EJS.
Create a mini-language with its own commands like Jekyll. Mini-languages are appealing because they are smaller and safer than general-purpose languages, but eventually they acquire most of the features of a general-purpose language. Again, some kind of marker must be used to show which parts of the page are code and which are ordinary text.
Put directives in specially-named attributes in the HTML. This approach is the least popular, but it eliminates the need for a special parser.

Three options for page templates — Figure 12.1: Three different ways to implement page templating.

This chapter builds a simple page templating system using the third strategy. We will process each page independently by parsing the HTML and walking the DOM to find nodes with special attributes. Our program will execute the instructions in those nodes to implement loops and if/else statements; other nodes will be copied as-is to create text.

Syntax

Let’s start by deciding what “done” looks like. Suppose we want to turn an array of strings into an HTML list. Our template will look like this:

<html>
  <body>
    <ul z-loop="item:names">
      <li><span z-var="item"/></li>
    </ul>
  </body>
</html>

The attribute z-loop tells the tool to repeat the contents of that node; the loop variable and the collection being looped over are separated by a colon. The span with the attribute z-var tells the tool to fill in the node with the value of the variable. When our tool processes this page, the output will be standard HTML without any traces of how it was created:

<html>
<body>
<ul>
<li><span>Johnson</span></li>

<li><span>Vaughan</span></li>

<li><span>Jackson</span></li>
</ul>
</body>
</html>

Human-Readable vs. Machine-Readable

Putting the loop variable and target in a single attribute makes loops easy to type but hides information from standard HTML tools, which can’t know that this attribute contains multiple values separated by a colon. We should use two attributes like this:

<ul z-loop="names" z-loop-var="item">

but we decided to save ourselves a little typing. We should also call our attributes data-something instead of z-something to conform with the HTML5 specification, but again, decided to save ourselves a bit of typing.

The next step is to define the Application Programming Interface (API) for filling in templates. Our tool needs the template itself, somewhere to write its output, and the set of variables to use in the expansion. Those variables might come from a configuration file from a header in the file itself, or from somewhere else entirely, so we will assume the calling program has gotten them somehow and have it pass them into the expansion function as a dictionary (Figure 12.2):

data = {"names": ["Johnson", "Vaughan", "Jackson"]}

dom = read_html("template.html")
expander = Expander(dom, data)
expander.walk()
print(expander.result)

Template API — Figure 12.2: Combining text and data in templating.

Managing Variables

As soon as we have variables, we need a way to track their values. We also need to maintain multiple sets of variables so that (for example) variables used inside a loop don’t conflict with ones used outside of it. As in Chapter 7, we will use a stack of environments, each of which is a dictionary.

Our stack-handling class Env has methods to push and pop new stack frames and find a variable given its name. If the variable can’t be found, Env.find returns None instead of raising an exception:

class Env:
    def __init__(self, initial):
        self.stack = [initial.copy()]

    def push(self, frame):
        self.stack.append(frame)

    def pop(self):
        self.stack.pop()

    def find(self, name):
        for frame in reversed(self.stack):
            if name in frame:
                return frame[name]
        return None

Visiting Nodes

As Chapter 11 explained, HTML pages are usually stored in memory as trees and processed using the Visitor pattern. We therefore create a Visitor class whose constructor takes the root node of the DOM tree as an argument and saves it. Calling Visitor.walk without a value starts recursion from that saved root; when .walk is given a value (as it is during recursive calls), it uses that instead.

class Visitor:
    def __init__(self, root):
        self.root = root

    def walk(self, node=None):
        if node is None:
            node = self.root
        if self.open(node):
            for child in node.children:
                self.walk(child)
        self.close(node)

    def open(self, node):
        raise NotImplementedError("open")

    def close(self, node):
        raise NotImplementedError("close")

Visitor defines two abstract methods open and close that are called when we first arrive at a node and when we are finished with it. These methods are called “abstract” because we can’t actually use them: any attempt to do so will raise an exception, which means child classes must override them. (In object-oriented terminology, this means that Visitor is an abstract class.) This approach is different from that of the visitor in Chapter 11, where we defined do-nothing methods so that derived classes could override only the ones they needed.

The Expander class is specialization of Visitor that uses an Env to keep track of variables. It imports handlers for each type of special node—we will explore those in a moment—and saves them along with a newly-created environment and a list of strings making up the output:

class Expander(Visitor):
    def __init__(self, root, variables):
        super().__init__(root)
        self.env = Env(variables)
        self.handlers = HANDLERS
        self.result = []

When recursion encounters a new node, it calls open to do one of three things:

If the node is plain text, copy it to the output.
If there is a handler for the node, call the handler’s open or close method.
Otherwise, open a regular tag.

    def open(self, node):
        if isinstance(node, NavigableString):
            self.output(node.string)
            return False
        elif self.hasHandler(node):
            return self.getHandler(node).open(self, node)
        else:
            self.showTag(node, False)
            return True

Expander.close works much the same way. Both methods find handlers by comparing the DOM node’s attributes to the keys in the dictionary of handlers built during construction:

    def hasHandler(self, node):
        return any(
            name in self.handlers
            for name in node.attrs
        )

    def getHandler(self, node):
        possible = [
            name for name in node.attrs
            if name in self.handlers
        ]
        assert len(possible) == 1, "Should be exactly one handler"
        return self.handlers[possible[0]]

Finally, we need a few helper methods to show tags and generate output:

    def showTag(self, node, closing):
        if closing:
            self.output(f"</{node.name}>")
            return
        self.output(f"<{node.name}")
        for name in node.attrs:
            if not name.startswith("z-"):
                self.output(f' {name}="{node.attrs[name]}"')
        self.output(">")

    def output(self, text):
        self.result.append("UNDEF" if text is None else text)

    def getResult(self):
        return "".join(self.result)

Notice that Expander adds strings to an array and joins them all right at the end rather than concatenating strings repeatedly. Doing this is more efficient; it also helps with debugging, since each string in the array corresponds to a single method call.

Implementing Handlers

Our last task is to implement the handlers for filling in variables’ values, looping, and so on. We could define an abstract class with open and close methods, derive one class for each of the template expander’s capabilities, and then construct one instance of each class for Expander to use, but there’s a simpler way. When Python executes the statement import something it executes the file something.py, saves the result in a specialized dictionary-like object, and assigns that object to the variable something. That object can also be saved in data structures like lists and dictionaries or passed as an argument to a function just like numbers, functions, and classes—remember, programs are just data.

Let’s write a pair of functions that each take an expander and a node as inputs and expand a DOM node with a z-num attribute to insert a number into the output:

def open(expander, node):
    expander.showTag(node, False)
    expander.output(node.attrs["z-num"])

def close(expander, node):
    expander.showTag(node, True)

When we enter a node like <span z-num="123"/> this handler asks the expander to show an opening tag followed by the value of the z-num attribute. When we exit the node, the handler asks the expander to close the tag. The handler doesn’t know whether things are printed immediately, added to an output list, or something else; it just knows that whoever called it implements the low-level operations it needs.

Here’s how we connect this handler (and others we’re going to write in a second) to the expander:

import z_if
import z_loop
import z_num
import z_var

HANDLERS = {
    "z-if": z_if,
    "z-loop": z_loop,
    "z-num": z_num,
    "z-var": z_var
}

The HANDLERS dictionary maps the names of special attributes in the HTML to modules, each of which defines open and close functions for the expander to call. In other words, we are using modules to prevent name collision just as we would use classes or functions.

The handlers for variables are:

def open(expander, node):
    expander.showTag(node, False)
    expander.output(expander.env.find(node.attrs["z-var"]))

def close(expander, node):
    expander.showTag(node, True)

This code is almost the same as the previous example. The only difference is that instead of copying the attribute’s value directly to the output, we use it as a key to look up a value.

These two pairs of handlers look plausible, but do they work? To find out, we can build a program that loads variable definitions from a JSON file, reads an HTML template using the Beautiful Soup module, and does the expansion:

import json
import sys
from bs4 import BeautifulSoup
from expander import Expander

def main():
    with open(sys.argv[1], "r") as reader:
        variables = json.load(reader)

    with open(sys.argv[2], "r") as reader:
        doc = BeautifulSoup(reader.read(), "html.parser")
        template = doc.find("html")

    expander = Expander(template, variables)
    expander.walk()
    print(expander.getResult())

if __name__ == "__main__":
    main()

We added new variables for our test cases one by one as we were writing this chapter. To avoid repeating text repeatedly, here’s the entire set:

{
  "firstVar": "firstValue",
  "secondVar": "secondValue",
  "varName": "varValue",
  "yes": true,
  "no": false,
  "names": ["Johnson", "Vaughan", "Jackson"]
}

Our first test checks whether static text is copied over as-is:

<html>
  <body>
    <h1>Static Text</h1>
    <p>test</p>
  </body>
</html>

<html>
<body>
<h1>Static Text</h1>
<p>test</p>
</body>
</html>

Good. Now, does the expander handle constants?

<html>
  <body>
    <p><span z-num="123"/></p>
  </body>
</html>

<html>
<body>
<p><span>123</span></p>
</body>
</html>

What about a single variable?

<html>
  <body>
    <p><span z-var="varName"/></p>
  </body>
</html>

<html>
<body>
<p><span>varValue</span></p>
</body>
</html>

What about a page containing multiple variables? There’s no reason it should fail if the single-variable case works, but we should still check—again, software isn’t done until it has been tested.

<html>
  <body>
    <p><span z-var="firstVar" /></p>
    <p><span z-var="secondVar" /></p>
  </body>
</html>

<html>
<body>
<p><span>firstValue</span></p>
<p><span>secondValue</span></p>
</body>
</html>

Generating Element IDs

It’s often handy to have a unique identifier for every element in a page, so some templating engines automatically generate id attributes for elements that don’t specify IDs explicitly. If you do this, please do not generate random numbers, because then Git and other version control systems will think a regenerated page has changed when it actually hasn’t. Generating sequential IDs is equally problematic: if you add an item to a list at the top of the page, for example, that might change the IDs for all of the items in subsequent (unrelated) lists.

Control Flow

Our tool supports conditional expressions and loops. Since we’re not implementing Boolean expressions like and and or, all we have to do for a condition is look up a variable and then expand the node if Python thinks the variable’s value is truthy:

def open(expander, node):
    check = expander.env.find(node.attrs["z-if"])
    if check:
        expander.showTag(node, False)
    return check

def close(expander, node):
    if expander.env.find(node.attrs["z-if"]):
        expander.showTag(node, True)

Let’s test it:

<html>
  <body>
    <p z-if="yes">Should be shown.</p>
    <p z-if="no">Should <em>not</em> be shown.</p>
  </body>
</html>

<html>
<body>
<p>Should be shown.</p>

</body>
</html>

Spot the Bug

This implementation of if contains a subtle bug. open and close both check the value of the control variable. If something inside the body of the if changes that value, the result could be an opening tag without a matching closing tag or vice versa. We haven’t implemented an assignment operator, so right now there’s no way for that to happen, but it’s a plausible thing for us to add later, and tracking down a bug in old code that is revealed by new code is always a headache.

Finally we have loops. For these, we need to get the array we’re looping over from the environment and do the following for each item it contains:

Create a new stack frame holding the current value of the loop variable.
Expand all of the node’s children with that stack frame in place.
Pop the stack frame to get rid of the temporary variable.

def open(expander, node):
    index_name, target_name = node.attrs["z-loop"].split(":")
    expander.showTag(node, False)
    target = expander.env.find(target_name)
    for value in target:
        expander.env.push({index_name: value})
        for child in node.children:
            expander.walk(child)
        expander.env.pop()
    return False

def close(expander, node):
    expander.showTag(node, True)

Once again, it’s not done until we test it:

<html>
  <body>
    <ul z-loop="item:names">
      <li><span z-var="item"/></li>
    </ul>
  </body>
</html>

<html>
<body>
<ul>
<li><span>Johnson</span></li>

<li><span>Vaughan</span></li>

<li><span>Jackson</span></li>
</ul>
</body>
</html>

We have just implemented another simple programming language like the one in Chapter 7. It’s unlikely that anyone would want to use it as-is, but adding a new feature is now as simple as writing a matching pair of open and close functions.

Summary

Figure 12.3 summarizes the key ideas in this chapter, some of which we first encountered in Chapter 7. Please see Appendix B for extra material related to these ideas.

Concept map for HTML templating — Figure 12.3: HTML templating concept map.

Exercises

Tracing Execution

Add a directive <span z-trace="variable"/> that prints the current value of a variable for debugging.

Unit Tests

Write unit tests for template expansion using pytest.

Sub-keys

Modify the template expander so that a variable name like person.name looks up the "name" value in a dictionary called "person" in the current environment.

Literal Text

Add a directive <div z-literal="true">…</div> that copies the enclosed text as-is without interpreting or expanding any contained directives. (A directive like this would be needed when writing documentation for the template expander.)

Including Other Files

Add a directive <div z-include="filename.html"/> that includes another file in the file being processed.
Should included files be processed and the result copied into the including file, or should the text be copied in and then processed? What difference does it make to the way variables are evaluated?

HTML Snippets

Add a directive <div z-snippet="variable">…</div> that saves some text in a variable so that it can be displayed later. For example:

<html>
  <body>
    <div z-snippet="prefix"><strong>Important:</strong></div>
    <p>Expect three items</p>
    <ul>
      <li z-loop="item:names">
        <span z-var="prefix"><span z-var="item"/>
      </li>
    </ul>
  </body>
</html>

would print the word “Important:” in bold before each item in the list.

YAML Headers

Modify the template expander to handle variables defined in a YAML header in the page being processed. For example, if the page is:

---
name: "Dorothy Johnson Vaughan"
---
<html>
  <body>
    <p><span z-var="name"/></p>
  </body>
</html>

will create a paragraph containing the given name.

Expanding All Files

Write a program expand_all.py that takes two directory names as command-line arguments and builds a website in the second directory by expanding all of the HTML files found in the first or in sub-directories of the first.

Counting Loops

Add a directive <div z-index="indexName" z-limit="limitName">…</div> that loops from zero to the value in the variable limitName, putting the current iteration index in indexName.

Boolean Expression

Design and implement a way to express the Boolean operators and and or.

Element IDs

The callout earlier said that templating systems should not generate random or sequential IDs for elements. A colleague of yours has proposed generating the IDs by hashing the element’s content, since this will stay the same as long as the content does. What are the pros and cons of doing this?