Generating Documentation

  • FIXME
  • Instructions are just numbers but may be represented as assembly code.
  • Instructions may refer to registers, memory, both, or neither.
  • A processor usually executes instructions in order but may jump to another location based on whether a conditional is true or false.

Terms defined: unparsing

Many programmers believe they’re more likely to write documentation and keep it up to date if it is close to the code. Tools that extract specially-formatted comments from code and turn them into documentation have been around since at least the 1980s; both Sphinx and MkDocs are popular ones for Python.

Generating documentation isn’t the same as checking code style, but they share some tooling. Let’s start by building a NodeVisitor that extracts and saves docstrings:

class Extract(ast.NodeVisitor):
    """Extraction class."""

    @staticmethod
    def extract(filenames):
        """Entry-level method."""
        extracter = Extract()
        for filename in filenames:
            with open(filename, "r") as reader:
                source = reader.read()
                tree = ast.parse(source)
                module_name = Path(filename).stem
                extracter.extract_from(module_name, tree)
        return extracter.seen

The code to create a stack, extract docstrings, and save them in a dictionary should look familiar by now:

    def __init__(self):
        """Constructor."""
        super().__init__()
        self.stack = []
        self.seen = {}

    def visit_ClassDef(self, node):
        """Get docstring from class."""
        self.save("class", node.name, node)
        self.generic_visit(node)
        self.stack.pop()

    def extract_from(self, module_name, tree):
        """Start extraction for a module."""
        self.save("module", module_name, tree)
        self.visit(tree)
        self.stack.pop()

    def save(self, kind, name, node):
        """Save information about a docstring."""
        self.stack.append(name)
        docstring = ast.get_docstring(node)
        self.seen[".".join(self.stack)] = (kind, docstring)

To format the docstrings, we create a Markdown page with module, class, and function names as headers:

HEADING = {"module": "#", "class": "##", "function": "##"}

MISSING = "**No documentation.**"


def format(docstrings):
    """Convert dictionary of docstrings to HTML page."""
    result = []
    for key, (kind, docstring) in sorted(docstrings.items()):
        result.append(make_heading(kind, key))
        result.append(docstring if docstring is not None else MISSING)
    result = "\n\n".join(result)
    return markdown.markdown(result, extensions=["markdown.extensions.extra"])


def format_key(key):
    return key.replace(".", "-").replace("_", r"\_")


def make_heading(kind, key):
    return f"{HEADING[kind]} `{key}` {{: #{format_key(key)}}}"

If our input file looks like this:

"""Docstring for module."""


def function(param):
    """Docstring for function."""


def undocumented(param):
    pass


class Sample:
    """Docstring for class."""

    def __init__(self, name):
        """Docstring for constructor."""
        self.name = name

    def rename(self, new_name):
        """Docstring for method."""
        self.name = new_name

then our output is:

<h1 id="doc_sample"><code>doc_sample</code></h1>
<p>Docstring for module.</p>
<h2 id="doc_sample-Sample"><code>doc_sample.Sample</code></h2>
<p>Docstring for class.</p>
<h2 id="doc_sample-Sample-__init__"><code>doc_sample.Sample.__init__</code></h2>
<p>Docstring for constructor.</p>
<h2 id="doc_sample-Sample-rename"><code>doc_sample.Sample.rename</code></h2>
<p>Docstring for method.</p>
<h2 id="doc_sample-function"><code>doc_sample.function</code></h2>
<p>Docstring for function.</p>
<h2 id="doc_sample-undocumented"><code>doc_sample.undocumented</code></h2>
<p><strong>No documentation.</strong></p>

Modifying Code

An AST is a data structure like any other, which means we can modify it as well as inspecting it. Let’s start with this short program:

def double(x):
    return 2 * x

print(double(3))

Its AST has two top-level nodes: one for the function definition and one for the print statement. We can duplicate the second of these and then unparse the AST to produce a new program:

code = ast.parse(original)
print_stmt = code.body[1]
code.body.append(print_stmt)
modified = ast.unparse(code)
def double(x):
    return 2 * x
print(double(3))
print(double(3))

To run our machine-generated program, we have to compile the AST to bytecode and tell Python to evaluate the result:

bytecode = compile(code, filename="example", mode="exec")
exec(bytecode)
6
6

Duplicating a print statement isn’t particularly useful, but other applications of this technique let us do some powerful things. Let’s have another look at how Python represents a function call. Our example is:

count("name")

We parse it like this:

call_code = ast.parse(call)

and get this AST:

Module(
  body=[
    Expr(
      value=Call(
        func=Name(id='count', ctx=Load()),
        args=[
          Constant(value='name')],
        keywords=[]))],
  type_ignores=[])

But we don’t have to parse text to create an AST: it’s just a bunch of objects, so we can construct one by hand that mirrors the structure shown above:

def make_count(name):
    return ast.Expr(
        value = ast.Call(
            func=ast.Name(id="count", ctx=ast.Load()),
            args=[ast.Constant(value=name)],
            keywords=[]
        )
    )
constructed = make_count("test")
Expr(
  value=Call(
    func=Name(id='count', ctx=Load()),
    args=[
      Constant(value='test')],
    keywords=[]))

Alternatively, we can find existing function definitions and modify them programmatically:

def modify(text):
    code = ast.parse(text)
    for node in ast.walk(code):
        if isinstance(node, ast.FunctionDef):
            node.body = [make_count(node.name), *node.body]
    return ast.unparse(code)

To try this out, here’s a program that adds and doubles numbers:

def add(left, right):
    return left + right

def double(x):
    return add(x, x)

add(1, 2)
double(3)

The modified version is:

def add(left, right):
    count('add')
    return left + right

def double(x):
    count('double')
    return add(x, x)
add(1, 2)
double(3)

So what exactly is call? We want a “function” that keeps track of how many times it has been passed different strings, so we define a class with a __call__ method so that its instances can be used like functions:

class CountCalls:
    def __init__(self):
        self.count = Counter()

    def __call__(self, name):
        self.count[name] += 1

Finally, when we’re evaluating the bytecode generated from our modified AST, we pass in a dictionary of variable names and values that we want to have in scope. The result is exactly what we would get if we had defined all of this in the usual way:

call_counter = CountCalls()
bytecode = compile(modified, filename="example", mode="exec")
exec(bytecode, {"count": call_counter})
print(call_counter.count)
Counter({'add': 2, 'double': 1})

There’s Such a Thing as “Too Clever”

Modifying code dynamically is the most powerful technique shown in this book. It is also the least comprehensible: as soon as the code you read and the code that’s run can differ in arbitrary ways, you have a maintenance headache and a security nightmare. Limited forms of program modification, such as Python’s metaclasses or decorators give most of the power with only some of the headaches; please use those rather than the magic shown above.

Exercises

Name Conversion

Write a tool that find functions with pothole_case names and replaces them with CamelCase names, then saves the resulting program as a legal Python file.