expLog

A Notebook Style Guide

This note is an attempt at writing the rules I try to follow for computational notebooks that are meant to be read.

Like any good style guide it has inherent contradictions: best addressed with a liberal dose of context and a pinch of personal taste.

The Style Guide

Explicitly choose global state, and modify it with pure functions

This rule contradicts the one around following best practices in Software Engineering, but carefully choose global state that is the central concept for the notebook.

One common example is query results that are then analyzed by the notebook.

Maintain pure functions that can transform this state throughout the notebook; avoid side effects because they can be painfully hard to debug in a REPL like environment.

Writing transforms as pure functions also allows for quick and simple inline tests.

Each cell should be responsible for one thing.

A cell can define a function, a class, or a snippet of code to be executed. Alternatively, it can be one paragraph or section of text in the notebook.

Maintaining tight, one-idea cells makes for cleaner diffs and clearer histories for notebooks maintained in source control.

Liberally include assertions and tests through the notebook.

A quick assertion or simple unit test at the end of any function or class definition can prove invaluable in debugging and extending notebooks.

Assertions also allow for quick iteration using Shift + Enter while iterating on the contents of a cell to quickly sanity check it's contents.

Notebooks must be written like prose

As true as this statement is for code, it's even more true for a Notebook. A good notebook must be written keeping the audience in mind: emphasizing code and prose appropriately.

Accordingly, style guides apply perflectly: I personally prefer On Writing Well.

Structure the notebook clearly with well-defined sections.

Use headings liberally to structure the notebook into digestible pieces.

Most reasonable renderers will also generate a Table of Contents to make headings even more valuable for quickly navigating the document and getting a quick overview.

Notebooks should follow best practices for programming.

Code within notebooks should be carefully structured to stand well by itself as a program.

The standards we've adopted for good design don't disappear because it's an interactive environment:

  • abstract well, and have consistent levels of abstraction
  • balance coupling and cohesion;
  • trade-off YAGNI and DRY as appropriate

An interactive environment gives even more opportunities to get it right and refactor quickly. Tooling for refactoring tends to be somewhat broken.

Simple rules of software engineering also apply: stick to the PEPs, avoid lint errors and maintain conventions.

Notebooks should be reproducible.

Reproducibility depends on the nature of the notebook: it doesn't necessarily mean that re-running a notebook should produce exactly the same outputs, but the central thesis of the notebook should stand.

The underlying data – or random value generating a notebook should be allowed to update without breaking the notebook.

While it may not be feasible to snapshot and include all the data used within a notebook, where and how to access it should be clearly documented.

Similarly, there should be a clear description of the packages, libraries and potentially even hardware required to re-run the notebook.

Notebooks should be executable directly with a "run-all".

Few things signal a sloppy notebook more than one which fails to execute with "Run all cells".

Ensure that functions and variables are available in the right order.

One simple sanity check is to execute "Run All" successfully as a pre-cursor to publishing an notebook.

Minimize noise from unintentional output

Libraries and function calls can be noisy, and generate outputs indicating query progress, incremental logging with progress bars or otherwise unnecessary output.

Eliminate these to minimize visual noise in the notebook.

At the same time, be very intentional about retaining all potentially useful information for anyone simply reading the notebook.

For example, %%capture in a Jupyter notebook can help suppressing unnecessary output.

References

Books, papers, etc.

  • Donald E. Knuth. 1984. Literate Programming. The Computer Journal. British Computer Society 27 (2): 97–111. 10.1093/comjnl/27.2.97
view source