#+TITLE: A Notebook Style Guide

This note is an attempt at writing the rules I try to follow for
computational notebooks that are meant to be read.

Like any good style guide it has inherent contradictions: best
addressed with a liberal dose of context and a pinch of personal
taste.

* The Style Guide

** Explicitly choose global state, and modify it with pure functions
This rule contradicts the one around following best practices in
Software Engineering, but carefully choose global state that is the
central concept for the notebook.

One common example is query results that are then analyzed by the
notebook.

Maintain pure functions that can transform this state throughout the
notebook; avoid side effects because they can be painfully hard to
debug in a REPL like environment.

Writing transforms as pure functions also allows for quick and simple
inline tests.

** Each cell should be responsible for one thing.
A cell can define a function, a class, or a snippet of code to be
executed. Alternatively, it can be one paragraph or section of text in
the notebook.

Maintaining tight, one-idea cells makes for cleaner diffs and clearer
histories for notebooks maintained in source control.

** Liberally include assertions and tests through the notebook.
A quick assertion or simple unit test at the end of any function or
class definition can prove invaluable in debugging and extending
notebooks.

Assertions also allow for quick iteration using Shift + Enter while
iterating on the contents of a cell to quickly sanity check it's contents.

** Notebooks must be written like prose
As true as this statement is for code, it's even more true for a
Notebook. A good notebook must be written keeping the audience in
mind: emphasizing code and prose appropriately.

Accordingly, style guides apply perflectly: I personally prefer On
Writing Well.

** Structure the notebook clearly with well-defined sections.
Use headings liberally to structure the notebook into digestible
pieces. 

Most reasonable renderers will also generate a Table of Contents to
make headings even more valuable for quickly navigating the document
and getting a quick overview.

** Notebooks should follow best practices for programming.
Code within notebooks should be carefully structured to stand well by
itself as a program.

The standards we've adopted for good design don't disappear because
it's an interactive environment: 
- abstract well, and have consistent levels of abstraction 
- balance coupling and cohesion;
- trade-off YAGNI and DRY as appropriate 

An interactive environment gives even more opportunities to get it
right and refactor quickly. Tooling for refactoring tends to be
somewhat broken.

Simple rules of software engineering also apply: stick to the PEPs,
avoid lint errors and maintain conventions.

** Notebooks should be reproducible.
Reproducibility depends on the nature of the notebook: it doesn't
necessarily mean that re-running a notebook should produce exactly the
same outputs, but the central thesis of the notebook should stand.

The underlying data -- or random value generating a notebook should be
allowed to update without breaking the notebook. 

While it may not be feasible to snapshot and include all the data
used within a notebook, where and how to access it should be clearly
documented.

Similarly, there should be a clear description of the packages,
libraries  and potentially even hardware required to re-run the
notebook.

** Notebooks should be executable directly with a "run-all".
Few things signal a sloppy notebook more than one which fails to
execute with "Run all cells".

Ensure that functions and variables are available in the right order.

One simple sanity check is to execute "Run All" successfully as a
pre-cursor to publishing an notebook.

** Minimize noise from unintentional output
Libraries and function calls can be noisy, and generate outputs
indicating query progress, incremental logging with progress bars or
otherwise unnecessary output.

Eliminate these to minimize visual noise in the notebook. 

At the same time, be very intentional about retaining all potentially
useful information for anyone simply reading the notebook.

For example, %%capture in a Jupyter notebook can help suppressing
unnecessary output. 


* References
** Style guides around the web
- Space Telescope Science Institute Jupyter Notebook Style Guide
- Clean code in Jupyter Notebooks
- GCP Jupyter Notebooks Development Manifesto
- Coding Standards for your Jupyter Notebooks
- Jupyter Notebook best practices

** Books, papers, etc.
-  Donald E. Knuth. 1984. Literate Programming. The Computer
  Journal. British Computer Society 27 (2):
  97–111. DOI:10.1093/comjnl/27.2.97