#+TITLE: A Notebook Style Guide

After writing and reading several python notebooks I've observed
patterns that make notebooks significantly easier to write, read and
iterate on. This style guide is an attempt at formalizing the patterns.

Like any good style guide there are inherent contradictions: best
addressed by a liberal dose of context and a pinch of taste.

* The Style Guide
  
** Carefully choose global state, and transform it with pure functions.
Handling this correctly automatically forces an excellent structure on
the notebook. 

The global, shared state identifies the core of the notebook. Ideally,
the state is never mutated in place: instead any transformations are
copied into a new variable. 

Pure -- side effect free -- functions can work on the data, and
automatically structure the notebook into stages. It also makes it
trivial to write a quick test for the functions right next to them
with toy inputs.

Retaining the original data and the lack of side effects makes it
significantly easier to re-run sections of the notebook confidently. 

One common example would the core dataset being analyzed in a
notebook: it can be very valuable to have the original dataframe
available, with any transformations creating copies.

** Each cell should be responsible for one thing.
A cell can define a function, a class, or a snippet of code to be
executed. Alternatively, it can be one paragraph or section of text in
the notebook.

Maintaining tight, one-idea cells makes for cleaner diffs and clearer
histories for notebooks maintained in source control.

** Liberally include assertions and tests through the notebook.
A quick assertion or simple unit test at the end of any function or
class definition can prove invaluable in debugging and extending
notebooks.

Assertions also allow for quick iteration using Ctrl + Enter while
iterating on the contents of a cell to quickly sanity check it's contents.

** Notebooks must be written like prose.
As true as this statement is for code, it's even more true for a
Notebook. A good notebook must be written keeping the audience in
mind: emphasizing code and prose appropriately.

Accordingly, style guides apply perfectly: I strongly recommend On
Writing Well.

** Structure the notebook clearly with well-defined headings.
Use headings liberally to structure the notebook into digestible
pieces. 

Most reasonable renderers will also generate a Table of Contents to
make headings even more valuable for quickly navigating the document
and getting a quick overview.

** Notebooks should follow best practices for programming.
Code within notebooks should be carefully structured to stand well by
itself as a program.

The standards we've adopted for good design don't disappear because
it's an interactive environment: 
- abstract well, and have consistent levels of abstraction.
- balance coupling and cohesion.
- trade-off YAGNI and DRY as appropriate.

An interactive environment gives even more opportunities to get it
right and refactor quickly; though tooling support for refactoring in
most notebook clients tends to be non-existent.

Simple rules of software engineering also apply: stick to the PEPs,
avoid lint errors and maintain conventions.

** Notebooks should be reproducible.
Reproducibility depends on the nature of the notebook: it doesn't
necessarily mean that re-running a notebook should produce exactly the
same outputs, but the central thesis of the notebook should stand.

The underlying data -- or random value generating a notebook should be
allowed to update without breaking the notebook. 

While it may not be feasible to snapshot and include all the data
used within a notebook, where and how to access it should be clearly
documented.

Similarly, there should be a clear description of the packages,
libraries  and potentially even hardware required to re-run the
notebook.

** Notebooks should be executable directly with a "run-all".
Few things signal a sloppy notebook more than one which fails to
execute with "Run all cells".

Ensure that functions and variables are available in the right order.

One simple sanity check is to execute "Run All" successfully as a
pre-cursor to publishing an notebook.

** Minimize noise from unintentional output
Libraries and function calls can be noisy, and generate outputs
indicating query progress, incremental logging with progress bars or
otherwise unnecessary output.

Eliminate these to minimize visual noise in the notebook. 

At the same time, be very intentional about retaining all potentially
useful information for anyone simply reading the notebook.

For example, %%capture in a Jupyter notebook can help suppressing
unnecessary output. 


* References
** Style guides around the web
- Space Telescope Science Institute Jupyter Notebook Style Guide
- Clean code in Jupyter Notebooks
- GCP Jupyter Notebooks Development Manifesto
- Coding Standards for your Jupyter Notebooks
- Jupyter Notebook best practices

** Books, papers, etc.
-  Donald E. Knuth. 1984. Literate Programming. The Computer
  Journal. British Computer Society 27 (2):
  97–111. DOI:10.1093/comjnl/27.2.97