This notebook explores how much of your team you should allocate to toolsmiths, and the corresponding tradeoffs. The assumptions are explicitly laid out and varied over different sets of ranges to give an envelope of potentially useful values: jump to the end of this notebook for a visualization that summarizes the envelope. THere are several different ways to interpret this data as well, depending on what you consider the job of a toolsmith to be.
Before you spend too much time reading -- or making extremely important decisions -- based off this notebook, you should remember that everything is made up, and the numbers don't matter. The point of this notebook is to help build intuition, and not to be a precise recommendation. All the graphs are low-fidelity and XKCD-style to reinforce that fact.
There's also a tl;dr; near the bottom: you can jump to it by clicking here.
This is part of a series of notes on building developer tools that ended up being much longer than I ever anticipated, available here.
First, starting with a collection of functions that will be useful in building up the actual model, with a few simple tests next to each utility: following my style guide this notebook should be executable from top to bottom. (If you'd like to run it locally, remember to install humor sans for the pretty fonts.)
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
mpl.rcParams['figure.figsize'] = [16.0, 5.0]
plt.xkcd(length=200, randomness=1)
palette = sns.color_palette("deep")
from functools import partial
colors = {}
for i, label in enumerate(["productivity", "work", "toolsmiths", "time"]):
colors[label] = palette[i]
Using sigmoid cuvers as a way to model the effect of toolsmiths: if all the toolsmiths together put in $x$ person-days of work, the "productivity curve" returns the new productivity for engineers. (A "person-day" is the amount of work we expect the average engineer to complete in a day.)
The utility function is a heavily parameterized sigmoid curve to allow testing against different shapes and assumptions around how the effect toolsmiths can have.
import math
def sigmoid(x, a=1, b=1, c=1, d=1, e=0, f=0):
return a / (b + c * math.exp(-d * (x + e))) + f
def display_curve(title, curve, xs=None, show_project_size=True):
if xs is None:
xs = list(range(100))
plt.title(title)
plt.plot(xs, [curve(x) for x in xs], color=colors["productivity"])
plt.xlabel("Person-days of work")
plt.ylabel("Productivity multiplier")
if show_project_size:
plt.axvline(x=100 * 100, color="black", linestyle="--", label="Project size")
plt.legend()
plt.show()
xs = [x/10 for x in range(-100, 100)]
display_curve("The default shape of sigmoid", sigmoid, xs, show_project_size=False)
A confounding factor while dealing with large teams is the slow-down triggered because of the coordination overhead, among other things. Assumptions for this particular exploration:
(Presumably, some sanity prevails at that point, and engineers are organized into teams to cap the communication overhead.)
def brookes_law(current_productivity, number_of_engineers):
return current_productivity - min(.5, number_of_engineers * .01)
xs = list(range(0, 101))
plt.title("Work done per day per engineer with increasing engineers")
plt.xlabel("Number of engineers in the team")
plt.ylabel("Productivity Multiplier")
plt.plot(xs, [brookes_law(1, x) for x in xs], color=colors["productivity"])
plt.ylim(bottom=0)
plt.show()
For simple calculations, assuming
For these numbers, it'll take 200 days to complete the project (.5 productivity per engineer, 10,000 / (100 * .5)) without any toolsmiths.
To evaluate the effect of toolsmiths, it's interesting to determine:
This utility function simply walks through each day, adjusting the productivity multiplier and total work done along the way. The interesting data is returned as a tuple, along with a line demonstrating the work done per day.
def calculate_work(*, toolsmiths, engineers, productivity_curve, brookes_law=brookes_law, project_size=10_000, default_days=200):
engineer_work = 0
toolsmith_work = 0
productivity = 1
days = 0
days_to_complete = None
work_in_default_days = None
ys = []
while engineer_work < project_size or days < default_days:
productivity = productivity_curve(toolsmith_work)
toolsmith_work += toolsmiths * brookes_law(productivity, toolsmiths)
engineer_work += engineers * brookes_law(productivity, engineers)
days += 1
ys.append(engineer_work)
if engineer_work >= project_size and days_to_complete is None:
days_to_complete = days
if days == default_days:
work_in_default_days = engineer_work
return (work_in_default_days, days_to_complete, ys)
baseline_work, baseline_time, ys = calculate_work(toolsmiths=0, engineers=100, productivity_curve=lambda x: 1)
print(f"{baseline_work = }, {baseline_time = }")
baseline_work = 10000.0, baseline_time = 200
def compare_work(productivity_curve, toolsmiths_counts, title):
plt.title(title)
for toolsmiths in toolsmiths_counts:
_, _, ys = calculate_work(toolsmiths=toolsmiths,
engineers=100-toolsmiths,
productivity_curve=productivity_curve)
plt.plot(list(range(0, 200)), ys[:200], label=f"{toolsmiths} toolsmiths")
plt.xlabel("Total days")
plt.ylabel("Work done")
plt.ylim(bottom=0)
plt.xlim(left=0)
plt.legend()
plt.show()
compare_work(
productivity_curve=lambda x: 1 + x / 10_000,
toolsmiths_counts=list(range(0, 100, 20)),
title="Linear productivity curve")
Finally building up to something interesting: a function to figure out the optimal allocation between toolsmiths and engineers by simply brute-forcing all options.
def optimum_work(productivity_curve, brookes_law=brookes_law, total_engineers=100, project_size=10_000, default_days=200):
max_work = None
min_duration = None
for toolsmiths in range(total_engineers):
work, duration, _ = calculate_work(toolsmiths=toolsmiths,
engineers=total_engineers - toolsmiths,
productivity_curve=productivity_curve,
brookes_law=brookes_law,
project_size=project_size,
default_days=default_days)
if max_work is None or max_work[1] < work:
max_work = (toolsmiths, work)
if min_duration is None or min_duration[1] > duration:
min_duration = (toolsmiths, duration)
return max_work, min_duration
The example checks the optimum allocation with a linear productivity curve where we can double productivity by spending 10,000 person-days of work: both to maximize total work completed, and to minimize the time to finish the main project.
Then it checks these values by graphing out all the intermediate values.
max_work, min_time = optimum_work(lambda x: 1 + x / 10_000, brookes_law)
print(f"{max_work = }, {min_time = }")
max_work = (19, 10923.214441128348), min_time = (14, 187)
xs = []
ys = []
zs = []
for toolsmiths in range(100):
work, duration, _ = calculate_work(toolsmiths=toolsmiths,
engineers=100 - toolsmiths,
productivity_curve=lambda x: 1 + x / 10_000)
xs.append(toolsmiths)
ys.append(work)
zs.append(duration)
fig, ax1 = plt.subplots()
ax1.set_title("Exploring toolsmith allocation")
ax2 = ax1.twinx()
ax1.plot(xs, ys, label="Total work done in 200 days (person_days)", color=colors["work"])
ax1.set_ylabel("Work done (person-days)")
ax1.set_xlabel("# of toolsmiths (out of a team of 100)")
ax2.plot(xs, zs, label="Time to complete 10_000 person-days project (days)", color=colors["time"])
ax2.set_ylabel("Time to completion (days)")
ax1.hlines([max_work[1]], 0, max_work[0], linestyle="dotted", color=colors["work"])
ax1.vlines([max_work[0]], 0, max_work[1], linestyle="dotted", color=colors["work"])
ax2.hlines([min_time[1]], min_time[0], 100, linestyle="dotted", color=colors["time"])
ax2.vlines([min_time[0]], 0, min_time[1], linestyle="dotted", color=colors["time"])
ax1.set_xlim(0, 100)
ax1.set_ylim(bottom=0, top=12000)
ax2.set_ylim(bottom=0, top=600)
ax1.legend(loc="lower right")
ax2.legend(loc="lower left")
plt.show()
Last but not least, it's most interesting to identify how the maximum work, minimum time, and optimal allocation change with different productivity curves. Abstracting this out to avoid duplicate code.
Testing this out with the simple linear model of productivity that we used above.
def explore_optimum_work(*, title, xlabel, xs, ys, zs, ws, vs):
"""
xs: value along x-axis
ys: work completed / baseline
zs: toolsmith allocation for optimum
ws: time to completion / baseline
vs: toolsmith allocation for optimum
"""
fig, axs = plt.subplots(2, 1)
fig.set_size_inches(16.0, 10.0)
axw = axs[0]
axw.set_title(title)
axw.plot(xs, ys, label="Total work done / baseline", color=colors["work"])
axw.set_ylabel("Total work done / baseline")
axw.set_xlabel(xlabel)
axw.set_ylim(bottom=0)
axw.legend(loc='upper left')
axw2 = axw.twinx()
axw2.plot(xs, zs, label="Toolsmiths", color=colors["toolsmiths"])
axw2.set_ylabel("Toolsmiths")
axw2.set_ylim(bottom=0, top=100)
axw2.legend(loc='center right')
axd = axs[1]
axd.plot(xs, ws, label="Total time (days) / baseline", color=colors["time"])
axd.set_ylabel("Total time (days) / baseline")
axd.set_xlabel(xlabel)
axd.set_ylim(bottom=0)
axd.legend(loc='lower left')
axd2 = axd.twinx()
axd2.plot(xs, zs, label="Toolsmiths", color=colors["toolsmiths"])
axd2.set_ylabel("Toolsmiths")
axd2.set_ylim(bottom=0, top=100)
axd2.legend(loc='center right')
plt.tight_layout()
plt.show()
Exploring with a linear productivity curve, while varying the maximum increase in productivity.
linear_productivity = lambda x, multiplier=1: 1 + multiplier * x / 10_000
xs = []
ys = []
zs = []
ws = []
vs = []
base_work, base_days, _ = calculate_work(toolsmiths=0, engineers=100, productivity_curve=linear_productivity, brookes_law=lambda x, _: x)
for multiplier10 in range(0, 41):
multiplier = multiplier10 / 10
xs.append(multiplier + 1)
work, duration = optimum_work(partial(linear_productivity, multiplier=multiplier), brookes_law=lambda x, _: x)
ys.append(work[1] / base_work)
zs.append(work[0])
ws.append(duration[1] / base_days)
vs.append(duration[0])
explore_optimum_work(
title="Linear productivity curve without brooke's law",
xlabel="Final productivity at the end",
xs=xs,
ys=ys,
zs=zs,
ws=ws,
vs=vs)