< expLog

[Draft] Eating Elephants

As a software engineer, you'll almost certainly ramp up on large projects and code bases: anything from massive proprietary setups to popular open source projects.

The best way to get better at learning a new project is to do it repeatedly across different contexts and directly building the appropriate mental muscles. This post is for those of you who haven't had that luxury yet: strategies and tactics for managing complexity and delivering along the way. YMMV: these are approaches that work for the way my mind functions.

With several bad metaphors thrown in along the way.

How do you eat an elephant? One bite at a time.

elephant.png

Figure 1: No elephants (artificial or real) were harmed in the making of this post. (Generated with DALL-E 2.)

Be very clear on what you want to achieve

When faced with a large project, it's unfortunately easy to fixate on a specific rabbit hole and end up spending all your time there while forgetting why you were there in the first place. Always remember the broader context and constraints! The care with which you gain context and evaluate approaches will vary dramatically.

Let's make this concrete: If you're only ramping up to do a quick bug fix on an old, mostly dead project: you should be laser focused on the desired behavior, the observed behavior and the safest, smallest change you can do to make it. "Patch"-fixing might even be acceptable by tackling the symptom without treating the underlying root cause.

On the other hand, if this is something you're going to be spending a significant amount of your time on: you want to build significantly broader context on what the code is doing, how it got to this state, and building a significantly more detailed mental model.

Of course, these aren't as distinct as they may seem: it's entirely likely that you'll need to to start by setting out fires as quickly as possible, and then take the time to build deeper and broader context.

Build and refine your mental model

I feel comfortable with a system once I have a reasonably accurate mental model of how it operates. The model should be high resolution and precise for areas that need to be actively changed soon, while it's generally acceptable to treat other parts as black boxes. Ideally you make the time to peek into those boxes every so often.

Think through how you would have built it

To reach a first approximation, imagine how you would build this system; given the constraints and behaviors you understand. The most interesting parts are going to be figuring out where reality is different from how you would have done it, and digging into why.

See what is, not what you expect to see

After conducting more than 500 interviews one extremely common mistake I see programmers make is to fall prey to confirmation bias: instead of carefully exploring what the program is actually doing, they assume it's doing what they wrote it to. I'm bitter about this because I repeatedly make the same mistake in my daily life, never quite seeming to learn from my experience.

My favorite approach to counteract this is to actively look for any signals that suggest I'm wrong. This is hard, and has the added negative of appearing significantly less confident than you should be – but is worth the price.

Think of it like the sentence with 'the' repeated across multiple lines: sometimes it can be very hard to see the underlying issues.

Hack it up any which way

This is a recommendation to try and swallow the elephant whole: you're almost certainly going to fail, but you'll get a true sense of the magnitude of the task.

Even if it is surprisingly counter-intuitive you can minimize the risk by simply hacking in the changes you'd like to make, without paying attention to design, maintainability or code quality.

Simply getting your changes to work can be very valuable to identify if there are any large unknown unknowns, significantly reduces the chances of unpleasant surprises a few weeks to months down the line and really validates your mental model.

I don't recommend actually deploying the changes of course, nor even committing them; at best you might use them for a temporary demo to make sure your team is aligned.

You'll learn a lot even if the attempt at hacking it up fails completely: at least you'll know the big questions you need to answer before you can meaningfully tackle this project.

Make small changes with confidence

This is the part where you nibble at the edges of the elephant to build up confidence that you'll be able to actually eat it one day.

At the other end of the spectrum, you want to gain confidence in your knowledge of – and ability to change – the system. Start by kicking the tires – add unit tests, fix lints, spellings, add examples to the documentation, and very simple functional or non-functional changes that will still run in production.

Check that you can deploy them smoothly, and they behave as you expect once you've made the changes. Along the way, you'll gain goodwill from your team, have something to show for all the time you've been spending ramping up on the system, and get that glow of satisfaction at actually landing something instead of rotating in place for months.

Move from a good working state to a good working state

This is a reminder to take small bites, and to not bite off more than you can chew.

Because you're dealing with so many unknowns, you need to move carefully: always go from a known good state that behaves exactly as you expect it to the a new good state making one change at a time. This simple mechanical behavior can save you hours of debugging and working backwards to identify unintentional consequences.

At times, it can take a little bit of humility to constrain yourself to small changes instead of giant leaps; you're most likely to learn the same way I did – which is to fail often enough till I trained myself to make small but confident steps.

Believe only in empirical evidence

There are no appropriate elephant analogies here. Carry along.

Reading code is one thing, running it is another entirely.

Anybody who thinks "Just read the code and think about it" – that's an insane statement – you can't even read all the code in a big system, you have to do experiments on the system. – John Carmack

Reading code can be fairly misleading: you can't really be certain if other abstractions won't interfere with execution – particularly if you're not aware of the context the system lives in.

Reaching for an example from something I've been working on recently: PyTorch supports several transforms to make your easy-to-modify but slow-to-run model into something that's extremely fast. But if you didn't know that it's going to be traced, you're going to have completely broken assumptions on what runs.

Lightly modifying the simple example in the torch.fx documentation: I added a small print statement to forward to always see the value of x.

import torch
class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.param = torch.nn.Parameter(torch.rand(3, 4))
        self.linear = torch.nn.Linear(4, 5)

    def forward(self, x):
        print(f"A wild print appeared! {x=}")
        return self.linear(x + self.param).clamp(min=0.0, max=1.0)

Reading the code I'd assume the print statement is always going to be executed: but if this module is FX-traced before execution (and it probably will be) – it's going to get elided.

If you don't believe me, you can try running through this notebook. Actively running the code through FX and printing out the generated version makes it much more explicit:

def forward(self, x):
    param = self.param
    add = x + param;  x = param = None
    linear = self.linear(add);  add = None
    clamp = linear.clamp(min = 0.0, max = 1.0);  linear = None
    return clamp

For examples from further in the past I'll point to optimizing compilers and other brilliant magic. (Which is why tools like the Godbolt compiler explorer are essential.)

Use a debugger

If you have a good enough setup to easily attach a debugger and run the program fast enough, step through it – save your breakpoints, and try playing with the program as it runs.

Unfortunately I've mostly worked with distributed systems, custom built PHP debuggers and android: debuggers have been flaky, occasionally misleading and have slowed down execution to be impossible.

I dislike the fact that the output and state of the debugger is ephemeral – it's extremely hard to go back and compare to a previous state without being extremely disciplined around recording state.

Debuggers that can run in reverse solve this, but tend to be extremely expensive given the amount of data they need to resolve.

Use a REPL

Getting a live REPL into your code base can be an excellent way to see what's going on in small pieces – you can call deeply nested functions directly to see how they behave and validate behavior.

This works particularly well in Python, on notebooks, and other similar systems to piece-wise sanity check that things are working properly.

If you're lucky (and nearby security engineers maybe not so much) you might get a REPL into a live production system to poke around in it.

Sprinkle print statements liberally

If you're dealing with something much more distributed, or something that happens to slow down to a crawl if you try to debug it print-equivalent based debugging can work just as well – with the added advantage of persisting easily between sessions, and all the way into production if you want it to.

Occasionally you'll find that you don't own stderr or stdout: they've been redirected, are already overwhelmed with noisy print statements by inferior programmers, or otherwise used for the program's contracted behavior. Open up an extra file instead and write to it – sometimes several if you want to break up the logs into something easier to deal with.

Use Tracebacks and exceptions

To see how certain functions are used, try explicitly logging a stack trace at interesting points with the arguments and return values. That's been the whole point of one of my open source projects, Panopticon.

Make like Sherlock Holmes

This is getting somewhat repetitive, but being extremely conscientious about collecting and paying attention to every signal is incredibly valuable.

sherlock.png

Figure 2: If it looks like an elephant, talks like an elephant, walks like an elephant… (Generated with DALL-E 2.)

Look for fingerprints

Ideally you'll have good ways to quickly navigate and search through your code base: the more obvious jump-to-TAGS is always valuable to explicitly look up class and function names, but do remember to also pay attention to other unintentional fingerprints as you search through the code.

One of the most surprisingly effective ways I've found to navigate a co-debase is to try and search for the strings that show up all over the place: either in the ui, or in logs; elide the more obviously dynamic parts and it becomes surprisingly easy to jump straight to the code you care about – or a specific constant that you can then work backwards from.

Other fingerprints include CSS class names, ids, etc. – particularly those that haven't been minified yet.

Pay close attention to the logs

Always remember to carefully read all logging: it's too easy to miss something obvious right in front of you. Most of the logs correspond to something someone else thought was interesting about the program's behavior; you might as well pay attention as you learn about it.

There are several programs that can help you navigate logs more easily: while there are some excellent UIs and CLIs like The LogFile Navigator, I often find myself dropping into Vim with a copy of the logs. That gives me the ability to search and delete anything I don't care to see, reformat some logs; occasionally copy out a few specific instances and explicitly diff them.

In extreme scenarios, I'll also pull all the logs into a notebook and convert them to a dataframe for faster manipulation and analysis (particularly useful for extracting all logs for a specific thread by breaking them out by pid).

Look out for any instrumentation

Any live instrumentation that captures the system's execution is the last set of data I'd recommend pulling up. Instrumentation should help you build empathy with how the system executes: do requests take milliseconds, seconds, or minutes? What does p99, p99.9, p99.99 latency look like? How many resources are consumed? Is this code CPU bound, I/O bound, or neither?

Don't trust instrumentation at first pass either: look up how it's generated, validate it with simple tests, and then you can assume that it's probably not broken.

Learn to speed read code

Even though relying solely on reading code is misleading, you still need to be able to skim it really quickly. The same rules for reading books well fast apply: focus on the interfaces instead of digging deeply into the implementations, spend your time on critical pieces.

You should figure out ways to quickly navigate your code base for this to work really well: and there are 2 mechanisms you should have.

Indexing based navigation

The ideal is always a system that lets you jump to the precise function or class you care about, or even with a regular expression. A search engine / IDE that understands the code well is always amazing.

Old school TAGS, shiny new LSPs, or Github's new code search and inline links to source code are all good examples.

String based navigation

At the same time (particularly to navigate using fingerprints) I find simple text (or regex) based search invaluable. Ideally you have a system that lets you grep through your code base quickly.

At its simplest, hopefully you have a checkout that you can find and grep your way through with ease; but for particularly large systems something like Open Grok or Android Code Search is invaluable.

Working with other people

Happily enough, you'll generally work with people who've already consumed this particular elephant and can significantly speed you up along the way. There are several tactics to learning well from others. To the horror of the elephant community.

Communicate your progress!

Setting and updating expectations constantly as you ramp up can be one of the most valuable things you can do. Explain how far you've gotten, how long you think it'll take and why.

Writing down and sharing (much more on this in a minute) what you've found will also help validate that you're building a good understanding of the system and act as a bridge for whoever happens to come after you.

Regular communication on an automatic cadence: as opposed to self identifying specific milestones also significantly reduces pressure on sharing updates: if you do it every week, then people can rely on understanding your progress regularly when you push the information; instead of having to pull it when they get worried.

Ship small wins as soon as you can

Never under-estimate the value of shipping several small wins quickly: you'll build momentum and confidence in yourself, and inspire confidence in others at the same time.

You're also actively fixing and improving the project you're working on, doing good work sooner along the way.

Respect Chesterton's fence

As satisfying as it is to rant about past design decisions – and they may definitely be horrifying – the engineers before you did the best they could with the constraints they were working under.

You need to figure out which of those constraints still apply and which ones are obsolete, and do this without running headlong into them. Try to understand the rationale behind questionable design decisions to make sure they weren't solving a problem you just haven't faced yet.

But don't be afraid to break it

Once you're sure you have context on past decisions, you should feel empowered to go and change them. It can be just as damaging to blindly accept past decisions as it is to change them without thinking through the consequences.

Ask questions well

While there are no stupid questions, you can get much more from your colleagues by asking good questions well. You want to show that you did your research, what worked and what didn't; you want to minimize the time they need to answer your question – but still get the most information you can.

If someone hands you a fish, also follow up on how to fish – "How did you solve this?" so that you don't need to repeatedly ask for the same things.

Of course, there's also a cost to spending too much time digging before asking questions. A rule of thumb along the lines of "one hour of digging" before reaching out can be very valuable instead.

Read everything

Read the documentation

Learning from the engineers who came before you isn't necessarily restricted to directly talking to people. There's a lot of information available in any reasonable code base.

Most obviously, you should quickly navigate and understand the documentation that's available: particularly valuable is documentation that explains the why. I generally recommend reading the code itself to understand the how; documentation describing the how also tends to bit-rot the fastest.

Read the commit history

Less obviously, see how the code evolved: go spelunking into the commit history and look for major decisions for the parts of the code base you're interested in. I like to jump to the commit that introduced that particular file/class/function in the first place to see the original intent for adding it, without the cruft that might have grown up around it over the years. Then you can skip along to any major refactors or structural changes that render it unrecognizable.

Look for design documents & discussions

Look for any old design documents or discussions that might be hanging around: this can also act as an excellent reference to find more people to talk to. One of the most important pieces of context you should gather is the set of problems that this piece of code was meant to solve: and then to determine which of those problems is still applicable or if there are new ones that must be addressed.

Pair program, or shadow engineers

It can be both inspiring and extremely informative to simply shadow others who're comfortable with the system. You can see what it's going to be like once you build mastery, and ask live questions along the way.

This may or may not be possible depending on your setup, but can be one of the fastest ways to get better at ramping up and solving problems.

Write everything

The single most valuable tool I've found for maintaining my orientation and sense of progress as I work through something complex is to have a written log of everything I'm trying, what's worked, what's not, links and screenshots.

writeeverything.png

Structure

There's a certain structure that works fairly well for me; I recommend adapting it to one that works well for you:

What's the goal?

Start by writing out the problem you're solving at the top: why are you dealing with this? What do you hope to achieve? Scrolling past the note every morning is extremely useful for avoiding rabbit holes and focusing on the most valuable ways to spend your time.

Open Questions

Second, I try to have a list of major open questions I want answers to that I haven't found yet: either why a certain subsystem works a certain way, or where I should make certain changes as examples. This reminds me to look out for signals I would otherwise ignore.

References

Then I keep quick links to references I need to access frequently that are related to that project; this list can become extremely large so I'd recommend keeping it to only the most important references. The rest can live in the daily log.

Daily Log

Finally, a daily log to take notes on progress: I tend to write out how much time I expect to have today to work on this, then the actual work I hope to accomplish as headings. I fill this in as I go through the day, including stack traces, observations, other TODOs that pop up (particularly second-order tools I could build to make all of this painless, and why-oh-why has no one else implemented it yet).

The daily log tends to be incredibly valuable as an excellent replacement for my memory: I can easily return to the project and start right where I left off. It also helps maintain momentum and understand my own progress – without concrete results or code it can be hard to remember the sheer amount of work that goes into ramping up on something complex or brand new.

Diagrams!

A complementary skill to build out is to build mind maps of the code base: I strongly recommend using something electronic like Kinopio or Scapple. That allows you to paste in links and code instead of writing them down and can help navigate the most confusingly named of code bases (Android, I'm looking at you!).

Be selective about how you structure your diagram: resist the urge to simply put every single class in – prioritize what's important and build a map into the code base that highlights the parts that you actually care about.

I used to lean on Scapple a lot while While I was working on Android: as an experiment I'd written a web-based Scapple renderer to be able to share these: you can also pan around the full Scapple.

anrscapple.png

Bon Appetit!

Once you're finished with everything, come back to your notes and diagrams and use them to build solid documentation for the work you did, as well as on boarding documentation for others that come after you – blaze a trail!

Consider fixing – or at least writing about – all the shortcomings you experienced while you still remember the pain: one of the super-powers of a new-joiner is that you're not affected by "It's-always-been-that-way"itis, and have a chance at fixing it before you become accustomed to the status quo as well.

I hope you find great success and build valuable things!

Comments

Add your comments to this Twitter thread, or feel free to drop me an email.

Updates

  • 2022-09-04: Marked as a draft as I significantly restructure this post.
  • 2022-08-21: Removing the "DRAFT" title, but I'd still like to continue editing and revising.