expLog

Tracing Asyncio

I must admit to a hidden motive: the reason I've been spending so much time learning about asyncio internals and familiarizing myself with different mechanisms to introspect asyncio is to build a tracing tool.

Ideally, I would love to be able to see work scheduled on and actually executed by the event loop, broken correctly across threads and clearly rendered in chrome://tracing.

North Star

A tool that can be attached to a running process and can generate a trace of all the asyncio work being run anywhere, including spawned sub-processes.

Initial Version

I'll start by building a module along the lines of Python's tracer that can be manually inserted into a process to build out the proof of concept, and then improve on it from there.

Design

Initial modules that I should need:

  • trace: to actually write out the trace file, strongly coupled to the systrace format described here.
  • __init__: start/stop, the API. Initial inclination is to create an object in start that can then be stopped and should enable the most flexibility for use and testing.
  • Something? for the implementation that can elegantly deal with asyncio internals and handle state correctly.

    Mapping:

  • Event loop life: Duration Event
  • Tasks actually running on the loop: Nested Duration Event (Attach a stack trace to the task)
  • Tasks on an executor: Flow event + Duration event

Solutions

Instrumenting asyncio tasks

For the first version I should be able to build something that works by instrumenting all the interesting frames in asyncio using sys.setprofile or sys.settrace (documentation).

Intercepting any async calls, as well as carefully working through the event loop implementation can hopefully help capture the main behavior of asyncio: unfortunately this will need to be maintained with every python version change, which doesn't sound very attractive.

For a proof of concept, it should work out.

Connecting async tasks from schedule to execution

Task creation is a good point to capture the future's creation and source. The actual run happens inside handle.run, which would be the point to connect the async task with.

The task tracer will need to intercept specific function calls – potentially saving an identifier corresponding to the callback at the time of task creation, and then connecting to it completing the visualization.

Actually intercepting the values

I had to StackOverflow for this but it seems trivial in retrospect: the frames contain all local values, which I can comfortably use to figure out what's being run where for connections.

Watching over coroutines

After realizing the immense power available with sys.settrace this became much more simple: I can retain the frame information of the coroutine, and explicitly mark whenever we enter/ext the coroutine with flow events. This makes it much simpler to watch how a specific coroutine gets a chance to execute.

TODO: Describe the current design much more simply now that I actually have a design that runs.

Rendering the visualization

Ideally I'd like to directly apply Catapult's tracing format which is well documented in a Google Doc.

Connecting flow events

Instrumenting asyncio tasks on a running program

Taking inspiration from both GDB and Pyspy it might be possible to externally get the same information; it's a tracer and not a sampling profiler so I can probably afford to pause the program, update state, and then let it continue.

This will take much more effort and research (and should also be much more fun); I'll revisit this when I have more proof that the idea is sound and I don't expect more landmines.

References:

view source