The unreasonable effectiveness of sys.settrace (and sys.setprofile)
CPython allows for incredible introspection: settrace
and setprofile
can observe and control just about everything from within the
program. The official documentation doesn't do enough justice to these
functions, and so I decided to take a stab at it.
sys.settrace
sets a trace function that is triggered by the VM
every time it enters or exits a function, processes a line of code,
runs into an exception and even when it runs a single
opcode. Each of these callbacks also give full access to the current
stack frame and code.
sys.setprofile
sets a trace function that's triggered every time the
VM enters or exits both Python and C functions.
There are equivalents available in the threading
module to allow
setting tracers for multi-threaded programs as well:
threading.settrace
and threading.setprofile
.
The frame
object contains the current state of computation: all
locals, globals, current location, etc.; the code
object has the
actual opcodes and function details independent of the current
stack.
Frighteningly enough, it's also possible to modify the state of the program using the frame: so it's possible to override locals, globals, or simply skip around by changing the offset into the bytecode.
Cheat Sheet
sys.settrace(fn)
andsys.setprofile(fn)
to setfn
as the callback.sys.settrace(None)
andsys.setprofile(None)
to clear it.- The trace function has the function signature
(frame, event, arg)
. settrace
events:call
,return
,line
,exception
,opcode
setprofile
events:call
,c_call
,return
,c_return
,c_exception
- The return value is used in all but
return
to specify the new local trace function forsettrace
, and is unused forsetprofile
. arg
depends on the eventc_call
,c_return
,c_exception
: C function objectreturn
: The return value,None
for exceptionsexception
: (exception
,value
,traceback)
tuple
frame
fieldsf_back:
previous framef_code
: code objectf_locals
: as titlef_globals
: as titlef_builtins
: built-in namesf_lasti
: instruction, index into bytecode stringf_trace
: writable, set the trace function for this framef_trace_lines
: writable, trigger trace for each linef_trace_opcodes
: writable, trigger trace for each opcodef_lineno
: writable, current line number
code
fields (immutable)co_name
: function nameco_argcount
: positional argumentsco_posonlargcount
: positional-only argumentsco_kwonlyargcount
: keyword-only argumentsco_nlocals
: # of local variables used by the function, including argsco_varnames
: tuple with names of local variablesco_cellvars
: tuple with local vars referred to by nested functionsco_freevars
: tuple with free variablesco_code
: bytecode as a stringco_consts
: tuple with literals used by bytecode- first item is docstring or None for functions
co_names
: tuple with names used by bytecodeco_filename
: origin filenameco_firstlineno
: first line number of functionco_lnotab
: maps bytecode offsets to line numbersco_flags
: flags for interpreter0x04
: uses *arguments0x08
: uses **keywords0x20
: generator0x2000
: from future import division
A full example of a settrace
A short Fibonacci loop serves as a good example to show the values inside a trace.
def fib(n): i, f1, f2 = 1, 1, 1 while i < n: f1, f2 = f2, f1 + f2 i += 1 return f1
show_trace
is a simple tracing function to give a live example. Note
that f_trace_opcodes
needs to be explicitly set to True
to trace
opcodes in the frame, and import opcode
gets the opcode number to name
mapping.
import opcode def show_trace(frame, event, arg): frame.f_trace_opcodes = True code = frame.f_code offset = frame.f_lasti print(f"| {event:10} | {str(arg):>4} |", end=' ') print(f"{frame.f_lineno:>4} | {frame.f_lasti:>6} |", end=' ') print(f"{opcode.opname[code.co_code[offset]]:<18} | {str(frame.f_locals):<35} |") return show_trace
Finally, actually triggering the trace – and pretty printing the
output. Here we see the bytecode actually being executed; in contrast,
dis.dis
shows the disassembled bytecode representing the function.
import sys header = f"| {'event':10} | {'arg':>4} | line | offset | {'opcode':^18} | {'locals':^35} |" print(header) sys.settrace(show_trace) fib(3) sys.settrace(None)
| event | arg | line | offset | opcode | locals | | call | None | 1 | 0 | RESUME | {'n': 3} | | line | None | 2 | 2 | LOAD_CONST | {'n': 3} | | line | None | 3 | 14 | LOAD_FAST | {'n': 3, 'i': 1, 'f1': 1, 'f2': 1} | | line | None | 4 | 24 | LOAD_FAST | {'n': 3, 'i': 1, 'f1': 1, 'f2': 1} | | line | None | 5 | 38 | LOAD_FAST | {'n': 3, 'i': 1, 'f1': 1, 'f2': 2} | | line | None | 3 | 48 | LOAD_FAST | {'n': 3, 'i': 2, 'f1': 1, 'f2': 2} | | line | None | 4 | 24 | LOAD_FAST | {'n': 3, 'i': 2, 'f1': 1, 'f2': 2} | | line | None | 5 | 38 | LOAD_FAST | {'n': 3, 'i': 2, 'f1': 2, 'f2': 3} | | line | None | 3 | 48 | LOAD_FAST | {'n': 3, 'i': 3, 'f1': 2, 'f2': 3} | | line | None | 6 | 60 | LOAD_FAST | {'n': 3, 'i': 3, 'f1': 2, 'f2': 3} | | return | 2 | 6 | 62 | RETURN_VALUE | {'n': 3, 'i': 3, 'f1': 2, 'f2': 3} |
The underlying code of a function is directly accessible with the
__code__
field, and is also the value stored in the frame's co_code
field.
import inspect code = fib.__code__ print(f"{'field':<20} | value") for name, val in inspect.getmembers(code): if name.startswith("co_"): print(f"{name:<20} | {val}")
field | value co_argcount | 1 co_cellvars | () co_code | b'\x97\x00d\x01\\\x03\x00\x00}\x01}\x02}\x03|\x01|\x00k\x02\x00\x00r\x12|\x03|\x02|\x03z\x00\x00\x00}\x03}\x02|\x01d\x02z\r\x00\x00}\x01|\x01|\x00k\x02\x00\x00r\x01\x8c\x12|\x02S\x00' co_consts | (None, (1, 1, 1), 1) co_exceptiontable | b'' co_filename | /tmp/python-Rpavws co_firstlineno | 1 co_flags | 3 co_freevars | () co_kwonlyargcount | 0 co_lines | <built-in method co_lines of code object at 0x7f2a54808a30> co_linetable | b'\x80\x00\xd8\x10\x17\x81I\x80A\x80r\x882\xd8\n\x0b\x88a\x8a%\xd8\x11\x13\x90R\x98"\x91W\x88B\x88\x02\xd8\x08\t\x88Q\x89\x06\x88\x01\xf0\x05\x00\x0b\x0c\x88a\x8b%\xf0\x06\x00\x0c\x0e\x80I' co_lnotab | b'\x02\x01\x0c\x01\n\x01\x0e\x01\n\xfe\x0c\x03' co_name | fib co_names | () co_nlocals | 4 co_positions | <built-in method co_positions of code object at 0x7f2a54808a30> co_posonlyargcount | 0 co_qualname | fib co_stacksize | 3 co_varnames | ('n', 'i', 'f1', 'f2')
Potentially useful patterns
A completely programmable ad-hoc debugger
I've found settrace
with a simple function to be extremely valuable in
understanding code.
Instead of learning even more arcane commands and manually stepping
through programs with pdb
, I can directly write my "pause" and "inspect" logic
into the trace function and keep iterating on the program – while
also filtering the output to files I care about.
The tracer
module also helps with directly tracing and dumping the
code, but I generally find it too unwieldy compared to directly using settrace
.
Building code execution visualizations
Working on mine is what inspired this note: it's not hard to imagine
beautiful ways to display code execution. My approach is to use
Catapult (chrome://tracing
) for rendering the execution, but it could
just as easily be on the source itself inside an IDE.
Figure 1: Visualizing the broken behavior described in nullprogram
Test coverage
settrace
makes it trivial to observe the lines of code executed in any
specific program run, and the underling C function is how coverage.py works.
They didn't stop to think if they should
Jump to arbitrary parts of a function
I vaguely remember seeing a Hacker News post about jump implemented
with settrace
and then directly writing to the frame offset.
Recreating that is left as an exercise to the reader.
Rewrite arguments on the fly
For bonus points, you can include this code and have it trigger randomly.
import sys def square_arguments(frame, event, arg): if event != "call": return code = frame.f_code if code.co_name != "add": return for var in frame.f_locals: frame.f_locals[var] = frame.f_locals[var] ** 2 def add(a, b): return a + b print(f"add(1, 2): {add(1, 2)}") sys.settrace(square_arguments) print(f"add(1, 2): {add(1, 2)}")
add(1, 2): 3 add(1, 2): 5
Not quite a deus-ex machina
Tracing doesn't capture coroutine creation, which would be
amazing. With sys.set_coroutine_origin_tracking
it's possible to get a
stack trace to where a specific coroutine was created, but I'd love
for that step to also pass through settrace
/ setprofile
.
It's also not directly possible to set this from outside an arbitrary running program that I know of.
Finally, setprofile
intercepts and shares exceptions thrown from C
code, but doesn't intercept Python exceptions. I'm not quite sure why.
References
sys.settrace
in the standard python documentation.- The Python Type Hierarchy, with definitions of frame and code objects.
- inspect, with very readable documentation that makes my cheat sheet irrelevant.
Comments?
Drop me an email or reach out on Twitter.
History
- 2020-06-13: Published.
- (TODO Explore
PyEval_SetTrace
.)