expLog

AsyncIO In Depth

Python's async / await syntax and asyncio in general can be a little magical and opaque; a few minor adjustments to otherwise synchronous and simple code, and it's possible to quickly achieve concurrency 1.

This article is a depth-first-traversal into the Python's event loop implementation. After several (incomplete) attempts at writing about asyncio, I believe this is the best way to peek behind the curtain and learn the fundamentals of the system.

This is not a way to quickly become productive with asyncio: that function is better fulfilled by numerous other articles and tutorials already published. This is the article you should read after becoming somewhat comfortable applying asyncio to understand why and how it works.

To begin, consider a minimalist and asynchronous "Hello, World!" using asyncio:

import asyncio

async def hello_world():
    await asyncio.sleep(1)
    print("Hello, world!")

asyncio.run(hello_world())
Hello, world!

There's a surprising amount of machinery behind this simple piece of code to say hello after waiting for a second. (The delay helps make sure really fast software doesn't spoil us, even as we keep improving the hardware.)

The async keyword

The Python lexer has to deal with a new keyword to parse this variant of hello_world: the first way to inspect it within Python is to look at the AST generated.

(To keep the output focused, I'm eliding the call to await; =async has the spotlight for the moment.)

import ast
print(ast.dump(ast.parse("""
async def hello_world():
    print("Hello, world!")
""")))
Module(body=[AsyncFunctionDef(name='hello_world', args=arguments(posonlyargs=[], args=[], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Constant(value='Hello, world!', kind=None)], keywords=[]))], decorator_list=[], returns=None, type_comment=None)], type_ignores=[])

The function is explicitly detected as an AsyncFunctionDef to mark the addition of the async tag. 2

Next up, it's just as valuable to look at the disassembly of the function definition:

import ast
import dis

code = dis.Bytecode(compile("""
async def async_hello_world():
    print("Async Hello, world!")

def hello_world():
    print("Hello, world!")

""", filename='<string>', mode='exec'))

print(code.dis())
2           0 LOAD_CONST               0 (<code object async_hello_world at 0x7fce5600e2f0, file "<string>", line 2>)
            2 LOAD_CONST               1 ('async_hello_world')
            4 MAKE_FUNCTION            0
            6 STORE_NAME               0 (async_hello_world)

5           8 LOAD_CONST               2 (<code object hello_world at 0x7fce5600e450, file "<string>", line 5>)
           10 LOAD_CONST               3 ('hello_world')
           12 MAKE_FUNCTION            0
           14 STORE_NAME               1 (hello_world)
           16 LOAD_CONST               4 (None)
           18 RETURN_VALUE

I was a little surprised at this: the opcodes for defining the function seem to be exactly the same. Perhaps the magic comes inside the function definition?

import dis

async def async_hello_world():
    print("Hello, world!")

def hello_world():
    print("Hello, world!")

print("Async")
dis.dis(async_hello_world)

print("Function")
dis.dis(hello_world)
Async
  4           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('Hello, world!')
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
Function
  7           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('Hello, world!')
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

The opcodes are also exactly the same! That said, the main identifier for a coroutine is a flag set on the code CO_COROUTINE which is what marks these as different. The actual compiler code is somewhere around here, but I'd prefer to be see this explicitly as well.

Once more into the breach:

import dis

async def async_hello_world():
    print("Hello, world!")

def hello_world():
    print("Hello, world!")

async_flags = async_hello_world.__code__.co_flags
standard_flags = hello_world.__code__.co_flags

only_async_flags = async_flags & (~standard_flags)
print(f"{only_async_flags=}, aka {dis.COMPILER_FLAG_NAMES[only_async_flags]}")

only_standard_flags = standard_flags & (~async_flags)
print(f"{only_standard_flags=}")
only_async_flags=128, aka COROUTINE
only_standard_flags=0

And that's the trivial little difference that changes how a function is evaluated.

Coroutines

The first thing to observe is the async keyword before hello_world marking it as a coroutine; calling hello_world() doesn't directly run the function anymore, but instead returns a coroutine object.

hello_world()
Hello, world!

There's a lot more that we can find out about it because it is a standard Python object after all:

[attribute for attribute in hello_world().__dir__() if not attribute.startswith("__")]
Hello, world!
[]

A coroutine carries its world around with it: unlike a function. A function simple has its own code, but a coroutine also maintains the surrounding frame – which is why it can continue execution in the future.

coroutine = hello_world()
for attribute in coroutine.__dir__():
    if attribute.startswith("cr"):
        print(f"{attribute}: {getattr(coroutine, attribute)}")
Hello, world!

An instance of a coroutine has both a frame and the code being executed, which makes it possible to "pause" the coroutine, and simply restore it when required.

1

: Concurrency allows running multiple computations at the same time, parallelism allows running them on multiple cores.

2

Just for contrast:

import ast
print(ast.dump(ast.parse("""
def hello_world():
    print("Hello, world!")
""")))
Module(body=[FunctionDef(name='hello_world', args=arguments(posonlyargs=[], args=[], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Constant(value='Hello, world!', kind=None)], keywords=[]))], decorator_list=[], returns=None, type_comment=None)], type_ignores=[])

Normal functions are parsed, unsurprisingly, as FunctionDef.

view source