vittorio romeo's website

C++20 coroutines have lovely syntax. They are also a terrible fit for game development.

If you’ve ever tried to use them for boss scripts, dialogue, or AI behaviors – anywhere you want straight-line code that pauses for a few frames – you’ve probably hit the same wall I did: opaque handles, heap allocations , hidden compiler lowering, and – most damning for games – no way to serialize a paused coroutine to disk.

In this article, I will present sfex::Coroutine : a ~200-line stackless macro-based coroutine library built around a variant of the classic switch + __LINE__ trick. Like my previously discussed sfex::Profiler , these coroutines are meant to be simple and lightweight.

what’s our goal?

Our goal is to implement coroutines that meet these criteria:

Allocation-free: The entire coroutine state is one int plus whatever members you put on the struct. Trivially serializable: Save the integer state and the struct’s data, and that’s it. Mid-cutscene saving and loading just works™. Deterministic: No hidden compiler lowering; the generated control flow could be written by hand. Composable: Coroutines can await other coroutines, all in straight-line code. Tiny: The whole runtime is a small header you can read in one sitting.

The full implementation is part of my fork of SFML, in SfexCoroutine.hpp . Here are a few examples that showcase the feature, that you can directly play here in your browser !

????️ Shoot-em-up boss fight

????️ Coreographed top-down stealth game

????️ Dialogue cutscene

the trouble with C++20 coroutines

Before we get to work, let’s see what’s wrong with co_await for game programming.

Unpredictable heap allocations: The frame of a C++20 coroutine is allocated dynamically. The standard provides “Heap Allocation eLision Optimization” (HALO) as an allowed optimization, but it is not guaranteed by the language. Compilers do it inconsistently , and any change to the coroutine body or to where its handle escapes can silently re-introduce the allocation. Unoptimized builds might not get HALO at all – so your debug build might silently be allocating per-coroutine on the hot path while your -O2 build might not. For a real-time game with tight latency deadlines, “might” is an uncomfortable term. As far as I know, there’s no attribute that you can apply to a coroutine to ensure that HALO is applied.

Opaque handles and hidden state: std::coroutine_handle is a pointer to compiler-managed state. There is no portable way to ask: “what’s in this coroutine’s frame? what local variables does it have? where in the source code is it suspended right now?” That makes serialization impossible without compiler hooks. For a game where the player can press F5 to quick-save at any moment, that’s a deal-breaker. Serialization is not only useful for player-facing features such as quick saving, but is also very important during development. Being able to deterministically replay save states greatly helps with iteration speed and debugging, and being able to pause/introspect the state of the program at any time and interact with it is a massive productivity boost. C++20 coroutines are well-shaped for transient work: an async HTTP request that flips a “loaded” flag when it completes, a generator that yields a finite sequence and is then discarded, a one-off scheduled task. The coroutine runs, eventually has some side effect on your real game state, and goes away. Its internal state during the suspension is scaffolding – you don’t reach into it, you just wait for the side effect. Game logic is not transient. The boss’s behaviour is part of the boss’s data: “we’re paused on phase 3, with 0.3 seconds of wait remaining” must travel alongside the boss’s HP and position through saves, quick-loads, and network sync. C++20 coroutines push hard against this: the only state model you get is “internal compiler scaffolding, you don’t get to look at it”.

Cumbersome customization machinery: To get a working coroutine type you need to write a promise_type , an awaitable , deal with await_transform , final_suspend , initial_suspend , return-object construction, and so on. Even minimal implementations of generator<T> involve dozens of lines of plumbing.

For a game I want a coroutine that is part of an object’s data. When the object dies, the coroutine dies with it. When I serialize the object to a save buffer, the coroutine’s state goes with it. When the optimizer is off, there is no extra cost compared to the equivalent state machine. C++20 coroutines do not provide any of these guarantees out of the box.

Let’s build something that does.

a tiny cutscene

Imagine a simple scripted exchange between two characters:

“Hero: I finally got you.” Wait one second. “Villain: You’re too late!” Wait half a second. “Hero: We’ll see about that!” Wait one second. Hide the dialogue UI.

Here’s how this looks as an sfex::Coroutine :

struct DialogueScene : sfex :: Coroutine DialogueScenesfexCoroutine { operator ()( World & world ) YieldWorldworld { ; SFEX_CO_BEGIN . showText ( "Hero" , "I finally got you." ); worldshowText ( Wait { 1.0 f }); SFEX_CO_YIELDWait . showText ( "Villain" , "You're too late!" ); worldshowText ( Wait { 0.5 f }); SFEX_CO_YIELDWait . showText ( "Hero" , "We'll see about that!" ); worldshowText ( Wait { 1.0 f }); SFEX_CO_YIELDWait . hideText (); worldhideText ( Done {}); SFEX_CO_RETURNDone ; SFEX_CO_END } };

It nicely reads top-to-bottom. Each SFEX_CO_YIELD(Wait{...}) means “pause this coroutine for this many seconds, then come back here”. You can save the game between any two yields and reload it: the coroutine resumes at the right line, with all its members intact. There is no compiler magic – the runtime cost is one integer state field that tracks the current resumption point, plus a switch .

Driving it from the main loop is equally simple:

; DialogueScene scene float waitTimer = 0. f ; waitTimer while ( true ) { const float dt = clock . restart (). asSeconds (); dtclockrestartasSeconds if ( waitTimer > 0. f ) waitTimer { -= dt ; waitTimerdt } else { ( world ). match ( sceneworldmatch [&]( NextFrame ) { /* ...run again next frame... */ }, NextFrame [&]( Wait w ) { waitTimer = w . seconds ; }, Wait wwaitTimerseconds [&]( Done ) { /* ...proceed to next scene... */ }); Done } // ... draw, display ... }

Quick note on the types: Yield is a std::variant<NextFrame, Wait, Done> , and .match(...) is a thin wrapper around std::visit using overloaded lambdas. The point is that the three yield kinds are explicit data – the driver pattern-matches on them and decides what to do: ignore, sleep, or stop.

the same thing as a state machine

It would be easy to make the coroutine look good by comparing it against a (deliberately) bloated FSM. To be fair, here’s the minimal state-machine equivalent: one step counter and one wait timer.

struct DialogueSceneFSM DialogueSceneFSM { int step = 0 ; step float waitTimer = 0. f ; waitTimer bool tick ( World & world , float dt ) tickWorldworlddt { if ( waitTimer > 0. f ) waitTimer { -= dt ; waitTimerdt return true ; } switch ( step ++) step { case 0 : world . showText ( "Hero" , "I finally got you." ); waitTimer = 1.0 f ; return true ; worldshowTextwaitTimer case 1 : world . showText ( "Villain" , "You're too late!" ); waitTimer = 0.5 f ; return true ; worldshowTextwaitTimer case 2 : world . showText ( "Hero" , "We'll see about that!" ); waitTimer = 1.0 f ; return true ; worldshowTextwaitTimer case 3 : world . hideText (); return false ; worldhideText } return false ; } };

For this particular shape – a flat “do X, wait, do Y, wait” sequence with no loops or branches – the step-counter FSM is short, readable, and saves to the same number of bytes as the coroutine.

IMHO, the coroutine clearly wins the moment your script starts orchestrating visual state on top of the dialogue.

Let’s actually draw the two characters on screen and have something happen between the lines of dialogue: the villain takes three menacing paces toward the hero, with a brief pause between each step. The dialogue resumes once the approach is finished.

Here’s the part of the coroutine that handles the menacing approach:

struct CutsceneScene : sfex :: Coroutine CutsceneScenesfexCoroutine { int paceIdx = 0 ; paceIdx float t = 0. f ; :: Vec2f paceStartPos ; sfVec2f paceStartPos operator ()( World & world ) YieldWorldworld { ; SFEX_CO_BEGIN // ... for ( paceIdx = 0 ; paceIdx < 3 ; ++ paceIdx ) // take three steps forward paceIdxpaceIdxpaceIdx { = world . villainPos ; paceStartPosworldvillainPos t = 0. f ; while ( t < 1. f ) // smoothly move by 100px per step { t += world . dt ; worlddt . villainPos . x = paceStartPos . x - 100. f * t ; worldvillainPospaceStartPos ( NextFrame {}); SFEX_CO_YIELDNextFrame } ( Wait { 0.30 f }); // menacing pause between paces SFEX_CO_YIELDWait } . showText ( "Hero" , "Don't come any closer!" ); worldshowText ( Wait { 1.5 f }); SFEX_CO_YIELDWait ( Done {}); SFEX_CO_RETURNDone ; SFEX_CO_END } };

Three nested control-flow constructs: an outer for over the three paces, an inner while that does the smooth interpolation, a Wait after each pace.

Three pieces of persistent state: paceIdx , t , paceStartPos .

Again, you can read it top to bottom and visually see the temporal flow of events.

Very importantly, those three values have to be struct members, not function-locals inside operator() . There’s a good reason – the rules and footguns section below covers it – but the rule of thumb is short: anything that needs to survive a yield goes on the struct.

Let’s now try to compare the above solution with an FSM-based one. A clever FSM author might create a Tween helper to clean up the per-frame interpolation:

struct Tween Tween { :: Vec2f from , to ; sfVec2f fromto float duration = 0. f , t = 0. f ; duration void start ( sf :: Vec2f f , sf :: Vec2f tt , float d ) startsfVec2f fsfVec2f tt { = f ; from = tt ; tott = d ; duration t = 0. f ; } bool active () const active { return t < 1. f ; } :: Vec2f tick ( float dt ) sfVec2f tickdt { t += dt ; dt return from + ( to - from ) * t ; fromtofrom } };

That handles a single smooth movement nicely. But the outer “do this three times with a menacing pause between each” doesn’t fit in Tween ; it needs its own state. The painful region of the FSM looks roughly like this:

case N : // outer loop entry = 0 ; paceIdx = N + 1 ; step continue ; case N + 1 : // start next pace if ( paceIdx >= 3 ) { step = N + 3 ; continue ; } // exit loop paceIdxstep . start ( world . villainPos , world . villainPos + sf :: Vec2f {- 100. f , 0. f }, 0.35 f ); villainTweenstartworldvillainPosworldvillainPossfVec2f = N + 2 ; step return true ; case N + 2 : // tick the tween + post-pause if ( villainTween . active ()) villainTweenactive { . villainPos = villainTween . tick ( dt ); worldvillainPosvillainTweentickdt return true ; } = 0.30 f ; waitTimer ++ paceIdx ; paceIdx = N + 1 ; // back to top of loop step return true ;

Three coordinated case branches plus a paceIdx member, just to express for (paceIdx = 0; paceIdx < 3; ++paceIdx) . The Tween helper covered one level of “thing happening over time”; the outer loop is the second level, and the FSM has to grow another mini-state-machine on top.

In contrast, the coroutine just has one int paceIdx; and lets ordinary C++ control flow do its job.

The full cutscene discussed above is in CoroutineDialogue.cpp , and playable in the browser at the top of the article.

My claim is: coroutines aren’t dramatically better for trivial scripts, but they become so the moment we have more than one layer of “thing happening over time”.

how does it work?

Every SFEX coroutine expands to a switch-based state machine you’d be able to write by hand – the macros just hide the bookkeeping.

Here is the body of DialogueScene::operator() after preprocessing (lightly cleaned up to drop some noise we’ll talk about in a moment):

:: operator ()( World & world ) Yield DialogueSceneWorldworld { switch ( state ) // SFEX_CO_BEGIN state { // case 0 : // . showText ( "Hero" , "I finally got you." ); worldshowText = 1 ; // SFEX_CO_YIELD(Wait{1.0f}) state return Wait { 1.0 f }; // Wait // case 1 : // . showText ( "Villain" , "You're too late!" ); worldshowText = 2 ; // SFEX_CO_YIELD(Wait{0.5f}) state return Wait { 0.5 f }; // Wait // case 2 : // . showText ( "Hero" , "We'll see about that!" ); worldshowText = 3 ; // SFEX_CO_YIELD(Wait{1.0f}) state return Wait { 1.0 f }; // Wait // case 3 : // . hideText (); worldhideText = 0 ; // SFEX_CO_END state return Done {}; // Done } // // std:: unreachable (); // unreachable }

What to notice:

case labels can be nested anywhere inside the switch: This is legal C and C++. A case label can sit arbitrarily deep inside the switch’s body – between two normal statements, inside a block, inside a loop – as long as no other switch opens between the dispatch and the label. We use this to drop a fresh case label after every yield so the next switch (state) jumps right back to where we left off. Each yield is two halves: The “going-to-sleep” half ( state = N; return Wait{...}; ) runs when the coroutine is currently executing. The “waking-up” half ( case N: ) runs when the driver re-enters after the wait expires. The two halves are placed adjacent to each other in source order, so the body reads top-to-bottom even though execution leaves and re-enters multiple times. The resumption point is just an int state : That’s literally all that’s needed for the coroutine to know where to resume. The rest of the “coroutine frame” – locals, persistent counters, anything that has to survive a yield – is just regular member data on your struct. The compiler can’t change this representation; it’s an integer plus whatever you wrote. std::unreachable() tells the compiler “control flow never reaches here”: Without it, GCC and Clang would warn about a missing return.

In the real macro expansion, every yield is wrapped in a do { ... } while(0) :

// `SFEX_CO_YIELD` expansion: do { = 1 ; state return Wait { 1.0 f }; Wait case 1 :; } while ( 0 );

This is the standard macro-hygiene trick: it makes the macro expand to a single statement, so writing if (cond) SFEX_CO_YIELD(...); followed by an else does the right thing. The case label sitting inside the do block is still part of the enclosing switch (rule 1 above), so jumping to it from outside the do works fine.

These are the actual macro definitions:

// 1. Open the `switch` statement // 2. Record the initial value of `__COUNTER__` #define SFEX_CO_BEGIN \ static constexpr int _sfex_base = __COUNTER__ ; \ _sfex_base switch ( state ) { case 0 :; state // 1. Update the `state` to the next resumption point // 2. Return (yield) the specified `value` // 3. Open the `case` for the next resumption point #define SFEX_CO_YIELD ( value ) \ value do { \ state = ( __COUNTER__ + 1 ) - _sfex_base ; \ state_sfex_base return value ; \ value case ( __COUNTER__ - _sfex_base ):; \ _sfex_base } while ( 0 ) // 1. Reset the coroutine state // 2. Return the specified `value` #define SFEX_CO_RETURN ( value ) \ value do \ { \ state = 0 ; \ state return value ; \ value } while ( 0 ) // 1. Syntactically close the `switch` statement // 2. Add the "unreachable" sentinel #define SFEX_CO_END \ } \ \ std:: unreachable (); unreachable

…counter?

You might be wondering: why __COUNTER__ and not __LINE__ ?

The Protothreads library by Adam Dunkels (providing “extremely lightweight stackless threads designed for severely memory constrained systems”) uses __LINE__ to generate unique state values, making the state values more readable – “we’re paused at line 42” is easy to audit. I specifically chose not to use __LINE__ , and the reason is serialization.

Recall that the main pitch of this design is that the entire coroutine state is an int field. You save it to disk, you load it back, you resume right where you left off. That works only if the integer means the same thing across save and load.

With __LINE__ , the state value is the source-code line number of the yield.

Any cosmetic edit to the source file shifts those numbers.

Add a blank line above the function… Reformat with clang-format … Insert a // ... comment…

Now the integer ‘47’ in the save file – which used to mean “paused at the third yield” – means something completely different. Save files written before the edit resume mid-function at a random instruction. Not good.

__COUNTER__ does not have this problem. It increments per use, not per source line, so cosmetic edits don’t shift the values. As long as you don’t add or remove yield points, the numbering for every existing yield stays stable. Even when you do add a yield, only the yields after the new one shift – everything before keeps its identity.

We relativize the counter to the start of each function thanks to _sfex_base so the case labels start at 1, 2, 3, ... per coroutine and don’t depend on what comes before in the translation unit.

The only price we pay versus __LINE__ is that state values are no longer human-readable. For my use case, save-file stability matters far more than peeking at an integer in the debugger.

However, while __COUNTER__ survives cosmetic edits to the function body, it does not survive meaningful edits. The moment you ship a patch that adds or removes a yield, every state value below the change shifts – saves written under the previous build will resume into the wrong yield site. There’s no easy way to avoid this in the design (the same hazard applies to FSMs, too).

Two practical workarounds:

Tag every save with a coroutine-revision number (a constant you bump whenever you touch a coroutine), and either refuse to load saves with a mismatched revision or keep an older version of the coroutine in the source code for backwards compatibility.

Save the high-level coroutine identity as data, e.g. “we’re at the start of phase 3 of fight 2” – alongside the integer state, and rebuild the integer from the data on load when revisions don’t match.

composing coroutines: enemy AI

A scripted dialogue is a nice introductory example, but the real value shows up when you start composing coroutines. Here are two firing patterns for an enemy in a shoot-em-up (a la Touhou):

struct RingFire : sfex :: Coroutine RingFiresfexCoroutine { int rings = 4 ; rings int i = 0 ; operator ()( World & world , Enemy & self ) YieldWorldworldEnemyself { ; SFEX_CO_BEGIN for ( i = 0 ; i < rings ; ++ i ) rings { . spawnBulletRing ( self . pos , /* count */ 8 , /* speed */ 120. f ); worldspawnBulletRingselfpos ( Wait { 0.4 f }); SFEX_CO_YIELDWait } ( Done {}); SFEX_CO_RETURNDone ; SFEX_CO_END } }; struct AimedBurst : sfex :: Coroutine AimedBurstsfexCoroutine { int shots = 5 ; shots int i = 0 ; operator ()( World & world , Enemy & self ) YieldWorldworldEnemyself { ; SFEX_CO_BEGIN for ( i = 0 ; i < shots ; ++ i ) shots { . spawnBulletAimed ( self . pos , world . player . pos , /* speed */ 220. f ); worldspawnBulletAimedselfposworldplayerpos ( Wait { 0.2 f }); SFEX_CO_YIELDWait } ( Done {}); SFEX_CO_RETURNDone ; SFEX_CO_END } };

If I want my enemy to cycle forever between the two patterns, with a one-second pause between each, it’s as easy as using a while loop:

struct EnemyAI : sfex :: Coroutine EnemyAIsfexCoroutine { ; RingFire ring ; AimedBurst aimed operator ()( World & world , Enemy & self ) YieldWorldworldEnemyself { ; SFEX_CO_BEGIN while ( true ) { = {}; ring ( ring ( world , self )); SFEX_CO_AWAITringworldself ( Wait { 1.0 f }); SFEX_CO_YIELDWait = {}; aimed ( aimed ( world , self )); SFEX_CO_AWAITaimedworldself ( Wait { 1.0 f }); SFEX_CO_YIELDWait } ; SFEX_CO_END } };

SFEX_CO_AWAIT(child(...)) is the composition primitive: it runs the child to completion, propagating whatever the child yields up to the driver.

From the parent’s point of view, awaiting a sub-coroutine looks identical to executing the sub-coroutine’s sequence of yields inline. Yields propagate through arbitrarily many levels.

Each sub-coroutine is owned by its parent as a member – EnemyAI contains ring and aimed . The ring = {} line resets the sub-coroutine’s state before each cycle, which is just a normal copy-assign. There’s no allocation, nor any opaque handle. When EnemyAI is destroyed, ring and aimed go with it.

Internally, SFEX_CO_AWAIT expands to the same state = N; case N:; pattern from YIELD , with a check in the case body:

do { = 1 ; // set resumption point to `1` state case 1 : // begin resumption point `1` { auto _sfex_res = ring ( world , self ); _sfex_resringworldself if (! isFinished ( _sfex_res )) isFinished_sfex_res return _sfex_res ; // child still running -- propagate its yield _sfex_res } } while ( 0 );

Each call: tick the child once, grab its yield.

If the child isn’t done, return the yield up to the parent’s caller (so a Wait{0.4f} from inside RingFire propagates all the way out to the driver, which sleeps the parent’s parent for 0.4 seconds).

If the child is done, fall through past the if , past the do { } while (0) , and on to the next statement – which, in the loop above, is Wait{1.0f} between cycles.

That’s the whole composition mechanism: nested case labels and yields that bubble up by being returned.

Now let’s compare to the FSM equivalent. To match the cycle behaviour, every for loop and every yield in the sub-coroutines becomes its own state. Even just the RingFire part:

struct EnemyAIFSM EnemyAIFSM { enum class Phase Phase { // Embedded `RingFire`: , Ring_Fire , Ring_Wait , Ring_Done , Ring_InitRing_FireRing_WaitRing_Done // Pause: , WaitAfterRing // Embedded `AimedBurst`: , Aimed_Fire , Aimed_Wait , Aimed_Done , Aimed_InitAimed_FireAimed_WaitAimed_Done // Pause: , WaitAfterAimed }; = Phase :: Ring_Init ; Phase phasePhaseRing_Init int i = 0 ; float waitTimer = 0. f ; waitTimer void tick ( World & world , Enemy & self , float dt ) tickWorldworldEnemyselfdt { while ( true ) { switch ( phase ) phase { case Phase :: Ring_Init : PhaseRing_Init i = 0 ; = Phase :: Ring_Fire ; phasePhaseRing_Fire continue ; case Phase :: Ring_Fire : PhaseRing_Fire if ( i >= 4 ) { = Phase :: Ring_Done ; phasePhaseRing_Done continue ; } . spawnRing ( self . pos , 8 , 120. f ); worldspawnRingselfpos = 0.4 f ; waitTimer = Phase :: Ring_Wait ; phasePhaseRing_Wait return ; case Phase :: Ring_Wait : PhaseRing_Wait -= dt ; waitTimerdt if ( waitTimer > 0. f ) return ; waitTimer ++ i ; = Phase :: Ring_Fire ; phasePhaseRing_Fire continue ; case Phase :: Ring_Done : PhaseRing_Done = 1.0 f ; waitTimer = Phase :: WaitAfterRing ; phasePhaseWaitAfterRing return ; case Phase :: WaitAfterRing : PhaseWaitAfterRing -= dt ; waitTimerdt if ( waitTimer > 0. f ) return ; waitTimer = Phase :: Aimed_Init ; phasePhaseAimed_Init continue ; // ... the `Aimed_*` cases follow the same pattern ... } } } };

Composition disappears in a non-hierarchical FSM: it has to flatten RingFire and AimedBurst into states alongside its own. Each child becomes inlined as a sub-region of the parent’s enum, and the parent has to know which child it’s currently running. Adding a third pattern – say, a sweeping laser between rings and aimed shots – means another set of states and another set of transitions to wire up, while the coroutine version is two new lines.

Hierarchical state machines and behaviour trees offer richer alternatives – structured composition on top of FSMs – but their composition primitives are explicit, while coroutines reuse C++’s existing control flow. The coroutine version has the smallest surface area and is, in my opinion, the simplest option (but, perhaps, not the most powerful).

This macro trickery is small but has many sharp edges. In rough order of how often they bite:

Locals that must outlive a yield have to be struct members. When the switch jumps to a case label deep inside the function, it skips over any local-variable declarations between the function entry and the label, so a local declared after the function entry but before a yield is silently re-initialized every time the coroutine is called. If the local sits inside a nested scope (e.g. inside a for loop), the compiler will reject the code with a “jump into protected scope” error instead. The convention is short and absolute: anything that must persist across yields becomes a struct member. The compiler does not catch the silent-reset case; you have to remember. For locals that don’t need to survive the yield – per-iteration scratch values that the rest of the loop body wants to be const – there’s a less heavy-handed fix: wrap them in their own inner { ... } so the scope ends before the yield. The case label inside SFEX_CO_YIELD then lands at a point where the locals aren’t in scope, and the “jump into protected scope” rule simply doesn’t apply.

RAII guards do not span yields. A std::lock_guard , file handle, or unique_ptr declared as a local before a yield is destroyed at the yield, because the coroutine returns from the function. “Suspending” really means “returning, with state set so we know where to come back”. The stack unwinds; RAII fires. If you need a resource held across a yield, put it on the coroutine struct – its destructor runs when the coroutine struct itself is destroyed, not at every yield. Same workaround as locals. While unintuitive, I don’t think this is a big problem in practice, as you’d probably use C++20 coroutines for any sort of resource-management task that involves locks, files, or networking. __COUNTER__ is shared with the rest of the translation unit. The relativization trick ( __COUNTER__ - _sfex_base ) handles __COUNTER__ uses outside the function: the base captures the counter at function entry, so external uses cancel out. But it does not protect against __COUNTER__ uses inside the same function body. Every coroutine yield uses two __COUNTER__ increments; if a macro between two yields also bumps the counter, the case labels for everything below shift. Few macros use __COUNTER__ in practice, but if you mix SFEX coroutines with another __COUNTER__ -driven construct in the same function the failure mode is a runtime jump to the wrong yield site, not a compile error. Adding or removing a yield invalidates older save files. The state integer is an offset into the coroutine’s switch dispatch – adding or removing a yield earlier in the function shifts every state value below it. The same hazard applies to any FSM whose enum you change. Discussed earlier, but here for completeness. Debugging is awkward. After a yield, the program counter “zips” back into the switch dispatch on the next call rather than continuing in source order, so a stepping debugger doesn’t follow the logical control flow. The call stack shows the current C++ invocation, not the logical nesting of awaited sub-coroutines either – a deeply-nested AWAIT chain looks like one frame in the debugger. For non-trivial debugging, a logged state value plus a per-coroutine name ( "EnemyAI:state=3" ) is more informative than a stepping session. The state integer is small, stable, and hand-readable once you know which yield is which.

None of the above is fatal. The first rule is the most annoying and frequent one.

what’s next

We’ve seen:

Why C++20 coroutines aren’t a good fit for game logic.

A ~200-line header that gives us lightweight coroutines with allocation-free state and trivial save/load.

How the macros expand under the hood, and why __COUNTER__ beats __LINE__ .

Composition via SFEX_CO_AWAIT , and how it crushes the FSM equivalent for branching scripts.

If there’s interest in the topic, in a follow-up post I could cover:

Parallel composition : AWAIT_ALL and AWAIT_ANY for “fire bullets while dashing” or “race a script against a timeout”.

Generic time and yield types: How the same primitives work for tick-counter games or non-real-time simulations.

If you want to read ahead, the full implementation of the shoot-em-up example lives in VRSFML, and uses parallel composition primitives.

shameless self-promotion