Coroutine demonstration with setjmp/longjmp (STM32)

Hi Reader,

Embedded programmers working without an RTOS for whatever reason are frequently rolling their own solutions for multitasking. Protothreads are a de-facto default for ultra-lightweight threading but since they are stackless, they are extremely limited.

Here’s a demonstration of a technique to allow cooperative multitasking by abusing the ancient setjmp/longjmp routines in C. This implementation is for STM32 and compiled with GCC and is not portable (though it is pretty easy to port). I used a similar technique on the Mooshimeter for the thread which has to deal with the SD card and filesystem (it has too much state and too many blocking points to easily handle stackless…).

What are setjmp/longjmp? In the long-long ago, they were used for exception handling in C. setjmp allows you to store your execution context in a jmp_buf structure, and longjmp allows you to restore that context from a jmp_buf. Context here means your position in the code, your CPU registers and your stack position. The original idea was that you’d call setjmp in front of some error handling code, then do whatever your application needed to do, and if something went wrong deep within the code the offending function could call longjmp and pop right back out to the top level error handling code rather unwinding the call stack in the conventional way, with every function checking and forwarding the return value of its callees.

AFAIK almost nobody does this anymore but the functions still exist in the C standard, so we can abuse it as a starting point for a coroutine library.

Goals of the library:

  • Make it easy to run a function as a coroutine with its own stack
  • Allow data passing between the coroutine and the main function
  • Facilitate an deferred-processing iterator pattern

First we generate typedefs to store all the information we need:

Now some basic glue functions for swapping between contexts. setjmp returns zero when it loads the jmp_buf on its first call. It returns non-zero when you are landing there from a remote longjmp.

Next we have macros that will allow us to write our coroutine functions. CR_START does the first call to setjmp from within the target coroutine and returns immediately. CR_YIELD yields control back to the main thread. CR_END just sets the finished flag and yields forever. The __is_base_cr boolean is there to allow calling coroutine functions from within other coroutines, and it prevents the sub-coroutine from overwriting the jmp_buf and also allows it to return normally at the end of execution.

There’s only one more function to the coroutine library but it’s the most dangerous. initCoroutine does the initial setup of a coroutine_t so that calling switchToCoroutine will properly jump to the desired function operating on its own stack. Basically what we’re doing is:

  • Calling in to the function we want to run as the coroutine (func) and letting it hit its CR_START macro, which will set up a jmp_buf and return.
  • Examine that jmp_buf to figure out how much stack was used to get to the CR_START macro
  • Copy that portion of stack over to the user-provided stack buffer for the new coroutine
  • Redirect the stack pointer entry of the jmp_buf so that when the context is restored with longjmp, it will point at the user-provided stack buffer.

Note that the position of the stack pointer within the jmp_buf is specific to the C libraries you’re using. Also this code has STM32 specific IRQ toggling built in.

That’s it for the library itself, let’s write some tests. Let’s start with a really basic function that just interprets the shared argument as an int and prints it out. So the coroutine function is:

And the code to run it (from within main) is:

Note that debug_print behaves like printf on my setup and just pushes data out a serial port where I can see it. Running this test gives me:

Nice. We are swapping back and forth between two different stacks and sensibly exchanging data. Let’s try something a little more involved: an iterator pattern. Let’s make an iterator which is initialized with a list of numbers, and when you call next it will tell you whether the next number is prime or not. I’m keeping this demo all in C, but you can see that I’m basically encapsulating the coroutine_t data in an object which is almost a class (and would be really easy to implement as such).

And the invocation in main:

Which yields:

Nice. The main routine and coroutine are exchanging data well and the coroutine is providing a friendly way to defer processing on a sequence, a common iterator use case. The last thing I want to demonstrate is nesting. If we’re writing functions adhering to the coroutine_func_t prototype, we want to be able to call each other and yield from more than one layer deep in the call stack. So as a trivial test, let’s build a coroutine which just runs the prime number checker twice:

And the corresponding code in main:

Which yields:

Great! We have a functional, if unpolished, stackful coroutine library for at least some flavor of STM32 setup in about 50 lines of actual code. Full file pasted below:

And the output from the full test:

This could all be made more elegant by writing some custom assembly, but that’s not really the point of this post. Maybe next time, since the setjmp and longjmp routines are only a few assembly statements long and I can use them as a basis. Hope you found this interesting or useful! Thanks for reading!

Tags: ,

3 Responses to “Coroutine demonstration with setjmp/longjmp (STM32)”

  1. Richard Jesch September 6, 2021 at 7:26 am #

    Wow, you’re a great code warrior. I could follow your method sketchily but I’m not too good with C++.
    STM-32 is what my drones use too. I’m only at the level of being able to use Betaflight configurator.

    I’d ask for something that would be applicable to Mooshimeter but I don’t know how to phrase my request.
    Thanks, rich

  2. Nineways October 4, 2021 at 3:54 pm #

    Hi guy, in your code snippet for the primality test

    bool isPrime(uint32_t num) {
    // Simple brute force prime check
    for(uint32_t i = 2; i*i < num; i++) {..}


    The condition i*i < num" should be i*i <= num, where the equal sign is needed.

    • admin April 9, 2022 at 6:05 pm #

      Ah! You’re right! Changed, thanks.

Leave a Reply

This site is protected by reCaptcha and the Google Privacy Policy and Terms of Service apply.

The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.