Coroutine demonstration with setjmp/longjmp (STM32)
Hi Reader,
Embedded programmers working without an RTOS for whatever reason are frequently rolling their own solutions for multitasking. Protothreads are a de-facto default for ultra-lightweight threading but since they are stackless, they are extremely limited.
Here’s a demonstration of a technique to allow cooperative multitasking by abusing the ancient setjmp/longjmp routines in C. This implementation is for STM32 and compiled with GCC and is not portable (though it is pretty easy to port). I used a similar technique on the Mooshimeter for the thread which has to deal with the SD card and filesystem (it has too much state and too many blocking points to easily handle stackless…).
What are setjmp/longjmp? In the long-long ago, they were used for exception handling in C. setjmp
allows you to store your execution context in a jmp_buf
structure, and longjmp
allows you to restore that context from a jmp_buf
. Context here means your position in the code, your CPU registers and your stack position. The original idea was that you’d call setjmp
in front of some error handling code, then do whatever your application needed to do, and if something went wrong deep within the code the offending function could call longjmp
and pop right back out to the top level error handling code rather unwinding the call stack in the conventional way, with every function checking and forwarding the return value of its callees.
AFAIK almost nobody does this anymore but the functions still exist in the C standard, so we can abuse it as a starting point for a coroutine library.
Goals of the library:
- Make it easy to run a function as a coroutine with its own stack
- Allow data passing between the coroutine and the main function
- Facilitate an deferred-processing iterator pattern
First we generate typedefs to store all the information we need:
1 2 3 4 5 6 7 8 9 10 | typedef struct { bool init_done; // Has this coroutine been initialized? bool finished; // Has the base function of the coroutine hit CR_END? void* arg; // Used for data exchange between coroutine and main function jmp_buf invoking_env; // main context jmp_buf cr_env; // coroutine context } coroutine_t; // This is the prototype for the function we will run as a coroutine typedef void(*coroutine_func_t)(coroutine_t*); |
Now some basic glue functions for swapping between contexts. setjmp
returns zero when it loads the jmp_buf on its first call. It returns non-zero when you are landing there from a remote longjmp
.
1 2 3 4 5 6 7 8 9 10 11 12 13 | void swapJmpBufs(jmp_buf* from, jmp_buf* to) { if(0 == setjmp(*from)) { longjmp(*to, 1); } } void switchToCoroutine(coroutine_t* cr) { swapJmpBufs(&cr->invoking_env, &cr->cr_env); } void yieldFromCoroutine(coroutine_t* cr) { swapJmpBufs(&cr->cr_env, &cr->invoking_env); } |
Next we have macros that will allow us to write our coroutine functions. CR_START does the first call to setjmp
from within the target coroutine and returns immediately. CR_YIELD yields control back to the main thread. CR_END just sets the finished
flag and yields forever. The __is_base_cr
boolean is there to allow calling coroutine functions from within other coroutines, and it prevents the sub-coroutine from overwriting the jmp_buf
and also allows it to return normally at the end of execution.
1 2 3 4 | // CR_START does the initial jmp_buf setup only if this is the base coroutine (we check using the init_done member) #define CR_START(cr) bool __is_base_cr = !cr->init_done; do{if(!cr->init_done && (0 == setjmp(cr->cr_env))) { return; }}while(0) #define CR_YIELD(cr) yieldFromCoroutine(cr) #define CR_END(cr) while(__is_base_cr){cr->finished = true; CR_YIELD(cr);} |
There’s only one more function to the coroutine library but it’s the most dangerous. initCoroutine
does the initial setup of a coroutine_t
so that calling switchToCoroutine
will properly jump to the desired function operating on its own stack. Basically what we’re doing is:
- Calling in to the function we want to run as the coroutine (
func
) and letting it hit its CR_START macro, which will set up ajmp_buf
and return. - Examine that
jmp_buf
to figure out how much stack was used to get to the CR_START macro - Copy that portion of stack over to the user-provided stack buffer for the new coroutine
- Redirect the stack pointer entry of the
jmp_buf
so that when the context is restored withlongjmp
, it will point at the user-provided stack buffer.
Note that the position of the stack pointer within the jmp_buf
is specific to the C libraries you’re using. Also this code has STM32 specific IRQ toggling built in.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | // Initialize a coroutine // After this function is run on a coroutine_t structure, feeding that structure to switchToCoroutine will invoke the // function provided to the "func" arg on the stack buffer provided by the "stack" arg. void initCoroutine(coroutine_t* cr_to_load, coroutine_func_t func, void* func_arg, uint8_t* stack, size_t stack_len) { // THIS IS NOT PORTABLE BETWEEN ARCHITECTURES register void* SP asm ("sp"); void* local_sp = NULL; void* remote_sp = NULL; size_t stack_used = 0; uint32_t i = 0; cr_to_load->init_done = false; cr_to_load->finished = false; cr_to_load->arg = func_arg; bool interrupts_masked = __get_PRIMASK(); // Call the coroutine function directly. It will yield immediately on the first run and load the cr_env jump_buf // BEGIN CRITICAL SECTION - WE ARE ABUSING THE STACK // If an IRQ were to strike here, we would hardfault down the road when restoring context __disable_irq(); func(cr_to_load); cr_to_load->init_done = true; // Manipulate the jmp_buf static const uint32_t jmp_buf_sp_i = 8; // First figure out how much stack the function used before its initial call to setjmp local_sp = (void*)cr_to_load->cr_env[jmp_buf_sp_i]; stack_used = (size_t)((uint32_t)SP - (uint32_t)local_sp); // Offsets within the jmp_buf for stack pointer and program counter derived from observation in debugger // Set the stack pointer to the user-provided stack buffer, expecting it to grow down remote_sp = (void*)(stack + stack_len - stack_used); cr_to_load->cr_env[jmp_buf_sp_i] = (int)remote_sp; // Copy the stack over to the new context for(i = 0; i < stack_used; i++) { ((uint8_t*)remote_sp)[i] = ((uint8_t*)local_sp)[i]; } // END CRITICAL SECTION if(!interrupts_masked) { __enable_irq(); } debug_print("Transposed %d bytes of stack\r\n", stack_used); return; } |
That’s it for the library itself, let’s write some tests. Let’s start with a really basic function that just interprets the shared argument as an int and prints it out. So the coroutine function is:
1 2 3 4 5 6 7 8 | void myCoroutine(coroutine_t* cr_arg) { CR_START(cr_arg); while(1) { debug_print("CO: %u\r\n", *(uint32_t*)cr_arg->arg); CR_YIELD(cr_arg); } CR_END(cr_arg); } |
And the code to run it (from within main) is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | uint8_t cr_stack[1024]; // yo dawg, I heard you like stacks, so... // First example: Super basic demo, just transition back and forth to a coroutine a few times. debug_print("\n\rTest 1: Switch back and forth\r\n"); coroutine_t cr_test; uint32_t cr_arg = 0; initCoroutine(&cr_test, myCoroutine, (void*)&cr_arg, cr_stack, sizeof(cr_stack)); for(uint32_t i = 0; i < 4; i++) { // Feed arbitrary example data to the coroutine. debug_print("MAIN: Switching to coroutine\r\n"); cr_arg = i*i; switchToCoroutine(&cr_test); } |
Note that debug_print
behaves like printf on my setup and just pushes data out a serial port where I can see it. Running this test gives me:
1 2 3 4 5 6 7 8 9 10 | Test 1: Switch back and forth Transposed 24 bytes of stack MAIN: Switching to coroutine CO: 0 MAIN: Switching to coroutine CO: 1 MAIN: Switching to coroutine CO: 4 MAIN: Switching to coroutine CO: 9 |
Nice. We are swapping back and forth between two different stacks and sensibly exchanging data. Let’s try something a little more involved: an iterator pattern. Let’s make an iterator which is initialized with a list of numbers, and when you call next
it will tell you whether the next number is prime or not. I’m keeping this demo all in C, but you can see that I’m basically encapsulating the coroutine_t data in an object which is almost a class (and would be really easy to implement as such).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | bool isPrime(uint32_t num) { // Simple brute force prime check for(uint32_t i = 2; i*i <= num; i++) { if((num % i) == 0) { return false; } } return true; } typedef struct { bool is_prime_out; uint32_t* num_in; size_t n_num; coroutine_t cr; } isprime_iter_member_t; void isPrime_coroutine(coroutine_t* cr_arg) { CR_START(cr_arg); isprime_iter_member_t* arg = (isprime_iter_member_t*)cr_arg->arg; uint32_t i = 0; while(true) { uint32_t n = arg->num_in[i]; arg->is_prime_out = isPrime(n); debug_print("CO: Is %u prime?\r\n", n); if(++i == arg->n_num) { break; } else { CR_YIELD(cr_arg); } } CR_END(cr_arg); } void isPrime_init(isprime_iter_member_t* to_fill, uint32_t* num_to_check, size_t n_num_to_check, uint8_t* stack, size_t stack_len) { to_fill->num_in = num_to_check; to_fill->n_num = n_num_to_check; initCoroutine(&to_fill->cr, isPrime_coroutine, (void*)to_fill, stack, stack_len); } bool isPrime_next(isprime_iter_member_t* arg) { switchToCoroutine(&arg->cr); return arg->is_prime_out; } bool isPrime_empty(isprime_iter_member_t* arg) { return arg->cr.finished; } |
And the invocation in main:
1 2 3 4 5 6 7 8 | // Second example: Set up a coroutine to iterate over a sequence and determine whether elements are prime uint32_t nums_to_check[] = {500, 521, 2025, 2027}; isprime_iter_member_t isprime_iter_member; isPrime_init(&isprime_iter_member, nums_to_check, ARRAY_LEN(nums_to_check), cr_stack, sizeof(cr_stack)); while(!isPrime_empty(&isprime_iter_member)) { bool is_next_prime = isPrime_next(&isprime_iter_member); debug_print("MAIN: %s\r\n", is_next_prime?"YES!":"NO!"); } |
Which yields:
1 2 3 4 5 6 7 8 9 10 | Test 2: Deferred processing iterator Transposed 32 bytes of stack CO: Is 500 prime? MAIN: NO! CO: Is 521 prime? MAIN: YES! CO: Is 2025 prime? MAIN: NO! CO: Is 2027 prime? MAIN: YES! |
Nice. The main routine and coroutine are exchanging data well and the coroutine is providing a friendly way to defer processing on a sequence, a common iterator use case. The last thing I want to demonstrate is nesting. If we’re writing functions adhering to the coroutine_func_t
prototype, we want to be able to call each other and yield from more than one layer deep in the call stack. So as a trivial test, let’s build a coroutine which just runs the prime number checker twice:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | void doPrimeCoroutineTwice(coroutine_t* cr_arg) { CR_START(cr_arg); debug_print("FIRST SUBCOROUTINE CALL\r\n"); isPrime_coroutine(cr_arg); // Extra yield necessary because isPrime_coroutine expects CR_END to yield final value CR_YIELD(cr_arg); debug_print("SECOND SUBCOROUTINE CALL\r\n"); isPrime_coroutine(cr_arg); CR_END(cr_arg); } void setupPrimeCoroutineTwice(isprime_iter_member_t* to_fill, uint32_t* num_to_check, size_t n_num_to_check, uint8_t* stack, size_t stack_len) { to_fill->num_in = num_to_check; to_fill->n_num = n_num_to_check; initCoroutine(&to_fill->cr, doPrimeCoroutineTwice, (void*)to_fill, stack, stack_len); } |
And the corresponding code in main:
1 2 3 4 5 6 | // Third example: Demonstrate yielding from within a nested coroutine setupPrimeCoroutineTwice(&isprime_iter_member, nums_to_check, ARRAY_LEN(nums_to_check), cr_stack, sizeof(cr_stack)); while(!isPrime_empty(&isprime_iter_member)) { bool is_next_prime = isPrime_next(&isprime_iter_member); debug_print("MAIN: %s\r\n", is_next_prime?"YES!":"NO!"); } |
Which yields:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | Test 3: Sub-coroutine Transposed 24 bytes of stack FIRST SUBCOROUTINE CALL CO: Is 500 prime? MAIN: NO! CO: Is 521 prime? MAIN: YES! CO: Is 2025 prime? MAIN: NO! CO: Is 2027 prime? MAIN: YES! SECOND SUBCOROUTINE CALL CO: Is 500 prime? MAIN: NO! CO: Is 521 prime? MAIN: YES! CO: Is 2025 prime? MAIN: NO! CO: Is 2027 prime? MAIN: YES! |
Great! We have a functional, if unpolished, stackful coroutine library for at least some flavor of STM32 setup in about 50 lines of actual code. Full file pasted below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | /*************************** * It is the opinion of the author that nobody should use this code for anything. * The code is provided with no warranties of any kind and if you try to run it * you will probably end up mildly annoyed. * * That said, if you find it useful, I'd like to hear about it! * James Whong 2020/07/13 * james@moosh.im */ #include <setjmp.h> // for setjmp, longjmp #include <stddef.h> #include <inttypes.h> #include <stdbool.h> #include "cmsis_gcc.h" #include "debug.h" // Project specific for debug_print #define ARRAY_LEN(x) (sizeof(x)/sizeof(x[0])) typedef struct { bool init_done; // Has this coroutine been initialized? bool finished; // Has the base function of the coroutine hit CR_END? void* arg; // Used for data exchange between coroutine and main function jmp_buf invoking_env; // main context jmp_buf cr_env; // coroutine context } coroutine_t; // A coroutine function must start with CR_START in the body and end with CR_END. // It can call CR_YIELD as many times as it likes. It can also feed its couroutine_t* // arg to subroutines and yield from within subroutines. typedef void(*coroutine_func_t)(coroutine_t*); void swapJmpBufs(jmp_buf* from, jmp_buf* to) { if(0 == setjmp(*from)) { longjmp(*to, 1); } } void switchToCoroutine(coroutine_t* cr) { swapJmpBufs(&cr->invoking_env, &cr->cr_env); } void yieldFromCoroutine(coroutine_t* cr) { swapJmpBufs(&cr->cr_env, &cr->invoking_env); } // CR_START does the initial jmp_buf setup only if this is the base coroutine (we check using the init_done member) #define CR_START(cr) bool __is_base_cr = !cr->init_done; do{if(!cr->init_done && (0 == setjmp(cr->cr_env))) { return; }}while(0) #define CR_YIELD(cr) yieldFromCoroutine(cr) #define CR_END(cr) while(__is_base_cr){cr->finished = true; CR_YIELD(cr);} // Initialize a coroutine // After this function is run on a coroutine_t structure, feeding that structure to switchToCoroutine will invoke the // function provided to the "func" arg on the stack buffer provided by the "stack" arg. void initCoroutine(coroutine_t* cr_to_load, coroutine_func_t func, void* func_arg, uint8_t* stack, size_t stack_len) { // THIS IS NOT PORTABLE BETWEEN ARCHITECTURES register void* SP asm ("sp"); void* local_sp = NULL; void* remote_sp = NULL; size_t stack_used = 0; uint32_t i = 0; cr_to_load->init_done = false; cr_to_load->finished = false; cr_to_load->arg = func_arg; bool interrupts_masked = __get_PRIMASK(); // Call the coroutine function directly. It will yield immediately on the first run and load the cr_env jump_buf // BEGIN CRITICAL SECTION - WE ARE ABUSING THE STACK // If an IRQ were to strike here, we would hardfault down the road when restoring context __disable_irq(); func(cr_to_load); cr_to_load->init_done = true; // Manipulate the jmp_buf static const uint32_t jmp_buf_sp_i = 8; // First figure out how much stack the function used before its initial call to setjmp local_sp = (void*)cr_to_load->cr_env[jmp_buf_sp_i]; stack_used = (size_t)((uint32_t)SP - (uint32_t)local_sp); // Offsets within the jmp_buf for stack pointer and program counter derived from observation in debugger // Set the stack pointer to the user-provided stack buffer, expecting it to grow down remote_sp = (void*)(stack + stack_len - stack_used); cr_to_load->cr_env[jmp_buf_sp_i] = (int)remote_sp; // Copy the stack over to the new context for(i = 0; i < stack_used; i++) { ((uint8_t*)remote_sp)[i] = ((uint8_t*)local_sp)[i]; } // END CRITICAL SECTION if(!interrupts_masked) { __enable_irq(); } debug_print("Transposed %d bytes of stack\r\n", stack_used); return; } //////////////////////////////////////////////////////////////////// // BELOW HERE ARE ALL TEST FUNCTIONS FOR THE COROUTINE LIBRARY ABOVE //////////////////////////////////////////////////////////////////// void myCoroutine(coroutine_t* cr_arg) { CR_START(cr_arg); while(1) { debug_print("CO: %u\r\n", *(uint32_t*)cr_arg->arg); CR_YIELD(cr_arg); } CR_END(cr_arg); } bool isPrime(uint32_t num) { // Simple brute force prime check for(uint32_t i = 2; i*i < num; i++) { if((num % i) == 0) { return false; } } return true; } typedef struct { bool is_prime_out; uint32_t* num_in; size_t n_num; coroutine_t cr; } isprime_iter_member_t; void isPrime_coroutine(coroutine_t* cr_arg) { CR_START(cr_arg); isprime_iter_member_t* arg = (isprime_iter_member_t*)cr_arg->arg; uint32_t i = 0; while(true) { uint32_t n = arg->num_in[i]; arg->is_prime_out = isPrime(n); debug_print("CO: Is %u prime?\r\n", n); if(++i == arg->n_num) { break; } else { CR_YIELD(cr_arg); } } CR_END(cr_arg); } void isPrime_init(isprime_iter_member_t* to_fill, uint32_t* num_to_check, size_t n_num_to_check, uint8_t* stack, size_t stack_len) { to_fill->num_in = num_to_check; to_fill->n_num = n_num_to_check; initCoroutine(&to_fill->cr, isPrime_coroutine, (void*)to_fill, stack, stack_len); } bool isPrime_next(isprime_iter_member_t* arg) { switchToCoroutine(&arg->cr); return arg->is_prime_out; } bool isPrime_empty(isprime_iter_member_t* arg) { return arg->cr.finished; } void doPrimeCoroutineTwice(coroutine_t* cr_arg) { CR_START(cr_arg); debug_print("FIRST SUBCOROUTINE CALL\r\n"); isPrime_coroutine(cr_arg); // Extra yield necessary because isPrime_coroutine expects CR_END to yield final value CR_YIELD(cr_arg); debug_print("SECOND SUBCOROUTINE CALL\r\n"); isPrime_coroutine(cr_arg); CR_END(cr_arg); } void setupPrimeCoroutineTwice(isprime_iter_member_t* to_fill, uint32_t* num_to_check, size_t n_num_to_check, uint8_t* stack, size_t stack_len) { to_fill->num_in = num_to_check; to_fill->n_num = n_num_to_check; initCoroutine(&to_fill->cr, doPrimeCoroutineTwice, (void*)to_fill, stack, stack_len); } void runCoroutineDemo() { debug_print("\n\n\n\rHello World\r\n"); uint8_t cr_stack[1024]; // First example: Super basic demo, just transition back and forth to a coroutine a few times. debug_print("\n\rTest 1: Switch back and forth\r\n"); coroutine_t cr_test; uint32_t cr_arg = 0; initCoroutine(&cr_test, myCoroutine, (void*)&cr_arg, cr_stack, sizeof(cr_stack)); for(uint32_t i = 0; i < 4; i++) { // Feed arbitrary example data to the coroutine. debug_print("MAIN: Switching to coroutine\r\n"); cr_arg = i*i; switchToCoroutine(&cr_test); } debug_print("\n\rTest 2: Deferred processing iterator\r\n"); // Second example: Set up a coroutine to iterate over a sequence and determine whether elements are prime uint32_t nums_to_check[] = {500, 521, 2025, 2027}; isprime_iter_member_t isprime_iter_member; isPrime_init(&isprime_iter_member, nums_to_check, ARRAY_LEN(nums_to_check), cr_stack, sizeof(cr_stack)); while(!isPrime_empty(&isprime_iter_member)) { bool is_next_prime = isPrime_next(&isprime_iter_member); debug_print("MAIN: %s\r\n", is_next_prime?"YES!":"NO!"); } debug_print("\n\rTest 3: Sub-coroutine\r\n"); // Third example: Demonstrate yielding from within a nested coroutine setupPrimeCoroutineTwice(&isprime_iter_member, nums_to_check, ARRAY_LEN(nums_to_check), cr_stack, sizeof(cr_stack)); while(!isPrime_empty(&isprime_iter_member)) { bool is_next_prime = isPrime_next(&isprime_iter_member); debug_print("MAIN: %s\r\n", is_next_prime?"YES!":"NO!"); } debug_print("Done!"); for(;;) { vTaskDelay(10); } } |
And the output from the full test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | Hello World Test 1: Switch back and forth Transposed 24 bytes of stack MAIN: Switching to coroutine CO: 0 MAIN: Switching to coroutine CO: 1 MAIN: Switching to coroutine CO: 4 MAIN: Switching to coroutine CO: 9 Test 2: Deferred processing iterator Transposed 32 bytes of stack CO: Is 500 prime? MAIN: NO! CO: Is 521 prime? MAIN: YES! CO: Is 2025 prime? MAIN: NO! CO: Is 2027 prime? MAIN: YES! Test 3: Sub-coroutine Transposed 24 bytes of stack FIRST SUBCOROUTINE CALL CO: Is 500 prime? MAIN: NO! CO: Is 521 prime? MAIN: YES! CO: Is 2025 prime? MAIN: NO! CO: Is 2027 prime? MAIN: YES! SECOND SUBCOROUTINE CALL CO: Is 500 prime? MAIN: NO! CO: Is 521 prime? MAIN: YES! CO: Is 2025 prime? MAIN: NO! CO: Is 2027 prime? MAIN: YES! Done! |
This could all be made more elegant by writing some custom assembly, but that’s not really the point of this post. Maybe next time, since the setjmp
and longjmp
routines are only a few assembly statements long and I can use them as a basis. Hope you found this interesting or useful! Thanks for reading!
Wow, you’re a great code warrior. I could follow your method sketchily but I’m not too good with C++.
STM-32 is what my drones use too. I’m only at the level of being able to use Betaflight configurator.
I’d ask for something that would be applicable to Mooshimeter but I don’t know how to phrase my request.
Thanks, rich
Hi guy, in your code snippet for the primality test
bool isPrime(uint32_t num) {
// Simple brute force prime check
for(uint32_t i = 2; i*i < num; i++) {..}
…
}
The condition i*i < num" should be i*i <= num, where the equal sign is needed.
Ah! You’re right! Changed, thanks.