The devibot STM32G491 was crashing approximately once every 4 hours. HardFault exception. No reproducible trigger. It happened slightly more often under high CAN bus load, but not reliably enough to be useful as a debugging cue.
Intermittent HardFaults are the worst category of embedded firmware bug. They don’t happen in your test harness. They happen on the customer’s production floor, at 3am, when the robot is supposed to be running unattended.
Finding the fault
The first step was enabling the HardFault handler to capture the stack frame before reset. In Cortex-M4, the processor automatically saves registers to the stack on exception entry. A custom HardFault handler can read these out:
void HardFault_Handler(void) {
__asm volatile (
"tst lr, #4
"
"ite eq
"
"mrseq r0, msp
"
"mrsne r0, psp
"
"b hard_fault_handler_c
"
);
}
void hard_fault_handler_c(uint32_t *stack) {
// stack[6] = PC at time of fault
volatile uint32_t pc = stack[6];
while(1); // Set breakpoint here
}
The captured PC pointed into xSemaphoreTake() inside our FDCAN receive callback. That was the answer. We were calling a blocking FreeRTOS API from an interrupt service routine.
Why this is undefined behaviour
FreeRTOS maintains an internal scheduler state. Blocking API functions like xSemaphoreTake(), xQueueSend(), and vTaskDelay() interact directly with the scheduler — they may suspend the calling task, trigger a context switch, and modify scheduler data structures.
When called from an ISR, there is no “calling task” to suspend. The processor is in privileged interrupt context, the scheduler cannot perform a context switch safely, and modifying scheduler data structures from an ISR while a task may be doing the same thing causes a race condition with no guaranteed resolution.
Sometimes it works. Sometimes it corrupts the ready list. Sometimes it corrupts the blocked list. Eventually, the scheduler reaches an inconsistent state and the next context switch triggers a fault accessing an invalid address.
The correct pattern: ISR to queue, task from queue
FreeRTOS provides FromISR variants of most API functions. These are specifically designed for use in interrupt context — they are non-blocking, they cannot suspend the caller, and they signal the scheduler safely using portYIELD_FROM_ISR().
// Define a queue for CAN frames
QueueHandle_t can_rx_queue;
// In your init function
can_rx_queue = xQueueCreate(32, sizeof(FDCAN_RxHeaderTypeDef));
// In the ISR: non-blocking, safe
void HAL_FDCAN_RxFifo0Callback(FDCAN_HandleTypeDef *hfdcan, uint32_t RxFifo0ITs) {
FDCAN_RxHeaderTypeDef rx_header;
uint8_t rx_data[8];
HAL_FDCAN_GetRxMessage(hfdcan, FDCAN_RX_FIFO0, &rx_header, rx_data);
BaseType_t xHigherPriorityTaskWoken = pdFALSE;
xQueueSendFromISR(can_rx_queue, &rx_header, &xHigherPriorityTaskWoken);
portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
}
// In your CAN processing task: blocking is fine here
void can_process_task(void *pvParameters) {
FDCAN_RxHeaderTypeDef frame;
for (;;) {
if (xQueueReceive(can_rx_queue, &frame, portMAX_DELAY) == pdTRUE) {
// Acquire mutex here — we are in task context
xSemaphoreTake(data_mutex, portMAX_DELAY);
process_can_frame(&frame);
xSemaphoreGive(data_mutex);
}
}
}
Rule: In an ISR, only use FromISR variants. Never block. Never delay. Do the minimum work needed to capture the data, post it to a queue, and return. All processing happens in task context.
The ROS2 connection
In the devibot architecture, the STM32 communicates with the ROS2 layer via UART. The UART transmit was also happening in ISR context — which had the same problem. A shared ring buffer was being written in the UART TX complete ISR and read in a FreeRTOS task, with a mutex protecting the shared state.
Same fix: ISR writes to a queue without mutex, task reads from queue and handles the UART transmission. The UART layer now runs cleanly, and the STM32 has not had a HardFault in production since the fix was deployed.
How to check your own codebase
- Search for any call to
xSemaphoreTake,xSemaphoreGive,xQueueSend,vTaskDelayinside any function whose name starts withHAL_— these are likely called from ISR context - Check your NVIC priority configuration — any interrupt with priority at or above
configMAX_SYSCALL_INTERRUPT_PRIORITYmust only useFromISRvariants - Enable
configASSERTin FreeRTOSConfig.h during development — it will trap ISR API violations at runtime rather than letting them cause silent corruption
This bug was found and fixed during devibot firmware stabilisation at Peribott Dynamic LLP. If you’re hitting intermittent HardFaults in FreeRTOS, the ISR API pattern is one of the first places to look. Questions? Reach out.