Fixed race condition and message loss in Cortex-M GNU ports#523
Fixed race condition and message loss in Cortex-M GNU ports#523fdesbiens wants to merge 7 commits intoeclipse-threadx:devfrom
Conversation
…readx#516) - Added compiler memory barriers to BASEPRI management functions in tx_port.h. - Added architectural barriers (DSB/ISB) to scheduler return paths in tx_port.h and tx_thread_system_return.S to prevent fall-through before context switch. - These changes address spurious thread resumption and lost messages, especially when TX_NOT_INTERRUPTABLE is enabled. Assisted-by: Gemini (Gemini 2.0 Flash)
…readx#516) - Added compiler memory barriers to BASEPRI management functions in tx_port.h. - Added architectural barriers (DSB/ISB) to scheduler return paths in tx_port.h and tx_thread_system_return.S to prevent fall-through before context switch. - These changes address spurious thread resumption and lost messages, especially when TX_NOT_INTERRUPTABLE is enabled. Assisted-by: Gemini (Gemini 2.0 Flash)
…readx#516) - Added compiler memory barriers to BASEPRI management functions in tx_port.h. - Added architectural barriers (DSB/ISB) to scheduler return paths in tx_port.h and tx_thread_system_return.s to prevent fall-through before context switch. - These changes address spurious thread resumption and lost messages, especially when TX_NOT_INTERRUPTABLE is enabled. Assisted-by: Gemini (Gemini 2.0 Flash)
…clipse-threadx#516) - Added compiler memory barriers to BASEPRI management functions in tx_port.h (M7). - Added architectural barriers (DSB/ISB) to scheduler return paths in tx_port.h and tx_thread_system_return.S (M7, M23) to prevent fall-through before context switch. - These changes address spurious thread resumption and lost messages, especially when TX_NOT_INTERRUPTABLE is enabled. Assisted-by: Gemini (Gemini 2.0 Flash)
…ts (eclipse-threadx#516) - Added compiler memory barriers to BASEPRI management functions in tx_port.h (M3, M4, M7, M55, M85). - Added architectural barriers (DSB/ISB) to scheduler return paths in tx_port.h and tx_thread_system_return.s (all architectures) to prevent fall-through before context switch. - Added Gemini attribution and updated headers to follow project mandates. - These changes address spurious thread resumption and lost messages, especially when TX_NOT_INTERRUPTABLE is enabled. Assisted-by: Gemini (Gemini 2.0 Flash)
| #else | ||
| __enable_interrupts(); | ||
| #endif | ||
| __restore_interrupt(interrupt_save); |
There was a problem hiding this comment.
I would advise adding ISB immediately after restoring interrupts, so in the line following
| __restore_interrupt(interrupt_save); | |
| __asm__ volatile ("isb 0xF " : : : "memory"); |
as per Arm DAI 0321A PDF
“If it is necessary to ensure a pended interrupt is recognized before subsequent operations, the ISB instruction should be used after CPSIE I.”
(section 4.7, p.28; lines 1077-1080)
__enable_irq(); // CPSIE I : Enable interrupt __ISB(); // Allow pended interrupts to be recognized
(p.28; lines 1083-1087)
“you might want to insert an ISB instruction if the priority level change can result in the interrupt being accepted, and you want this interrupt to be executed immediately.”
(p.34; lines 1321-1324)
| MSR BASEPRI, r1 // Restore original interrupt posture | ||
| #else | ||
| MRS r1, PRIMASK // Thread context returning, pickup PRIMASK | ||
| CPSIE i // Enable interrupts | ||
| MSR PRIMASK, r1 // Restore original interrupt posture | ||
| #endif |
There was a problem hiding this comment.
Same thing here
| MSR BASEPRI, r1 // Restore original interrupt posture | |
| ISB #0xF // Flush pipeline | |
| #else | |
| MRS r1, PRIMASK // Thread context returning, pickup PRIMASK | |
| CPSIE i // Enable interrupts | |
| MSR PRIMASK, r1 // Restore original interrupt posture | |
| ISB #0xF // Flush pipeline | |
| #endif |
Pulling in same ARM citations:
as per Arm DAI 0321A PDF
“If it is necessary to ensure a pended interrupt is recognized before subsequent operations, the ISB instruction should be used after CPSIE I.”
(section 4.7, p.28; lines 1077-1080)
__enable_irq(); // CPSIE I : Enable interrupt __ISB(); // Allow pended interrupts to be recognized
(p.28; lines 1083-1087)
“you might want to insert an ISB instruction if the priority level change can result in the interrupt being accepted, and you want this interrupt to be executed immediately.”
(p.34; lines 1321-1324)
…clipse-threadx#516) - Added ISB (Instruction Synchronization Barrier) to _tx_thread_system_return_inline in tx_port.h for all Cortex-M GNU ports. - Added ISB instruction to _tx_thread_system_return in tx_thread_system_return.S for all Cortex-M GNU ports. - These changes ensure that pending interrupts (specifically PendSV) are recognised before subsequent instructions are executed, following Kairalite's feedback and ARM architectural guidelines. Assisted-by: Gemini (Gemini 2.0 Flash)
|
Thank you for the feedback, @Kairalite. I updated the PR accordingly. Do you feel other changes are needed before I merge? |
|
@fdesbiens Thank you! One more thing I want to bring to your attention: This might be out of scope, but another section Codex flagged when I was using it to help me pin down this bug was in tx_queue_cleanup.c Specifically in the non_interruptable path, it checks if the cleanup is still required and that the thread objects are still valid. This may be noise, as I am not familiar with the rest of the repo, but wanted to flag it for you guys just in case. I've attached the diff below |
In TX_NOT_INTERRUPTABLE mode the caller keeps interrupts disabled across the entire cleanup call, so the race window that makes the guards necessary in the interruptable path cannot occur. Add a comment explaining this, and noting that all paths that resume a suspended thread clear tx_thread_suspend_cleanup before calling _tx_thread_system_ni_resume, making double-cleanup impossible. This prevents future false-positive suggestions (e.g. from AI tools) to add redundant checks to the NI path. Relates to: eclipse-threadx#516 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@Kairalite After careful analysis, I won't be applying the change from the diff file you shared. The The Additionally:
So the preconditions the checks guard against simply cannot arise in a correct NI execution. A small comment has been added to document the above. |
|
Understood, thank you for taking a look at this! I have no further comments, and the fixes work on my end. Thank you again! |
Aims to solve the problems raised in issue #516.