Skip to content

Fixed race condition and message loss in Cortex-M GNU ports#523

Open
fdesbiens wants to merge 7 commits intoeclipse-threadx:devfrom
fdesbiens:issue-516
Open

Fixed race condition and message loss in Cortex-M GNU ports#523
fdesbiens wants to merge 7 commits intoeclipse-threadx:devfrom
fdesbiens:issue-516

Conversation

@fdesbiens
Copy link
Copy Markdown
Contributor

Aims to solve the problems raised in issue #516.

…readx#516)

- Added compiler memory barriers to BASEPRI management functions in tx_port.h.
- Added architectural barriers (DSB/ISB) to scheduler return paths in tx_port.h and tx_thread_system_return.S to prevent fall-through before context switch.
- These changes address spurious thread resumption and lost messages, especially when TX_NOT_INTERRUPTABLE is enabled.

Assisted-by: Gemini (Gemini 2.0 Flash)
…readx#516)

- Added compiler memory barriers to BASEPRI management functions in tx_port.h.
- Added architectural barriers (DSB/ISB) to scheduler return paths in tx_port.h and tx_thread_system_return.S to prevent fall-through before context switch.
- These changes address spurious thread resumption and lost messages, especially when TX_NOT_INTERRUPTABLE is enabled.

Assisted-by: Gemini (Gemini 2.0 Flash)
…readx#516)

- Added compiler memory barriers to BASEPRI management functions in tx_port.h.
- Added architectural barriers (DSB/ISB) to scheduler return paths in tx_port.h and tx_thread_system_return.s to prevent fall-through before context switch.
- These changes address spurious thread resumption and lost messages, especially when TX_NOT_INTERRUPTABLE is enabled.

Assisted-by: Gemini (Gemini 2.0 Flash)
…clipse-threadx#516)

- Added compiler memory barriers to BASEPRI management functions in tx_port.h (M7).
- Added architectural barriers (DSB/ISB) to scheduler return paths in tx_port.h and tx_thread_system_return.S (M7, M23) to prevent fall-through before context switch.
- These changes address spurious thread resumption and lost messages, especially when TX_NOT_INTERRUPTABLE is enabled.

Assisted-by: Gemini (Gemini 2.0 Flash)
…ts (eclipse-threadx#516)

- Added compiler memory barriers to BASEPRI management functions in tx_port.h (M3, M4, M7, M55, M85).
- Added architectural barriers (DSB/ISB) to scheduler return paths in tx_port.h and tx_thread_system_return.s (all architectures) to prevent fall-through before context switch.
- Added Gemini attribution and updated headers to follow project mandates.
- These changes address spurious thread resumption and lost messages, especially when TX_NOT_INTERRUPTABLE is enabled.

Assisted-by: Gemini (Gemini 2.0 Flash)
#else
__enable_interrupts();
#endif
__restore_interrupt(interrupt_save);
Copy link
Copy Markdown

@Kairalite Kairalite Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advise adding ISB immediately after restoring interrupts, so in the line following

Suggested change
__restore_interrupt(interrupt_save);
__asm__ volatile ("isb 0xF " : : : "memory");

as per Arm DAI 0321A PDF

“If it is necessary to ensure a pended interrupt is recognized before subsequent operations, the ISB instruction should be used after CPSIE I.”
(section 4.7, p.28; lines 1077-1080)

__enable_irq(); // CPSIE I : Enable interrupt __ISB(); // Allow pended interrupts to be recognized
(p.28; lines 1083-1087)

“you might want to insert an ISB instruction if the priority level change can result in the interrupt being accepted, and you want this interrupt to be executed immediately.”
(p.34; lines 1321-1324)

Comment on lines 88 to 93
MSR BASEPRI, r1 // Restore original interrupt posture
#else
MRS r1, PRIMASK // Thread context returning, pickup PRIMASK
CPSIE i // Enable interrupts
MSR PRIMASK, r1 // Restore original interrupt posture
#endif
Copy link
Copy Markdown

@Kairalite Kairalite Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here

Suggested change
MSR BASEPRI, r1 // Restore original interrupt posture
ISB #0xF // Flush pipeline
#else
MRS r1, PRIMASK // Thread context returning, pickup PRIMASK
CPSIE i // Enable interrupts
MSR PRIMASK, r1 // Restore original interrupt posture
ISB #0xF // Flush pipeline
#endif

Pulling in same ARM citations:
as per Arm DAI 0321A PDF

“If it is necessary to ensure a pended interrupt is recognized before subsequent operations, the ISB instruction should be used after CPSIE I.”
(section 4.7, p.28; lines 1077-1080)

__enable_irq(); // CPSIE I : Enable interrupt __ISB(); // Allow pended interrupts to be recognized
(p.28; lines 1083-1087)

“you might want to insert an ISB instruction if the priority level change can result in the interrupt being accepted, and you want this interrupt to be executed immediately.”
(p.34; lines 1321-1324)

…clipse-threadx#516)

- Added ISB (Instruction Synchronization Barrier) to _tx_thread_system_return_inline in tx_port.h for all Cortex-M GNU ports.
- Added ISB instruction to _tx_thread_system_return in tx_thread_system_return.S for all Cortex-M GNU ports.
- These changes ensure that pending interrupts (specifically PendSV) are recognised before subsequent instructions are executed, following Kairalite's feedback and ARM architectural guidelines.

Assisted-by: Gemini (Gemini 2.0 Flash)
@fdesbiens
Copy link
Copy Markdown
Contributor Author

Thank you for the feedback, @Kairalite. I updated the PR accordingly. Do you feel other changes are needed before I merge?

@Kairalite
Copy link
Copy Markdown

@fdesbiens Thank you!

One more thing I want to bring to your attention:

This might be out of scope, but another section Codex flagged when I was using it to help me pin down this bug was in tx_queue_cleanup.c

Specifically in the non_interruptable path, it checks if the cleanup is still required and that the thread objects are still valid.

This may be noise, as I am not familiar with the rest of the repo, but wanted to flag it for you guys just in case.

I've attached the diff below
f28564481f2ce649be196af7d7cbbab69835b901.diff.txt

In TX_NOT_INTERRUPTABLE mode the caller keeps interrupts disabled
across the entire cleanup call, so the race window that makes the
guards necessary in the interruptable path cannot occur.  Add a
comment explaining this, and noting that all paths that resume a
suspended thread clear tx_thread_suspend_cleanup before calling
_tx_thread_system_ni_resume, making double-cleanup impossible.

This prevents future false-positive suggestions (e.g. from AI tools)
to add redundant checks to the NI path.

Relates to: eclipse-threadx#516

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@fdesbiens
Copy link
Copy Markdown
Contributor Author

@Kairalite After careful analysis, I won't be applying the change from the diff file you shared.

The #ifndef TX_NOT_INTERRUPTABLE path calls TX_RESTORE before the cleanup function runs, creating a race window where another context can resume/abort/delete the suspension. The five defensive checks (cleanup pointer, sequence, null, queue ID, suspended count) exist to handle that race.

The TX_NOT_INTERRUPTABLE path safely omits those checks. In NI mode, callers (tx_thread_timeout, tx_thread_terminate, tx_thread_wait_abort) keep interrupts disabled through the entire cleanup call — there is no reopened race window.

Additionally:

  • tx_queue_delete clears tx_thread_suspend_cleanup before resuming threads
  • tx_queue_send/receive clear the cleanup pointer before calling _tx_thread_system_ni_resume

So the preconditions the checks guard against simply cannot arise in a correct NI execution.

A small comment has been added to document the above.

@Kairalite
Copy link
Copy Markdown

@fdesbiens

Understood, thank you for taking a look at this!

I have no further comments, and the fixes work on my end.

Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible missing compiler memory barriers in Cortex-M BASEPRI port: TX_NOT_INTERRUPTABLE tx_queue_send can resume receiver without delivered message

2 participants