[asm] Async memory iter_arg tying safety in register allocator#978
[asm] Async memory iter_arg tying safety in register allocator#978Hardcode84 wants to merge 1 commit intoiree-org:mainfrom
Conversation
|
I think this is fine as a starting point, but we would eventually want to make the tying dependence aware (in a subsequent PR). |
| // destination register asynchronously. If the corresponding block arg | ||
| // is still read after the load issues, sharing a register would let the | ||
| // load clobber a value MFMAs are still consuming. | ||
| auto isUnsafeAsyncTie = [&](Value iterArg, BlockArgument blockArg) -> bool { |
There was a problem hiding this comment.
Rather than have this as a heuristic in the pass, could we add this as a verification hook on ConditionOp or LoopOp that flags when an async memory result is passed as an iter_arg with an overlapping block arg use. This would catch the hazard at IR validation time rather than relying on the liveness pass to silently handle it.
There was a problem hiding this comment.
Not sure what IR validation will give us besides random compilation failures for user. Also, checking non-local properties (use-def chains) in verifier is a bad practice.
There was a problem hiding this comment.
The idea with the verifier is that we always validate this as an invariant in the IR, rather than only evaluate it in this pass. Okay then maybe we should model it as a normal form but we can do that in a separate PR.
harsh-nod
left a comment
There was a problem hiding this comment.
thanks for fixing the bug, minor comments, but overall looks good
|
Also, this is overconservative, if we have waitcount before the loop end this should be safe, but waitcounts inserted in subsequent passes. |
Couldnt we just move the ticketing pass before this one? |
Maybe, we can also just check if value is consumed before the loop end (which will generate waitcount later) |
Problem
In software-pipelined loops,
buffer_load/ds_readresults are passed ascondition iter_args. The liveness analysis (Pass 3b) unconditionally tied ALL
iter_args to their corresponding block args, making them share a physical
register. This eliminates the back-edge
v_movcopy but is unsafe when theblock arg is still live after the async memory op writes — the VMEM/LDS write
clobbers a value MFMAs are still reading.
A previous prototype by @panditsa put the filter in
LinearScanPass.cpp, but liveness alsoadded unconditional ties — two competing sources of truth.
Fix
Move the safety check into
Liveness.cpp, the single place where loop ties aredecided:
MemoryOptrait (hasTrait<OpTrait::MemoryOp>() && getNumResults() > 0) instead of enumerating op types. Coversbuffer_load_*,ds_read_*,global_load_*,flat_load_*.point against the last use of the corresponding block arg in
opToIdxorder.If
lastBlockArgUse > defIdx→ unsafe, skip the tie.tiedPairsinsertion are guarded.Additionally, the assembly emitter's back-edge copy logic is fixed to decompose
multi-register values into individual
v_mov_b32copies (previously only copiedone register out of N for wide types like
pvreg<N, 4>).