shim: handle connection-closed errors during kill after live migration#2673
Closed
shreyanshjain7174 wants to merge 10 commits intomicrosoft:mainfrom
Closed
shim: handle connection-closed errors during kill after live migration#2673shreyanshjain7174 wants to merge 10 commits intomicrosoft:mainfrom
shreyanshjain7174 wants to merge 10 commits intomicrosoft:mainfrom
Conversation
…dy into TransferSandbox
dd237d6 to
39d705b
Compare
…wn race After HCS live migration completes, FinalizeSourceLM calls FinalizeSandbox(STOP) which calls LMKill to finalize the HCS system. This causes the VM to exit, which the waitContainer goroutines detect via c.Wait(). Those goroutines race to shut down the shim before containerd can call Kill/Delete via the task ttrpc service, resulting in 'ttrpc: closed' errors that surface as StopSourceVMFailure. Fix: Cancel the waitContainer context before calling LMKill, following the same pattern already used in TransferSandbox. This prevents the goroutines from racing to shut down the shim. Also keep s.sandbox alive (instead of nilling it) so that the subsequent Kill and Delete calls from containerd succeed — Kill calls Terminate which returns nil for an already-stopped system, and Delete returns the cached exit state. Fixes: AB#61773098 Signed-off-by: Shreyansh Sancheti <shsancheti@microsoft.com>
39d705b to
8f23399
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closing — the fix was in the wrong layer (V1 shim instead of V2 taskserver). The correct fix is in rawahars#14 against the
live_migration_poc_4branch.