Guard old colltrace baseline path against null ctranComm_#2265
Open
minsii wants to merge 1 commit intometa-pytorch:mainfrom
Open
Guard old colltrace baseline path against null ctranComm_#2265minsii wants to merge 1 commit intometa-pytorch:mainfrom
minsii wants to merge 1 commit intometa-pytorch:mainfrom
Conversation
2b2a7c7 to
fa00d1b
Compare
Contributor
|
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this in D102408648. (Because this pull request was imported automatically, there will not be any future comments.) |
8a2bae1 to
8b428ba
Compare
Summary: - D101586421 moved ctranComm_ creation inside `if (useCtran_)`, making ctranComm_ null when ctran is disabled. The old colltrace baseline enqueue path unconditionally dereferenced ctranComm_ in two places, causing SIGSEGV: - `collTraceBaselineGetHandle` accessed `plan->comm->ctranComm_->collTrace_` to check if old colltrace is active - `colltraceHandle->trigger()` in enqueue.cc was called on a nullptr handle returned by the above - Fix: return nullptr from `collTraceBaselineGetHandle` when NCCL_COLLTRACE is empty or ctranComm_ is null, and guard all `colltraceHandle->trigger()` calls with null checks in all three versions Reviewed By: YulunW Differential Revision: D102289281
8b428ba to
da21730
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
if (useCtran_), making ctranComm_ null when ctran is disabled. The old colltrace baseline enqueue path unconditionally dereferenced ctranComm_ in two places, causing SIGSEGV:collTraceBaselineGetHandleaccessedplan->comm->ctranComm_->collTrace_to check if old colltrace is activecolltraceHandle->trigger()in enqueue.cc was called on a nullptr handle returned by the abovecollTraceBaselineGetHandlewhen NCCL_COLLTRACE is empty or ctranComm_ is null, and guard allcolltraceHandle->trigger()calls with null checks in all three versionsDifferential Revision: D102289281