Skip to content

Conversation

@MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Jan 23, 2026

For a dataflow node n one can ask if n is the node corresponding to a StoreInstruction. This is useful when one wants to deal uniformly with all kinds of assignments/initializers/increments/decrements/etc.

The predicate also tells you whether an assignment overwrites an entire buffer or whether it may be a partial write (i.e., node.asDefinition(true) means the write may not overwrite the entire buffer). In order to do that it uses some of the SSA predicates (in particular, SsaImpl::defToNode) to find the corresponding Ssa::Definition belonging to the StoreInstruction.

However, if there's no corresponding SSA information for the StoreInstruction (for instance, if it's a store to a field or to a variable that is not live) then SsaImpl::defToNode won't hold. And so the entire node.asDefinition predicate did not have a result.

This PR fixes this so that, instead of the predicate not having a result, it will have a result with uncertain = true. This means that node.asDefinition can now be used to grab all assignments/initializers/increments/decrements/etc regardless of whether we have SSA information for it.

cc @bdrodes

@MathiasVP MathiasVP requested a review from a team as a code owner January 23, 2026 16:32
Copilot AI review requested due to automatic review settings January 23, 2026 16:32
@github-actions github-actions bot added the C++ label Jan 23, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes DataFlow::Node.asDefinition so it returns a result (marked uncertain = true) even when the underlying StoreInstruction has no corresponding SSA definition, preventing “missing result” cases.

Changes:

  • Adjusted Node.asDefinition/1 implementation to avoid depending on SsaImpl::defToNode for producing any result.
  • Added a new InlineExpectations-based regression test for asDefinition over different store forms (including a field store / dead variable store).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

File Description
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll Updates asDefinition uncertainty determination so stores without SSA still yield a result.
cpp/ql/test/library-tests/dataflow/asDefinition/test.ql Adds an InlineExpectations test query for Node.asDefinition.
cpp/ql/test/library-tests/dataflow/asDefinition/test.cpp Adds C++ test cases exercising assignments/initializers including a field store.
cpp/ql/test/library-tests/dataflow/asDefinition/test.expected Empty InlineExpectations expected-output file for the new test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jketema
Copy link
Contributor

jketema commented Jan 23, 2026

Could you explain why uncertain = true is the right choice?

@MathiasVP
Copy link
Contributor Author

Could you explain why uncertain = true is the right choice?

It's basically the sound choice. Notice this part of the QLDoc (emphasis mine) from this predicate:

  • If uncertain = false then the definition is guaranteed to overwrite
  • the entire buffer pointed to by the destination address of the definition.
  • Otherwise, uncertain = true.

In order words, the uncertain = false is basically the case where we have proven that it will overwrite the entire buffer (essentially because we've managed to trace all paths getting here back to an &x for some local variable x), and uncertain = true is when this wasn't possible.

@jketema
Copy link
Contributor

jketema commented Jan 23, 2026

Thanks. In the case you're handling here there are effectively no paths, if I understand correctly, so wouldn't uncertain = false also be sound?

@MathiasVP
Copy link
Contributor Author

We're handling that we couldn't conclude whether we know about all the paths. Consider something like this (which is outside the scope of the C/C++ SSA since we don't have field-based SSA):

void foo(int**);

struct S {
  int* p;
};

S s;
foo(&s.p);
*s.p = 42;

When we reach *s.p = 42; the buffer pointed to by s.p may be a single integer, or it may be an array of integers. So consider this example:

S s;
foo(&s.p);
*s.p = source();
... // (1)
*s.p = 42;
... // (2)
sink(*s.p);

On main we don't have a result for asDefinition for *s.p = 42;. So now that we add a result we need to pick the value for uncertain. So let's consider the two situations:
When uncertain = false the write is assumed to totally overwrite the entire SSA variable, so that there's no flow from source() to sink().

When uncertain = true the we will flow the old value through the write. The thinking here is that // (1) may be something like ++s.p; and // (2) may be something like --s.p; so that the write to *s.p = 42 doesn't invalidate the initial write to *s.p = source();.

@jketema
Copy link
Contributor

jketema commented Jan 23, 2026

Thanks for the explanation. That makes sense.

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if DCA is happy.

@MathiasVP
Copy link
Contributor Author

DCA was uneventful 🎉

@MathiasVP MathiasVP merged commit cabcb83 into github:main Jan 23, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants