-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Shared: Provenance-based filtering of flow summaries #21051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Shared: Provenance-based filtering of flow summaries #21051
Conversation
a3e585d to
eb48820
Compare
1e946f8 to
30a0791
Compare
0fbea88 to
5a2881d
Compare
5a2881d to
a941f4a
Compare
bf632b3 to
c6383ff
Compare
9f81377 to
0057ae3
Compare
1933d1c to
72dfe9c
Compare
117690f to
4a90f84
Compare
@michaelnebel : I have pushed a revert change that appears to fix this: When I run |
5d74edd to
27c102a
Compare
| c.fromSource() and | ||
| not c.getFile().isStub() and | ||
| not ( | ||
| c.getFile().extractedQlTest() and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this deserves a comment (that ql test files where the body is just a throw are considered stub like and thus not a part of the source code).
michaelnebel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work @hvitved !
Only a couple of minor questions/remarks.
yoff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python 👍
|
Offline feedback recap: Some of the added Java models look wrong. Tom and I identified several issues: the code snippet to identify and generate the missing models lacked the signature, and notably the signature can be different from the overridden method. Also, some existing manual exact models were missing signatures, which caused them to wrongly apply to inherited overloads. |
Missing manual models were added using the following code added to `FlowSummaryImpl.qll`:
```ql
private predicate testsummaryElement(
Input::SummarizedCallableBase c, string namespace, string type, boolean subtypes, string name,
string signature, string ext, string originalInput, string originalOutput, string kind,
string provenance, string model, boolean isExact
) {
exists(string input, string output, Callable baseCallable |
summaryModel(namespace, type, subtypes, name, signature, ext, originalInput, originalOutput,
kind, provenance, model) and
baseCallable = interpretElement(namespace, type, subtypes, name, signature, ext, isExact) and
(
c.asCallable() = baseCallable and input = originalInput and output = originalOutput
or
correspondingKotlinParameterDefaultsArgSpec(baseCallable, c.asCallable(), originalInput,
input) and
correspondingKotlinParameterDefaultsArgSpec(baseCallable, c.asCallable(), originalOutput,
output)
)
)
}
private predicate testsummaryElement2(
string namespace, string type, boolean subtypes, string name, string signature, string ext,
string originalInput, string originalOutput, string kind, string provenance, string model,
string namespace2, string type2
) {
exists(Input::SummarizedCallableBase c |
testsummaryElement(c, namespace2, type2, _, _, _, ext, originalInput, originalOutput, kind,
provenance, model, false) and
testsummaryElement(c, namespace, type, subtypes, name, _, _, _, _, _, provenance, _, true) and
signature = paramsString(c.asCallable()) and
not testsummaryElement(c, _, _, _, _, _, _, originalInput, originalOutput, kind, provenance,
_, true)
)
}
private string getAMissingManualModel(string namespace2, string type2) {
exists(
string namespace, string type, boolean subtypes, string name, string signature, string ext,
string originalInput, string originalOutput, string kind, string provenance, string model
|
testsummaryElement2(namespace, type, subtypes, name, signature, ext, originalInput,
originalOutput, kind, provenance, model, namespace2, type2) and
result =
"- [\"" + namespace + "\", \"" + type + "\", True, \"" + name + "\", \"" + signature +
"\", \"\", \"" + originalInput + "\", \"" + originalOutput + "\", \"" + kind + "\", \"" +
provenance + "\"]"
)
}
```
27c102a to
eaaa4f2
Compare
| ) | ||
| } | ||
|
|
||
| private predicate testsummaryElement( |
Check warning
Code scanning / CodeQL
Dead code Warning
| ) | ||
| } | ||
|
|
||
| private predicate testsummaryElement2( |
Check warning
Code scanning / CodeQL
Dead code Warning
| ) | ||
| } | ||
|
|
||
| private string getAMissingManualModel(string namespace2, string type2) { |
Check warning
Code scanning / CodeQL
Dead code Warning
|
|
||
| private string getAMissingManualModel(string namespace2, string type2) { | ||
| exists( | ||
| string namespace, string type, boolean subtypes, string name, string signature, string ext, |
Check warning
Code scanning / CodeQL
Omittable 'exists' variable Warning
in this argument
|
|
||
| private string getAMissingManualModel(string namespace2, string type2) { | ||
| exists( | ||
| string namespace, string type, boolean subtypes, string name, string signature, string ext, |
Check warning
Code scanning / CodeQL
Omittable 'exists' variable Warning
in this argument
| private string getAMissingManualModel(string namespace2, string type2) { | ||
| exists( | ||
| string namespace, string type, boolean subtypes, string name, string signature, string ext, | ||
| string originalInput, string originalOutput, string kind, string provenance, string model |
Check warning
Code scanning / CodeQL
Omittable 'exists' variable Warning
This PR aligns the logic across languages for how flow summaries are prioritized based on provenance and exactness (that is, whether a model is defined directly for a function or for a function that is implemented/overridden).
A flow summary is considered relevant if:
Note that for dynamic languages we currently pretend that no source code is available for functions with flow summaries, so 3.(a) holds vacuously.
Points 2 and 3.c represent a change for e.g. Java, where we would previously union exact and inexact models, which meant that it was not possible to overrule inexact models. As a consequence, some inexact manual have been replicated. DCA for Java reports some lost
java/sensitive-logresults onapache_solr, but looking at those results, they all have flow paths of length > 150, so they are almost certainly false positives, and most likely a consequence of 3.c.In order for the logic to be defined in the shared flow summary library, I had to move provenance and exactness information into the
propagatesFlowpredicate, which is a breaking change.Lastly, I have applied the
::Rangepattern to theSummarizedCallableclass for all languages except C++, which currently does not expose this class. This means thatSummarizedCallable::Rangewill contain all flow summaries, whereasSummarizedCallablewill only contain relevant summaries.