Comments (18)
Hi! Yes, this is correct. In the PDG we generate, we overtaint severely, and we then use semantics at query-time to ignore defs. I'll be writing a detailed article about this soon, and since you're the third person to ask about this this week alone, I think a decent API to retrieve DDG edges according to the data flow semantics is in order.
from codepropertygraph.
Yes, by default, we assume that calls to methods taint their arguments. I know I've promised an article about this, which I haven't gotten to, but the intuition is that this makes sense for exploratory code analysis because it will by default return longer more detailed flows containing calls to escaping routines, even if these escaping routines are external methods that have not been annotated with a semantic.
For ddgIn
/ddgPathElem
, the unit tests in this PR show how it is used. I've also polished the API a bit.
from codepropertygraph.
Great, thanks for the clarification! If it makes sense to you, just adding additional "DDG" edges similar to the "CDG" edges would be pretty intuitive, at least in my usage of the library thus far, along with the brief explanation of difference between REACHING_DEF and DDG edges. Thanks again!
from codepropertygraph.
Related: joernio/joern#333
from codepropertygraph.
Work-in-progress PR at #958
from codepropertygraph.
Update: one can now traverse the DDG backwards step by step using ddgIn
and ddgPathElement
(#958) , taking into account the semantics. The plotting code also takes into account semantics now. I'll move to fixing the script for PDG export, as that seems to be a common use case.
from codepropertygraph.
from codepropertygraph.
@fabsx00 I revisited this today. My understanding is that I can now define semantics
that contain a list of function names and the dataflow pattern for each argument, then use something like dotDdg
to generate edges. I tried this for the example above. I add the semantics rule: "foo" 1->-1
. However, I still get the same edges reported in the first post. Is this still the expected behavior, or is there a path to getting the expected edges (i.e. someway to indicate that a parameter is not DEF'd in the call)?
Thanks!
from codepropertygraph.
How are you retrieving edges? If you're looking at all REACHING_DEF
edges, then that will not take into account semantics. dotDdg
on the other hand should now return the edges, taking into account the semantics.
from codepropertygraph.
Here is what I did:
./joern-parse myfunc.c
importCpg("cpg.bin")
import io.shiftleft.dataflowengineoss.semanticsloader.{Parser, Semantics}
implicit val s = Semantics.fromList(new Parser().parseFile("../ddg.semantics"))
cpg.method.name("myfunc").dotDdg.l.head
Where ddg.semantics
contains:
"<operator>.assignment" 2->1
"<operators>.assignmentAnd" 2->1 1->1
"<operators>.assignmentArithmeticShiftRight" 2->1 1->1
"<operator>.assignmentDivision" 2->1 1->1
"<operators>.assignmentExponentiation" 2->1 1->1
"<operators>.assignmentLogicalShiftRight" 2->1 1->1
"<operator>.assignmentMinus" 2->1 1->1
"<operators>.assignmentModulo" 2->1 1->1
"<operator>.assignmentMultiplication" 2->1 1->1
"<operators>.assignmentOr" 2->1 1->1
"<operator>.assignmentPlus" 2->1 1->1
"<operators>.assignmentShiftLeft" 2->1 1->1
"<operators>.assignmentXor" 2->1 1->1
"<operator>.postDecrement" 1->1
"<operator>.preDecrement" 1->1
"<operator>.postIncrement" 1->1
"<operator>.preIncrement" 1->1
"<operator>.memberAccess" 1->-1
"<operator>.indirectComputedMemberAccess" 1->-1
"<operator>.indirectMemberAccess" 1->-1
"<operator>.computedMemberAccess" 1->-1
"<operator>.indirection" 1->-1
"<operator>.addressOf" 1->-1
"<operator>.fieldAccess" 1->-1
"<operator>.indirectFieldAccess" 1->-1
"<operator>.indexAccess" 1->-1
"<operator>.indirectIndexAccess" 1->-1
"<operator>.pointerShift" 1->-1
"<operator>.getElementPtr" 1->-1
"<operator>.addition" 1->-1 2->-1
"<operator>.conditional" 2->-1 3->-1
"foo" 1->-1
And here is the output:
res3: String = """digraph myfunc {
"1000108" [label = "(<operator>.assignment,a = 42)" ]
"1000109" [label = "(IDENTIFIER,a,a = 42)" ]
"1000110" [label = "(LITERAL,42,a = 42)" ]
"1000111" [label = "(foo,foo(a))" ]
"1000112" [label = "(IDENTIFIER,a,foo(a))" ]
"1000113" [label = "(bar,bar(a))" ]
"1000114" [label = "(IDENTIFIER,a,bar(a))" ]
"1000105" [label = "(METHOD,myfunc)" ]
"1000115" [label = "(METHOD_RETURN,void)" ]
"1000109" -> "1000108" [ label = "a"]
"1000110" -> "1000108" [ label = "42"]
"1000110" -> "1000109" [ label = "42"]
"1000105" -> "1000109"
"1000105" -> "1000110"
"1000112" -> "1000111" [ label = "a"]
"1000109" -> "1000112" [ label = "a"]
"1000105" -> "1000112"
"1000114" -> "1000113" [ label = "a"]
"1000112" -> "1000114" [ label = "a"]
"1000105" -> "1000114"
"1000105" -> "1000115"
}
"""
I was expecting an edge from 1000109 -> 1000114.
from codepropertygraph.
What you're doing looks correct. Let me look into it.
from codepropertygraph.
Ok, I understand the problem now. Fix is coming up. That idea of just following incoming REACHING_DEF edges while honoring the semantics to then obtain a nice DDG is a bit flawed. I failed to take into account the visible
field in the path elements. This shouldn't be hard to fix though.
from codepropertygraph.
This should fix it: #968
Bringing into joern. I'll close when it works in Joern.
from codepropertygraph.
Confirmed to work in joern in a local deployment, however, the Semantics
implicit needs to be set by hand (as you do in your example). This needs to be fixed, but I'll do that in a separate PR.
from codepropertygraph.
@kzsnow the problem should be fixed on joern's current master. Could you take a look? If it works, we can close the ticket. I'm hoping to find time soon to clean up the generated DDGs a bit. In particular, it would probably be nicer to draw edges from calls to calls, now that edges are labeled with the variable that is propagated.
from codepropertygraph.
@fabsx00 It works, yay! Feel free to close the issue!
Is it the case that by default call arguments arguments will be treated as def
s, unless specifically added to the semantics
to indicate otherwise? I am envisioning some type propagation to try and give a best guess based on whether the argument type is a pointer.
Would it be possible to give a quick example of using ddgIn
or ddgPathElement
to get a list of edges similar to the dotDdg output? Those I could not figure out :-)
from codepropertygraph.
from codepropertygraph.
@kzsnow this may be of interest to you as well: joernio/joern#357
from codepropertygraph.
Related Issues (20)
- cpgqls python client closes connection before queries finish HOT 1
- Some header files not found with new beta C/C++ frontend HOT 2
- New C frontend: `METHOD` stubs for external methods not present HOT 1
- New C/C++ frontend: missing TYPE nodes for lambdas/template-types
- [ newc ] Type is missing for locals
- newc: signature
- [ newc ] Missing method stub HOT 4
- [newc] Base class in separate namespace not correctly identified
- Missing size of char array in CPG
- Question regarding `NodeRef`s HOT 2
- Does this tool support generating cpg from java code? HOT 2
- Errors during createAndApply of ParallelCpgPass not escalated HOT 9
- Wrong result for `\\` in the label in the dot file HOT 2
- Some operators have the wrong name (typo) HOT 1
- by using this joern library , i want CPG in ideal format as given in research papers in which CFG and AST nodes must not connected HOT 1
- Can code contain inheritance relationships between classes?
- `sbt package` failing on M1 macOS
- Introduce `POSSIBLE_TYPES` property
- Inclusion of schema.json generated by schema2json HOT 2
- Will the `Hidden` defines in the schema be exported ? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from codepropertygraph.