Coder Social home page Coder Social logo

Comments (18)

fabsx00 avatar fabsx00 commented on September 15, 2024 1

Hi! Yes, this is correct. In the PDG we generate, we overtaint severely, and we then use semantics at query-time to ignore defs. I'll be writing a detailed article about this soon, and since you're the third person to ask about this this week alone, I think a decent API to retrieve DDG edges according to the data flow semantics is in order.

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024 1

Yes, by default, we assume that calls to methods taint their arguments. I know I've promised an article about this, which I haven't gotten to, but the intuition is that this makes sense for exploratory code analysis because it will by default return longer more detailed flows containing calls to escaping routines, even if these escaping routines are external methods that have not been annotated with a semantic.

For ddgIn/ddgPathElem, the unit tests in this PR show how it is used. I've also polished the API a bit.

#969

from codepropertygraph.

kzsnow avatar kzsnow commented on September 15, 2024

Great, thanks for the clarification! If it makes sense to you, just adding additional "DDG" edges similar to the "CDG" edges would be pretty intuitive, at least in my usage of the library thus far, along with the brief explanation of difference between REACHING_DEF and DDG edges. Thanks again!

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024

Related: joernio/joern#333

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024

Work-in-progress PR at #958

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024

Update: one can now traverse the DDG backwards step by step using ddgIn and ddgPathElement (#958) , taking into account the semantics. The plotting code also takes into account semantics now. I'll move to fixing the script for PDG export, as that seems to be a common use case.

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024

#963

from codepropertygraph.

kzsnow avatar kzsnow commented on September 15, 2024

@fabsx00 I revisited this today. My understanding is that I can now define semantics that contain a list of function names and the dataflow pattern for each argument, then use something like dotDdg to generate edges. I tried this for the example above. I add the semantics rule: "foo" 1->-1. However, I still get the same edges reported in the first post. Is this still the expected behavior, or is there a path to getting the expected edges (i.e. someway to indicate that a parameter is not DEF'd in the call)?

Thanks!

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024

How are you retrieving edges? If you're looking at all REACHING_DEF edges, then that will not take into account semantics. dotDdg on the other hand should now return the edges, taking into account the semantics.

from codepropertygraph.

kzsnow avatar kzsnow commented on September 15, 2024

Here is what I did:

./joern-parse myfunc.c

importCpg("cpg.bin")
import io.shiftleft.dataflowengineoss.semanticsloader.{Parser, Semantics}
implicit val s = Semantics.fromList(new Parser().parseFile("../ddg.semantics"))
cpg.method.name("myfunc").dotDdg.l.head

Where ddg.semantics contains:

"<operator>.assignment" 2->1
"<operators>.assignmentAnd" 2->1 1->1
"<operators>.assignmentArithmeticShiftRight" 2->1 1->1
"<operator>.assignmentDivision" 2->1 1->1
"<operators>.assignmentExponentiation" 2->1 1->1
"<operators>.assignmentLogicalShiftRight" 2->1 1->1
"<operator>.assignmentMinus" 2->1 1->1
"<operators>.assignmentModulo" 2->1 1->1
"<operator>.assignmentMultiplication" 2->1 1->1
"<operators>.assignmentOr" 2->1 1->1
"<operator>.assignmentPlus" 2->1 1->1
"<operators>.assignmentShiftLeft" 2->1 1->1
"<operators>.assignmentXor" 2->1 1->1
"<operator>.postDecrement" 1->1
"<operator>.preDecrement" 1->1
"<operator>.postIncrement" 1->1
"<operator>.preIncrement" 1->1
"<operator>.memberAccess" 1->-1
"<operator>.indirectComputedMemberAccess" 1->-1
"<operator>.indirectMemberAccess" 1->-1
"<operator>.computedMemberAccess" 1->-1
"<operator>.indirection" 1->-1
"<operator>.addressOf" 1->-1
"<operator>.fieldAccess" 1->-1
"<operator>.indirectFieldAccess" 1->-1
"<operator>.indexAccess" 1->-1
"<operator>.indirectIndexAccess" 1->-1
"<operator>.pointerShift" 1->-1
"<operator>.getElementPtr" 1->-1
"<operator>.addition" 1->-1 2->-1
"<operator>.conditional" 2->-1 3->-1
"foo" 1->-1

And here is the output:

res3: String = """digraph myfunc {
"1000108" [label = "(<operator>.assignment,a = 42)" ]
"1000109" [label = "(IDENTIFIER,a,a = 42)" ]
"1000110" [label = "(LITERAL,42,a = 42)" ]
"1000111" [label = "(foo,foo(a))" ]
"1000112" [label = "(IDENTIFIER,a,foo(a))" ]
"1000113" [label = "(bar,bar(a))" ]
"1000114" [label = "(IDENTIFIER,a,bar(a))" ]
"1000105" [label = "(METHOD,myfunc)" ]
"1000115" [label = "(METHOD_RETURN,void)" ]
  "1000109" -> "1000108"  [ label = "a"]
  "1000110" -> "1000108"  [ label = "42"]
  "1000110" -> "1000109"  [ label = "42"]
  "1000105" -> "1000109"
  "1000105" -> "1000110"
  "1000112" -> "1000111"  [ label = "a"]
  "1000109" -> "1000112"  [ label = "a"]
  "1000105" -> "1000112"
  "1000114" -> "1000113"  [ label = "a"]
  "1000112" -> "1000114"  [ label = "a"]
  "1000105" -> "1000114"
  "1000105" -> "1000115"
}
"""

I was expecting an edge from 1000109 -> 1000114.

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024

What you're doing looks correct. Let me look into it.

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024

Ok, I understand the problem now. Fix is coming up. That idea of just following incoming REACHING_DEF edges while honoring the semantics to then obtain a nice DDG is a bit flawed. I failed to take into account the visible field in the path elements. This shouldn't be hard to fix though.

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024

This should fix it: #968
Bringing into joern. I'll close when it works in Joern.

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024

Confirmed to work in joern in a local deployment, however, the Semantics implicit needs to be set by hand (as you do in your example). This needs to be fixed, but I'll do that in a separate PR.

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024

@kzsnow the problem should be fixed on joern's current master. Could you take a look? If it works, we can close the ticket. I'm hoping to find time soon to clean up the generated DDGs a bit. In particular, it would probably be nicer to draw edges from calls to calls, now that edges are labeled with the variable that is propagated.

from codepropertygraph.

kzsnow avatar kzsnow commented on September 15, 2024

@fabsx00 It works, yay! Feel free to close the issue!

Is it the case that by default call arguments arguments will be treated as defs, unless specifically added to the semantics to indicate otherwise? I am envisioning some type propagation to try and give a best guess based on whether the argument type is a pointer.

Would it be possible to give a quick example of using ddgIn or ddgPathElement to get a list of edges similar to the dotDdg output? Those I could not figure out :-)

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024

#973

from codepropertygraph.

fabsx00 avatar fabsx00 commented on September 15, 2024

@kzsnow this may be of interest to you as well: joernio/joern#357

from codepropertygraph.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.