Coder Social home page Coder Social logo

Erroneous CFA generation vol 2 about gazer HOT 13 CLOSED

ftsrg avatar ftsrg commented on July 18, 2024
Erroneous CFA generation vol 2

from gazer.

Comments (13)

as3810t avatar as3810t commented on July 18, 2024

I think this is a serious issue, so @sallaigy @radl97 can one of you take a look?

from gazer.

sallaigy avatar sallaigy commented on July 18, 2024

What is the output without optimizations (-no-optimize) and without variable elimination (-elim-vars=off)? What is the output if the source file only contains the code snippet which is translated erroneously?

from gazer.

as3810t avatar as3810t commented on July 18, 2024

With passing both -no-optimize and -elim-vars=off, the result is the same:

    loc22 -> loc23 {
        main_main_bb20_tmp24_inlined0 := (main_main_bb20__1_i_inlined0 bv_sign_extend bv[32])
        main_main_bb20_tmp25_inlined0 := (main_main_bb20__12_i_inlined0 bv_sign_extend bv[32])
        main_main_bb20_tmp26_inlined0 := (main_main_bb20_tmp24_inlined0 bvsrem main_main_bb20_tmp25_inlined0)
        main_main_bb20_tmp27_inlined0 := (main_main_bb20_tmp26_inlined0)[8:0]
    }

    loc23 -> loc24 {
    }

    loc24 -> loc18 {
        main_main_bb20__12_i_inlined0 := main_main_bb20_tmp27_inlined0
        main_main_bb20__1_i_inlined0 := main_main_bb20__12_i_inlined0
    }

from gazer.

as3810t avatar as3810t commented on July 18, 2024

For a smaller example, the generated code is correct (also when using if instead of while). I tried it with -no-optimize and -elim-vars=off and without them as well.

signed char x = __VERIFIER_nondet_char();
signed char y = __VERIFIER_nondet_char();
signed char g;

while(x != y) {
      g = y;
      y = x % y;
      x = g;

      __VERIFIER_assert(x != y);
}

The generated excerpt:

loc18 -> loc10 {
        main_main_bb3__01_inlined0 := main_main_bb3__0_inlined0
        main_main_bb3__0_inlined0 := main_main_bb3_tmp11_inlined0
}

from gazer.

sallaigy avatar sallaigy commented on July 18, 2024

This is a PHI assignment. Are you confident that the assignment on the other incoming branch does not make this code correct? Does the verification produce an incorrect result (either with theta or the BMC algorithm)?

from gazer.

as3810t avatar as3810t commented on July 18, 2024

The verification produces an incorrect result for the problem in the first comment with gazer-theta, but a correct result with gazer-bmc.

In the example the error location should not be reachable, however, gazer-theta produces a trace, which claims that the return value of the gcd function is 0, which can be traced back to that incorrectly handled swap:

Error trace:
------------
#0 in function main():
  call __VERIFIER_nondet_char() returned #5Dbv8          at 29:21
  x := 93       (0b01011101)     at 29:21
  call __VERIFIER_nondet_char() returned #3bv8           at 30:21
  y := 3        (0b00000011)     at 30:21
#1 in function gcd_test(#5Dbv8,#3bv8):
  a := 93       (0b01011101)     at 29:21
  b := 3        (0b00000011)     at 30:21
  a := 93       (0b01011101)     at 15:9
  b := 3        (0b00000011)     at 16:9
  a := 93       (0b01011101)
  b := 3        (0b00000011)
  t := 3        (0b00000011)
  b := 0        (0b00000000)     at 20:13
  a := 3        (0b00000011)
  a := 0        (0b00000000)
  b := 0        (0b00000000)
  a := 0        (0b00000000)
  a := 0        (0b00000000)
  return #0bv8
#2 in function main():
  g := 0        (0b00000000)
#3 in function __VERIFIER_assert(#0bv32):
  cond := 0     (0b00000000000000000000000000000000)     at 36:29

from gazer.

radl97 avatar radl97 commented on July 18, 2024

Can you run theta-cfa -pipeline test.c?
This should create a digraph for all the different loops and functions. I think only gcd and it's loop is important here. (it uses the CFA representation by Gazer, not the Theta one)

from gazer.

as3810t avatar as3810t commented on July 18, 2024

@radl97 The digraph can be found here

from gazer.

sallaigy avatar sallaigy commented on July 18, 2024

I looked at the generated automata and the original LLVM module. The generated code seems to be a faithful representation of the code coming from LLVM, which also looks correct (and BMC provides a correct result on it).

%.12.i contains the srem result from the previous iteration and the eventual return value %.1.lcssa is set to contain this value.

bb13:                                             ; preds = %bb15, %bb8
  %.12.i = phi i8 [ %.01.i, %bb8 ], [ %tmp19, %bb15 ], !dbg !28
  %.1.i = phi i8 [ %spec.select.i, %bb8 ], [ %.12.i, %bb15 ], !dbg !28
  call void @llvm.dbg.value(metadata i8 %.1.i, metadata !33, metadata !DIExpression()) #4, !dbg !28
  call void @llvm.dbg.value(metadata i8 %.12.i, metadata !34, metadata !DIExpression()) #4, !dbg !28
  %tmp14 = icmp eq i8 %.12.i, 0, !dbg !46                                                    ; b != 0
  br i1 %tmp14, label %gcd_test.exit, label %bb15, !dbg !45

bb15:                                             ; preds = %bb13
  call void @llvm.dbg.value(metadata i8 %.12.i, metadata !35, metadata !DIExpression()) #4, !dbg !28 ; t = b
  %tmp16 = sext i8 %.1.i to i32, !dbg !47
  %tmp17 = sext i8 %.12.i to i32, !dbg !49
  %tmp18 = srem i32 %tmp16, %tmp17, !dbg !50
  %tmp19 = trunc i32 %tmp18 to i8, !dbg !47
  call void @llvm.dbg.value(metadata i8 %tmp19, metadata !34, metadata !DIExpression()) #4, !dbg !28
  call void @llvm.dbg.value(metadata i8 %.12.i, metadata !33, metadata !DIExpression()) #4, !dbg !28
  br label %bb13, !dbg !45, !llvm.loop !51

gcd_test.exit:                                    ; preds = %bb13
  %.1.i.lcssa = phi i8 [ %.1.i, %bb13 ], !dbg !28

@as3810t Unless you are able to pinpoint the semantic difference yourself, I would recommend to look into the backend algorithm executed by theta.

from gazer.

radl97 avatar radl97 commented on July 18, 2024

@as3810t I think you sent the wrong file.
Note that there are multiple generated files, one for each loop and each function.

The interesting files should be .gcd.dot and .gcd.loop.dot in this context.

Oh, I think gazer-cfa does not exactly work like that... :/ Sorry

from gazer.

as3810t avatar as3810t commented on July 18, 2024

I found the following cases, where I think the outputted CFA is semantically different than the input C program. In each case, I generated the CFA with gazer-theta <file> --model-only -o <outfile>, without any additional parameters. The textual CFA is the exact output of gazer-cfa, while on the image I did some modification to make it more readable:

  • I text replaced (some of) the variable names to match their C source counterpart
  • I deleted nodes that had only one skip statement on their only outgoing edge.
Filename Source CFA (text) CFA (img) Notes
gcd_1.c .c .cfa .png The swap in lines 19-21 in gcd_1.c does not match the operations on transition between loc24 -> loc21. To verify let x = 93, y = 3. In this scenario, the assertion passes in the C file (g = 3), but fails in the CFA (g = 0).
num_conversion_2.c .c .cfa .png The cycles (lines 21-27 and loc15-loc18) do not match, as in the CFA, the order of the assignment on y and on c is swapped. To verify let x = 9. In this scenario, the assertion passes in the C file (y = 9), but fails in the CFA (y = 8).
sum02-2.c .c .cfa .png The cycles (lines 21-23 and loc12-loc18) do not match, as in the CFA, the order of the assignment on i and on sn is swapped. To verify let n = 0. In this scenario, the assertion passes in the C file (sn = 0), but fails in the CFA (sn = 1).

All cases have in common:

  • gazer-theta verifies incorrectly (with PRED_CART and NWT_IT_WP) and produces a counterexample that fits the input CFA (manually checked every one of them)
  • gazer-bmc verifies correctly (or bound reached)

from gazer.

radl97 avatar radl97 commented on July 18, 2024

Thanks for the detailed report.

Gazer's CFA assumes some kind of SSA-form, but uses more assumptions right now, I think. (A strict SSA would need Phi nodes. Now the only assumption is (should be) that one loop or one function assigns only one value per path for a given variable)

This is handled by "PhiInputs". At the start of the loop, when a variable is changed in the loop. There could be two assignments where the value could come from: from before the loop or inside the loop in a previous iteration.
The loop is transformed to a function with multiple inputs or outputs.

So gcd_loop(a,b) returns gcd_loop(b,a%b). I think the problem lies in the phi input assignments affecting each other (when building up these in Theta).

Input assignments are part of the inlining process when instead of "calling", one assigns the input variables with the given value.

This could be in some inlining mechanism or in RecursiveToCyclicCfa.cpp lines 192- and 257-

from gazer.

radl97 avatar radl97 commented on July 18, 2024

The solution could be a bit hard. Assigning in two stages (a' := b; b' := a%b; a:=a'; b:=b';) could be a solution.

from gazer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.