Coder Social home page Coder Social logo

llvm-debuginfo-analyzer's Introduction

Introduction

LLVM supports multiple debug information formats (namely DWARF and CodeView) in different binary formats (e.g. ELF, PDB, Mach-O). Understanding the mappings between source code and debug information can be complex, and it is a problem we have commonly encountered when triaging debug information issues.

The output from tools such as llvm-dwarfdump, llvm-readobj or llvm-pdbutil use a close representation of the internal debug information format and in our experience, we have found that they require a good knowledge of those formats to understand the output, limiting who can triage and address such issues quickly. Even for the experts, it can sometimes take a lot of time and effort to triage issues due to the inherent complexity.

llvm-debuginfo-analyzer

At Sony, we have been developing an LLVM-based debug information analysis tool which we have called llvm-debuginfo-analyzer (short for LLVM debug information visual analyzer), designed to visualize these mappings. It’s based entirely on the existing LLVM libraries for debug info parsing, target support, etc. and at this stage we believe that its proven its worth internally to the point where we would like to propose upstreaming it as part of the mainline LLVM project alongside existing tools such as llvm-dwarfdump.

llvm-debuginfo-analyzer is a command line tool that process debug info contained in a binary file and produces a debug information format agnostic “Logical View”, which is a high-level semantic representation of the debug info, independent of the low-level format.

The logical view is composed of the tradition programming elements as: scopes, types, symbols, lines. These elements can display additional information, such as variable coverage factor, lexical block level, disassembly code, code ranges, etc.

The diversity of llvm-debuginfo-analyzer command line options enables the creation of very rich logical views to include more low-level debug information: disassembly code associated with the debug lines, variables runtime location and coverage, internal offsets for the elements within the binary file, etc.

With llvm-debuginfo-analyzer, we aim to address the following points:

  • Which variables are dropped due to optimization?
  • Why I cannot stop at a particular line?
  • Which lines are associated to a specific code range?
  • Does the debug information represent the original source?
  • What is the semantic difference between the debug info generated by different toolchain versions?

Printing Mode

In this mode llvm-diva prints the logical view or portions of it, based on criteria patterns (including regular expressions) to select the kind of logical elements to be included in the output.

The below example is used to show different output generated by llvm-diva. We then compiled it for an x86 ELF target with a recent version of clang (-O0 -g):

1  using INTPTR = const int *;
2  int foo(INTPTR ParamPtr, unsigned ParamUnsigned, bool ParamBool) {
3    if (ParamBool) {
4      typedef int INTEGER;
5      const INTEGER CONSTANT = 7;
6      return CONSTANT;
7    }
8    return ParamUnsigned;
9  }

Print basic details

The following command prints basic details for all logical elements sorted by the debug information internal offset; it includes its lexical level.

llvm-debuginfo-analyzer --attribute=level,format
                        --output-sort=offset
                        --print=scopes,symbols,types,lines,instructions
                        test-dwarf-clang.o

Each row represents an element that is present within the debug information. The first column represents the scope level, followed by the associated line number (if any), and finally the description of the element.

Logical View:
[000]           {File} 'test-dwarf-clang.o' -> elf64-x86-64

[001]             {CompileUnit} 'test.cpp'
[002]     2         {Function} extern not_inlined 'foo' -> 'int'
[003]     2           {Parameter} 'ParamPtr' -> 'INTPTR'
[003]     2           {Parameter} 'ParamUnsigned' -> 'unsigned int'
[003]     2           {Parameter} 'ParamBool' -> 'bool'
[003]                 {Block}
[004]     5             {Variable} 'CONSTANT' -> 'const INTEGER'
[004]     5             {Line}
[004]                   {Code} 'movl  $0x7, -0x1c(%rbp)'
[004]     6             {Line}
[004]                   {Code} 'movl  $0x7, -0x4(%rbp)'
[004]                   {Code} 'jmp   0x6'
[004]     8             {Line}
[004]                   {Code} 'movl  -0x14(%rbp), %eax'
[003]     4           {TypeAlias} 'INTEGER' -> 'int'
[003]     2           {Line}
[003]                 {Code} 'pushq   %rbp'
[003]                 {Code} 'movq    %rsp, %rbp'
[003]                 {Code} 'movb    %dl, %al'
[003]                 {Code} 'movq    %rdi, -0x10(%rbp)'
[003]                 {Code} 'movl    %esi, -0x14(%rbp)'
[003]                 {Code} 'andb    $0x1, %al'
[003]                 {Code} 'movb    %al, -0x15(%rbp)'
[003]     3           {Line}
[003]                 {Code} 'testb   $0x1, -0x15(%rbp)'
[003]                 {Code} 'je      0x13'
[003]     8           {Line}
[003]                 {Code} 'movl    %eax, -0x4(%rbp)'
[003]     9           {Line}
[003]                 {Code} 'movl    -0x4(%rbp), %eax'
[003]                 {Code} 'popq    %rbp'
[003]                 {Code} 'retq'
[003]     9           {Line}
[002]     1         {TypeAlias} 'INTPTR' -> '* const int'

On closer inspection, we can see what could be a potential debug issue:

[003]                 {Block}
[003]     4           {TypeAlias} 'INTEGER' -> 'int'

The ‘INTEGER’ definition is at level [003], the same lexical scope as the anonymous {Block} (‘true’ branch for the ‘if’ statement) whereas in the original source code the typedef statement is clearly inside that block, so the ‘INTEGER’ definition should also be at level [004] inside the block.

Select logical elements

This feature allows selecting specific logical elements; the patterns used as criteria can include regular expressions. The output layout is controlled by the ‘–report’ option to have a tabular report, a tree view showing the parents hierarchy for the logical element that matches the criteria, or just a summary with the number of occurrences.

The following prints all instructions, symbols and types that contain ‘inte’ or ‘movl’ in their names or types, using a tab layout and given the number of matches.

llvm-debuginfo-analyzer --attribute=level
                        --select-nocase --select-regex --select=INTe --select=movl
                        --report=list
                        --print=symbols,types,instructions,summary
                        test-dwarf-clang.o

Logical View:
[000]           {File} 'test-dwarf-clang.o'

[001]           {CompileUnit} 'test.cpp'
[003]           {Code} 'movl  $0x7, -0x1c(%rbp)'
[003]           {Code} 'movl  $0x7, -0x4(%rbp)'
[003]           {Code} 'movl  %eax, -0x4(%rbp)'
[003]           {Code} 'movl  %esi, -0x14(%rbp)'
[003]           {Code} 'movl  -0x14(%rbp), %eax'
[003]           {Code} 'movl  -0x4(%rbp), %eax'
[003]     4     {TypeAlias} 'INTEGER' -> 'int'
[004]     5     {Variable} 'CONSTANT' -> 'const INTEGER'

-----------------------------
Element      Total      Found
-----------------------------
Scopes           3          0
Symbols          4          1
Types            2          1
Lines           17          6
-----------------------------
Total           26          8

Comparison mode

In this mode llvm-debuginfo-analyzer compares logical views to produce a report with the logical elements that are missing or added. This a very powerful aid in finding semantic differences in the debug information produced by different toolchain versions or even completely different toolchains altogether (For example a compiler producing DWARF can be directly compared against a completely different compiler that produces CodeView).

Given the previous example we found the above debug information issue (related to the previous invalid scope location for the ‘typedef int INTEGER’) by comparing against another compiler.

Using GCC to generate test-dwarf-gcc.o, we can apply a selection pattern with the printing mode to obtain the following logical view output.

llvm-debuginfo-analyzer --attribute=level
                        --select-regex --select-nocase --select=INTe
                        --report=list
                        --print=symbols,types
                        test-dwarf-clang.o test-dwarf-gcc.o

Logical View:
[000]           {File} 'test-dwarf-clang.o'

[001]           {CompileUnit} 'test.cpp'
[003]     4     {TypeAlias} 'INTEGER' -> 'int'
[004]     5     {Variable} 'CONSTANT' -> 'const INTEGER'

Logical View:
[000]           {File} 'test-dwarf-gcc.o'

[001]           {CompileUnit} 'test.cpp'
[004]     4     {TypeAlias} 'INTEGER' -> 'int'
[004]     5     {Variable} 'CONSTANT' -> 'const INTEGER'

The output shows that both objects contain the same elements. But the ‘typedef INTEGER’ is located at different scope level. The GCC generated object, shows ‘4’, which is the correct value.

Note that there is no requirement that GCC must produce identical or similar DWARF to Clang to allow the comparison. We are only comparing the semantics. The same case when comparing CodeView debug information generated by MSVC and Clang.

There are 2 comparison methods: logical view and logical elements.

Logical View

It compares the logical view as a whole unit; for a match, each compared logical element must have the same parents and children.

Using the llvm-debuginfo-analyzer comparison functionality, that issue can be seen in a more global context, that can include the logical view.

The output shows in view form the missing (-), added (+) elements, giving more context by swapping the reference and target object files.

llvm-debuginfo-analyzer --attribute=level
                        --compare=types
                        --report=view
                        --print=symbols,types
                        test-dwarf-clang.o test-dwarf-gcc.o

Reference: 'test-dwarf-clang.o'
Target:    'test-dwarf-gcc.o'

Logical View:
 [000]           {File} 'test-dwarf-clang.o'

 [001]             {CompileUnit} 'test.cpp'
 [002]     1         {TypeAlias} 'INTPTR' -> '* const int'
 [002]     2         {Function} extern not_inlined 'foo' -> 'int'
 [003]                 {Block}
 [004]     5             {Variable} 'CONSTANT' -> 'const INTEGER'
+[004]     4             {TypeAlias} 'INTEGER' -> 'int'
 [003]     2           {Parameter} 'ParamBool' -> 'bool'
 [003]     2           {Parameter} 'ParamPtr' -> 'INTPTR'
 [003]     2           {Parameter} 'ParamUnsigned' -> 'unsigned int'
-[003]     4           {TypeAlias} 'INTEGER' -> 'int'

The output shows the merging view path (reference and target) with the missing and added elements.

Logical Elements

It compares individual logical elements without considering if their parents are the same. For both comparison methods, the equal criteria include the name, source code location, type, lexical scope level.

llvm-debuginfo-analyzer --attribute=level
                        --compare=types
                        --report=list
                        --print=symbols,types,summary
                        test-dwarf-clang.o test-dwarf-gcc.o

Reference: 'test-dwarf-clang.o'
Target:    'test-dwarf-gcc.o'

(1) Missing Types:
-[003]     4     {TypeAlias} 'INTEGER' -> 'int'

(1) Added Types:
+[004]     4     {TypeAlias} 'INTEGER' -> 'int'

----------------------------------------
Element   Expected    Missing      Added
----------------------------------------
Scopes           4          0          0
Symbols          0          0          0
Types            2          1          1
Lines            0          0          0
----------------------------------------
Total            6          1          1

Changing the Reference and Target order:

llvm-debuginfo-analyzer --attribute=level
                        --compare=types
                        --report=list
                        --print=symbols,types,summary
                        test-dwarf-gcc.o test-dwarf-clang.o

Reference: 'test-dwarf-gcc.o'
Target:    'test-dwarf-clang.o'

(1) Missing Types:
-[004]     4     {TypeAlias} 'INTEGER' -> 'int'

(1) Added Types:
+[003]     4     {TypeAlias} 'INTEGER' -> 'int'

----------------------------------------
Element   Expected    Missing      Added
----------------------------------------
Scopes           4          0          0
Symbols          0          0          0
Types            2          1          1
Lines            0          0          0
----------------------------------------
Total            6          1          1

As the Reference and Target are switched, the Added Types from the first case now are listed as Missing Types.

LLVM issues

LLVM debug issues found using llvm-debuginfo-analyzer, in DWARF and CodeView debug formats:

Conclusion

We have created the following incrementally depending patches with a very specific functionality:

  1. Interval tree
  2. Driver and documentation
  3. Logical elements
  4. Locations and ranges
  5. Select elements
  6. Warning and internal options
  7. Compare elements
  8. ELF Reader
  9. CodeView Reader

Getting in touch

Join discord chat.

llvm-debuginfo-analyzer's People

Contributors

lattner avatar topperc avatar rksimon avatar espindola avatar tkremenek avatar rotateright avatar ddunbar avatar douggregor avatar arsenm avatar d0k avatar rui314 avatar zygoloid avatar chandlerc avatar maskray avatar isanbard avatar echristo avatar nico avatar dwblaikie avatar rnk avatar chapuni avatar labath avatar nikic avatar akyrtzi avatar stoklund avatar jdevlieghere avatar eefriedman avatar tobiasgrosser avatar lhames avatar resistor avatar klausler avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.