Coder Social home page Coder Social logo

bcdb-private's People

Contributors

andrewf29 avatar dependabot[bot] avatar theo25 avatar yotann avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bcdb-private's Issues

a few residual "Attributes 'byval' and 'inalloca' do not support unsized types!"

this is not serious since it only happens a few times in a very large number of bc files, but I have run into some more instances of this

regehr@john-home:~/tmp$ ~/bcdb/build/bin/bc-split -o foo foo.bc 
Attributes 'byval' and 'inalloca' do not support unsized types!
  %59 = call { i8*, i32 } %57(%0* %0, %1* nonnull %9, i32 %2, i8* %3, i32 %4, %5* byval nonnull align 8 %58, i64 %6) #4
Attributes 'byval' and 'inalloca' do not support unsized types!
  %85 = call { i8*, i32 } %84(%0* %0, %1* nonnull %10, i32 %2, i8* %3, i32 %4, %4* byval nonnull align 8 %12, i64 %6) #4
Attributes 'byval' and 'inalloca' do not support unsized types!
  %115 = call { i8*, i32 } %114(%0* %0, %1* nonnull %13, i32 5, i8* %96, i32 %22, %4* byval nonnull align 8 %15, i64 %6) #4
Attributes 'byval' and 'inalloca' do not support unsized types!
  %142 = call { i8*, i32 } %141(%0* %0, %1* nonnull %16, i32 %2, i8* %3, i32 %4, %4* byval nonnull align 8 %18, i64 %6) #4
bc-split: could not verify module part

foo.bc.gz

Rename struct types based on their contents

LLVM automatically combines equivalent structs into one. For example, all type { i8 } structs may be combined into %"class.cmsys::SystemToolsManager" = type { i8 }. There may also be multiple different structs with the same name, such as class.std::__2::vector and class.std::__2::vector.57.

These issues mean that slight changes to a program's source code may cause lots of its struct names to change, preventing deduplication. When splitting functions into their own modules, we should either remove struct names or give them names based on the hashes of their contents. That way, if two functions used the same struct in the original source code, they should use the same struct name after splitting.

Note: LLVM does not seem to combine multiple opaque structs into one.

Can't build on Ubuntu 20.04

Tried with the following packages:

cmake
g++
llvm-11-dev
pkg-config
libsodium-dev
libsqlite3-dev
librocksdb-dev
zlib1g-dev
libncurses-dev

Ran into the following problems:

  • librocksdb-dev doesn't include cmake files
  • CMakeLists.txt doesn't check whether clang and clang++ can be run
  • Clang and LLVM executables are named like not-11, clang++-11 (actually, they're just installed in /usr/lib/llvm-11/bin/).
  • SQLite too old to support sqlite3_txn_state()
  • Python executable may be named python3
  • The official LLVM build for Ubuntu 20.04 is built with exceptions and RTTI disabled, which I can't support. I'm instantiating llvm::cl::opt with RTTI enabled, and the generated RTTI is referring to RTTI for LLVM classes, which is missing.

likely non-conformant C++

BCDB builds fine using g++, but using clang++ I get the following errors:

regehr@john-home:~/bcdb/build$ make VERBOSE=1
/usr/bin/cmake -H/home/regehr/bcdb -B/home/regehr/bcdb/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/regehr/bcdb/build/CMakeFiles /home/regehr/bcdb/build/CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/home/regehr/bcdb/build'
make -f lib/Split/CMakeFiles/BCDBSplit.dir/build.make lib/Split/CMakeFiles/BCDBSplit.dir/depend
make[2]: Entering directory '/home/regehr/bcdb/build'
cd /home/regehr/bcdb/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/regehr/bcdb /home/regehr/bcdb/lib/Split /home/regehr/bcdb/build /home/regehr/bcdb/build/lib/Split /home/regehr/bcdb/build/lib/Split/CMakeFiles/BCDBSplit.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/home/regehr/bcdb/build'
make -f lib/Split/CMakeFiles/BCDBSplit.dir/build.make lib/Split/CMakeFiles/BCDBSplit.dir/build
make[2]: Entering directory '/home/regehr/bcdb/build'
[  5%] Building CXX object lib/Split/CMakeFiles/BCDBSplit.dir/Join.cpp.o
cd /home/regehr/bcdb/build/lib/Split && /home/regehr/souper-regehr/third_party/llvm/Release/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/regehr/souper-regehr/third_party/llvm/Release/include -I/home/regehr/bcdb/include  -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -ffunction-sections -fdata-sections   -Wall -UNDEBUG  -fno-exceptions -fno-rtti -std=c++14 -o CMakeFiles/BCDBSplit.dir/Join.cpp.o -c /home/regehr/bcdb/lib/Split/Join.cpp
In file included from /home/regehr/bcdb/lib/Split/Join.cpp:1:
/home/regehr/bcdb/include/bcdb/Split.h:14:7: warning: 'bcdb::SplitLoader' has virtual functions but non-virtual destructor
      [-Wnon-virtual-dtor]
class SplitLoader {
      ^
/home/regehr/bcdb/include/bcdb/Split.h:21:7: warning: 'bcdb::SplitSaver' has virtual functions but non-virtual destructor
      [-Wnon-virtual-dtor]
class SplitSaver {
      ^
/home/regehr/bcdb/lib/Split/Join.cpp:85:14: error: call to deleted constructor of 'llvm::Error'
      return Err;
             ^~~
/home/regehr/souper-regehr/third_party/llvm/Release/include/llvm/Support/Error.h:182:3: note: 'Error' has been explicitly marked
      deleted here
  Error(const Error &Other) = delete;
  ^
/home/regehr/souper-regehr/third_party/llvm/Release/include/llvm/Support/Error.h:449:18: note: passing argument to parameter
      'Err' here
  Expected(Error Err)
                 ^
2 warnings and 1 error generated.
lib/Split/CMakeFiles/BCDBSplit.dir/build.make:62: recipe for target 'lib/Split/CMakeFiles/BCDBSplit.dir/Join.cpp.o' failed
make[2]: *** [lib/Split/CMakeFiles/BCDBSplit.dir/Join.cpp.o] Error 1
make[2]: Leaving directory '/home/regehr/bcdb/build'
CMakeFiles/Makefile2:543: recipe for target 'lib/Split/CMakeFiles/BCDBSplit.dir/all' failed
make[1]: *** [lib/Split/CMakeFiles/BCDBSplit.dir/all] Error 2
make[1]: Leaving directory '/home/regehr/bcdb/build'
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
regehr@john-home:~/bcdb/build$ 

Join gets struct types mixed up in certain unimportant cases

If the original module has duplicate struct names like %struct.foo and %struct.foo.1, and there's a function that uses %struct.foo.1 but doesn't depend on its actual contents, the joined module may incorrectly have that function use %struct.foo instead.

See LLVM's IRLinker::computeTypeMapping for the code which decides %struct.foo and %struct.foo.1 must actually be the same type. This only happens if the types are isomorphic, or if one of them has been replaced with opaque, so there's no actual functional change.

bc-split crash: Assertion `(Flags & RF_IgnoreMissingLocals) && "Referenced value not in value map!"' failed.

Attached bitcode (processed using "opt -strip-debug -strip-named-metadata -metarenamer") triggers a crash. BCDB is built against LLVM 7 in Ubuntu 18.04.

foo.bc.gz

$ ~/bcdb/build/bin/bc-split foo.bc -o bar
bc-split: ../lib/Transforms/Utils/ValueMapper.cpp:879: void (anonymous namespace)::Mapper::remapInstruction(llvm::Instruction *): Assertion `(Flags & RF_IgnoreMissingLocals) && "Referenced value not in value map!"' failed.
Stack dump:
0.	Program arguments: /home/regehr/bcdb/build/bin/bc-split foo.bc -o bar 
#0 0x000056183b051a34 PrintStackTraceSignalHandler(void*) (/home/regehr/bcdb/build/bin/bc-split+0x99a34)
#1 0x000056183b04f7be llvm::sys::RunSignalHandlers() (/home/regehr/bcdb/build/bin/bc-split+0x977be)
#2 0x000056183b051bf2 SignalHandler(int) (/home/regehr/bcdb/build/bin/bc-split+0x99bf2)
#3 0x00007fa13b1c4890 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12890)
#4 0x00007fa13a079e97 gsignal /build/glibc-OTsEL5/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
#5 0x00007fa13a07b801 abort /build/glibc-OTsEL5/glibc-2.27/stdlib/abort.c:81:0
#6 0x00007fa13a06b39a __assert_fail_base /build/glibc-OTsEL5/glibc-2.27/assert/assert.c:89:0
#7 0x00007fa13a06b412 (/lib/x86_64-linux-gnu/libc.so.6+0x30412)
#8 0x000056183b128c1e (anonymous namespace)::Mapper::remapInstruction(llvm::Instruction*) (/home/regehr/bcdb/build/bin/bc-split+0x170c1e)
#9 0x000056183b128f93 (anonymous namespace)::Mapper::remapFunction(llvm::Function&) (/home/regehr/bcdb/build/bin/bc-split+0x170f93)
#10 0x000056183b128d92 llvm::ValueMapper::remapFunction(llvm::Function&) (/home/regehr/bcdb/build/bin/bc-split+0x170d92)
#11 0x000056183b0950ef llvm::RemapFunction(llvm::Function&, llvm::ValueMap<llvm::Value const*, llvm::WeakTrackingVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) /home/regehr/souper-regehr/third_party/llvm/Release/include/llvm/Transforms/Utils/ValueMapper.h:268:0
#12 0x000056183b092715 ExtractFunction(llvm::Module&, llvm::Function&, (anonymous namespace)::NeededTypeMap&) /home/regehr/bcdb/lib/Split/Split.cpp:386:0
#13 0x000056183b0929e7 bcdb::SplitModule(std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, bcdb::SplitSaver&) /home/regehr/bcdb/lib/Split/Split.cpp:406:0
#14 0x000056183b00bb8f main /home/regehr/bcdb/tools/bc-split/bc-split.cpp:103:0
#15 0x00007fa13a05cb97 __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:344:0
#16 0x000056183b00b06a _start (/home/regehr/bcdb/build/bin/bc-split+0x5306a)
Aborted

bc-split internal linkage issue?

When I run "opt -O1" on a file produced by bc-split, the resulting module is empty. Can we (perhaps optionally) have bc-split mark its functions as having external linkage so this doesn't happen?

Rename anonymous strings like .str.491

Slight changes to source files can cause the string numbering to be different, which prevents deduplication of identical functions from working. If we rename strings like ".str.123" to use names based on a hash of the contents, this won't be a problem.

Replace types with opaque when possible

When splitting certain C++ bitcode files (like libQtWebKit), many of the resulting modules have more than 100 times as many struct definitions as actual lines of code. Very few of these struct definitions are actually needed by the module; they're pulled in because of other structs that point to them. This is a bottleneck for both splitting and joining. We should fix this by detecting which struct types are actually used by the module, and replacing all the others with opaque types.

It may be useful to examine llvm::ValueEnumerator, which knows how to list all types used by a function.

bc-split crash

I'm getting this crash when I try to split a bitcode file.

regehr@john-home:~/tmp$ ~/bcdb/build/bin/bc-split ~/bitcode/mvqFvS0_g7.bc -o foo
bc-split: /home/regehr/souper-regehr/third_party/llvm/Release/include/llvm/Support/Casting.h:106: static bool llvm::isa_impl_cl<To, const From*>::doit(const From*) [with To = llvm::ValueAsMetadata; From = llvm::Metadata]: Assertion `Val && "isa<> used on a null pointer"' failed.
Stack dump:
0.	Program arguments: /home/regehr/bcdb/build/bin/bc-split /home/regehr/bitcode/mvqFvS0_g7.bc -o foo 
#0 0x0000559f4b1bea34 PrintStackTraceSignalHandler(void*) (/home/regehr/bcdb/build/bin/bc-split+0x99a34)
#1 0x0000559f4b1bc7be llvm::sys::RunSignalHandlers() (/home/regehr/bcdb/build/bin/bc-split+0x977be)
#2 0x0000559f4b1bebf2 SignalHandler(int) (/home/regehr/bcdb/build/bin/bc-split+0x99bf2)
#3 0x00007fe444d6e890 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12890)
#4 0x00007fe443c23e97 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x3ee97)
#5 0x00007fe443c25801 abort (/lib/x86_64-linux-gnu/libc.so.6+0x40801)
#6 0x00007fe443c1539a (/lib/x86_64-linux-gnu/libc.so.6+0x3039a)
#7 0x00007fe443c15412 (/lib/x86_64-linux-gnu/libc.so.6+0x30412)
#8 0x0000559f4b208090 llvm::isa_impl_cl<llvm::ValueAsMetadata, llvm::Metadata const*>::doit(llvm::Metadata const*) /home/regehr/souper-regehr/third_party/llvm/Release/include/llvm/Support/Casting.h:107:0
#9 0x0000559f4b207571 llvm::isa_impl_wrap<llvm::ValueAsMetadata, llvm::Metadata const*, llvm::Metadata const*>::doit(llvm::Metadata const* const&) /home/regehr/souper-regehr/third_party/llvm/Release/include/llvm/Support/Casting.h:134:0
#10 0x0000559f4b2062a1 llvm::isa_impl_wrap<llvm::ValueAsMetadata, llvm::Metadata const* const, llvm::Metadata const*>::doit(llvm::Metadata const* const&) /home/regehr/souper-regehr/third_party/llvm/Release/include/llvm/Support/Casting.h:126:0
#11 0x0000559f4b204d83 bool llvm::isa<llvm::ValueAsMetadata, llvm::Metadata const*>(llvm::Metadata const* const&) /home/regehr/souper-regehr/third_party/llvm/Release/include/llvm/Support/Casting.h:145:0
#12 0x0000559f4b2034a6 llvm::cast_retty<llvm::ValueAsMetadata, llvm::Metadata const*>::ret_type llvm::dyn_cast<llvm::ValueAsMetadata, llvm::Metadata const>(llvm::Metadata const*) /home/regehr/souper-regehr/third_party/llvm/Release/include/llvm/Support/Casting.h:334:0
#13 0x0000559f4b1fe6a3 (anonymous namespace)::NeededTypeMap::VisitMetadata(llvm::Metadata const*) /home/regehr/bcdb/lib/Split/Split.cpp:223:0
#14 0x0000559f4b1fe754 (anonymous namespace)::NeededTypeMap::VisitMetadata(llvm::Metadata const*) /home/regehr/bcdb/lib/Split/Split.cpp:226:0
#15 0x0000559f4b1fe754 (anonymous namespace)::NeededTypeMap::VisitMetadata(llvm::Metadata const*) /home/regehr/bcdb/lib/Split/Split.cpp:226:0
#16 0x0000559f4b1feba3 (anonymous namespace)::NeededTypeMap::VisitFunction(llvm::Function&) /home/regehr/bcdb/lib/Split/Split.cpp:258:0
#17 0x0000559f4b1ff452 ExtractFunction(llvm::Module&, llvm::Function&, (anonymous namespace)::NeededTypeMap&) /home/regehr/bcdb/lib/Split/Split.cpp:344:0
#18 0x0000559f4b1ff9e7 bcdb::SplitModule(std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, bcdb::SplitSaver&) /home/regehr/bcdb/lib/Split/Split.cpp:406:0
#19 0x0000559f4b178b8f main /home/regehr/bcdb/tools/bc-split/bc-split.cpp:103:0
#20 0x00007fe443c06b97 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b97)
#21 0x0000559f4b17806a _start (/home/regehr/bcdb/build/bin/bc-split+0x5306a)
Aborted
regehr@john-home:~/tmp$ 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.