Coder Social home page Coder Social logo

Comments (13)

mxmlnkn avatar mxmlnkn commented on July 30, 2024

Sorry, I don't quite understand the exact circumstances under which it is happening. Maybe you have different versions of indexed_bzip2 installed in Anaconda and in Ubuntu directly? Can you single out during which code line it is happening? Is it happening during the import statement or during the creation of an IndexedBzip2File object or when calling read/seek on that object?

The most helpful would be a backtrace of the segfault. It might be possible to simply call your non-MWE example with gdb -ex r -ex bt --args python3 large-project.py. It might be that the indexed_bzip2 shared library is built without debug symbols, I'm not sure. In that case, you would need to install indexed_bzip2 from source and add the -g option inside the setup.py file right beside the already existing -O3 option.

from indexed_bzip2.

ozancaglayan avatar ozancaglayan commented on July 30, 2024

Hi,

Sorry for the rushed comment. More info:

  • In both cases I use pip to install ratarmount which installs the same indexed_bzip2 package from PyPI
  • The issue does not occur in my dev machine which has Ubuntu 20.04 installed but started happening in Github runners and Docker builds with 22.04 images, that's why my intuition was that this is due to some libs or toolchains in the OS. To be justified
  • It's not happening if I do import indexed_bzip2. The issue occurs if I run a script using functionalities from my codebase e.g. a rather deep import chain including imports to torch, numpy, scipy, matplotlib, sklearn, ratarmountcore. I can make the issue go away if I remove some of the imports. I'm not even sure it is the imports causing the issue or the bare existence of the indexed_bzip2 so file.

This is the error I'm getting if indexed_bzip2 is installed:

$ script
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted

pip uninstall indexed-bzip2 fixes the issue. The version of the package is: indexed_bzip2-1.4.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl

GDB backtrace

It seems to be related to some string processing somewhere in indexed_bzip2.

#2  0x00007fcac0dc77ec in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6                                                                                                   [150/22615]
#3  0x00007fcac0dd2966 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fcac0dd29d1 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fcac0dd2c65 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fcac0dc742a in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007fcac0e607b2 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocato
r<char> > const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007fc9d5eb2c65 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign (__str="", this=0x7ffce48d9540)
    at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/basic_string.h:1387
#9  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator= (__str="", this=0x7ffce48d9540)
    at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/basic_string.h:681
#10 std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_match_token (token=std::__detail::_ScannerBase::_S_token_subexpr_begin, this=0x7ffce48d9430)
    at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/regex_compiler.tcc:593
#11 std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom (this=this@entry=0x7ffce48d9430) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/regex_compiler.tcc:340
#12 0x00007fc9d5eb30b0 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term (this=0x7ffce48d9430)
    at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/regex_compiler.tcc:136
#13 std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term (this=0x7ffce48d9430) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/regex_compiler.tcc:136
#14 std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative (this=0x7ffce48d9430) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/regex_compiler.tcc:123
#15 0x00007fc9d5eb3389 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction (this=this@entry=0x7ffce48d9430)
    at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/regex_compiler.tcc:99
#16 0x00007fc9d5eb3c2e in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler (this=0x7ffce48d9430, __b=<optimized out>, __e=<optimized out>, __loc=...,
    __flags=<optimized out>) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/regex_compiler.tcc:84
#17 0x00007fc9d5eb430f in std::__detail::__compile_nfa<std::__cxx11::regex_traits<char>, char const*> (__first=__first@entry=0x7fc9d5ed1320 "(-)?(0x)?([0-9a-zA-Z]+)|((0x)?0)",
    __last=__last@entry=0x7fc9d5ed1340 "", __loc=..., __flags=<optimized out>) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/regex_compiler.h:183
#18 0x00007fc9d5eb44a6 in std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::basic_regex<char const*> (__f=<optimized out>, __loc=..., __last=0x7fc9d5ed1340 "",
    __first=0x7fc9d5ed1320 "(-)?(0x)?([0-9a-zA-Z]+)|((0x)?0)", this=0x7fc9d5ef61a0 <cxxopts::values::(anonymous namespace)::integer_pattern>)
    at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/move.h:104
#19 std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::basic_regex<char const*> (__f=<optimized out>, __last=0x7fc9d5ed1340 "",
    __first=0x7fc9d5ed1320 "(-)?(0x)?([0-9a-zA-Z]+)|((0x)?0)", this=0x7fc9d5ef61a0 <cxxopts::values::(anonymous namespace)::integer_pattern>)
    at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/regex.h:508
#20 std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::basic_regex (this=0x7fc9d5ef61a0 <cxxopts::values::(anonymous namespace)::integer_pattern>,
    __p=0x7fc9d5ed1320 "(-)?(0x)?([0-9a-zA-Z]+)|((0x)?0)", __f=<optimized out>) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/regex.h:441
#21 0x00007fc9d5e5b8ee in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at external/cxxopts/include/cxxopts.hpp:475
#22 _GLOBAL__sub_I_indexed_bzip2.cpp(void) () at indexed_bzip2.cpp:14247
#23 0x00007fcadfa28fe2 in call_init (l=<optimized out>, argc=argc@entry=3, argv=argv@entry=0x7ffce48e1e78, env=env@entry=0x556a6c0693f0) at dl-init.c:72
#24 0x00007fcadfa290e9 in call_init (env=0x556a6c0693f0, argv=0x7ffce48e1e78, argc=3, l=<optimized out>) at dl-init.c:30
#25 _dl_init (main_map=0x556a6d602950, argc=3, argv=0x7ffce48e1e78, env=0x556a6c0693f0) at dl-init.c:119
#26 0x00007fcadf646aed in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:182
#27 0x00007fcadfa2d364 in dl_open_worker (a=a@entry=0x7ffce48d99b0) at dl-open.c:758
#28 0x00007fcadf646a90 in __GI__dl_catch_exception (exception=0x7ffce48d9990, operate=0x7fcadfa2cca0 <dl_open_worker>, args=0x7ffce48d99b0) at dl-error-skeleton.c:208
#29 0x00007fcadfa2c8fa in _dl_open (file=0x7fc9d5f65bd0 "/usr/local/lib/python3.10/site-packages/indexed_bzip2.cpython-310-x86_64-linux-gnu.so", mode=-2147483390,
    caller_dlopen=0x7fcadf8e5e12, nsid=-2, argc=3, argv=0x7ffce48d9990, env=0x556a6c0693f0) at dl-open.c:837
#30 0x00007fcadf4ea258 in dlopen_doit (a=a@entry=0x7ffce48d9bd0) at dlopen.c:66
#31 0x00007fcadf646a90 in __GI__dl_catch_exception (exception=exception@entry=0x7ffce48d9b70, operate=0x7fcadf4ea200 <dlopen_doit>, args=0x7ffce48d9bd0) at dl-error-skeleton.c:208
#32 0x00007fcadf646b4f in __GI__dl_catch_error (objname=0x556a66d36230, errstring=0x556a66d36238, mallocedp=0x556a66d36228, operate=<optimized out>, args=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at dl-error-skeleton.c:227
#33 0x00007fcadf4eaa65 in _dlerror_run (operate=operate@entry=0x7fcadf4ea200 <dlopen_doit>, args=args@entry=0x7ffce48d9bd0) at dlerror.c:170
#34 0x00007fcadf4ea2e4 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87

from indexed_bzip2.

mxmlnkn avatar mxmlnkn commented on July 30, 2024

First off let me check: Are you running out of memory? That would be the naive interpretation of the bad_alloc exception. Normally, however, it happens because the programmer tried to allocate a vector or string with length -1, which gets converted to unsigned 64-bit, therefore 16 Exabinarybytes.

I'm not even sure it is the imports causing the issue or the bare existence of the indexed_bzip2 so file.

According to the backtrace, it is the import that is causing the issue. This is also why uninstalling indexed_bzip2 works. ratarmount always tries to import indexed_bzip2 but when the import fails, it simply works without it until of course, you try to open a bzip2 file.

This is what is happening, indexed_bzip2 has a command line interface so that you can decompressed bzip2 files using the ibzip2 binary installed with the Python package. ibzip2 simply calls the cli function on the C++ side. Then, the C++ side uses cxxopts for argument parsing. This is also were the string processing comes from. In order to speed up parsing, cxxopts uses static initialization for its used regexes. I.e., those regexes are compiled as soon as the shared library is loaded. The parsing of the regexes itself happens inside the C++ standard library implementation inside std::regex. This seems to be the line:

std::basic_regex<char> integer_pattern
        ("(-)?(0x)?([0-9a-zA-Z]+)|((0x)?0)");

There even is an old issue for that same variable. Unfortunately, it isn't helpful at all because it basically says to update the compiler but the backtrace shows that the GCC 11 regex library is used.

During the regex processing something seems to go awry. I have no idea what. It looks to me like the regex implementation is broken. I guess I could try to understand it thanks to the line information in the backtrace but this seems hard.

As I cannot reproduce it, it is hard for me to test alternatives :/. What I can try and do is to update cxxopts and push a new 1.4.1 release and wait for you to test out the new version. Unfortunately, I cannot update the compiler much because it is inside the manylinux Docker container.

What I could try and do, even without being able to reproduce the bug, is to modify cxxopts to not initialize those regexes during opening of the shared library. I feel liken this would be the clean workaround as it also reduces startup time but it still might result in the error when the command line interface is used as it does not fix the underlying issue.

from indexed_bzip2.

ozancaglayan avatar ozancaglayan commented on July 30, 2024

Thanks! If I can come up with an import chain that reproduces this, I'll let you know. I'm able to reproduce it on ubuntu:22.04 docker image. I dont think this is related to memory, the host has plenty of memory and everything works just fine without the indexed_bzip2 module.

from indexed_bzip2.

ozancaglayan avatar ozancaglayan commented on July 30, 2024

This is really triggered by the import from ratarmountcore but only in combination with other extensions installed into the site-packages and getting loaded in the import chain probably. Can't reproduce it by just importing ratarmount or indexed_bzip2

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Fatal Python error: Aborted

Current thread 0x00007fb99e6da740 (most recent call first):
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1176 in create_module
  File "<frozen importlib._bootstrap>", line 571 in module_from_spec
  File "<frozen importlib._bootstrap>", line 674 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "/usr/local/lib/python3.10/site-packages/ratarmountcore/compressions.py", line 14 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "/usr/local/lib/python3.10/site-packages/ratarmountcore/__init__.py", line 45 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load

from indexed_bzip2.

mxmlnkn avatar mxmlnkn commented on July 30, 2024

Can't reproduce it by just importing ratarmount or indexed_bzip2

Can you try this:

python3 -c 'import indexed_bzip2 as ibz2; ibz2.cli()' --help

from indexed_bzip2.

ozancaglayan avatar ozancaglayan commented on July 30, 2024

it works without an issue : )

from indexed_bzip2.

mxmlnkn avatar mxmlnkn commented on July 30, 2024

it works without an issue : )

I guess that's good even though I wanted to trigger the bug in a minimal example. It's really very weird that it only happens with a complex example. The only reasons coming to mind are:

  • Some internal state inside libstdc++ for regex parsing that gets changed by other stuff before being used by cxxopts/std::regex
  • Different indexed_bzip2 shared library being used maybe because something changes the PYTHONPATH
  • Paths to system libraries were changed somehow in the complex example before importing indexed_bzip2
  • Different Python version being used

These are mostly hypothetical.

from indexed_bzip2.

ozancaglayan avatar ozancaglayan commented on July 30, 2024

By the way, gdb shows me lots of threads and I wonder whether this could be related to some non thread-safe code somewhere. I'm really obsessed by this, if I can reduce it to an MWE, I'll definitely share it.

from indexed_bzip2.

mxmlnkn avatar mxmlnkn commented on July 30, 2024

That could be a reason. But who is starting those threads? A simple import shouldn't start threads inside indexed_bzip2. One other thought that occurred to me: Could it be that indexed_bzip2 is imported twice somehow? Maybe from different modules or different threads? Normally, it shouldn't be an issue if it is imported twice but still.

from indexed_bzip2.

mxmlnkn avatar mxmlnkn commented on July 30, 2024

I pushed a fix that tries to delay the std::regex initialization until the first use. This commit automatically closed the issue but as I can't test it out, feel free to reopen if it doesn't fix your issue. I definitely would like to hear back if it fixes your issue.

While looking around for other workarounds, I was thinking about statically linking libstdc++. And while looking for possible cons, I found some statements, that it is generally a bad idea when your code is used via dlopen because it might lead to some functions using the statically linked version and other parts using the shared version.

Could it be that in your complex example one other dependency is statically linked against libstdc++? I guess in order to test that, one would have to collect all imports somehow and then a simple test Python script would call all those imports and then indexed_bzip2 at the end. If it doesn't trigger the problem, then I guess the import order and amount of imported stuff before importing indexed_bzip2 might also have to be tested for. This could be done in a kind of bash loop. Then again, I'm completely spitballing ideas at this point.

from indexed_bzip2.

ozancaglayan avatar ozancaglayan commented on July 30, 2024

Okay, 1.5.0 seems to fix this, thank you!

from indexed_bzip2.

mxmlnkn avatar mxmlnkn commented on July 30, 2024

Thanks for the feedback!

from indexed_bzip2.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.