Coder Social home page Coder Social logo

Comments (16)

BenWiederhake avatar BenWiederhake commented on July 29, 2024

tl;dr: What was the actual output?

"segfault", or more correctly "segmentation violation … fault" means that a program tried to access memory it isn't allowed to access. None of your previous errors were segfaults. It is confusing if you use such a specific term if it doesn't apply. It's like saying "Here, please hold my UMF silver-sterling coated luxurious spoon" while handing you a chair. I say this because it might improve your communication skills.

Signal: 6 (ABRT)

So this isn't a segfault either, because then it would have said:

Signal: 11 (SEGV)

It's an abort, so I guess it's a failing assert. When this happens, a failing program usually prints out (to stderr) some interesting information about where it failed and the condition that failed. Do you have this output, or can you reproduce it? I'm asking because I want to see the actual line and condition that failed. My copy of mpih-mul3.c doesn't contain any asserts, because we probably have slightly different versions.

I'll start writing a patch for TGLC_bn_is_prime (the "switch" layer that lets tgl switch between openssl and gcrypt at compile-time) that outputs all kind of information, in the hope that you can reproduce it more easily.

To your edit: thanks for the additional info! In this case, from the occurrence of tglmp_check_DH_params it's already clear where and when this error happens.

from tgl.

herndlm avatar herndlm commented on July 29, 2024

Thanks for pointing that out to me! Don't know why I had the segfault thing in my head and wasn't considering that it could be something else although it was stated right there as SIGABRT.. I'm really appreciating your help!

I looked through the spectrum2 logs again which had nothing in them except a failing PING to their backend process which was not running any more. I then tried to attach myself with strace to another backend process before it exited and ended up with just "+++ killed by SIGABRT (core dumped) +++" and also directly looking at the process descriptors in the file system revealed nothing, so I guess there is no output at all.

If we are talking about libcrypt than it's the one I have installed on my system, right? And independent from tgl? I looked into the source code (compiled it just earlier with debug flags) and also saw no asserts in "mpih-mul3.c". Right now I'm using the latest stable vanilla version 1.7.3

from tgl.

BenWiederhake avatar BenWiederhake commented on July 29, 2024

I really hope the spectrum-guys finally implement something like useful logging. Until then, please try to find some hack to get the stderr output of spectrum2_libpurple_backend, something like (as a superuser):

  • Hide the real executable (mv /usr/bin/spectrum2_libpurple_backend /usr/bin/spectrum2_libpurple_backend.real)

  • Write a new one that calls the old one like this:

    #!/bin/sh
    
    # None of the arguments contain spaces, which makes this a lot easier
    /usr/bin/spectrum2_libpurple_backend.real $@ > /tmp/spectrumbackend_stdout.log 2> /tmp/spectrumbackend_stderr.log
    

    (Don't forget to chmod +x /usr/bin/spectrum2_libpurple_backend afterwards!)

With this, the actual message of the abort should appear in the corresponding files in /tmp.

Why so complicated? Well, that's the easiest quick-and-dirty solution I can think of, given that spectrum ignores any and all logs. I'm only interested in the actual abort message (and, later on, the debug output of the path I'm writing), so go with whatever works.

from tgl.

herndlm avatar herndlm commented on July 29, 2024

I see, did something like that when my mail sending via a java exchange gateway did not work reliable at work.
Unfortunately I got nothing here:

# tail -f /tmp/spectrumbackend_std*
==> /tmp/spectrumbackend_stderr.log <==

==> /tmp/spectrumbackend_stdout.log <==

==> /tmp/spectrumbackend_stderr.log <==
/usr/bin/spectrum2_libpurple_backend: line 4:  8312 Aborted                 (core dumped) /usr/bin/spectrum2_libpurple_backend.real $@ > /tmp/spectrumbackend_stdout.log 2> /tmp/spectrumbackend_stderr.log

from tgl.

BenWiederhake avatar BenWiederhake commented on July 29, 2024

This is awful. And unnecessarily complicated.

Step 0: Do the previous hack

Outputting to stderr is a bit easier.
There's nothing to do for you, I just want to keep it comprehensive for the reader.

Step 1: Inject the logging code into tglc

  • You could either apply BenWiederhake@ab5ac54 directly
  • You could visit that page and make the modifications on your own
  • Or go the "official" way:
telegram-purple/tgl/ $ git remote add ben https://github.com/BenWiederhake/tgl.git
telegram-purple/tgl/ $ git fetch ben
telegram-purple/tgl/ $ git checkout ben/tmp/log-isprime

Step 2: Recreate

You have to build tgl explicitly, because this kind of fine-grained change isn't always detected properly by telegram-purple's Makefile.

telegram-purple/tgl/ $ make
telegram-purple/tgl/ $ cd ..
telegram-purple/ $ make

Step 3: Try again

This time, your /tmp/spectrumbackend_stderr.log should end "abruptly". A normal call would output something like this:

TGLC_bn_is_prime: About to call gcry_prime_check on a=0x1e77020, bytes=3
Argument is: 12D6FF
TGLC_bn_is_prime: Done.

Step 4: What.

  • If it really is a crash/bug in gcrypt (which I strongly doubt), then the last line will be missing, and we can easily reproduce this bug in a simple ten-line program.
  • If this is a bug in tglc/tgl/tgp (e.g., bad call that violates some precondition), then the Argument is line will be missing, and we can probably see what's wrong by the previous line.
  • If the abort actually happens somewhere else entirely (this is what I expect), then the output may be cut off at any point (but probably in a "neat" place, i.e., after a Done line.)

from tgl.

herndlm avatar herndlm commented on July 29, 2024

Well, thanks again! Did the new checkouts, re-compiled and re-started my tests. Got the following this time:

tail -f /tmp/spectrumbackend_stderr.log 

tail: /tmp/spectrumbackend_stderr.log: file truncated
TGLC_bn_is_prime: About to call gcry_prime_check on a=0x1d62d8
c71caeb9c6b1c9048e6c522f70f13f73980d40238e3e21c14934d037563d930f48198a0aa7c14058229493d22530f4dbfa336f6e0ac925139543aed44cce7c3720fd51f69458705ac68cd4fe6b6b13abdc9746512969328454f18faf8c595f642477fe96bb2a941d5bcd1d4ac8cc49880708fa9b378e3c4f3a9060bee67cf9a4a4a695811051907e162753b56b0f6b410dba74d8a84b2a14b3144e0ef1284754fd17ed950d5965b4b9dd46582db1178d169c6bc465b0d6ff9ca3928fef5b9ae4e418fc15e83ebea0f87fa9ff5eed70050ded2849f47bf959d956850ce929851f0d8115f635b105ee2e4e15d04b2454bf6f4fadf034b10403119cd8e3b92fcc5b
Done dumping.
tail: /tmp/spectrumbackend_stderr.log: file truncated
TGLC_bn_is_prime: About to call gcry_prime_check on a=0x1ff5c70
c71caeb9c6b1c9048e6c522f70f13f73980d40238e3e21c14934d037563d930f48198a0aa7c14058229493d22530f4dbfa336f6e0ac925139543aed44cce7c3720fd51f69458705ac68cd4fe6b6b13abdc9746512969328454f18faf8c595f642477fe96bb2a941d5bcd1d4ac8cc49880708fa9b378e3c4f3a9060bee67cf9a4a4a695811051907e162753b56b0f6b410dba74d8a84b2a14b3144e0ef1284754fd17ed950d5965b4b9dd46582db1178d169c6bc465b0d6ff9ca3928fef5b9ae4e418fc15e83ebea0f87fa9ff5eed70050ded2849f47bf959d956850ce929851f0d8115f635b105ee2e4e15d04b2454bf6f4fadf034b10403119cd8e3b92fcc5b
Done dumping.

You can see when the spectrum2_manager was starting a new backend process when the file was truncated. The first one was me manually, afterwards I registered my number with the transport in gajim. There was output immediately which ended with "Done dumping." The backend process was still running but using 100% CPU for a few seconds and then it ended (with no new output) and a new backend process started.
Is this of any help?

from tgl.

BenWiederhake avatar BenWiederhake commented on July 29, 2024

using 100% CPU for a few seconds

Sounds like the "factorization challenge", which is pretty much the only operation to take long.

Is it possible that it crashed differently this time?

then it ended

Did it abort or not? Why is there no line like /usr/bin/spectrum2_libpurple_backend: line 4: 8312 Aborted in blah_stderr.log this time?

So there's only one thing to be learned: something aborts, and the stacktrace from the first comment may be useless.

One of your outputs said "core dumped". Can you find the actual coredump file? (I'd like to avoid the coredump file from your initial comment.) Can you extract the stacktraces for all threads from it?

from tgl.

herndlm avatar herndlm commented on July 29, 2024

Is it possible that it crashed differently this time?

I don't think so, but I'm honestly not quite sure why I did not get that "Aborted" in my cat.. But I saw it with journalctl:

Oct 23 19:45:24 raspi2 systemd[1]: Started Process Core Dump (PID 651/UID 0).
-- Subject: Unit [email protected] has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit [email protected] has finished starting up.
-- 
-- The start-up result is done.
Oct 23 19:45:24 raspi2 systemd-coredump[652]: Removed old coredump core.php-fpm.33.07dddbf9f70a41e1a5aeb5caac926b37.1725.1477141407000000000000.lz4.
Oct 23 19:45:24 raspi2 systemd-coredump[652]: Removed old coredump core.php-fpm.33.07dddbf9f70a41e1a5aeb5caac926b37.1733.1477141408000000000000.lz4.
Oct 23 19:45:24 raspi2 prosody[13417]: c2s5c1568: Received[c2s]: <a xmlns='urn:xmpp:sm:2' h='415'>
Oct 23 19:45:24 raspi2 prosody[13417]: c2s5c1568: #queue = 1
/usr/bin/spectrum2_libpurple_backend: line 4:   612 Aborted                 (core dumped) /usr/bin/spectrum2_libpurple_backend.real $@ > /tmp/spectrumbackend_stdout.log 2> /tmp/spectrumbackend_stderr.log
Oct 23 19:45:29 raspi2 prosody[13417]: c2s570368: Received[c2s]: <a xmlns='urn:xmpp:sm:3' h='396'>
Oct 23 19:45:29 raspi2 prosody[13417]: c2s570368: #queue = 1
Oct 23 19:45:31 raspi2 systemd-coredump[652]: Process 612 (spectrum2_libpu) of user 0 dumped core.

                                              Stack trace of thread 612:
                                              #0  0x00000000755b58bc _gcry_mpih_submul_1 (libgcrypt.so.20)
                                              #1  0x00000000755b0430 _gcry_mpih_divrem (libgcrypt.so.20)
                                              #2  0x00000000755ad6a4 mul_mod (libgcrypt.so.20)
                                              #3  0x00000000755adec0 _gcry_mpi_powm (libgcrypt.so.20)
                                              #4  0x000000007555b034 is_prime (libgcrypt.so.20)
                                              #5  0x000000007555b7b4 check_prime (libgcrypt.so.20)
                                              #6  0x000000007555d0cc _gcry_prime_check (libgcrypt.so.20)
                                              #7  0x0000000075542048 gcry_prime_check (libgcrypt.so.20)
                                              #8  0x000000007016fe3c TGLC_bn_is_prime (telegram-purple.so)
                                              #9  0x000000007016f828 check_prime (telegram-purple.so)
                                              #10 0x00000000701449bc process_dh_answer (telegram-purple.so)
                                              #11 0x0000000070148128 rpc_execute (telegram-purple.so)
                                              #12 0x0000000070135828 try_rpc_read (telegram-purple.so)
                                              #13 0x0000000000052b14 _start (spectrum2_libpurple_backend.real)
-- Subject: Process 612 (spectrum2_libpu) dumped core
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
-- 
-- Process 612 (spectrum2_libpu) crashed and dumped core.
-- 
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.

And I made a full backtrace this time:

# coredumpctl gdb -1
           PID: 612 (spectrum2_libpu)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Sun 2016-10-23 19:45:24 CEST (53s ago)
  Command Line: /usr/bin/spectrum2_libpurple_backend.real --host 127.0.0.1 --port <PORT> --service.backend_id=1 -j telegram.<XMPP_DOMAIN> /etc/spectrum2/transports/telegram.cfg
    Executable: /usr/bin/spectrum2_libpurple_backend.real
 Control Group: /user.slice/user-1001.slice/session-c16.scope
          Unit: session-c16.scope
         Slice: user-1001.slice
       Session: c16
     Owner UID: 1001 (martin)
       Boot ID: 502c3937e5b54a71906d641ec5fd86bf
    Machine ID: 5890fdfecaaa435a8e17b789ddc2ea1a
      Hostname: raspi2
      Coredump: /var/lib/systemd/coredump/core.spectrum2_libpu.0.502c3937e5b54a71906d641ec5fd86bf.612.1477244724000000000000.lz4
       Message: Process 612 (spectrum2_libpu) of user 0 dumped core.

                Stack trace of thread 612:
                #0  0x00000000755b58bc _gcry_mpih_submul_1 (libgcrypt.so.20)
                #1  0x00000000755b0430 _gcry_mpih_divrem (libgcrypt.so.20)
                #2  0x00000000755ad6a4 mul_mod (libgcrypt.so.20)
                #3  0x00000000755adec0 _gcry_mpi_powm (libgcrypt.so.20)
                #4  0x000000007555b034 is_prime (libgcrypt.so.20)
                #5  0x000000007555b7b4 check_prime (libgcrypt.so.20)
                #6  0x000000007555d0cc _gcry_prime_check (libgcrypt.so.20)
                #7  0x0000000075542048 gcry_prime_check (libgcrypt.so.20)
                #8  0x000000007016fe3c TGLC_bn_is_prime (telegram-purple.so)
                #9  0x000000007016f828 check_prime (telegram-purple.so)
                #10 0x00000000701449bc process_dh_answer (telegram-purple.so)
                #11 0x0000000070148128 rpc_execute (telegram-purple.so)
                #12 0x0000000070135828 try_rpc_read (telegram-purple.so)
                #13 0x0000000000052b14 _start (spectrum2_libpurple_backend.real)

GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "armv7l-unknown-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/spectrum2_libpurple_backend.real...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 612]
[New LWP 615]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `/usr/bin/spectrum2_libpurple_backend.real --host 127.0.0.1 --port <PORT> --servi'.
Program terminated with signal SIGABRT, Aborted.
#0  _gcry_mpih_submul_1 (res_ptr=0xd6b78bec, res_ptr@entry=0x14c315c, s1_ptr=0xd6b78be8, s1_ptr@entry=0x14c3158, s1_size=s1_size@entry=1968887460, 
    s2_limb=s2_limb@entry=1666595665) at mpih-mul3.c:56
56        cy_limb = (prod_low < cy_limb?1:0) + prod_high;
[Current thread is 1 (Thread 0x75368190 (LWP 612))]
(gdb) bt full
#0  0x755b58bc in _gcry_mpih_submul_1 (res_ptr=0xd6b78bec, res_ptr@entry=0x14c315c, s1_ptr=0xd6b78be8, 
    s1_ptr@entry=0x14c3158, s1_size=s1_size@entry=1968887460, s2_limb=s2_limb@entry=1666595665) at mpih-mul3.c:56
        cy_limb = 662021149
        j = <optimized out>
        prod_high = 147200886
        prod_low = 1537337633
        x = 936725571
#1  0x755b0430 in _gcry_mpih_divrem (qp=qp@entry=0x14c3258, qextra_limbs=2128371752, qextra_limbs@entry=0, np=0x14c315c, 
    np@entry=0x14c3158, nsize=<optimized out>, dp=0x14c3050, dp@entry=0x14c46f0, dsize=dsize@entry=64) at mpih-div.c:366
        q = 1666595665
        n1 = <optimized out>
        n2 = 0
        cy_limb = <optimized out>
        i = 1
        dX = 3340545721
        d1 = 3333540100
        n0 = <optimized out>
        most_significant_q_limb = 774089351
#2  0x755ad6a4 in mul_mod (xp=xp@entry=0x14c3158, xsize_p=0x7edc6028, xsize_p@entry=0x14c2f48, rp=rp@entry=0x14c3360, rsize=64, sp=0x14c46f0, 
    sp@entry=0x7edc6054, ssize=ssize@entry=64, mp=0x14c3050, mp@entry=0x6, msize=64, msize@entry=2128371756, karactx_p=0x7edc6054, 
    karactx_p@entry=0x30) at mpi-pow.c:378
#3  0x755adec0 in _gcry_mpi_powm (res=0x14c3360, res@entry=0x14c1830, base=base@entry=0x14c2240, expo=expo@entry=0xf, mod=mod@entry=0x14c2198)
    at mpi-pow.c:683
        c0 = 402653184
        e0 = <optimized out>
        j = <optimized out>
        xp = 0x5
        k = 16
        xsize = 64
        e = 2128371888
        carry_limb = <optimized out>
        karactx = {next = 0x0, tspace = 0x14c3570, tspace_nlimbs = 128, tspace_size = 64, tp = 0x0, tp_nlimbs = 0, tp_size = 0}
        i = 1
        c = 64
        tp = <optimized out>
        rp = 0x14c3158
        ep = 0x7edc6054
        mp = <optimized out>
        bp = <optimized out>
---Type <return> to continue, or q <return> to quit---
        esize = <optimized out>
        msize = 2128371756
        bsize = <optimized out>
        rsize = 64
        msign = <optimized out>
        bsign = <optimized out>
        rsign = 0
        esec = <optimized out>
        msec = <optimized out>
        bsec = <optimized out>
        size = -38223784
        mod_shift_cnt = 0
        negative_result = 64
        mp_marker = <optimized out>
        bp_marker = 0x14c1830
        ep_marker = 0x0
        xp_marker = 0x7edc6028
        mp_nlimbs = <optimized out>
        bp_nlimbs = 0
        ep_nlimbs = 1
        xp_nlimbs = <optimized out>
        precomp = 
          {0x14c2c30, 0x14c3778, 0x14c3880, 0x14c3988, 0x14c3a90, 0x14c3b98, 0x14c3ca0, 0x14c3da8, 0x14c3eb0, 0x14c3fb8, 0x14c40c0, 0x14c41c8, 0x14c42d0, 0x14c43d8, 0x14c44e0, 0x14c45e8}
        precomp_size = {64 <repeats 16 times>}
        W = 21770320
        base_u = 0x7edc6054
        base_u_size = 64
        max_u_size = 0
        __FUNCTION__ = "_gcry_mpi_powm"
#4  0x7555b034 in is_prime (n=n@entry=0x14c2198, steps=steps@entry=64, count=0x7edc6174, count@entry=0x7edc616c) at primegen.c:953
        x = 0x14c2240
        y = 0x14c1830
        z = 0x14c2218
        nminus1 = 0x14bf200
        a2 = 0x755b8c7c
        q = <optimized out>
        i = <optimized out>
        j = <optimized out>
---Type <return> to continue, or q <return> to quit---
        k = 2
        rc = 0
        nbits = 64
        __FUNCTION__ = "is_prime"
#5  0x7555b7b4 in check_prime (prime=prime@entry=0x14c2198, val_2=0x14be008, rm_rounds=rm_rounds@entry=64, cb_func=cb_func@entry=0x0, cb_arg=cb_arg@entry=0x0) at primegen.c:891
        i = <optimized out>
        x = <optimized out>
        count = 33
#6  0x7555d0cc in _gcry_prime_check (x=x@entry=0x14c2198, flags=flags@entry=0) at primegen.c:1179
#7  0x75542048 in gcry_prime_check (x=x@entry=0x14c2198, flags=flags@entry=0) at visibility.c:1393
#8  0x7016fe3c in TGLC_bn_is_prime (a=a@entry=0x14c2198, checks=checks@entry=0, callback=callback@entry=
    0x0, ctx=<optimized out>, cb_arg=cb_arg@entry=0x0) at crypto/bn_altern.c:80
        __PRETTY_FUNCTION__ = "TGLC_bn_is_prime"
        err = <optimized out>
#9  0x7016f828 in check_prime (TLS=0x14bd898, p=0x14c2198) at mtproto-utils.c:13
        r = <optimized out>
        t = 0x14c2198
        dh_g = 0x14c1440
        x = <optimized out>
        __PRETTY_FUNCTION__ = "tglmp_check_DH_params"
        res = 0
        b = 0x14c1440
#10 0x7016f828 in tglmp_check_DH_params (TLS=TLS@entry=0x14bd898, p=p@entry=0x14c1a80, g=g@entry=3) at mtproto-utils.c:67
        t = 0x14c2198
        dh_g = 0x14c1440
        x = <optimized out>
        __PRETTY_FUNCTION__ = "tglmp_check_DH_params"
        res = 0
        b = 0x14c1440
#11 0x701449bc in process_dh_answer (TLS=TLS@entry=0x14bd898, c=c@entry=0x14c1998, len=len@entry=652, temp_key=temp_key@entry=0, packet=0x70621060 <Response.14104> "") at mtproto-client.c:496
        in_save = <optimized out>
        DC = 0x14c02d0
        op = <optimized out>
        tmp = {-383131149, 1077234515, -437752979, 937072745}
        l = <optimized out>
        dh_prime = 0x14c1a80
        g_a = 0x14c1f90
---Type <return> to continue, or q <return> to quit---
        sha1_buffer = '\000' <repeats 19 times>
#12 0x70148128 in rpc_execute (TLS=TLS@entry=0x14bd898, c=c@entry=0x14c1998, op=0, len=652) at mtproto-client.c:1171
        DC = 0x14c02d0
        Response_len = 652
Python Exception <class 'gdb.error'> value requires 16777216 bytes, which is more than max-value-size: 
        Response = #13 0x70135828 in try_rpc_read (c=0x14c1998) at tgp-net.c:431
        len = 652
        t = 0
        op = 0
        TLS = 0x14bd898
        x = <optimized out>
        c = 0x14c1998
#14 0x70135828 in try_read (c=0x14c1998) at tgp-net.c:476
        x = <optimized out>
        c = 0x14c1998
#15 0x70135828 in conn_try_read (arg=0x14c1998, source=<optimized out>, cond=<optimized out>) at tgp-net.c:227
        c = 0x14c1998
#16 0x00052b14 in _start ()
(gdb) 

But this is just one thread, I didn't see any other crashing notices and this is all coredumpctl is giving me I think.

from tgl.

BenWiederhake avatar BenWiederhake commented on July 29, 2024
[New LWP 612]
[New LWP 615]

Ah, so there's indeed two threads going about their business.

#0  _gcry_mpih_submul_1 (res_ptr=0xd6b78bec, res_ptr@entry=0x14c315c, s1_ptr=0xd6b78be8, s1_ptr@entry=0x14c3158, s1_size=s1_size@entry=1968887460, 
    s2_limb=s2_limb@entry=1666595665) at mpih-mul3.c:56
56        cy_limb = (prod_low < cy_limb?1:0) + prod_high;
[Current thread is 1 (Thread 0x75368190 (LWP 612))]
(gdb) bt full

This shows pretty explicitly that this is definitely not the cause of the abort, as the first few stack frames would usually include something like a call to _abort.

Also, this is the stacktrace of one of the two threads. I'm interested in the other one. I think something like thread 619 should switch to the correct LWP, and enable you to print a stacktrace of that thread. (Technically, LWPs and threads are different things, but this far away from the kernel.)

So, please post another stacktrace of the right thread :)

from tgl.

herndlm avatar herndlm commented on July 29, 2024

Oh, I'm embarrassed that I missed that..
Tried it again and got the backtrace of the other one also this time, but I guess it's (once again) not really enlightening (for me at least):

[New LWP 29397]
[New LWP 29400]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `/usr/bin/spectrum2_libpurple_backend --host 127.0.0.1 --port <PORT> --service.ba'.
Program terminated with signal SIGABRT, Aborted.
#0  _gcry_mpih_submul_1 (res_ptr=0xd775dadc, res_ptr@entry=0x1f2404c, s1_ptr=0xd775dac0, s1_ptr@entry=0x1f24030, s1_size=s1_size@entry=1969284772, s2_limb=s2_limb@entry=3469356797)
    at mpih-mul3.c:56
56        cy_limb = (prod_low < cy_limb?1:0) + prod_high;
[Current thread is 1 (Thread 0x753c9190 (LWP 29397))]
(gdb) bt full
#0  0x756168bc in _gcry_mpih_submul_1 (res_ptr=0xd775dadc, res_ptr@entry=0x1f2404c, s1_ptr=0xd775dac0, 
    s1_ptr@entry=0x1f24030, s1_size=s1_size@entry=1969284772, s2_limb=s2_limb@entry=3469356797) at mpih-mul3.c:56
        cy_limb = 468639368
        j = <optimized out>
        prod_high = 2273443598
        prod_low = 887833984
        x = 1347279419
#1  0x75611430 in _gcry_mpih_divrem (qp=qp@entry=0x1f24130, qextra_limbs=2124185588, qextra_limbs@entry=0, np=0x1f2404c, np@entry=0x1f24030, nsize=<optimized out>, dp=0x1f23f28, 
    dp@entry=0x1f252b0, dsize=dsize@entry=64) at mpih-div.c:366
        q = 3469356797
        n1 = <optimized out>
        n2 = 0
        cy_limb = <optimized out>
        i = 7
        dX = 3340545721
        d1 = 3333540100
        n0 = <optimized out>
        most_significant_q_limb = 613216123
#2  0x7560e6a4 in mul_mod (xp=xp@entry=0x1f24030, xsize_p=xsize_p@entry=0x7e9c7ff4, rp=rp@entry=0x0, rsize=64, sp=0x1f252b0, ssize=64, mp=0x1f23f28, mp@entry=0x0, msize=64, 
    msize@entry=32651776, karactx_p=0x7e9c8024, karactx_p@entry=0x0) at mpi-pow.c:378
#3  0x7560eb88 in _gcry_mpi_powm (res=0x0, res@entry=0x1f23180, base=base@entry=0x1f22638, expo=expo@entry=0xf, mod=mod@entry=0x1f22620) at mpi-pow.c:564
        j = <optimized out>
        xp = 0x0
        k = <optimized out>
        xsize = 64
        e = <optimized out>
        carry_limb = <optimized out>
        karactx = {next = 0x0, tspace = 0x1f24448, tspace_nlimbs = 128, tspace_size = 64, tp = 0x0, tp_nlimbs = 0, tp_size = 0}
        i = 14
        c = <optimized out>
        tp = <optimized out>
        rp = 0x1f24030
        ep = 0x7e9c8024
        mp = <optimized out>
        bp = 0x40
        esize = 2124185636
        msize = 32651776
        bsize = 0
        rsize = 64
        msign = <optimized out>
        bsign = <optimized out>
        rsign = <optimized out>
        esec = <optimized out>
        msec = <optimized out>
        bsec = <optimized out>
        size = 65303552
        mod_shift_cnt = 0
        negative_result = 64
        mp_marker = <optimized out>
        bp_marker = 0x1f23180
        ep_marker = 0x0
        xp_marker = 0x0
        mp_nlimbs = <optimized out>
        bp_nlimbs = 0
---Type <return> to continue, or q <return> to quit---
        ep_nlimbs = 1
        xp_nlimbs = <optimized out>
        precomp = 
          {0x1f23b08, 0x1f24650, 0x1f24758, 0x1f24860, 0x1f24968, 0x1f24a70, 0x1f24b78, 0x1f24c80, 0x1f24d88, 0x1f24e90, 0x1f24f98, 0x1f250a0, 0x1f251a8, 0x1f252b0, 0x38, 0x40}
        precomp_size = {64 <repeats 14 times>, 5, 32646712}
        W = 32653096
        base_u = <optimized out>
        base_u_size = <optimized out>
        max_u_size = 64
        __FUNCTION__ = "_gcry_mpi_powm"
#4  0x755bc034 in is_prime (n=n@entry=0x1f22620, steps=steps@entry=64, count=0x7e9c8144, count@entry=0x7e9c813c) at primegen.c:953
        x = 0x1f22638
        y = 0x1f23180
        z = 0x1f06528
        nminus1 = 0x1f22390
        a2 = 0x75619c7c
        q = <optimized out>
        i = <optimized out>
        j = <optimized out>
        k = 2
        rc = 0
        nbits = 64
        __FUNCTION__ = "is_prime"
#5  0x755bc7b4 in check_prime (prime=prime@entry=0x1f22620, val_2=0x1f1f030, rm_rounds=rm_rounds@entry=64, cb_func=cb_func@entry=0x0, cb_arg=cb_arg@entry=0x0) at primegen.c:891
        i = <optimized out>
        x = <optimized out>
        count = 49
#6  0x755be0cc in _gcry_prime_check (x=x@entry=0x1f22620, flags=flags@entry=0) at primegen.c:1179
#7  0x755a3048 in gcry_prime_check (x=x@entry=0x1f22620, flags=flags@entry=0) at visibility.c:1393
#8  0x701d0e3c in TGLC_bn_is_prime (a=a@entry=0x1f22620, checks=checks@entry=0, callback=callback@entry=0x0, ctx=<optimized out>, cb_arg=cb_arg@entry=0x0) at crypto/bn_altern.c:80
        __PRETTY_FUNCTION__ = "TGLC_bn_is_prime"
        err = <optimized out>
#9  0x701d0828 in check_prime (TLS=0x1f1e8c0, p=0x1f22620) at mtproto-utils.c:13
        r = <optimized out>
        t = 0x1f22620
        dh_g = 0x1f22638
        x = <optimized out>
        __PRETTY_FUNCTION__ = "tglmp_check_DH_params"
        res = 0
        b = 0x1f23198
#10 0x701d0828 in tglmp_check_DH_params (TLS=TLS@entry=0x1f1e8c0, p=p@entry=0x1f1e120, g=g@entry=3) at mtproto-utils.c:67
        t = 0x1f22620
        dh_g = 0x1f22638
        x = <optimized out>
        __PRETTY_FUNCTION__ = "tglmp_check_DH_params"
        res = 0
        b = 0x1f23198
#11 0x701a59bc in process_dh_answer (TLS=TLS@entry=0x1f1e8c0, c=c@entry=0x1f22538, len=len@entry=652, temp_key=temp_key@entry=0, packet=0x70682060 <Response.14104> "")
    at mtproto-client.c:496
        in_save = <optimized out>
        DC = 0x1f207b0
        op = <optimized out>
        tmp = {1715300161, 851252908, 969377225, 1203365329}
        l = <optimized out>
---Type <return> to continue, or q <return> to quit---
        dh_prime = 0x1f1e120
        g_a = 0x1f224c0
        sha1_buffer = '\000' <repeats 19 times>
#12 0x701a9128 in rpc_execute (TLS=TLS@entry=0x1f1e8c0, c=c@entry=0x1f22538, op=0, len=652) at mtproto-client.c:1171
        DC = 0x1f207b0
        Response_len = 652
Python Exception <class 'gdb.error'> value requires 16777216 bytes, which is more than max-value-size: 
        Response = #13 0x70196828 in try_rpc_read (c=0x1f22538) at tgp-net.c:431
        len = 652
        t = 0
        op = 0
        TLS = 0x1f1e8c0
        x = <optimized out>
        c = 0x1f22538
#14 0x70196828 in try_read (c=0x1f22538) at tgp-net.c:476
        x = <optimized out>
        c = 0x1f22538
#15 0x70196828 in conn_try_read (arg=0x1f22538, source=<optimized out>, cond=<optimized out>) at tgp-net.c:227
        c = 0x1f22538
#16 0x00052b14 in _start ()
(gdb) thread 2
[Switching to thread 2 (Thread 0x6fde93b0 (LWP 29400))]
#0  0x7685f458 in select () from /usr/lib/libc.so.6
(gdb) bt full
#0  0x7685f458 in select () at /usr/lib/libc.so.6
#1  0x751ebed4 in NotifierThreadProc () at /usr/lib/libtcl8.6.so
#2  0x76c44fbc in start_thread () at /usr/lib/libpthread.so.0
#3  0x76865bc8 in  () at /usr/lib/libc.so.6

Looks like the selection of the first "thread" was not working or something like that?
edit: added also the other backtrace for the sake of completeness

from tgl.

BenWiederhake avatar BenWiederhake commented on July 29, 2024

2 new threads means there's a total of 3 threads; unless I misunderstood how this is counted, so … maybe the main prize is behind "door", err, thread 3? ;)

from tgl.

herndlm avatar herndlm commented on July 29, 2024

tried that later on :)
unfortunately there seems to be no third thread

(gdb) thread 3
Unknown thread 3.

I'm currently thinking about putting debug code into libgcrypt. I've learned a few things already, maybe it's time to learn even more..
But before I do that, maybe I should focus on spectrum2 - could it be their fault? The backend binary for libpurple is from spectrum2, so maybe it's somehow still swallowing the response/error/signal or smth like that

from tgl.

BenWiederhake avatar BenWiederhake commented on July 29, 2024

I have no idea. As far as I know there's only three ways to get a SIGABRT:

  • kill -s SIGABRT ${PID}, or issued by any other external source. Come to think of it, could you ask the spectrum guys whether they kill the backend if it doesn't respond for a while? That would explain why it gets killed to consistently during the most computationally expensive part.
  • Call to abort(), direct or indirect (e.g., failing assert()). I see nothing like that anywhere.
  • Explicitly and directly issuing the signal. Not the case.

So, I'm out of ideas.

from tgl.

herndlm avatar herndlm commented on July 29, 2024

I think I got it by just looking at the code of spectrum2. I was already mentioning their PING technique. And we have long and high CPU usage when doing the "factorization challenge" on my rather slow raspberry pi with 32bit arm. EDIT: thanks for coming up with that thought :)
Look what I found just here https://github.com/hanzz/spectrum2/blob/6d2f8c1927613d87b5872b75601d211fe68dadb3/libtransport/NetworkPluginServer.cpp#L1372
EDIT2: Looks like they are setting a 20s second timer further above. Will try it with a greater timeout and see if it works. I guess the best fix would be to make that timeout configureable in spectrum2?
EDIT3: I had to change another ping timeout in https://github.com/hanzz/spectrum2/blob/6d2f8c1927613d87b5872b75601d211fe68dadb3/backends/libpurple/main.cpp#L2284 because I was getting "PING request not received - exiting..." errors

And now I'm back to libpurple with those:

4889: 2016-10-24 16:36:55,083 INFO  libpurple: prpl-telegram:ping alarm
4889: 2016-10-24 16:36:55,083 INFO  libpurple: prpl-telegram:fail connection: reason: ping timeout
4889: 2016-10-24 16:36:55,084 INFO  libpurple: prpl-telegram:Lost connection to server ... 149.154.167.51:80
4889: 2016-10-24 16:36:55,084 INFO  libpurple: connection:Connection error on 0x1a5c860 (reason: 0 description: Lost connection to the server...)
4889: 2016-10-24 16:36:55,085 INFO  libpurple: prpl-telegram:ping alarm
4889: 2016-10-24 16:36:55,087 INFO  libpurple: account:Disconnecting account <PHONE_NUMBER> (0x1a5bb40)
4889: 2016-10-24 16:36:55,088 INFO  libpurple: connection:Disconnecting connection 0x1a5c860
4889: 2016-10-24 16:36:55,088 INFO  libpurple: prpl-telegram:tgprpl_close()
4889: 2016-10-24 16:36:55,088 ERROR backend: GLib Source ID 54 was not found when attempting to remove it
4889: 2016-10-24 16:36:55,089 INFO  libpurple: connection:Destroying connection 0x1a5c860
4889: 2016-10-24 16:36:55,090 ERROR backend: g_log purple_account_disconnect: assertion '!purple_account_is_disconnected(account)' failed
4889: 2016-10-24 16:36:55,091 INFO  libpurple: buddyicon:Deleted cache file: /var/lib/spectrum2/telegram.<XMPP_DOMAIN>/icons/3d3c354bde751af97c465fa0151a22dd316bb25f.icon
4889: 2016-10-24 16:36:55,091 INFO  libpurple: account:Destroying account 0x1a5bb40
4889: 2016-10-24 16:36:59,854 INFO  libpurple: util:Writing file accounts.xml to directory /var/lib/spectrum2/telegram.<XMPP_DOMAIN>
4889: 2016-10-24 16:36:59,854 INFO  libpurple: util:Writing file /var/lib/spectrum2/telegram.<XMPP_DOMAIN>/accounts.xml

I'll try to adapt the corresponding part of libpurple and see what happens, but it's strange that it's taking that long to respond and the process is using 100% CPU for the whole time

from tgl.

herndlm avatar herndlm commented on July 29, 2024

It finally worked!

But I also had to adapt the timeout in telegram-purple at https://github.com/majn/telegram-purple/blob/150f0c48cc535af318366deedb553bf5d8def283/tgp-net.c#L50
and it took about 6 minutes (with the same backend process that was using my CPU all the time) until I received the message about my Telegram code. From there on now everything seems to be working fast

from tgl.

BenWiederhake avatar BenWiederhake commented on July 29, 2024

6 minutes? Sigh. That's Telegram's fault for inserting a dumb and pointless "proof of work" which is implemented only slowly in tgl, and runs (for whatever reason) excessively slowly on 32 bit.

Regarding the changes: eh. On one hand, it should work out of the box, so I'm tempted to raise the timeouts (in tgp) to whatever value you observe. On the other hand, 6 minutes is abnormally long and should not occur during any other step, so … not sure what to do.

Anyway, this means there is no bug in tgl involved. This issue can be closed.

from tgl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.