Coder Social home page Coder Social logo

msort's People

Contributors

charles-randall avatar mayank-02 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

charles-randall

msort's Issues

Segfault with latest changes on large files

Input file attached...

$ /tmp/msort.new uuid.10k.txt > /dev/null
Segmentation fault (core dumped)

Backtrace,

$ gdb /tmp/msort.new core
...
Reading symbols from /tmp/msort.new...
[New LWP 2773816]
Core was generated by `/tmp/msort.new uuid.10k.txt'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000056143a4bedf6 in merge (f1=<error reading variable: Cannot access memory at address 0x7ffe1f355de8>, 
    f2=<error reading variable: Cannot access memory at address 0x7ffe1f355de0>, 
    f3=<error reading variable: Cannot access memory at address 0x7ffe1f355dd8>) at msort.c:751
751	void merge(FILE *f1, FILE *f2, FILE *f3) {
...
(gdb) bt
#0  0x000056143a4bedf6 in merge (f1=<error reading variable: Cannot access memory at address 0x7ffe1f355de8>, 
    f2=<error reading variable: Cannot access memory at address 0x7ffe1f355de0>, 
    f3=<error reading variable: Cannot access memory at address 0x7ffe1f355dd8>) at msort.c:751
#1  0x000056143a4bf789 in handleMerges (numFiles=3, x=114 'r') at msort.c:957
#2  0x000056143a4bffef in main (argc=2, argv=0x7ffe20356568) at msort.c:1182

uuid.10k.txt

Valgrind memory access errors on trivial files

Sorting the attached 3-line file under valgrind gives the attached errors on my system. I don't think that there's anything special about the input file as it's trivial.

$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

$ valgrind --version
valgrind-3.15.0

Msort compiled with "gcc -ggdb" or "gcc -g" as /tmp/msort-g,

$ valgrind --log-file=abc.valgrind.txt /tmp/msort-g abc.txt
a
b
c

This produces the correct output but please see the valgrind log.

Can you reproduce this? Is this real?

The first error is,

==3408920== Invalid read of size 4
==3408920==    at 0x4A337D7: fgets (iofgets.c:47)
==3408920==    by 0x10AB49: fillBuffer (msort.c:666)
==3408920==    by 0x10B32D: handleMerges (msort.c:846)
==3408920==    by 0x10BE19: main (msort.c:1131)
==3408920==  Address 0x4ba5b70 is 0 bytes inside a block of size 472 free'd
==3408920==    at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3408920==    by 0x4A33042: _IO_deallocate_file (libioP.h:863)
==3408920==    by 0x4A33042: fclose@@GLIBC_2.2.5 (iofclose.c:74)
==3408920==    by 0x10B391: handleMerges (msort.c:856)
==3408920==    by 0x10BE19: main (msort.c:1131)
==3408920==  Block was alloc'd at
==3408920==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3408920==    by 0x4A33AAD: __fopen_internal (iofopen.c:65)
==3408920==    by 0x4A33AAD: fopen@@GLIBC_2.2.5 (iofopen.c:86)
==3408920==    by 0x10B28E: handleMerges (msort.c:827)
==3408920==    by 0x10BE19: main (msort.c:1131)

abc.txt
abc.valgrind.txt

scan-build of msort.c reports "garbage value" in conditional

Using scan-build,

https://clang-analyzer.llvm.org/scan-build.html

Didn't investigate, but look at the third error,

$ scan-build clang msort.c -lm
scan-build: Using '/usr/lib/llvm-10/bin/clang' for static analysis
msort.c:38:5: warning: Value stored to 'endpos1' is never read
    endpos1 = length1;
    ^         ~~~~~~~
msort.c:39:5: warning: Value stored to 'endpos2' is never read
    endpos2 = length2;
    ^         ~~~~~~~
msort.c:189:15: warning: The left operand of '==' is a garbage value
    } while(x == y && i < endpos1 && j < endpos2);
            ~ ^
msort.c:791:9: warning: Value stored to 'k' is never read
        k = 0;
        ^   ~
4 warnings generated.
scan-build: 4 bugs found.

If you run this yourself, you can use "scan-view" to examine the report in detail.

Numeric sort order wrong with well-formatted numbers

Might be a duplicate bug but the other one was for decimals without a leading zero. These have leading zeros.

Please confirm that you can reproduce these test results.

$ cat test.txt
0.6095308
6.754819
0.1447246

GNU sort,
$ LC_ALL=C sort -n test.txt
0.1447246
0.6095308
6.754819

#Msort,
$ LC_ALL=C /tmp/msort -n test.txt
0.6095308
0.1447246
6.754819

And this example is even more disturbing,

$ /tmp/msort -n test.txt
1.738268
1.148224

$ cat test.txt
1.738268
1.148224

GNU sort,
$ LC_ALL=C sort -n test.txt
1.148224
1.738268

Msort,
$ LC_ALL=C /tmp/msort -n test.txt
1.738268
1.148224

Magic 1024 arg to qsort()?

Should this be STRSIZE or something else? A constant doesn't make sense.

$ grep qsort msort.c

  • Store sorted runs of the file made using in memory buffer and inbuilt qsort
    qsort(buffer, lines, 1024, numcompare);
    qsort(buffer, lines, 1024, strcompare);

Bug with sort on multiple keys?

Using the attached 383 line file containing UUIDs, the output of msort doesn't match that of GNU sort or Busybox sort (which return the same result).

$ md5sum uuid.1234
27e48a1f7971314287a3d70c3a1f9923 uuid.1234

GNU sort,
$ LC_ALL=C sort -t- -k3,3 -k4,4 uuid.1234 | md5sum
3882c4e3465944a98eb35f7d0e9cea73 -

Busybox sort,
$ LC_ALL=C /tmp/sort -t- -k3,3 -k4,4 uuid.1234 | md5sum
3882c4e3465944a98eb35f7d0e9cea73 -

Msort,
$ LC_ALL=C /tmp/msort -t- -k3,3 -k4,4 uuid.1234 | md5sum
473e9fe8912e001ab79a2b41d3efca4f -

uuid.1234.txt

Segfault sorting multiple files

Also probably the same as issue #7

Sorting multiple files also appears broken,


$ cat a.txt
a
$ cat b.txt
b
$ /tmp/msort a.txt
a
$ /tmp/msort b.txt
b
$ /tmp/msort a.txt b.txt 
Segmentation fault (core dumped)

Can you reproduce this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.