Coder Social home page Coder Social logo

timolassmann / samstat Goto Github PK

View Code? Open in Web Editor NEW
23.0 3.0 6.0 6.83 MB

SAMStat displays various properties of next-generation sequencing reads stored in SAM/BAM format.

License: Other

Shell 0.25% C 97.26% CMake 1.69% HTML 0.81%
next-generation-sequencing bioinformatics quality-control

samstat's Introduction

https://github.com/TimoLassmann/samstat/actions/workflows/cmake.yml/badge.svg

SAMStat

SAMStat is an efficient C program to quickly display statistics of large sequence files from next generation sequencing projects. When applied to SAM/BAM files all statistics are reported for unmapped, poorly and accurately mapped reads separately. This allows for identification of a variety of problems, such as remaining linker and adaptor sequences, causing poor mapping. Apart from this SAMStat can be used to verify individual processing steps in large analysis pipelines.

SAMStat reports length distribution, base quality distribution, mapping statistics, mismatch, insertion and deletion error profiles. The output is a single html page:

https://user-images.githubusercontent.com/8110320/175869206-6edcb06d-1afc-42f6-bbb8-16a2a18146f0.png

How to install

SAMstat depends on the hdf5 library. To install on linux:

Ubuntu/Debian:

sudo apt-get install -y libhdf5-dev

On a mac via brew:

brew install hdf5

To build SAMstat:

git clone https://github.com/TimoLassmann/samstat.git
cd samstat
mkdir build
cd build
cmake ..
make
make install 

Usage

samstat <file.sam>  <file.bam>  <file.fa>  <file.fq> ...  <options> 

For each input file SAMStat will create a single html page named after the input file name plus a dot samstat.html suffix.

Available options:

-d/-dir            : Output directory. []
                     NOTE: by default SAMStat will place reports in the same directory as the input files. 
-p/-peek           : Report stats only on the first <n> sequences. [unlimited]
-t                 : Number of threads. [4]
                     will only be used when multiple input files are present. 
--plotend          : Add base and quality plots relative to the read ends. []
--seed             : Random number seed. [0]
--verbose          : Enables verbose output. []

-h/-help           : Prints help message. []
-v/-version        : Prints version information. []

Please cite:

Timo Lassmann (2023) “SAMStat 2: quality control for next generation sequencing data.” Bioinformatics. (2023): btad019, https://doi.org/10.1093/bioinformatics/btad019

Lassmann et al. (2010) “SAMStat: monitoring biases in next generation sequencing data.” Bioinformatics 27.1 (2011): 130-131. doi:10.1093/bioinformatics/btq614

samstat's People

Contributors

timolassmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

samstat's Issues

samstat output is different than bowtie2 stats

The output stats from bowtie2 and samstat on the same bam file are different:

bowtie2:

18657869 reads; of these:
  18657869 (100.00%) were unpaired; of these:
    1995321 (10.69%) aligned 0 times
    12773937 (68.46%) aligned exactly 1 time
    3888611 (20.84%) aligned >1 times
89.31% overall alignment rate

samstat:

samstat

any explanation?

samstat not Running on Long File Names

I tried running samstat on a bam file, and it terminated immediately without producing any results. The file had a very long name, so I re-named it to have a shorter name. After re-naming the file, samstat started running. Is samstat unable to handle very long file names?

Thanks.

samstat: error while loading shared libraries: libhts.so.3: cannot open shared object file: No such file or directory

after installing and invoking samstat I can't seem to get past this error:

samstat: error while loading shared libraries: libhts.so.3: cannot open shared object file: No such file or directory

During install I see:

-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found

But I don't think this is the issue.

I also see

[  5%] No update step for 'htslib'
[  6%] No patch step for 'htslib'
[  8%] Performing configure step for 'htslib'

and later

[ 10%] Performing build step for 'htslib'
[ 11%] No install step for 'htslib'
[ 13%] Completed 'htslib'
[ 13%] Built target htslib

I get one warning:

/home/olin/software/samstat/src/tld/tests/utests/unit_timing.c: In function ‘main’:
/home/olin/software/samstat/src/tld/tests/utests/unit_timing.c:21:1: warning: label ‘ERROR’ defined but not used [-Wunused-label]
   21 | ERROR:
      | ^~~~~

But nothing else appears to go wrong (see below for full message).

(samtools) olin@agnes:/home/olin/software/samstat$ mkdir build && cd build && cmake ..
-- The C compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Submodule update
Submodule 'src/tld' (https://github.com/timolassmann/tld.git) registered for path 'tld'
Cloning into '/home/olin/software/samstat/src/tld'...
Submodule path 'tld': checked out '6edfacaf69c31cbce89d24095335a2b4cd49dd68'
-- Found HDF5: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libsz.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libdl.so;/usr/lib/x86_64-linux-gnu/libm.so (found version "1.10.4") found components: C HL
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- pthreads is enabled.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/olin/software/samstat/build
(samtools) olin@agnes:/home/olin/software/samstat/build$ make
Scanning dependencies of target htslib
[  1%] Creating directories for 'htslib'
[  3%] Performing download step (verify and extract) for 'htslib'
-- verifying file...
     file='/home/olin/software/samstat/thirdparty/htslib-1.16.tar.bz2'
-- verifying file... done
-- extracting...
     src='/home/olin/software/samstat/thirdparty/htslib-1.16.tar.bz2'
     dst='/home/olin/software/samstat/build/htslib/src/htslib'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
[  5%] No update step for 'htslib'
[  6%] No patch step for 'htslib'
[  8%] Performing configure step for 'htslib'
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for ranlib... ranlib
checking for grep that handles long lines and -e... /bin/grep
checking for C compiler warning flags... -Wall
checking whether C compiler accepts -mssse3 -mpopcnt -msse4.1... yes
checking whether C compiler accepts -mavx2... yes
checking whether C compiler accepts -mavx512f... yes
checking whether C compiler supports ARM Neon... no
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for special C compiler options needed for large files... no
checking for _FILE_OFFSET_BITS value needed for large files... no
checking shared library type for unknown-Linux... plain .so
checking whether the compiler accepts -fvisibility=hidden... yes
checking how to run the C preprocessor... gcc -E
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for stdlib.h... (cached) yes
checking for unistd.h... (cached) yes
checking for sys/param.h... yes
checking for getpagesize... yes
checking for working mmap... yes
checking for gmtime_r... yes
checking for fsync... yes
checking for drand48... yes
checking for srand48_deterministic... no
checking whether fdatasync is declared... yes
checking for fdatasync... yes
checking for library containing log... -lm
checking for zlib.h... yes
checking for inflate in -lz... yes
checking for library containing recv... none required
checking whether htscodecs files are present... yes
checking for libdeflate.h... no
checking for libdeflate_deflate_compress in -ldeflate... no
configure: WARNING: GCS support not enabled: requires libcurl support
checking for library containing regcomp... none required
checking whether PTHREAD_MUTEX_RECURSIVE is declared... yes
configure: creating ./config.status
config.status: creating config.mk
config.status: creating htslib.pc.tmp
config.status: creating config.h
config.status: linking htscodecs_bundled.mk to htscodecs.mk
[ 10%] Performing build step for 'htslib'
[ 11%] No install step for 'htslib'
[ 13%] Completed 'htslib'
[ 13%] Built target htslib
Scanning dependencies of target boot
[ 15%] Building C object src/tld/src/CMakeFiles/boot.dir/tld.c.o
[ 16%] Linking C static library libboot.a
[ 16%] Built target boot
Scanning dependencies of target gentable
[ 18%] Building C object src/tld/src/CMakeFiles/gentable.dir/gentables.c.o
[ 20%] Linking C executable gentable
[ 20%] Built target gentable
[ 21%] Generating tld-seq-tables.h
Scanning dependencies of target tld-dev
[ 23%] Building C object src/tld/src/CMakeFiles/tld-dev.dir/tld.c.o
[ 25%] Linking C static library libtld-dev.a
[ 25%] Built target tld-dev
Scanning dependencies of target converttablegen
[ 26%] Building C object src/CMakeFiles/converttablegen.dir/convert_tables.c.o
[ 28%] Linking C executable converttablegen
[ 28%] Built target converttablegen
Scanning dependencies of target module_plot
[ 30%] Building C object src/plot/CMakeFiles/module_plot.dir/plot.c.o
[ 31%] Building C object src/plot/CMakeFiles/module_plot.dir/plot_interval.c.o
[ 33%] Building C object src/plot/CMakeFiles/module_plot.dir/plot_group.c.o
[ 35%] Linking C static library libmodule_plot.a
[ 35%] Built target module_plot
[ 36%] Generating convert_tables.h
Scanning dependencies of target samstat
[ 38%] Building C object src/CMakeFiles/samstat.dir/samstat.c.o
[ 40%] Building C object src/CMakeFiles/samstat.dir/htsinterface/htsglue.c.o
[ 41%] Building C object src/CMakeFiles/samstat.dir/sambamparse/sam_bam_parse.c.o
[ 43%] Building C object src/CMakeFiles/samstat.dir/param/param.c.o
[ 45%] Building C object src/CMakeFiles/samstat.dir/collect/collect.c.o
[ 46%] Building C object src/CMakeFiles/samstat.dir/report/stat_report.c.o
[ 48%] Building C object src/CMakeFiles/samstat.dir/report/lst.c.o
[ 50%] Building C object src/CMakeFiles/samstat.dir/tools/tools.c.o
[ 51%] Building C object src/CMakeFiles/samstat.dir/thread/thread_data.c.o
[ 53%] Linking C executable samstat
[ 53%] Built target samstat
Scanning dependencies of target unit-seq
[ 55%] Building C object src/tld/tests/CMakeFiles/unit-seq.dir/utests/unit_seq.c.o
[ 56%] Linking C executable unit-seq
[ 56%] Built target unit-seq
Scanning dependencies of target unit-logsum
[ 58%] Building C object src/tld/tests/CMakeFiles/unit-logsum.dir/utests/unit_logsum.c.o
[ 60%] Linking C executable unit-logsum
[ 60%] Built target unit-logsum
Scanning dependencies of target unit-timing
[ 61%] Building C object src/tld/tests/CMakeFiles/unit-timing.dir/utests/unit_timing.c.o
/home/olin/software/samstat/src/tld/tests/utests/unit_timing.c: In function ‘main’:
/home/olin/software/samstat/src/tld/tests/utests/unit_timing.c:21:1: warning: label ‘ERROR’ defined but not used [-Wunused-label]
   21 | ERROR:
      | ^~~~~
[ 63%] Linking C executable unit-timing
[ 63%] Built target unit-timing
Scanning dependencies of target unit-seq-rev
[ 65%] Building C object src/tld/tests/CMakeFiles/unit-seq-rev.dir/utests/unit_seq_rev.c.o
[ 66%] Linking C executable unit-seq-rev
[ 66%] Built target unit-seq-rev
Scanning dependencies of target int-noise
[ 68%] Building C object src/tld/tests/CMakeFiles/int-noise.dir/itests/noise.c.o
[ 70%] Linking C executable int-noise
[ 70%] Built target int-noise
Scanning dependencies of target unit-rng
[ 71%] Building C object src/tld/tests/CMakeFiles/unit-rng.dir/utests/unit_rng.c.o
[ 73%] Linking C executable unit-rng
[ 73%] Built target unit-rng
Scanning dependencies of target kde_itest
[ 75%] Building C object src/tld/tests/CMakeFiles/kde_itest.dir/itests/kde_itest.c.o
[ 76%] Linking C executable kde_itest
[ 76%] Built target kde_itest
Scanning dependencies of target seq_shannon
[ 78%] Building C object src/tld/tests/CMakeFiles/seq_shannon.dir/itests/seq_shannon.c.o
[ 80%] Linking C executable seq_shannon
[ 80%] Built target seq_shannon
Scanning dependencies of target unit-hdf5
[ 81%] Building C object src/tld/tests/CMakeFiles/unit-hdf5.dir/utests/unit_hdf5.c.o
[ 83%] Linking C executable unit-hdf5
[ 83%] Built target unit-hdf5
Scanning dependencies of target unit-str
[ 85%] Building C object src/tld/tests/CMakeFiles/unit-str.dir/utests/unit_str.c.o
[ 86%] Linking C executable unit-str
[ 86%] Built target unit-str
Scanning dependencies of target unit-stats
[ 88%] Building C object src/tld/tests/CMakeFiles/unit-stats.dir/utests/unit_stats.c.o
[ 90%] Linking C executable unit-stats
[ 90%] Built target unit-stats
Scanning dependencies of target unit-alpha
[ 91%] Building C object src/tld/tests/CMakeFiles/unit-alpha.dir/utests/unit_alphabet.c.o
[ 93%] Linking C executable unit-alpha
[ 93%] Built target unit-alpha
Scanning dependencies of target unit-test-suite
[ 95%] Building C object src/tld/tests/CMakeFiles/unit-test-suite.dir/utests/unit_tests.c.o
[ 96%] Linking C executable unit-test-suite
[ 96%] Built target unit-test-suite
Scanning dependencies of target unit-plot
[ 98%] Building C object src/plot/CMakeFiles/unit-plot.dir/plot_test.c.o
[100%] Linking C executable unit-plot
[100%] Built target unit-plot
(samtools) olin@agnes:/home/olin/software/samstat/build$ sudo make install
[sudo] password for olin:
[ 13%] Built target htslib
[ 16%] Built target boot
[ 20%] Built target gentable
[ 25%] Built target tld-dev
[ 28%] Built target converttablegen
[ 35%] Built target module_plot
[ 53%] Built target samstat
[ 56%] Built target unit-seq
[ 60%] Built target unit-logsum
[ 63%] Built target unit-timing
[ 66%] Built target unit-seq-rev
[ 70%] Built target int-noise
[ 73%] Built target unit-rng
[ 76%] Built target kde_itest
[ 80%] Built target seq_shannon
[ 83%] Built target unit-hdf5
[ 86%] Built target unit-str
[ 90%] Built target unit-stats
[ 93%] Built target unit-alpha
[ 96%] Built target unit-test-suite
[100%] Built target unit-plot
Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/bin/samstat
-- Set runtime path of "/usr/local/bin/samstat" to ""

Samstat segfaults on some inputs

Hi,
I am a developer with https://www.bv-brc.org/, and we use samstat. We had some BAM files that produce segfaults with samstat version 1.5.1. I wonder if you could look into it and fix the problem. The reads come from RNA-Seq jobs with 454 sequences aligned against genome 234826.6 (https://www.bv-brc.org/view/Genome/234826.6) with bowtie2. We used SRA numbers SRR528324 and SRR528310.

Two BAM file test cases can be found here: https://zenodo.org/record/6609171

Thanks,
Jacob S. Porter, PhD

CMake

hello, when I try to build samstat the cmake does not work. what can I do ?

2.2.2 build issues

Trying to build this on the HPC running CentOS 7 that i manage, and the instructions arent working:

[root@cc-dclrilog61 samstat]# module list
Currently Loaded Modulefiles:
7) hdf5_18/1.8.20 8) gcc/8.2.0 9) cmake/3.9.1

[root@cc-dclrilog61 ~]# git clone https://github.com/TimoLassmann/samstat.git
[root@cc-dclrilog61 ~]# cd samstat
[root@cc-dclrilog61 samstat]# mkdir build
[root@cc-dclrilog61 build]# cd build
[root@cc-dclrilog61 build]# cmake -DCMAKE_INSTALL_PREFIX=/cm/shared/apps/samstats/2.2.2 ..
-- The C compiler identification is GNU 8.2.0
-- Check for working C compiler: /cm/local/apps/gcc/8.2.0/bin/gcc
-- Check for working C compiler: /cm/local/apps/gcc/8.2.0/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Submodule update
You need to run this command from the toplevel of the working tree.
CMake Error at src/CMakeLists.txt:12 (message):
git submodule update --init --recursive failed with 1, please checkout
submodules
-- Configuring incomplete, errors occurred!
See also "/root/temp/samstat/build/CMakeFiles/CMakeOutput.log".

And no matter what i do, it fails with the same submodule error, even if i cd .. and run that command, still fails to build

Any thoughts?

./autogen.sh not found

Hi!
I have a problem with Samstat's installation, when I put :
$./autogen.sh
terminal said that:
joel@joel-desktop:~/samstat$ ./autogen.sh
Submodule 'tldevel' (https://github.com/TimoLassmann/tldevel.git) registered for path 'tldevel'
Cloning into '/home/joel/samstat/tldevel'...
Submodule path 'tldevel': checked out '7541a28641752c9eb89d7926928af22471595c45'
./autogen.sh: 12: autoreconf: not found
and then I try to run make,
make: *** No targets specified and no makefile found. Stop.

I want to install this software is very necessary for my research.

Thanks.
Joel

'make' fails when I run it on my fedora 32 machine

Hi, thank you for developing and maintaining a great package.

I previously had no problem installing samstat, but somehow make gives me error when I run it for samstat-1.5.1 on my new machine.
Do you have any ideas why this could be?

Here's the output:

Making all in lib
make[1]: Entering directory ‘/mnt/data/src/samstat-1.5.1/lib’
make[1]: Nothing to be done for ‘all’.
make[1]: Leaving directory ‘/mnt/data/src/samstat-1.5.1/lib’
Making all in src
make[1]: Entering directory ‘/mnt/data/src/samstat-1.5.1/src’
make all-am
make[2]: Entering directory ‘/mnt/data/src/samstat-1.5.1/src’
gcc -O2 -funroll-loops  -Wall -std=gnu99  -o samstat interface.o nuc_code.o misc.o main.o io.o hmm.o viz.o -lm -lpthread /mnt/data/src/samstat-1.5.1/lib/libks.a 
/usr/bin/ld: misc.o:(.bss+0x0): multiple definition of `rev_nuc_code’; nuc_code.o:(.bss+0x0): first defined here
/usr/bin/ld: misc.o:(.bss+0x20): multiple definition of `nuc_code’; nuc_code.o:(.bss+0x20): first defined here
/usr/bin/ld: main.o:(.bss+0x20): multiple definition of `nuc_code’; nuc_code.o:(.bss+0x20): first defined here
/usr/bin/ld: main.o:(.bss+0x0): multiple definition of `rev_nuc_code’; nuc_code.o:(.bss+0x0): first defined here
/usr/bin/ld: io.o:(.bss+0x20): multiple definition of `nuc_code’; nuc_code.o:(.bss+0x20): first defined here
/usr/bin/ld: io.o:(.bss+0x0): multiple definition of `rev_nuc_code’; nuc_code.o:(.bss+0x0): first defined here
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:587: samstat] Error 1
make[2]: Leaving directory ‘/mnt/data/src/samstat-1.5.1/src’
make[1]: *** [Makefile:486: all] Error 2
make[1]: Leaving directory ‘/mnt/data/src/samstat-1.5.1/src’
make: *** [Makefile:334: all-recursive] Error 1

Thank you very much for your help.

install in non-standard location

Hi

Since I want to deploy this on a shared server, I need to install samstat away from the /usr/... standard location but since there is no ./configure (allowing me to set --prefix=/opt/biotools/samstat), I cannot set the DESTDIR to where I need it.
Is there a way to do this with the current install code?

Having samstat in bioconda would really be a plus for deployment in dockers and more.

thanks in advance

one more try -- make make install

Hi again, I hope I am not disturbing you. I left a comment on your answer to my other the other day, but now I see you have closed it so maybe you didn't see it. Like I said, when I was able to make cmake command work after sbatching it, would you mind tell me what are the arguments or "targets" for the last to
make
make install

commands for installing? I am going to attach the error I get below, thank you,
Selma

make: *** No targets specified and no makefile found. Stop.
make: *** No rule to make target `install'. Stop.

Conda availability

Please add samstat to conda:

conda install -c bioconda samstat

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.