Coder Social home page Coder Social logo

mpisort's People

Contributors

fredjarlier avatar njoly avatar phupe avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

mpisort's Issues

Remove assert

Problem with a old assert in write.c line 1430.
Shall remove it.

Question of the memory usage mentioned in the documentation

It is written in the docs in the section Informatic resources / Memory :

  • The total memory used during the sorting is around one and a half the size of the SAM file.

  • The total memory needed in this case is around 2.5 the individual SAM size.

I think that if should be 2.5 the size of the SAM size. Is it correct?

link with crypto in configure

So far we link with curl, lzma and bz2.
Add an option in configure.ac to enable the link with libcrypto.
and update the doc.

Refactoring for MPMD

Refactoring the code to allow MPMD with Slurm.

  • Split the mpiSort.c in 2
    first part is init_mpi + reading
    second part is sorting
  • Interface the code for second part.
  • modify Makefile to create a shared library of the second part
  • call it is from mpiBWA

Question about slurm and PBS/Torque

In the docs/README.md, a example with slurm is provided:

mpirun $MPISORT $SAM $OUTPUTDIR -q 0

The -n or -np parameter can be omitted when mpirun is invoked with slurm? Or should it possible to add the -np like this: mpirun -np ${SLURM_NPROCS} mpiSORT

In the case of PBS/Torque, it seems that it is required: mpirun -np ${PBS_NP}

@fredjarlier , can you provide more details/comments about these -n, -np or parameter when the code is launched with Slurm or PBS/Torque please?

Write a grid explorer

Propose a simple script to explore the cluster architecture.

The idea is to get a overview of the cluster topology at partition, node, network, etc... level in order to help users building the right command line.

replace existing SO in header

Replace the existing @hd SO: when multiple consecutive sorting are applied (sorting query + sorting coord) for instance when marking duplicate with samtools.

mpiSORT running issue when compress small buffer

Hi,

I have following slurm script:

#!/bin/bash

#SBATCH -N 2
#SBATCH -n 4 
#SBATCH -c 1
#SBATCH --tasks-per-node=2
#SBATCH -t 12:0:0
#SBATCH -p normal
#SBATCH --output=sortmpi.out

module load 2020
module load OpenMPI/4.0.3-GCC-9.3.0
module load impi/2019.8.254-iccifort-2020.1.217

with sbatch command:

mpirun -n 4 /home/tahmad/hawk/mpiSORT/src/mpiSORT /scratch-shared/tahmad/bio_data/ERR194147/out/ERR194147_mpibwa.sam.sam /scratch-shared/tahmad/bio_data/ERR194147/out  -p -q 0 2> /scratch-shared/tahmad/bio_data/ERR194147/out/mpiSORT.log

Every time the mpiSORT program crashes with following error:

Number of processes : 4
Reads' quality threshold : 0
Compression Level is : 3
SAM file to read : /scratch-shared/tahmad/bio_data/ERR194147/out/ERR194147_mpibwa.sam.sam
Output directory : /scratch-shared/tahmad/bio_data/ERR194147/out
The size of the file is 30898833423 bytes
Header has 195+2 references
rank 3 ::: counter = 3356333
rank 0 ::: counter = 3464114
rank 1 ::: counter = 3448595
rank 2 ::: counter = 3374532
rank 3 ::: counter = 3426006
rank 1 ::: counter = 3446897
rank 2 ::: counter = 3435650
rank 0 ::: counter = 3406663
rank 3 ::: counter = 3441393
rank 0 ::: counter = 3470229
rank 1 ::: counter = 3449038
rank 2 ::: counter = 3441602
rank 3 ::: counter = 3435720
rank 0 ::: counter = 3411970
rank 2 ::: counter = 3437440
rank 1 ::: counter = 3452366
rank 3 ::: counter = 3418915
rank 0 ::: counter = 3454414
rank 2 ::: counter = 3445559
rank 1 ::: counter = 3449876
rank 3 ::: counter = 3410467
rank 2 ::: counter = 3439039
rank 0 ::: counter = 3451221
rank 1 ::: counter = 3423934
rank 3 ::: counter = 3263389
rank 2 ::: counter = 3424659
rank 0 ::: counter = 3448015
rank 3 ::: counter = 636846
rank 2 ::: counter = 671044
3 (99.51)::::: *** FINISH PARSING FILE ***
2 (100.04)::::: *** FINISH PARSING FILE ***
rank 0 ::: counter = 667634
rank 1 ::: counter = 3460384
0 (101.26)::::: *** FINISH PARSING FILE ***
rank 1 ::: counter = 669584
1 (104.64)::::: *** FINISH PARSING FILE ***
rank 0 ::::[MPISORT] total reads parsed = 98633528
rank 0 ::::[MPISORT] total read to sort for unmapped = 127110
Rank 0 :::::[WRITE] Time for chromosome unmapped writing 0.100934 seconds
rank 0 :::::[MPISORT] Time to write chromosom unmapped ,  0.951701 seconds

rank 0 ::::[MPISORT] total read to sort for discordant = 662518
Rank 0 :::::[WRITE] Time for chromosome discordant writing 0.070292 seconds
rank 0 :::::[MPISORT] Time to write chromosom discordant ,  3.475659 seconds


rank 0 ::::[MPISORT] Elected rank = 0
rank 0 ::::[MPISORT] we don't split the rank
Rank 0 :::::[MPISORT] Dimensions for bitonic = 4
Rank 0 :::::[MPISORT] Split size                           = 4
rank 0 :::::[MPISORT][MALLOC 1] time spent = 0.003256 s
rank 0 :::::[MPISORT][LOCAL SORT] time spent = 0.022284 s
rank 0 :::::[MPISORT][BITONIC 2] time spent = 1.687243 s
rank 0 :::::[MPISORT][TRIMMING] time spent = 0.001534 s
rank 0 :::::[MPISORT][BRUCK 3] time spent = 0.556936 s
rank 0 :::::[MPISORT][FREE + MALLOC] time spent = 0.000414 s
rank 0 :::::[MPISORT] we call write SAM
Rank 0 :::::[WRITE][BITONIC 2] Time spent sorting sources offsets = 1.927455
Rank 0 :::::[WRITE][BRUCK 2] Time spent in bruck  = 0.395096
Rank 0 :::::[WRITE][LOCAL SORT] Time =  0.111249 seconds
Rank 0 :::::[WRITE][COMPUTE BUFFER SIZE] Time =  0.058617 seconds
Rank 0 :::: [WRITE][COMPUTE PACKs] Number of packs buffer = 3
Rank 0 :::::[WRITE][DATA PACK 0] : Time =  0.463420 seconds
mpiSORT: write.c:1395: writeSam: Assertion `number_of_reads_recieved == previous_local_readNum' failed.
mpiSORT: write.c:1395: writeSam: Assertion `number_of_reads_recieved == previous_local_readNum' failed.
mpiSORT: write.c:1395: writeSam: Assertion `number_of_reads_recieved == previous_local_readNum' failed.
Rank 0 :::::[WRITE][BRUCK PACK 0] :: Time   = 0.239733 s
Rank 0 :::::[WRITE][DATA PACK 1] : Time =  0.608196 seconds


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 26525 RUNNING AT tcn1028
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 26526 RUNNING AT tcn1028
=   KILLED BY SIGNAL: 6 (Aborted)
===================================================================================

It only generates following incomplete output files:

-rw-r--r-- 1 tahmad tahmad 700M Sep  9 19:28 chr1.gz
-rw-r--r-- 1 tahmad tahmad  79M Sep  9 19:39 discordant.gz
-rw-r--r-- 1 tahmad tahmad  29G Aug 27 08:28 ERR194147_mpibwa.sam.sam
-rw-r--r-- 1 tahmad tahmad 3.3K Sep  9 19:39 mpiSORT.log
-rw-r--r-- 1 tahmad tahmad  12M Sep  9 19:28 (null)_unmapped.gz
-rw-r--r-- 1 tahmad tahmad  12M Sep  9 19:39 unmapped.gz

Sorting by name before sorting by coordinates

Propose an option to eventually sorting read name before sorting by coordinates.
So far the name sorting is done independantly before the coordinate sorting and to gain speed-up we could do this in the same job.

Bug with mpiSORT launched with the "-n" option

How to reproduce the error:

git clone https://github.com/bioinfo-pf-curie/mpiSORT.git
git checkout d4294c7

Then compile the program according to the installation procedure.

Launch the following command:
mpirun -n 4 src/mpiSORT examples/data/HCC1187C_70K_READS.sam ~/mpiSORTExample/ -n

I got the following error:


Number of processes : 4
Reads' quality threshold : 0
Compression Level is : 3
SAM file to read : examples/data/HCC1187C_70K_READS.sam
Output directory : /home/philippe/mpiSORTExample/
The size of the file is 24471289 bytes
Header has 25+2 references
rank 3 ::: counter = 18803 
rank 0 ::: counter = 18786 
0 (0.08)::::: *** FINISH PARSING FILE ***
3 (0.08)::::: *** FINISH PARSING FILE ***
rank 2 ::: counter = 18782 
rank 1 ::: counter = 18805 
2 (0.10)::::: *** FINISH PARSING FILE ***
1 (0.10)::::: *** FINISH PARSING FILE ***
rank 0 :::: total read to sort for unmapped = 910 
Rank 0 :::::[WRITE] Time for chromosome discordant writing 0.000046 seconds
rank 0 :::::[MPISORT] Time to write chromosom discordant ,  0.003063 seconds 


rank 0 :::: total read to sort for discordant = 194 
Rank 0 :::::[WRITE] Time for chromosome unmapped writing 0.000031 seconds
rank 0 :::::[MPISORT] Time to write chromosom unmapped ,  0.000917 seconds 


rank 0 :::: Elected rank = 0 
rank 0 ::::[MPISORT] we don't split the rank 
Rank 0 :::::[MPISORT] Dimensions for bitonic = 4 
Rank 0 :::::[MPISORT] Split size 			   = 4 
rank 0 :::::[MPISORT][MALLOC 1] time spent = 0.000054 s
rank 0 :::::[MPISORT][LOCAL SORT] time spent = 0.000451 s
rank 0 :::::[MPISORT][BITONIC 2] time spent = 0.000844 s
rank 0 :::::[MPISORT][TRIMMING] time spent = 0.000017 s
rank 0 :::::[MPISORT][BRUCK 3] time spent = 0.000238 s
rank 0 :::::[MPISORT][FREE + MALLOC] time spent = 0.000002 s
corrupted double-linked list
[06751] *** Process received signal ***
[06751] Signal: Aborted (6)
[06751] Signal code:  (-6)
[06751] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f8f1beb0890]
[06751] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f8f1baebe97]
[06751] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f8f1baed801]
[06751] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x89897)[0x7f8f1bb36897]
[06751] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x9090a)[0x7f8f1bb3d90a]
[06751] [ 5] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x7ea)[0x7f8f1bb4513a]
[06751] [ 6] src/mpiSORT(+0x40fb)[0x556ce0c950fb]
[06751] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f8f1baceb97]
[06751] [ 8] src/mpiSORT(+0x533a)[0x556ce0c9633a]
[06751] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node hupe exited on signal 6 (Aborted).
--------------------------------------------------------------------------

Add an option to stabilize the sorting

So far the sorting is not stable. We only consider one key (the coordinate) to do the sorting. We don't preserve the order of the original SAM position of the reads for tie cases. It's not clear how important it is and if it had an impact for subsequent analysis but it could be a problem for reproducibility matter when using md5 comparison at file's level.

The trouble comes from the quick sort used before the bitonic. We could replace the quick sort with any stable sort: merge sort, tim sort, grail sort,...

Segfault during strdup

When sorting unique chromosome file a segfault could appear line 286 of mpiSort during strdup.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.