Coder Social home page Coder Social logo

isaac3's People

Contributors

cornhundred avatar rpetrovski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

isaac3's Issues

isaac-sort-reference: still no (user) control over CPU usage

Hi,

creating an index with isaac-sort-reference is still "problematic" in a shared computing environment.
Using all CPU cores in general if needed is not a very "social approach".

I'd like to have control over (at least) CPU cores to be used for the whole index creation process.
That includes other programs as well (if they they grab all CPUs).

Control over memory is just another "feature" request. Sth. like:

--maxCPU
--maxMEM

This is valid for isaac2/3.

best,
Sven

Error in thread

Hi, I'm having different errors when running isaac. My command line is:

../../Isaac3/bin/isaac-align -r /strg02/users/osvaldo/hg38/human_hg38/sorted-reference.xml -b ../libs_isaac3/ -m 256 --base-calls-format fastq-gz --known-indels /data/users/osvaldo/tmo/known_indels/1000G_phase1.indels.hg38.vcf

I tried with different libraries and still having the same problem:

Do you know whats I'm doing wrong?

Thank's

Dynamic exception type: boost::exception_detail::clone_implisaac::common::ThreadingException
std::exception::what: Terminating due to failures on other threads
: Terminating due to failures on other threads
2016-06-20 12:37:06 [7fae72360700] ERROR: Thread: 121 also caught an exception: 2016-Jun-20 12:37:06: Success: /data/users/osvaldo/tmo/Isaac3/src/c++/lib/build/Build.cpp(761): Throw in function void isaac::build::Build::waitForLoadSlot(boost::unique_lockboost::mutex&, std::vector<boost::reference_wrapper >::const_iterator, std::vector<boost::reference_wrapper >::const_iterator, std::vector<boost::reference_wrapper >::const_iterator&, isaac::common::ScopedMallocBlock&, std::size_t)
Dynamic exception type: boost::exception_detail::clone_implisaac::common::ThreadingException
std::exception::what: Terminating due to failures on other threads
: Terminating due to failures on other threads
2016-06-20 12:37:18 [7fae4671a700] Serializing records done: 5137358 of them for bin BinMetadata(51id ReferencePosition(4:48388960:0f)bs 12097240bl 1578980154ds 0do 0se 2586255rs 2550203f /data/users/osvaldo/tmo/Tismoo-003/align_isaac3/./Temp/bin-00000005-00000051.dat) in 48seconds.
2016-06-20 12:37:18 [7fae4671a700] ERROR: Thread: 54 also caught an exception: 2016-Jun-20 12:37:18: Success: /data/users/osvaldo/tmo/Isaac3/src/c++/lib/build/Build.cpp(929): Throw in function void isaac::build::Build::preemptComputeSlot(boost::unique_lockboost::mutex&, std::size_t, std::size_t, OperationT, unsigned int) [with OperationT = isaac::build::Build::sortBinParallel(std::vector<boost::reference_wrapper >::const_iterator&, std::vector<boost::reference_wrapper >::const_iterator&, std::vector<boost::reference_wrapper >::const_iterator&, std::vector<boost::reference_wrapper >::const_iterator&, std::vector<boost::reference_wrapper >::const_iterator&, isaac::common::ScopedMallocBlock&, std::size_t)::__lambda11; std::size_t = long unsigned int]
Dynamic exception type: boost::exception_detail::clone_implisaac::common::ThreadingException
std::exception::what: Terminating due to failures on other threads
: Terminating due to failures on other threads
2016-06-20 12:37:18 [7fae7916b700] ERROR: Thread: 132 also caught an exception: 2016-Jun-20 12:37:18: Success: /data/users/osvaldo/tmo/Isaac3/src/c++/lib/build/Build.cpp(761): Throw in function void isaac::build::Build::waitForLoadSlot(boost::unique_lockboost::mutex&, std::vector<boost::reference_wrapper >::const_iterator, std::vector<boost::reference_wrapper >::const_iterator, std::vector<boost::reference_wrapper >::const_iterator&, isaac::common::ScopedMallocBlock&, std::size_t)
Dynamic exception type: boost::exception_detail::clone_implisaac::common::ThreadingException
std::exception::what: Terminating due to failures on other threads
: Terminating due to failures on other threads
2016-06-20 12:37:24 [7fae358ff700] Reading alignment records done from BinMetadata(53id ReferencePosition(4:72583440:0f)bs 12097240bl 1647923336ds 0do 0se 2694650rs 2667212f /data/users/osvaldo/tmo/Tismoo-003/align_isaac3/./Temp/bin-00000005-00000053.dat)
2016-06-20 12:37:24 [7fae358ff700] Collecting gaps.
2016-06-20 12:37:25 [7fae358ff700] Finalizing gaps.
2016-06-20 12:37:25 [7fae358ff700] Loading unsorted data done in 16970ms
2016-06-20 12:37:25 [7fae358ff700] ERROR: Thread: 27 also caught an exception: 2016-Jun-20 12:37:25: Success: /data/users/osvaldo/tmo/Isaac3/src/c++/lib/build/Build.cpp(929): Throw in function void isaac::build::Build::preemptComputeSlot(boost::unique_lockboost::mutex&, std::size_t, std::size_t, OperationT, unsigned int) [with OperationT = isaac::build::Build::sortBinParallel(std::vector<boost::reference_wrapper >::const_iterator&, std::vector<boost::reference_wrapper >::const_iterator&, std::vector<boost::reference_wrapper >::const_iterator&, std::vector<boost::reference_wrapper >::const_iterator&, std::vector<boost::reference_wrapper >::const_iterator&, isaac::common::ScopedMallocBlock&, std::size_t)::__lambda9; std::size_t = long unsigned int]
Dynamic exception type: boost::exception_detail::clone_implisaac::common::ThreadingException
std::exception::what: Terminating due to failures on other threads
: Terminating due to failures on other threads
2016-06-20 12:37:25 [7fae57f36700] ERROR: Thread: 82 also caught an exception: 2016-Jun-20 12:37:25: Success: /data/users/osvaldo/tmo/Isaac3/src/c++/lib/build/Build.cpp(761): Throw in function void isaac::build::Build::waitForLoadSlot(boost::unique_lockboost::mutex&, std::vector<boost::reference_wrapper >::const_iterator, std::vector<boost::reference_wrapper >::const_iterator, std::vector<boost::reference_wrapper >::const_iterator&, isaac::common::ScopedMallocBlock&, std::size_t)
Dynamic exception type: boost::exception_detail::clone_implisaac::common::ThreadingException
std::exception::what: Terminating due to failures on other threads
: Terminating due to failures on other threads
2016-06-20 12:37:25 [7fae6e75a700] ERROR: Thread: 115 also caught an exception: 2016-Jun-20 12:37:25: Success: /data/users/osvaldo/tmo/Isaac3/src/c++/lib/build/Build.cpp(761): Throw in function void isaac::build::Build::waitForLoadSlot(boost::unique_lockboost::mutex&, std::vector<boost::reference_wrapper >::const_iterator, std::vector<boost::reference_wrapper >::const_iterator, std::vector<boost::reference_wrapper >::const_iterator&, isaac::common::ScopedMallocBlock&, std::size_t)
Dynamic exception type: boost::exception_detail::clone_implisaac::common::ThreadingException
std::exception::what: Terminating due to failures on other threads
: Terminating due to failures on other threads
2016-06-20 12:37:25 [7fae8317b700] ERROR: Thread: 148 also caught an exception: 2016-Jun-20 12:37:25: Success: /data/users/osvaldo/tmo/Isaac3/src/c++/lib/build/Build.cpp(761): Throw in function void isaac::build::Build::waitForLoadSlot(boost::unique_lockboost::mutex&, std::vector<boost::reference_wrapper >::const_iterator, std::vector<boost::reference_wrapper >::const_iterator, std::vector<boost::reference_wrapper >::const_iterator&, isaac::common::ScopedMallocBlock&, std::size_t)
Dynamic exception type: boost::exception_detail::clone_implisaac::common::ThreadingException
std::exception::what: Terminating due to failures on other threads
: Terminating due to failures on other threads

My job scheduler kills any job after 24H00. Is the any way to recover indexing ?

Hi illumina,
I'm trying to index a genome using isaac3.

/ccc/work/cont007/fg0019/lindenbp/packages/isaac/bin/isaac-sort-reference \
    -g  /ccc/work/cont007/fg/fg/biobank/by-taxonid/9606/38/grch38mt_all_chr.fasta \
    --quiet \
    --output-directory ${CCCSCRATCHDIR}/20160726-isaac-index-grch38mt

Unfortunately the job scheduler I'm using (a wrapper around slurm) kills any job after 24H00.

So my job was killed , here is the stderr:

(...)
+ gridShell=
+ QRSH_CMD=
+ make -j 1 -f /ccc/work/cont007/fg0019/lindenbp/packages/isaac/bin/../share/iSAAC-03.16.04.27/makefiles/reference/SortReference.mk -C /ccc/scratch/cont007/fg0019/lindenbp/20160726-isaac-index-grch38mt GENOME_FILE:=/ccc/work/cont007/fg/fg/biobank/by-taxonid/9606/38/grch38mt_all_chr.fasta ANNOTATION_MASK_WIDTH:=0 iSAAC_LOG_LEVEL:=1 NEIGHBORHOOD_DISTANCE:=2 'ANNOTATION_SEED_LENGTHS:=16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80' all
slurmd[airain6097]: error: *** JOB 4039578 CANCELLED AT 2016-07-27T16:37:30 DUE TO TIME LIMIT ***

and stdout:

(....)
/ccc/work/cont007/fg0019/lindenbp/packages/isaac/bin/../share/iSAAC-03.16.04.27/makefiles/reference/../../../../share/iSAAC-03.16.04.27/makefiles/common/../../../../libexec/iSAAC-03.16.04.27/findNeighbors -r /ccc/scratch/cont007/fg0019/lindenbp/20160726-isaac-index-grch38mt/Temp/neighbor-positions-20.xml \
    --seed-length 20 \
    --neighborhood-distance 2 \
    --mask-width 0 \
    --mask 0 \
    --output-file /ccc/scratch/cont007/fg0019/lindenbp/20160726-isaac-index-grch38mt/Temp/neighbors-2-20.16bpb.gz.tmp && mv /ccc/scratch/cont007/fg0019/lindenbp/20160726-isaac-index-grch38mt/Temp/neighbors-2-20.16bpb.gz.tmp /ccc/scratch/cont007/fg0019/lindenbp/20160726-isaac-index-grch38mt/Temp/neighbors-2-20.16bpb.gz

as this process seems to use make, is there a safe way to re-launch the process from the current state ?

+ sign not found where expected

I'm using iSAAC-03.17.03.01. I have created simulated fastq.gz files using VarSim v0.7.8. My reference genome is hg38, and I built the index according to the instructions provided. Using the call

${isaac} --base-calls ${outdir} --base-calls-format fastq-gz --memory-limit 25 --reference-genome ${refXMLdescriptor} --reference-name ${refname} --sample-sheet none

I get the following error:

Failed to parse the options: /opt/src/iSAAC/iSAAC-03.17.03.01/src/c++/lib/io/FastqReader.cpp(242): Throw in function void isaac::io::FastqReader::findQScores() Dynamic exception type: boost::exception_detail::clone_impl<isaac::io::FastqFormatException> std::exception::what: + sign not found where expected: /data/lane1_read1.fastq.gz, offset 324

What does this error mean? I've checked the file, it looks like a perfectly normal fastq.

BC tags with "none" value

Hi!

We found a BAM file, produced by Isaac 03.16.02.19, that contains BC tags with "none" value, which is invalid according to the standard, section 1.3. The barcodes appear to have been read from a CSV file.

This issue might be present in some of your other aligners.

[help] the correct way to use paired fastq.gz as input.

Hi,

I start alignment using Isaac3. But it keep give me errors:

Dynamic exception type: boost::exception_detail::clone_implisaac::common::InvalidOptionException
std::exception::what:
*** Could not find any fastq lanes in: "/datadir" ***

I use docker to run the isaac3 and and command looks like:
docker run --rm
-v ..../test_illumina_pipeline/WGC087349D:/out_dir
-v .../ftp.broadinstitute.org/b37/issac3_index:/ref \
-v .../WGC087349D:/datadir
-w /out_dir
isaac3:03.16.12.05 isaac-align
-b /datadir
-m 40
--base-calls-format fastq-gz
-r /ref
--enable-numa \
-j 4
-o /out_dir
more specifically this contains the command I used: https://github.com/LuyiTian/NGS_docker/blob/master/pipeline/pipe_issac3.py

my fastq.gz file (XXX_combined_R1.fastq.gz,XXX_combined_R2.fastq.gz) does not follow the specified format so i simlinked them to lane1_read1.fastq.gz and lane1_read2.fastq.gz in the same folder, but I still got the error. I am not sure if this is the right way because my fastq file contains all lanes.

Another question is about isaac3 index. It takes about 1T space to store the index and most of them are on /Temp folder. Can I safely delete the folder after I built the index? Also I think it is worth to mention in the Readme that to build human reference you need to prepare at least 1T space. It took more than 2 days to build on our 32 core server (with -j=1).

Kind Regards,
Luyi

ERROR: ***** Internal Program Error - assertion ([...]) failed in isaac::build::GapRealigner::GapChoice

Hi,

I have two PE125 fastq files (originally from bcl2fastq conversion) and wanted to map on grcm38 with 40 threads. After ~90 min isaac crashed (reproducible) with:

2016-09-09 12:48:09  [7f88fc664700] Loading unsorted data
2016-09-09 12:48:09  [7f88fc664700] Reading alignment records from BinMetadata(3id ReferencePosition(0:13680624:0f)bs 6840312bl 492742718ds 0do 0se 888256rs 869166f 
/scratch/cluster/mx/test_mouse/20160909_111202.mapping.mx.cpu40.vig1/3.ext_L7255-3_SJL.grcm38/temp/bin-00000001-00000003.dat)
2016-09-09 12:48:09  [7f88f865c700] ERROR: ***** Internal Program Error - assertion (undoneAlignmentPos <= int64_t(undoPivotGap.getEndPos(false).getPosition())) failed 
in isaac::build::GapRealigner::GapChoice isaac::build::GapRealigner::findBetterGapsChoice(const isaac::build::gapRealigner::GapsRange&, const isaac::reference::ReferencePosition&, const isaac::reference::ReferencePosition&, const ContigList&, const isaac::io::FragmentAccessor&, const isaac::build::PackedFragmentBuffer::Index&, unsigned int&):/scratch/local2/build/illumina/Isaac3/src/c++/lib/build/GapRealigner.cpp(1092): 
undoPivotPos pos ReferencePosition(0:3309721:0f) overlapped by an existing deletion PackedFragmentBuffer::Index(ReferencePosition(0:3309721:0f),271963004do 271963291mdo, 22M25D102M)

/scratch/cluster/mx/test_mouse/20160909_111202.mapping.mx.cpu40.vig1/3.ext_L7255-3_SJL.grcm38/run.3.isaac.sh: line 11: 
118927 Segmentation fault      
/package/sequencer/illumina/isaac/current/bin/isaac-align 
--base-calls /scratch/cluster/mx/test_mouse/20160909_111202.mapping.mx.cpu40.vig1/3.ext_L7255-3_SJL.grcm38
--base-calls-format fastq-gz 
--default-adapters Standard 
--reference-genome /project/genomes/Mus_musculus/NCBI/GRCm38/Sequence/iSAACIndex/sorted-reference.xml 
--realign-gaps sample
--scatter-repeats 1
--single-library-samples 0 
--keep-duplicates 1 
--bam-header-tag "@RG\tID:ext_L7255-3_SJL\tLB:ext_L7255-3_SJL\tSM:ext_L7255-3_SJL\tPL:illumina\tCN:MPIMG\tDS:xxx, Mouse Whole Genome Sequencing (WGS)" 
--keep-unaligned back 
--lane-number-max 3 
--mark-duplicates 1 
--clip-overlapping 1 
--clip-semialigned 0 
--description "xxx, Mouse Whole Genome Sequencing (WGS)" --jobs 40 
--memory-limit 80 
--input-concurrent-load 40 
--temp-concurrent-load 8 
--output-concurrent-save 40 
--temp-concurrent-save 40 
--cleanup-intermediary 1 
--verbosity 3 
--output-directory /scratch/cluster/mx/test_mouse/20160909_111202.mapping.mx.cpu40.vig1/3.ext_L7255-3_SJL.grcm38 
--temp-directory /scratch/cluster/mx/test_mouse/20160909_111202.mapping.mx.cpu40.vig1/3.ext_L7255-3_SJL.grcm38/temp 
--realign-vigorously 1

Setting --realign-vigorously 1 to 0 (which is the default) works like a charm.

This is Version: iSAAC-03.16.06.06.

best,
Sven

[Question]: Isaac index creation (grch38/m38)

Hi,

creating indices for grch38 and grcm38 leaves some open questions:

I have run index creation as follows (mask-width 0 is the default, I just put it there as a "reminder" for future index creation runs):

isaac-sort-reference \
  --output-directory iSAACindex \
  --jobs 1 \
  --mask-width 0 \
  --genome-file genome.fa

That left me with exact 3 files and a 1.1TiB Temp folder:

-rw-rw-r-- 1 klages klages 618M 2016.08.26 01:05:08 2repeatness.8bpb.gz
-rw-rw-r-- 1 klages klages 678M 2016.08.25 22:19:13 2uniqueness.8bpb.gz
-rw-rw-r-- 1 klages klages 108K 2016.08.26 01:05:09 sorted-reference.xml
drwxrwxr-x 2 klages klages 8.0K 2016.08.26 01:05:09 Temp

make reported

[all]    INFO: All done!

At least it is "packable" by isaac-pack-reference.

Comparing to hg19-packed-reference.tar.gz from BaseSpace which shows:

-rwxr-x--- rpetrovski/aladdin 644685308 2014-11-19 21:38 2uniqueness.16bpb.gz
-rw-r--r-- rpetrovski/aladdin 386961748 2014-11-20 13:03 neighbors-1or2-16.1bpb
-rw-r--r-- rpetrovski/aladdin 386961748 2014-11-20 13:06 neighbors-1or2-32.1bpb
-rwxr-xr-- rpetrovski/aladdin 3157608038 2014-11-20 12:53 genome.fa
-rw-r--r-- rpetrovski/aladdin      48044 2014-11-20 12:54 sorted-reference.xml
  • Is that a complete and valid index??
  • Do I still need Temp for any task after index creation?
  • What are the differences compared to isaac2 indices?
  • would be nice to have some grch38/grcm38 indexes on BaseSpace
isaac-sort-reference --version
iSAAC-03.16.06.06

best,
Sven

isaac: reduce I/O on himem machines

Hi,
I want to distribute different isaac (human/mouse) mappings on different servers all writing on the same (high performance) storage system during alignment. That is our computing setup.
We do have a lot of servers with more than 500G RAM.

How do I choose the concurrency parameters to produce as little I/O as possible?
I do see that multiple jobs on multiple servers dramatically kill I/O performance on shared filesystem.

I also see a lot of warnings like this in isaac's output:

2016-09-16 00:17:48  [7f866ebe5700] WARNING: Holding up processing of bin: BinMetadata(200id ReferencePosition(15:40562784:0f)bs 10140696bl 708874476ds 0do 0se 1277975rs 1250093f /path/to/temp/bin-00000016-00000200.dat) until std::bad_alloc clears. Error data: 982017348

2016-09-16 00:19:39  [7f88621da700] WARNING: Holding up processing of bin: BinMetadata(217id ReferencePosition(16:60844176:0f)bs 10140696bl 796554434ds 0do 0se 1434240rs 1408124f /path/to/temp/bin-00000017-00000217.dat) until std::bad_alloc clears. Error data: 1103634210

2016-09-16 00:25:44  [7f86703e8700] WARNING: Holding up processing of bin: /path/to/temp/bin-00000000-00000000.dat until a load slot is available

2016-09-16 00:26:03  [7f86713ea700] WARNING: Holding up processing of bin: /path/to/temp/bin-00000000-00000000.dat until a load slot is available

and runtimes more than 10h (usually max. 2h).

This happened with:

--input-concurrent-load 10 
--temp-concurrent-load 10 
--output-concurrent-save 4 
--temp-concurrent-save 4

I tried with different concurrency setting, with similar results.

So when I run isaac with one or two jobs, everything is fine. When I try to run 20 Jobs on 10 servers writing on the same filesystem, performance descreases dramatically.
So I'd like to opimise conncurrency parameters to possibly decrease file I/O.

Any idea what to do? I could use /dev/shm but I really like to avoid that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.