chanzuckerberg / idseq-workflows Goto Github PK
View Code? Open in Web Editor NEWPortable WDL workflows for IDseq production pipelines
Home Page: https://idseq.net/
License: MIT License
Portable WDL workflows for IDseq production pipelines
Home Page: https://idseq.net/
License: MIT License
I noticed that you're switching from a homebrew DAG processor to WDL and was wondering if you did an evaluation of the various workflow languages/processors as part of that process. If so, is that evaluation available anywhere?
I've got a project that will be tackling a similar evaluation soon.
Hey @rzlim08, I successfully installed latest idseq-workflow
. Now I run the following command but primer, genome and other files start downloading every time and taking hours and hours to run.
time miniwdl run --verbose idseq-workflows/consensus-genome/run.wdl docker_image_id=idseq-consensus-genome fastqs_0= SARSCoV2_firstBatch/S11_L001_R1_001.fastq.gz fastqs_1= SARSCoV2_firstBatch/S11_L001_R2_001.fastq.gz sample= S11 technology=Illumina ref_fasta=s3://idseq-public-references/consensus-genome/MN908947.3.fa -i idseq-workflows/consensus-genome/test/local_test.yml --debug
In fact, I have all files already downloaded in /tmp/miniwdl_download_cache/files/s3/idseq-public-references/_consensus-genome
but pipeline start downloading these all again and abort with the error (sometime kraken_coronavirus_db_only.tar.gz file not found and sometime hg38.fa.gz file not found). For this I manually pasted essential files. But nothing worked.
PS: I always run export MINIWDL__DOWNLOAD_CACHE__DIR=/tmp/miniwdl_download_cache
prior to run main command (mentioned above).
Kindly help
The previous version of idseq-workflow was working fine on my workstation but I am facing difficulties in its latest update.
I successfully installed miniwdl and all other dependencies according to the steps mentioned on GitHub page . Now when I tried to run consensus genome test example using the command:
miniwdl run --verbose consensus-genome/run.wdl docker_image_id=idseq-consensus-genome fastqs_0=idseq-workflows/consensus-genome/test/sample_sars-cov-2_paired_r1.fastq.gz fastqs_1=idseq-workflows/consensus-genome/test/sample_sars-cov-2_paired_r2.fastq.gz sample=sample_sars-cov-2_paired technology=Illumina -i idseq-workflows/consensus-genome/test/local_test.yml
I am getting the following error:
miniwdl-run docker task rejected, desired state shutdown: invalid bind mount source, must be an absolute path: /tmp/miniwdl_download_cache/files/s3/idseq-public-references/_consensus-genome/human_chr1.fa :: error: "RuntimeError", dir: "/home/samiahkanwar/Desktop/AKU_System/IDSeqPipeline_9Jul2021/idseq-workflows/20210713_122226_consensus_genome", from_dir: "/home/samiahkanwar/Desktop/AKU_System/IDSeqPipeline_9Jul2021/idseq-workflows/20210713_122226_consensus_genome/call-RemoveHost"
Kindly help me in this regard. I will be available for providing further information
Hi @mlin yesterday I cloned updated idseq-workflows
but got an error while running the following command:
docker build -t idseq-consensus-genome idseq-workflows/consensus-genome
The error is at 12/17 step:
RUN apt-get install -y python3-cffi python3-h5py python3-intervaltree python3-edlib muscle git
---> Running in 819cc4b4d14a
Get:41 http://us-west-2.ec2.archive.ubuntu.com/ubuntu focal/universe amd64 python3-h5py amd64 2.10.0-2build2 [873 kB]
Get:42 http://us-west-2.ec2.archive.ubuntu.com/ubuntu focal/main amd64 python3-sortedcontainers all 2.1.0-2 [27.3 kB]
Get:43 http://us-west-2.ec2.archive.ubuntu.com/ubuntu focal/universe amd64 python3-intervaltree all 3.0.2-1.1 [22.4 kB]
Fetched 17.3 MB in 46s (377 kB/s)
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/g/git/git-man_2.25.1-1ubuntu3.1_all.deb 404 Not Found [IP: 34.210.25.51 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/g/git/git_2.25.1-1ubuntu3.1_amd64.deb 404 Not Found [IP: 34.210.25.51 80]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
The command '/bin/sh -c apt-get install -y python3-cffi python3-h5py python3-intervaltree python3-edlib muscle git' returned a non-zero code: 100
My workstation is fully updated and upgraded, I retried after installing the packages manually but still got the same error. I have attached the detailed error file.
My Workstation's Specs are:
OS Name: Ubuntu 21.04
OS Type: 64-bit
Thank you in Advance.
To run the workflow on the full metagenomics databases used by IDseq, we recommend starting with an Amazon EC2 r5d.24xlarge - you may want to say that this EC2 instance size includes
Also which GCP instance type (and size) would be recommended for this test? - https://cloud.google.com/compute/docs/machine-types
Assertion: The maximum e-value for alignments in IDseq is 1.
Implementation Details:
The maximum e-value threshold filter is applied in two different locations within the code base:
iterate_m8()
function in the .m8 utils. PipelineStepBlastContigs
.We expect that there may be alignments with e-values > 1 in the initial alignment files (gsnap.m8, rapsearch2.m8, gsnap.blast.m8, rapsearch2.blast.m8).
The filter is then applied to the raw .m8 results when parsing for the top hits. There should never be e-values > 1 in the following files:
This was implemented as part of chanzuckerberg/czid-dag#309
Test Sample:
This was tested on staging using benchmark sample UnAmbiguouslyMapped_ds.gut
. In particular: staging sample ID 19379
was run prior to the fix, staging sample ID 19361
was run after the fix.
For exampe, in sample 19361,
gsnap.m8 has 32 rows with e-value > 1, but gsnap.deduped.m8 has zero.
rapsearch2.m8 has 45 rows with e-value > 1, but rapsearch2.deduped.m8 has zero.
rapsearch2.blast.m8 has 5172 rows with e-value > 1, but rapsearch2.blast.top.m8 has zero.
Hi everybody,
I am working on the packaging of idseq-dag on Debian. But is failling the tests.
Beyond that, I am trying to run the tests locally, but I've this error:
sudo python3 -m unittest tests/test_samples_on_local_steps.py
E{"time": "2020-04-21T22:09:03.245", "data": {"event": "ctx_exec", "context_name": "command.make_dirs", "uid": "3f3533f1285b", "values": {"path": "/mnt/idseq/results/star_out/257549"}, "duration_ms": 13}, "thread": "MainThread", "pid": 257549, "level": "INFO"}
{"time": "2020-04-21T22:09:03.246", "data": {"event": "ctx_exec", "context_name": "command.make_dirs", "uid": "1441993ea123", "values": {"path": "/mnt/idseq/ref"}, "duration_ms": 0}, "thread": "MainThread", "pid": 257549, "level": "INFO"}
E
======================================================================
ERROR: test_all_local_steps (tests.test_samples_on_local_steps.TestSamplesOnLocalSteps)
----------------------------------------------------------------------
TypeError: test_all_local_steps() missing 3 required positional arguments: 'dag_file', 'test_bundle', and 'output_dir_s3'
======================================================================
ERROR: test_many_samples (tests.test_samples_on_local_steps.TestSamplesOnLocalSteps)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/eamanu/Debian/idseq-dag/github/idseq-dag/tests/test_samples_on_local_steps.py", line 37, in test_many_samples
self.test_all_local_steps(dag_file, bundle, output_dir_s3)
File "/home/eamanu/Debian/idseq-dag/github/idseq-dag/tests/test_samples_on_local_steps.py", line 55, in test_all_local_steps
step_class, step_name, dag_file, test_bundle, output_dir_s3)
File "/home/eamanu/Debian/idseq-dag/github/idseq-dag/tests/test_utils.py", line 84, in run_step_and_match_outputs
test_bundle, output_dir_s3)
File "/home/eamanu/Debian/idseq-dag/github/idseq-dag/tests/idseq_step_setup.py", line 95, in get_test_step_object
result_dir_local)
File "/home/eamanu/Debian/idseq-dag/github/idseq-dag/idseq_dag/engine/pipeline_flow.py", line 173, in fetch_input_files_from_s3
output_file = idseq_dag.util.s3.fetch_from_s3(s3_file, local_dir, allow_s3mi=True)
File "/home/eamanu/Debian/idseq-dag/github/idseq-dag/idseq_dag/util/s3.py", line 299, in fetch_from_s3
if is_reference or os.path.abspath(dst).startswith(config["REF_DIR"]):
TypeError: startswith first arg must be str or a tuple of str, not NoneType
----------------------------------------------------------------------
Ran 2 tests in 0.017s
FAILED (errors=2)
Looking on the code seems like PipelineFlow
set the config['REF_DIR']
idseq_dag.util.s3.config["REF_DIR"] = self.ref_dir_local
but that configuration must be saved inside on the PipelineFlow to have persistence
of that dict (or some different way), for that reason whe the test run fetch_from_s3
config is set by default.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.