Comments (7)
@MiWitt , can you please send the full command here for each step? It seems like you have 8 files are you are setting @1?
from deepvariant.
I do not run it step by step. I run "run_deepvariant". This is my command:
singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
/my/path/software/deepVariant/deepvariant_${BIN_VERSION}.sif \
/opt/deepvariant/bin/run_deepvariant \
--model_type=PACBIO \
--ref=${THEREF} \
--reads="${ALIGNMENTNAME}.bam" \
--sample_name=${SAMPLENAME} \
--output_vcf="./${ALIGNMENTNAME}.deepVariant.vcf.gz" \
--output_gvcf="./${ALIGNMENTNAME}.deepVariant.g.vcf.gz" \
--intermediate_results_dir . \
--num_shards=8 \
--logging_dir=.
I have now added the following command, which is a workaround for the problem ...
if ! [ -f "./${ALIGNMENTNAME}.deepVariant.vcf.gz" ]
then
singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
/my/path/software/deepVariant/deepvariant_${BIN_VERSION}.sif \
/opt/deepvariant/bin/postprocess_variants \
--ref="${THEREF}" \
--infile "./call_variants_output@$(ls ./call_variants_output*.tfrecord.gz | wc -l).tfrecord.gz" \
--outfile "./${ALIGNMENTNAME}.deepVariant.vcf.gz" \
--cpus "8" \
--gvcf_outfile "./${ALIGNMENTNAME}.deepVariant.g.vcf.gz" \
--nonvariant_site_tfrecord_path "./gvcf.tfrecord@$(ls ./gvcf.tfrecord*.gz | wc -l).gz" \
--sample_name=${SAMPLENAME}
fi
Eventually this workaround sets --infile to "./[email protected]" and --nonvariant_site_tfrecord_path to "./[email protected]" (see directory listing above).
from deepvariant.
I could extract the three commands make_examples, call_variants and postprocess_variants from the output. Here it is:
seq 0 7 | parallel -q --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "stdchroms.hg38.fa" --reads "SAMPLENAME.bam" --examples "./[email protected]" --add_hp_channel --alt_aligned_pileup "diff_channels" --gvcf "./[email protected]" --max_reads_per_partition "600" --min_mapping_quality "1" --parse_sam_aux_fields --partition_size "25000" --phase_reads --pileup_image_width "199" --norealign_reads --sample_name "SAMPLENAME" --sort_by_haplotypes --track_ref_reads --vsc_min_fraction_indels "0.12" --task {}
/opt/deepvariant/bin/call_variants --outfile "./call_variants_output.tfrecord.gz" --examples "./[email protected]" --checkpoint "/opt/models/pacbio"
/opt/deepvariant/bin/postprocess_variants --ref "stdchroms.hg38.fa" --infile "./call_variants_output.tfrecord.gz" --outfile "./SAMPLENAME.deepVariant.vcf.gz" --cpus "8" --gvcf_outfile "./SAMPLENAME.deepVariant.g.vcf.gz" --nonvariant_site_tfrecord_path "./[email protected]" --sample_name "SAMPLENAME"
And here are the two last commands with std out ...
***** Running the command:*****
time /opt/deepvariant/bin/call_variants --outfile "./call_variants_output.tfrecord.gz" --examples "./[email protected]" --checkpoint "/opt/models/pacbio"
/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:
TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).
For more information see: https://github.com/tensorflow/addons/issues/2807
warnings.warn(
I0510 12:13:42.483308 47501039724352 call_variants.py:563] Total 1 writing processes started.
I0510 12:13:42.487790 47501039724352 dv_utils.py:370] From ./make_examples.tfrecord-00000-of-00008.gz.example_info.json: Shape of input examples: [100, 199, 9], Channels of input examples: [1, 2, 3, 4, 5, 6, 7, 9, 10].
I0510 12:13:42.487916 47501039724352 call_variants.py:588] Shape of input examples: [100, 199, 9]
I0510 12:13:42.488451 47501039724352 call_variants.py:592] Use saved model: True
I0510 12:13:52.162126 47501039724352 dv_utils.py:370] From /opt/models/pacbio/example_info.json: Shape of input examples: [100, 199, 9], Channels of input examples: [1, 2, 3, 4, 5, 6, 7, 9, 10].
I0510 12:13:52.163805 47501039724352 dv_utils.py:370] From ./make_examples.tfrecord-00000-of-00008.gz.example_info.json: Shape of input examples: [100, 199, 9], Channels of input examples: [1, 2, 3, 4, 5, 6, 7, 9, 10].
I0510 12:13:56.551032 47501039724352 call_variants.py:716] Predicted 982 examples in 1 batches [0.419 sec per 100].
I0510 12:13:57.403082 47501039724352 call_variants.py:779] Complete: call_variants.
real 0m21.581s
user 1m40.583s
sys 0m15.744s
***** Running the command:*****
time /opt/deepvariant/bin/postprocess_variants --ref "stdchroms.hg38.fa" --infile "./call_variants_output.tfrecord.gz" --outfile "./SAMPLENAME.deepVariant.vcf.gz" --cpus "8" --gvcf_outfile "./SAMPLENAME.deepVariant.g.vcf.gz" --nonvariant_site_tfrecord_path "./[email protected]" --sample_name "SAMPLENAME"
Traceback (most recent call last):
File "/tmp/Bazel.runfiles_0t8uq2zt/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1419, in <module>
app.run(main)
File "/tmp/Bazel.runfiles_0t8uq2zt/runfiles/absl_py/absl/app.py", line 312, in run
_run_main(main, args)
File "/tmp/Bazel.runfiles_0t8uq2zt/runfiles/absl_py/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/tmp/Bazel.runfiles_0t8uq2zt/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1300, in main
sample_name = get_sample_name()
File "/tmp/Bazel.runfiles_0t8uq2zt/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1203, in get_sample_name
_, record = get_cvo_paths_and_first_record()
File "/tmp/Bazel.runfiles_0t8uq2zt/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1179, in get_cvo_paths_and_first_record
raise ValueError(
ValueError: ('Found multiple file patterns in input filename space: ', './call_variants_output.tfrecord.gz')
real 0m4.925s
user 0m8.815s
sys 0m7.379s
from deepvariant.
@MiWitt ,
Given that you are using --intermediate_results_dir . \
which writes all intermediate files to your directory, if you run the same command multiple times then it will create multiple patterns. Can you please create a clean intermediate directory and use that for --intermediate_results_dir /path/to/intermediate_dir
? That should resolve the issue.
from deepvariant.
This can not be the point. I am working in a cluster environment using slurm and the dir "." is the job specific scratch dir, which is located at "/scratch/SlurmTMP/JobSpecificFolder" (${TMPDIR})
cd ${TMPDIR}
BIN_VERSION="1.6.1"
module load singularity/3.5.2
#####################################################################
# singularity pull docker://google/deepvariant:"${BIN_VERSION}"
ulimit -u 10000 # https://stackoverflow.com/questions/52026652/openblas-blas-thread-init-pthread-create-resource-temporarily-unavailable/54746150#54746150
# --model_type=PACBIO \ ##Replace this string with exactly one of the following [WGS,WES,PACBIO,HYBRID_PACBIO_ILLUMINA]**
# docker://google/deepvariant:"${BIN_VERSION}" \
if ! [ -f "${WORKINDIR}/${ALIGNMENTNAME}.deepVariant.vcf.gz" ]
then
cp "${THEREF}"* ./
cp "${WORKINDIR}/${ALIGNMENTNAME}.bam"* .
chmod 666 `basename "${THEREF}"`*
chmod 666 "${ALIGNMENTNAME}.bam"*
singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
/my/path/software/deepVariant/deepvariant_${BIN_VERSION}.sif \
/opt/deepvariant/bin/run_deepvariant \
--model_type=PACBIO \
--ref=`basename "${THEREF}"` \
--reads="${ALIGNMENTNAME}.bam" \
--sample_name=${SAMPLENAME} \
--output_vcf="./${ALIGNMENTNAME}.deepVariant.vcf.gz" \
--output_gvcf="./${ALIGNMENTNAME}.deepVariant.g.vcf.gz" \
--intermediate_results_dir . \
--num_shards=8 \
--logging_dir=.
if ! [ -f "./${ALIGNMENTNAME}.deepVariant.vcf.gz" ]
then
singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
/my/path/software/deepVariant/deepvariant_${BIN_VERSION}.sif \
/opt/deepvariant/bin/postprocess_variants \
--ref=`basename "${THEREF}"` \
--infile "./call_variants_output@$(ls ./call_variants_output*.tfrecord.gz | wc -l).tfrecord.gz" \
--outfile "./${ALIGNMENTNAME}.deepVariant.vcf.gz" \
--cpus "8" \
--gvcf_outfile "./${ALIGNMENTNAME}.deepVariant.g.vcf.gz" \
--nonvariant_site_tfrecord_path "./gvcf.tfrecord@$(ls ./gvcf.tfrecord*.gz | wc -l).gz" \
--sample_name=${SAMPLENAME}
fi
cp *.log ${WORKINDIR}/
cp "./${ALIGNMENTNAME}.deepVariant.vcf.gz"* ${WORKINDIR}/
else
cp "${WORKINDIR}/${ALIGNMENTNAME}.deepVariant.vcf.gz"* .
fi
from deepvariant.
@MiWitt ,
Can you use --intermediate_results_dir ./intermediate_results_ ${ALIGNMENTNAME}
. I am unsure why you are running postprocessing separately, but, something must be overwriting the files or generating multiple file patterns in the same directory where you are saving everything. One way to better debug is to set --dry_run=true
for each command and look at the outputs and see if they match with each other. Unfortunately I don't have access to an HPC to replicate this issue. I tried running your script but it has many missing variables.
from deepvariant.
Hi, do you have any updates on this issue?
from deepvariant.
Related Issues (20)
- unable to run deepvariant using conda HOT 6
- Fatal Python error: Segmentation fault HOT 3
- How to get list of variants after make_examples step? HOT 1
- Highest mapping quality = 42 in bowtie2 HOT 3
- Output files are missing after running deepvariant. HOT 10
- Merging gvcf with GLnexus introduces non-zero heterozygous PL in hemizygous PAR HOT 1
- Dynamic cast failed HOT 6
- question for INDEL variant calling HOT 14
- Question about the time it takes for VC analysis HOT 5
- Merging vcf files error with glnexus:v1.2.7 HOT 6
- haploid contigs and PAR region options for DeepTrio HOT 13
- [E::vcf_parse_format] Incorrect number of FORMAT fields at NC_059157.1:24900 HOT 2
- Issues with Incompatible TensorRT libraries in docker image google/deepvariant:latest-gpu and google/deepvariant:1.6.1-gpu HOT 9
- CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected HOT 7
- Info ONT R10.4.1 data HOT 3
- error while running deepvariant with a bam file with phasing information
- Error while using deepvariant with a bam file that is phased HOT 4
- Homozygous GT value while IGV shows otherwise HOT 8
- Fix male VCF after calling without --haploid_contigs="chrX,chrY" and/or --par_regions_bed parameters HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepvariant.