Coder Social home page Coder Social logo

Comments (7)

kishwarshafin avatar kishwarshafin commented on May 31, 2024

@MiWitt , can you please send the full command here for each step? It seems like you have 8 files are you are setting @1?

from deepvariant.

MiWitt avatar MiWitt commented on May 31, 2024

I do not run it step by step. I run "run_deepvariant". This is my command:

 singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
    /my/path/software/deepVariant/deepvariant_${BIN_VERSION}.sif \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type=PACBIO \
    --ref=${THEREF} \
    --reads="${ALIGNMENTNAME}.bam" \
    --sample_name=${SAMPLENAME} \
    --output_vcf="./${ALIGNMENTNAME}.deepVariant.vcf.gz" \
    --output_gvcf="./${ALIGNMENTNAME}.deepVariant.g.vcf.gz" \
    --intermediate_results_dir . \
    --num_shards=8 \
    --logging_dir=.

I have now added the following command, which is a workaround for the problem ...

    if ! [ -f "./${ALIGNMENTNAME}.deepVariant.vcf.gz" ]
    then
       singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
         /my/path/software/deepVariant/deepvariant_${BIN_VERSION}.sif \
         /opt/deepvariant/bin/postprocess_variants \
         --ref="${THEREF}" \
         --infile "./call_variants_output@$(ls ./call_variants_output*.tfrecord.gz | wc -l).tfrecord.gz" \
         --outfile "./${ALIGNMENTNAME}.deepVariant.vcf.gz" \
         --cpus "8" \
         --gvcf_outfile "./${ALIGNMENTNAME}.deepVariant.g.vcf.gz" \
         --nonvariant_site_tfrecord_path "./gvcf.tfrecord@$(ls ./gvcf.tfrecord*.gz | wc -l).gz" \
         --sample_name=${SAMPLENAME}
    fi

Eventually this workaround sets --infile to "./[email protected]" and --nonvariant_site_tfrecord_path to "./[email protected]" (see directory listing above).

from deepvariant.

MiWitt avatar MiWitt commented on May 31, 2024

I could extract the three commands make_examples, call_variants and postprocess_variants from the output. Here it is:

seq 0 7 | parallel -q --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "stdchroms.hg38.fa" --reads "SAMPLENAME.bam" --examples "./[email protected]" --add_hp_channel --alt_aligned_pileup "diff_channels" --gvcf "./[email protected]" --max_reads_per_partition "600" --min_mapping_quality "1" --parse_sam_aux_fields --partition_size "25000" --phase_reads --pileup_image_width "199" --norealign_reads --sample_name "SAMPLENAME" --sort_by_haplotypes --track_ref_reads --vsc_min_fraction_indels "0.12" --task {}

/opt/deepvariant/bin/call_variants --outfile "./call_variants_output.tfrecord.gz" --examples "./[email protected]" --checkpoint "/opt/models/pacbio"

/opt/deepvariant/bin/postprocess_variants --ref "stdchroms.hg38.fa" --infile "./call_variants_output.tfrecord.gz" --outfile "./SAMPLENAME.deepVariant.vcf.gz" --cpus "8" --gvcf_outfile "./SAMPLENAME.deepVariant.g.vcf.gz" --nonvariant_site_tfrecord_path "./[email protected]" --sample_name "SAMPLENAME"

And here are the two last commands with std out ...

***** Running the command:*****
time /opt/deepvariant/bin/call_variants --outfile "./call_variants_output.tfrecord.gz" --examples "./[email protected]" --checkpoint "/opt/models/pacbio"

/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning: 

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 

  warnings.warn(
I0510 12:13:42.483308 47501039724352 call_variants.py:563] Total 1 writing processes started.
I0510 12:13:42.487790 47501039724352 dv_utils.py:370] From ./make_examples.tfrecord-00000-of-00008.gz.example_info.json: Shape of input examples: [100, 199, 9], Channels of input examples: [1, 2, 3, 4, 5, 6, 7, 9, 10].
I0510 12:13:42.487916 47501039724352 call_variants.py:588] Shape of input examples: [100, 199, 9]
I0510 12:13:42.488451 47501039724352 call_variants.py:592] Use saved model: True
I0510 12:13:52.162126 47501039724352 dv_utils.py:370] From /opt/models/pacbio/example_info.json: Shape of input examples: [100, 199, 9], Channels of input examples: [1, 2, 3, 4, 5, 6, 7, 9, 10].
I0510 12:13:52.163805 47501039724352 dv_utils.py:370] From ./make_examples.tfrecord-00000-of-00008.gz.example_info.json: Shape of input examples: [100, 199, 9], Channels of input examples: [1, 2, 3, 4, 5, 6, 7, 9, 10].
I0510 12:13:56.551032 47501039724352 call_variants.py:716] Predicted 982 examples in 1 batches [0.419 sec per 100].
I0510 12:13:57.403082 47501039724352 call_variants.py:779] Complete: call_variants.

real	0m21.581s
user	1m40.583s
sys	0m15.744s

***** Running the command:*****
time /opt/deepvariant/bin/postprocess_variants --ref "stdchroms.hg38.fa" --infile "./call_variants_output.tfrecord.gz" --outfile "./SAMPLENAME.deepVariant.vcf.gz" --cpus "8" --gvcf_outfile "./SAMPLENAME.deepVariant.g.vcf.gz" --nonvariant_site_tfrecord_path "./[email protected]" --sample_name "SAMPLENAME"

Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_0t8uq2zt/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1419, in <module>
    app.run(main)
  File "/tmp/Bazel.runfiles_0t8uq2zt/runfiles/absl_py/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/tmp/Bazel.runfiles_0t8uq2zt/runfiles/absl_py/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/tmp/Bazel.runfiles_0t8uq2zt/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1300, in main
    sample_name = get_sample_name()
  File "/tmp/Bazel.runfiles_0t8uq2zt/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1203, in get_sample_name
    _, record = get_cvo_paths_and_first_record()
  File "/tmp/Bazel.runfiles_0t8uq2zt/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1179, in get_cvo_paths_and_first_record
    raise ValueError(
ValueError: ('Found multiple file patterns in input filename space: ', './call_variants_output.tfrecord.gz')

real	0m4.925s
user	0m8.815s
sys	0m7.379s

from deepvariant.

kishwarshafin avatar kishwarshafin commented on May 31, 2024

@MiWitt ,

Given that you are using --intermediate_results_dir . \ which writes all intermediate files to your directory, if you run the same command multiple times then it will create multiple patterns. Can you please create a clean intermediate directory and use that for --intermediate_results_dir /path/to/intermediate_dir? That should resolve the issue.

from deepvariant.

MiWitt avatar MiWitt commented on May 31, 2024

This can not be the point. I am working in a cluster environment using slurm and the dir "." is the job specific scratch dir, which is located at "/scratch/SlurmTMP/JobSpecificFolder" (${TMPDIR})


cd ${TMPDIR}
BIN_VERSION="1.6.1"
module load singularity/3.5.2


#####################################################################
# singularity pull docker://google/deepvariant:"${BIN_VERSION}"


ulimit -u 10000 # https://stackoverflow.com/questions/52026652/openblas-blas-thread-init-pthread-create-resource-temporarily-unavailable/54746150#54746150

#  --model_type=PACBIO \ ##Replace this string with exactly one of the following [WGS,WES,PACBIO,HYBRID_PACBIO_ILLUMINA]**
#  docker://google/deepvariant:"${BIN_VERSION}" \

if ! [ -f "${WORKINDIR}/${ALIGNMENTNAME}.deepVariant.vcf.gz" ]
then
  cp "${THEREF}"* ./
  cp "${WORKINDIR}/${ALIGNMENTNAME}.bam"* .
  chmod 666 `basename "${THEREF}"`*
  chmod 666 "${ALIGNMENTNAME}.bam"*
  singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
    /my/path/software/deepVariant/deepvariant_${BIN_VERSION}.sif \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type=PACBIO \
    --ref=`basename "${THEREF}"` \
    --reads="${ALIGNMENTNAME}.bam" \
    --sample_name=${SAMPLENAME} \
    --output_vcf="./${ALIGNMENTNAME}.deepVariant.vcf.gz" \
    --output_gvcf="./${ALIGNMENTNAME}.deepVariant.g.vcf.gz" \
    --intermediate_results_dir . \
    --num_shards=8 \
    --logging_dir=.
    
    if ! [ -f "./${ALIGNMENTNAME}.deepVariant.vcf.gz" ]
    then
       singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
         /my/path/software/deepVariant/deepvariant_${BIN_VERSION}.sif \
         /opt/deepvariant/bin/postprocess_variants \
         --ref=`basename "${THEREF}"` \
         --infile "./call_variants_output@$(ls ./call_variants_output*.tfrecord.gz | wc -l).tfrecord.gz" \
         --outfile "./${ALIGNMENTNAME}.deepVariant.vcf.gz" \
         --cpus "8" \
         --gvcf_outfile "./${ALIGNMENTNAME}.deepVariant.g.vcf.gz" \
         --nonvariant_site_tfrecord_path "./gvcf.tfrecord@$(ls ./gvcf.tfrecord*.gz | wc -l).gz" \
         --sample_name=${SAMPLENAME}
    fi
    cp *.log ${WORKINDIR}/
    cp "./${ALIGNMENTNAME}.deepVariant.vcf.gz"* ${WORKINDIR}/
else
 cp "${WORKINDIR}/${ALIGNMENTNAME}.deepVariant.vcf.gz"* .
fi

from deepvariant.

kishwarshafin avatar kishwarshafin commented on May 31, 2024

@MiWitt ,

Can you use --intermediate_results_dir ./intermediate_results_ ${ALIGNMENTNAME}. I am unsure why you are running postprocessing separately, but, something must be overwriting the files or generating multiple file patterns in the same directory where you are saving everything. One way to better debug is to set --dry_run=true for each command and look at the outputs and see if they match with each other. Unfortunately I don't have access to an HPC to replicate this issue. I tried running your script but it has many missing variables.

from deepvariant.

kishwarshafin avatar kishwarshafin commented on May 31, 2024

@MiWitt

Hi, do you have any updates on this issue?

from deepvariant.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.