sbg / mitty Goto Github PK
View Code? Open in Web Editor NEWSeven Bridges Genomics aligner/caller debugging and analysis tools
License: Apache License 2.0
Seven Bridges Genomics aligner/caller debugging and analysis tools
License: Apache License 2.0
Hello,
I am trying to simulate reads with a template length distribution such that some have a template length that is shorter than the read length, meaning not all reads in the file should reach the full read length of 75bp that I am using. However, all reads in these files are 75bp.
As an example of the issue I have provided a link to a dropbox containing two files, the simulated reads and read model used to create them. The template length is set to a mean of 50 and std of 0, meaning that the DNA fragments should all be 50, but all of the reads are 75bp (the read length set in the model).
https://www.dropbox.com/sh/uz2zjo2ze33978f/AAC8OXPwwnOtevohZ5dv_qjka?dl=0
While running the command
pip install git+https://github.com/sbg/Mitty.git
i am getting following error :
error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Details in the following screenshot
When i tried to install packages mentioned in the setup.py, i am getting following error for pysam installation
Googling about pysam, indicated that pysam is not supported on windows 10. I am using windows 10. Is that the reason for getting cloning error?
conda version is 4.3.30
Python 3.5.6
Thanks
Hello,
I am trying to generate simulated human reads. I cut my VCF using mitty filter-variants
, however, after indexing it mitty generate-reads
complains about some variants within the VCF file without specifying why and which ones.
mitty -v2 generate-reads \
~/Projects/Refs/ucsc.hg19/ucsc.hg19.fasta \
IN.vcf.gz \
SAMPLE \
IN.bed \
Custom-model.pkl \
20 \
7 \
>(gzip > r1.fq.gz) \
lq.txt \
--fastq2 >(gzip > r2.fq.gz) \
--threads 2
ERROR:mitty.lib.vcfio:Unusable variants present in VCF. Please filter or refactor these.
Hey guys, Is there any available version of Mitty to download? It seems I am getting versioning problems due to the lack of availability of some packages to x86 like pysam==0.10.0
I saw a mitty3 image in docker, but didn't worked for me. Could you help?
In your "Fast and accurate genomic analyses using genome graphs" paper you use Mitty to simulate reads, but only from regions without masked reference sequence (repetitive regions). I'm just wondering, is this possible to do directly with Mitty, e.g. by providing it with a hard-masked reference fasta file, or are there any ways to run Mitty telling it to not simulate reads from masked regions? Asking because I'm trying to reproduce the read simulations from your paper, and I don't find any details about how Mitty was used for the experiments shown in the paper.
In advance, thanks!
Hi Caner! The program variant-bearing-reads should check for the truth data (as found in the qname) to check what variants are present, not the aligned CIGAR. Thanks!
I would like to give a try to Mitty. Can I use mitty on Windows 10?
Thanks
Probably use Circle CI as that is what I know how to do.
Hello,
Do we know which Illumina machine the model represents?
thank you!
Hi,
I have been trying to use the GodAligner to recover my true alignment of my generated reads.
However when I run the following code, I do have an error "IndexError: list index out of range". After some digging I found out that the variable "ap2=[]" at some part of the loop (line 200 from god_aligner.py).. however I cannot found out why.
Here is the command I used to create the reads (and it worked perfectly)
#!/usr/bin/env bash
set -ex
FASTA=../data/human_g1k_v37.fa.gz
SAMPLEVCF=../data/1kg.20.22.vcf.gz
REGION_BED=ch20.bed
FILTVCF=Ch20filt.vcf.gz
SAMPLENAME=HG00119
COVERAGE=2
READ_GEN_SEED=7
FASTQ_PREFIX=Ch20-PEreads
READ_CORRUPT_SEED=7
READMODEL=1kg-pcr-free.pklmitty -v4 filter-variants
${SAMPLEVCF}
${SAMPLENAME}
${REGION_BED}
-
2> vcf-filter.log | bgzip -c > ${FILTVCF}tabix -p vcf ${FILTVCF}
mitty -v4 generate-reads
${FASTA}
${FILTVCF}
${SAMPLENAME}
${REGION_BED}
${READMODEL}
${COVERAGE}
${READ_GEN_SEED}
>(gzip > ${FASTQ_PREFIX}1.fq.gz)
${FASTQ_PREFIX}-lq.txt
--fastq2 >(gzip > ${FASTQ_PREFIX}2.fq.gz)
--threads 2mitty -v4 corrupt-reads
${READMODEL}
${FASTQ_PREFIX}1.fq.gz >(gzip > ${FASTQ_PREFIX}-corrupt1.fq.gz)
${FASTQ_PREFIX}-lq.txt
${FASTQ_PREFIX}-corrupt-lq.txt
${READ_CORRUPT_SEED}
--fastq2-in ${FASTQ_PREFIX}2.fq.gz
--fastq2-out >(gzip > ${FASTQ_PREFIX}-corrupt2.fq.gz)
--threads 2
and here is the command I used to run GodAligner:
#!/usr/bin/env bash
set -exFASTA=../data/human_g1k_v37.fa.gz
FASTQ_PREFIX=Ch20-PEreads
GODBAM=Ch20-god.bam
DO_NOT_INDEX=${1}mitty -v4 god-aligner
${FASTA}
${FASTQ_PREFIX}-corrupt1.fq.gz
${FASTQ_PREFIX}-corrupt-lq.txt
${GODBAM}
--fastq2 ${FASTQ_PREFIX}-corrupt2.fq.gz
--threads 2
The full error is as follow:
Process Process-1:
Traceback (most recent call last):
File "/Users/anaconda3/envs/mymitty/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap
self.run()
File "/Users/anaconda3/envs/mymitty/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/Users/anaconda3/envs/mymitty/lib/python3.5/site-packages/mitty/benchmarking/god_aligner.py", line 140, in disciple
write_perfect_reads(qname, rg_id, long_qname_table, ref_dict, read_data, cigar_v2, fp)
File "/Users/anaconda3/envs/mymitty/lib/python3.5/site-packages/mitty/benchmarking/god_aligner.py", line 206, in write_perfect_reads
p10, p11, p20, p21 = ap1[0][1], ap1[-1][1], ap2[0][1], ap2[-1][1]
IndexError: list index out of range
I am working on macOS using Mitty version 2.28.3.
Thanks a lot,
Adrien
EDIT: I tried simulating reads from another chromosome (Chr22) using the same pipelines (changed the seeds and coverage) and this time, everything went smoothly for the new dataset. However the previous one (Chr20) is still bugged.
Hi,
I've created a read model with the following script:
mitty create-read-model synth-illumina
100.pkl
--read-length 100
--mean-template-length 250
--std-template-length 20
--bq0 30
--k 200
--sigma 5
And when I check the read model in Mitty with the following:
mitty describe-read-model 100.pkl 100.png
It looks as expected:
But when I generate reads using the model with the following code:
k=HG00632
i=100
mitty -v4 generate-reads GRCh38.p12.fa
./final_vcfs/${k}all.vcf.gz
${k} all_merged_sorted.bed
${i}.pkl
40
7
${k}${i}reads-test.1.fq
${k}${i}-lq.txt
--fastq2 ${k}_${i}reads-test.2.fq
2> vcf-${i}${k}.log
The generated reads have a flat BQ of 9 when I check them with FastQC:
And when I run the god-aligner to create a bam file, I can see in IGV that the reads are a mess. I've tried running different individuals, different read lengths but get the same pattern.
Have I misunderstood something with the read model generation?
Thank you very much for any help you can provide on the matter.
When I try running the god-aligner on generated reads I get the error bellow:
(mymitty)$ mitty -v4 god-aligner ~/Refs/ucsc.hg19/ucsc.hg19.fasta r1c.fq.gz lqc.txt perfectc.bam --fastq2 r2c.fq.gz --threads 2
Traceback (most recent call last):
File "/Users/u1/anaconda/envs/mymitty/bin/mitty", line 11, in <module>
load_entry_point('mitty==2.9.1.dev0', 'console_scripts', 'mitty')()
File "/Users/u1/anaconda/envs/mymitty/lib/python3.5/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/Users/u1/anaconda/envs/mymitty/lib/python3.5/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/Users/u1/anaconda/envs/mymitty/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/u1/anaconda/envs/mymitty/lib/python3.5/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/smiao/anaconda/envs/mymitty/lib/python3.5/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/Users/u1/anaconda/envs/mymitty/lib/python3.5/site-packages/mitty/cli.py", line 255, in god_aligner
import mitty.benchmarking.god_aligner as god
ImportError: No module named 'mitty.benchmarking'
I am running the following version:
(mymitty)$ mitty --version
mitty, version 2.9.1.dev0
Please add a command line switch that allows us to either keep just the reference reads, or just the variant bearing reads from the original BAM. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.