sartorlab / mint Goto Github PK
View Code? Open in Web Editor NEWA pipeline for the integration of DNA methylation and hydroxymethylation data
A pipeline for the integration of DNA methylation and hydroxymethylation data
This is a rather annoying problem, where the chromosome size file directly downloaded from UCSC works in every other case for bedtools
subroutines, but has some problem with bedtools complement
. Googling for the problem doesn't return anything useful.
Mint seems very awesome. May I know how to use it for pair-end read WGBS data?
Thanks a lot
Z
Hi:
Our lab currently got back some whole-genome bisulfite sequencing data and I've been testing with the pipeline. Our run was paired-end 150 bp. I've noticed in your test data set, only 1 fastq file per sample. Are those single end only or I can only use read1 for the pipeline?
When I configure the run with init.R, I've noticed that if I have sample_1.fastq.gz, and sample_2.fastq.gz in my data folder, the soft link it generated in the project/data folder still have the _1 and _2 in there, but when I run the pipeline with make bisulfite align, the sample_mc_hmc_bisulfite.fastq.gz file in the project/bisulfite/raw-fastq folder loses _1 and _2 and seems to have trouble link to the data/raw-fastq folder.
Is there anything I can do to config the run to fit PE reads data?
Thank you!
Currently it's set to 5, but this should be user-defined.
In the bisulfite and pulldown comparison preparation, the code is implemented to handle a sample belonging to at most groups.
The default aligner in bismark v0.14.x
is now bowtie2
, so we're going to migrate to that as the default.
There is currently something wrong with the following:
bedtools genomecov -bg -ibam $bowtie2Bam -g ~/latte/Homo_sapiens/chromInfo_hg19.txt \
| sort-bed --max-mem 16G --tmpdir $PWD > $pulldownBedgraph
bedGraphToBigWig $pulldownBedgraph ~/latte/Homo_sapiens/chromInfo_hg19.txt $pulldownBigwig
Need to change back to temp files, or do something else. Kind of annoying.
mint
needs small test data to run the pipeline variants from end to end. It is becoming cumbersome to test on the GSE52945
and GSE63743
when pipeline changes are made. The problem with small test data is that there isn't a lot to work with because the reduced number of reads don't translate to methylation rates or peaks, and consequently differential methylation and classifications.
Hi,
I have been able to install and run this version of mint on both my BioLinux laptop running Ubuntu 14.04 as well as an HPC running CentOS; in both cases using your test_hybrid_small demo dataset. Now I am trying to get all of the makefile modules to run on real live patient data from our article published in Nature Genetics in AUG-2020. It would potentially be quite useful if there was a simple way to create the "Table: complete annotations" that is shown in the GitHub documentation, the relevant text says, "All annotation sessions output a table of all genomic annotations intersecting the input regions in test_hybrid_small/summary/tables." Thus far I have not been able to find this table, just the _annotation_counts.txt tables and the _annotation_counts_by_category.txt tables.
I have searched through all of the mint code, as well as in the annotatr package code, and could not find how this is supposed to be generated. Is this particular feature still supported?
Thanks,
-- Marc
bismark_methylation_extractor
.methylSig
input fileannotatr
input fileTransition from the old bedtools
+ ggplot2
visualization code to pure annotatr
. In addition to having "less moving parts", the annotations in annotatr
are much better than what I threw together for the previous implementation.
There are lots of files created by the pipeline, some of which are 0-based, and some of which are 1-based. It is a good idea to spell out which are which. This can have implications when dealing with CpG-level data, as opposed to region-level.
Rather than use project_create_runs.R
to build the scripts, use make
. It deals with prerequisites and reports error reporting at the appropriate steps.
This will allow adapter trimming and quality trimming. Will need to change the way we convert from sample IDs to human IDs because trim_galore
doesn't have an output name parameter.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.