sartorlab / mint Goto Github PK

View Code? Open in Web Editor NEW

5.0 7.0 4.0 13.28 MB

A pipeline for the integration of DNA methylation and hydroxymethylation data

R 88.44% Shell 9.80% ActionScript 0.55% Awk 0.78% Makefile 0.43%

dna-methylation dna-hydroxymethylation genome-browser

mint's People

Contributors

Stargazers

Watchers

Forkers

akhileshkaushal haoziyeung bioshare mdperry

mint's Issues

bedtools complement reporting chromosome size file has no contents

This is a rather annoying problem, where the chromosome size file directly downloaded from UCSC works in every other case for bedtools subroutines, but has some problem with bedtools complement. Googling for the problem doesn't return anything useful.

paired read

Mint seems very awesome. May I know how to use it for pair-end read WGBS data?

Thanks a lot
Z

Data sample type

Hi:

Our lab currently got back some whole-genome bisulfite sequencing data and I've been testing with the pipeline. Our run was paired-end 150 bp. I've noticed in your test data set, only 1 fastq file per sample. Are those single end only or I can only use read1 for the pipeline?

When I configure the run with init.R, I've noticed that if I have sample_1.fastq.gz, and sample_2.fastq.gz in my data folder, the soft link it generated in the project/data folder still have the _1 and _2 in there, but when I run the pipeline with make bisulfite align, the sample_mc_hmc_bisulfite.fastq.gz file in the project/bisulfite/raw-fastq folder loses _1 and _2 and seems to have trouble link to the data/raw-fastq folder.

Is there anything I can do to config the run to fit PE reads data?

Thank you!

Make argument for cutoff parameter in methylation extractor

Currently it's set to 5, but this should be user-defined.

Samples belonging to >2 comparison groups are mishandled

In the bisulfite and pulldown comparison preparation, the code is implemented to handle a sample belonging to at most groups.

Migrate to using bowtie2 with bismark

The default aligner in bismark v0.14.x is now bowtie2, so we're going to migrate to that as the default.

Fix bigWig creation of pulldown coverage

There is currently something wrong with the following:

bedtools genomecov -bg -ibam $bowtie2Bam -g ~/latte/Homo_sapiens/chromInfo_hg19.txt \
| sort-bed --max-mem 16G --tmpdir $PWD > $pulldownBedgraph
bedGraphToBigWig $pulldownBedgraph ~/latte/Homo_sapiens/chromInfo_hg19.txt $pulldownBigwig

UCSC compression tools do not work with subcommands

Need to change back to temp files, or do something else. Kind of annoying.

Develop a test suite

mint needs small test data to run the pipeline variants from end to end. It is becoming cumbersome to test on the GSE52945 and GSE63743 when pipeline changes are made. The problem with small test data is that there isn't a lot to work with because the reduced number of reads don't translate to methylation rates or peaks, and consequently differential methylation and classifications.

How can I obtain the table of complete annotations described in the README.md file?

Hi,
I have been able to install and run this version of mint on both my BioLinux laptop running Ubuntu 14.04 as well as an HPC running CentOS; in both cases using your test_hybrid_small demo dataset. Now I am trying to get all of the makefile modules to run on real live patient data from our article published in Nature Genetics in AUG-2020. It would potentially be quite useful if there was a simple way to create the "Table: complete annotations" that is shown in the GitHub documentation, the relevant text says, "All annotation sessions output a table of all genomic annotations intersecting the input regions in test_hybrid_small/summary/tables." Thus far I have not been able to find this table, just the _annotation_counts.txt tables and the _annotation_counts_by_category.txt tables.
I have searched through all of the mint code, as well as in the annotatr package code, and could not find how this is supposed to be generated. Is this particular feature still supported?
Thanks,
-- Marc

Compress output files where possible

CpG report output from bismark_methylation_extractor.
methylSig input file
annotatr input file

quality cutoff
adapter sequence (or shortcut)
stringency
error
length
RRBS data or not

Switch from cutadapt to trim_galore

This will allow adapter trimming and quality trimming. Will need to change the way we convert from sample IDs to human IDs because trim_galore doesn't have an output name parameter.