Coder Social home page Coder Social logo

sartorlab / mint Goto Github PK

View Code? Open in Web Editor NEW
5.0 7.0 4.0 13.28 MB

A pipeline for the integration of DNA methylation and hydroxymethylation data

R 88.44% Shell 9.80% ActionScript 0.55% Awk 0.78% Makefile 0.43%
dna-methylation dna-hydroxymethylation genome-browser

mint's People

Contributors

rcavalcante avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mint's Issues

paired read

Mint seems very awesome. May I know how to use it for pair-end read WGBS data?

Thanks a lot
Z

Data sample type

Hi:

Our lab currently got back some whole-genome bisulfite sequencing data and I've been testing with the pipeline. Our run was paired-end 150 bp. I've noticed in your test data set, only 1 fastq file per sample. Are those single end only or I can only use read1 for the pipeline?

When I configure the run with init.R, I've noticed that if I have sample_1.fastq.gz, and sample_2.fastq.gz in my data folder, the soft link it generated in the project/data folder still have the _1 and _2 in there, but when I run the pipeline with make bisulfite align, the sample_mc_hmc_bisulfite.fastq.gz file in the project/bisulfite/raw-fastq folder loses _1 and _2 and seems to have trouble link to the data/raw-fastq folder.

Is there anything I can do to config the run to fit PE reads data?

Thank you!

Fix bigWig creation of pulldown coverage

There is currently something wrong with the following:

bedtools genomecov -bg -ibam $bowtie2Bam -g ~/latte/Homo_sapiens/chromInfo_hg19.txt \
| sort-bed --max-mem 16G --tmpdir $PWD > $pulldownBedgraph
bedGraphToBigWig $pulldownBedgraph ~/latte/Homo_sapiens/chromInfo_hg19.txt $pulldownBigwig

Develop a test suite

mint needs small test data to run the pipeline variants from end to end. It is becoming cumbersome to test on the GSE52945 and GSE63743 when pipeline changes are made. The problem with small test data is that there isn't a lot to work with because the reduced number of reads don't translate to methylation rates or peaks, and consequently differential methylation and classifications.

How can I obtain the table of complete annotations described in the README.md file?

Hi,
I have been able to install and run this version of mint on both my BioLinux laptop running Ubuntu 14.04 as well as an HPC running CentOS; in both cases using your test_hybrid_small demo dataset. Now I am trying to get all of the makefile modules to run on real live patient data from our article published in Nature Genetics in AUG-2020. It would potentially be quite useful if there was a simple way to create the "Table: complete annotations" that is shown in the GitHub documentation, the relevant text says, "All annotation sessions output a table of all genomic annotations intersecting the input regions in test_hybrid_small/summary/tables." Thus far I have not been able to find this table, just the _annotation_counts.txt tables and the _annotation_counts_by_category.txt tables.
I have searched through all of the mint code, as well as in the annotatr package code, and could not find how this is supposed to be generated. Is this particular feature still supported?
Thanks,
-- Marc

Refactor visualization code to use annotatr

Transition from the old bedtools + ggplot2 visualization code to pure annotatr. In addition to having "less moving parts", the annotations in annotatr are much better than what I threw together for the previous implementation.

Clarify 0-based and 1-based files

There are lots of files created by the pipeline, some of which are 0-based, and some of which are 1-based. It is a good idea to spell out which are which. This can have implications when dealing with CpG-level data, as opposed to region-level.

Convert to make

Rather than use project_create_runs.R to build the scripts, use make. It deals with prerequisites and reports error reporting at the appropriate steps.

Switch from cutadapt to trim_galore

This will allow adapter trimming and quality trimming. Will need to change the way we convert from sample IDs to human IDs because trim_galore doesn't have an output name parameter.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.