Coder Social home page Coder Social logo

dammet's People

Contributors

khanghoj avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

mikkelschubert

dammet's Issues

getSites' arguments-output-file name

Hi,

This is just a quick report that when running DamMet getSites the file [prefix].getSites.args is created as indicated by the log, but the file created is named [prefix].estDEAM.args instead.

DamMet v1.0.4

Thank you!
Best.

Deamination rate matrix for nonmethylated and methylated cytosines, input gargammel

Hi Kristian,

Me again 0:D I'm simulating aDNA sequencing data using gargammel and its new feature to add DNA methylation patterns that you and Gabriel Renaud added to it to test DamMet. All works fine, but my question today is about the deamination rates matrix for nonmethylated and methylated cytosines that one needs to feed gargammel (using the options --methyl -matfilenonmeth -matfilemeth).

How exactly do you build the tables for nonmethylated and for methylated cytosines? I know how this tables are supposed to look, as explained in gargammel's GitHub page:

A->C	A->G	A->T	C->A	C->G	C->T	G->A	G->C	G->T	T->A	T->C	T->G
0	1.853e-3 [1.726e-3..1.989e-3]	4.064e-3 [3.875e-3..4.263e-3]	3.269e-3 [3.099e-3..3.448e-3]	6.661e-3 [6.254e-3..7.094e-3] 3.057e-3 [2.785e-3..3.355e-3] 8.004e-2 [7.865e-2..8.145e-2] 1.236e-2 [    1.183e-2..1.292e-2] 4.131e-3 [3.828e-3..4.459e-3] 6.703e-3 [6.314e-3..7.116e-3] 3.845e-3 [3.624e-3..4.079e-3] 4.581e-3 [4.339e-3..4.836e-3] 2.169e-3 [2.005e-3..2.347e-3]
1	1.986e-3 [1.849e-3..2.134e-3]	4.273e-3 [4.070e-3..4.487e-3]	3.030e-3 [2.859e-3..3.211e-3]	5.357e-3 [5.001e-3..5.738e-3] 3.188e-3 [2.916e-3..3.485e-3] 1.427e-2 [1.369e-2..1.488e-2] 9.514e-3 [    9.075e-3..9.974e-3]	3.316e-3 [3.061e-3..3.593e-3] 5.061e-3 [4.743e-3..5.400e-3] 3.421e-3 [3.216e-3..3.639e-3] 4.865e-3 [4.620e-3..5.124e-3]	2.201e-3 [2.038e-3..2.377e-3]

And I know you can get this information from a PMD profiling tool as mapDamage. However, after having this information and in order to use the --methyl option in gargammel, I would have to duplicate this tables and have different values(rates) for the "C->T" (5p matrix) and "G->A" (3p matrix) columns; one for the nonmethylated cytosines (-matfilenonmeth) and another for the methylated cytosines (-matfilemeth). Am I right? If so, the only way (to me, so far) to obtain specific deamination rates for nonmethylated and methylated cytosines is using DamMet. Therefore, I would have to combine the information about all possible bases misincorporation rate from mapDamage and the "C->T" and "G->A" misincorporation rate, for nonmethylated and methylated positions, from DamMet to create the mentioned tables. Is this correct or I'm getting it all wrong? This is probably a stupid question and I'm missing something, but I just can't be sure without asking you about it.

Thanks a lot in advance!

All the best,
Katterinne

Bug: Segmentation fault (core dumped)

Dear Hanghoj,

This is to report a bug in DamMet that is preventing the successful use of the tool on real aDNA data.

Briefly, DamMet estDEAM stops when reading the reads and shows the error message: "Segmentation fault (core dumped)". This happens with different real aDNA sequencing data with an average read depth ranging from ~3.8X to ~50X, and using different computers (16 CPUs/32 GiB RAM and 32CPUs/256 GiB RAM) and OS (Linux Ubuntu, and MacOS). Interestingly, DamMet works fine for some samples and fails for others.

If you're interested in the details, let me walk you through it...

I have run DamMet on 16 real aDNA sequencing data (input: BAM file of the alignment of the aDNA sequences against the human reference genome) with an average read depth ranging from ~1.5X to ~50X. For 5 of these samples, DamMet worked just fine (both steps: estDEAM and estF). These 5 samples have an average read depth ranging from ~1.5X to ~10.6X. On the other hand, for the other 11 samples with an average read depth ranging from ~3.8X to ~50X, DamMet failed. DamMet estDEAM stops when reading the reads and shows the error message: "Segmentation fault (core dumped)". I tried to run DamMet again with these samples in different computers with the following specifications: 16 CPUs/32 GiB RAM and 32CPUs/256 GiB RAM, Linux (Ubuntu) and MacOS. Same thing happened every time. The example command line I was using:

DamMet estDEAM -b sample.bam -r human_ref_genome.fasta -cf chromosomes.list -O sample -R sample_RGs.list

Notes:

  • I was using -cf with a list of all chromosomes chromosomes.list(1 to 22, X, Y, and MT).
  • I tried with and without providing a read group list (-R). It didn't make a difference.

Additional information I have from experimenting with the tool, trying to understand why it wouldn't work...

Additionally, even knowing that DamMet runs on one chromosome at a time, I tried again using the -c option to run one chromosome at a time. By doing this I could run DamMet with most of the chromosomes of the samples that previously failed, but for several of the chromosomes, again, DamMet will stop when reading the reads (either giving a read group list or not) and shows the error message: "Segmentation fault (core dumped)". The number and identity of the chromosomes that failed would depend on each sample. The example command line I was using:

DamMet estDEAM -b sample.bam -r human_ref_genome.fasta -c 1 -O sample_chr1 -R sample_RGs.list

Next, I tried again, this time using the -L option to filter the reads by length for those chromosomes that failed. I found that, by trying increasing read length values on each chromosome, eventually DamMet will run successfully. The example command line I was using:

DamMet estDEAM -b sample.bam -r human_ref_genome.fasta -c 6 -L 80 -O sample_chr6 -R sample_RGs.list

Notes:

  • I used the -L option starting from 35 with an increase of 5 (the minimum read length in all my samples is 30).

Finally, when I wanted to run DamMet's second step on the same chromosomes I had to run independently and with a read length filter for the first step, it was necessary to do the same for estF to work.

Example data for reproducing error...

Using this Dropbox link you can download a zip file with the necessary files to reproduce the error:

The files you will find:
Alignment file and its index: SF12_DS_05.bam, SF12_DS_05.bam.bai
(This sample has an average read depth of ~5X)
Human reference fasta file and index: hs37d5.fa, hs37d5.fa.fai
Read group list file: SF12DS05_RGs.list

When running DamMet one chromosome at a time using the -c option, the chromosome that fails is: 6.

DamMet estDEAM -b SF12_DS_05.bam -r hs37d5.fa -c 6 -O SF12DS05_chr6 -R SF12DS05_RGs.list

Note: It works when using -L 80.


I don't want to have to reduce the total number of reads in order to run DamMet, for some samples and chromosomes I had to use high read length filters for DamMet to work. This is wasting valuable information. Besides, I didn't find a correlation between BAM file size or depth and the bug. Either regarding computational resources.

I hope something can be done about this, I'm looking forward to use DamMet to infer methylation in my ancient genomes, which I need for further analysis. Please, let me know if you need more details or if I can help somehow.

Thank you very much in advance. And thanks for your work in epiPALEOMIX, gargammel, and DamMet!

Regards,
Katterinne

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.