Coder Social home page Coder Social logo

genid / yleaf Goto Github PK

View Code? Open in Web Editor NEW
23.0 8.0 10.0 42.21 MB

Yleaf software for human Y-chromosomal haplogroup inference from next generation sequencing data

License: GNU General Public License v3.0

Python 100.00%
y-chromosome haplogroup prediction-algorithm next-generation-sequencing python

yleaf's People

Contributors

6bass6 avatar bramvanwersch avatar cascadingstyletrees avatar dionzand avatar dmontielg avatar stikus avatar teepean avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yleaf's Issues

Updating position files

Hi,

I have been updating hg19 position files and came across a problem I am not sure how to deal with. For example Z14 is represented in ISOGG as follows and Yleaf cannot parse that line.

Z14 R1b1a1b1a1a1b1 17364720..17364737 15252840..15252857 CAGATAGATAGATAGATA->CAGATAGATAGATA

Thanks!

What if input paired-end reads fastq file

Hi, my fastq files are generated by paired-end sequencing, so that I have two fq files for each sample (e.g. sample.1.fq.gz and sample.2.fq.gz). Does Yleaf support two fq files input like -fastq sample.1.fastq -fastq sample.2.fastq?

Best wishes
Xb

Add T2T positions?

Hello!

Would it be possible to add T2T to Yleaf positions and support for T2T-CHM13v2.0 reference?

Thanks!

2.3 release

Hello!

Thank you for the tool! Do you plan to release the 2.3 version?

How to choose the position files?

Hi genid, I found there are three versions of 'Position File' (MCS_Ampliseq/Visage_Ampliseq/WGS). What are the differences between them? How to choose one? I use the GRCh37 (100 genomes) as the reference and I don't know which one I can use. Thanks!

Add Yleaf to PyPI/bioconda

Are there plans to add Yleaf to PyPI and/or bioconda? (I did not find it on either. Sorry if it is already there.)

The existing conda environment file is already great for standalone running, but including Yleaf into PyPI (and from there bioconda) would get you a docker/singularity container for each release without any extra effort, via biocontainers. That in turn would make it possible to include Yleaf in reproducible workflows and pipelines.

Few queries

Hi. Thanks for the nice software. I have 2 queries.

  1. Is there a way to ignore C->T and G->A SNPs in the tool's prediction? That will be invaluable for ancient DNA.
  2. Is there a way to run this on multiple bam files in one command?

Thank you.

Whether the software can be used for other species

Dear developer:

The species we are engaged in is pig, a very good medical model animal.

But we have noticed that this software is designed for humans, I would like to ask whether this software can be used in pigs at this stage?

Best

Dong

Error indicated when running Yleaf installed with conda on CentOs

After a conda install of Yleaf on Centos 7, I receive the following error message. This does not prevent the run but it indicates a problem:
Error processing line 1 of /home/grange/miniconda2/envs/yleaf/lib/python3.7/site-packages/distutils-precedence.pth:

Traceback (most recent call last):
File "/home/grange/miniconda2/envs/yleaf/lib/python3.7/site.py", line 168, in addpackage
exec(line)
File "", line 1, in
ModuleNotFoundError: No module named '_distutils_hack'

Any suggestions of script modification or of why this module is not installed?

Thanks

Thierry

Difference in results between Yleaf v3.0.1 and v3.1 (master)

Hello, after migration from Yleaf v3.0.1 to latest version (not using v3.1 due to #15 and #16 in v3.1 release) we've got some differences:

Yleaf_version Sample_name Hg Hg_marker Total_reads Valid_markers QC-score QC-1 QC-2 QC-3
latest test N1~ N-CTS11499/etc*(xCTS10760,Z4963,B195,Y13851,F859,PF967.2,Y16325,M2118,Y9025,B187,Y24348,FGC10788,F1228,CTS5397) 4124 64768 1.0 1.0 1.0 1.0
v3.0.1 test N1a1a1a1a2a1a1~ N-Z1926*(xCTS1737,Y21699) 5549131 64771 1.0 1.0 1.0 1.0

You can see two major differences - Hg (and Hg_marker) and Total_reads (and Valid_markers btw).

  • First difference come from sorting order changed in bc03016:

image

Is this intended, that instead of N1a1a1a1a2a1a1~ we're getting N1~?


  • Second difference come from code refactoring:

Total_reads and Valid_markers come from 2nd and last positions of log:
https://github.com/genid/Yleaf/blob/master/yleaf/old_predict_haplogroup.py#L255-L287

def process_log(log_file):
    log_file += "info"
    total_reads = "NA"
    valid_markers = "NA"

    try:
        df_log = pd.read_csv(log_file, sep=":", header=None)
        log_array = df_log[1].values
        total_reads = log_array[1]
        valid_markers = log_array[-1]
    except FileNotFoundError:
        print("Warning: log file not found!")
    return total_reads, valid_markers


def main():
    print("\tY-Haplogroup Prediction")

    args = get_arguments()

    path_samples = args.Input  # .out files are collected
    samples = check_if_folder(path_samples, '.out')
    out_file = args.Outputfile
    hg_intermediate = str(yleaf_constants.DATA_FOLDER / yleaf_constants.HG_PREDICTION_FOLDER)
    intermediate_tree_table = hg_intermediate + "/Intermediates.txt"
    h_flag = True
    log_output = []
    for sample_name in samples:
        putative_hg = "NA"
        out_name = str(sample_name.split("/")[-1])
        out_name = out_name.split(".")[0]

        total_reads, valid_markers = process_log(sample_name[:-3])

In v3.0.1 it was correct:

But now it is not correct:

Log for our data:

Total of mapped reads: 5549131
Total of unmapped reads: 4124
Valid markers: 64797
Markers with zero reads: 0
Markers below the read threshold {1}: 0
Markers below the base majority threshold {90}: 28
Markers with discordant genotype: 1
Markers without haplogroup information: 29
Markers with haplogroup information: 64768

Now we need first and third line, not second and last.

Article link in the readme is broken

Hello!

The link in the readme points to: https://academic.oup.com/mbe/article/35/7/1820/4993044, which is:

This is a correction to:
Molecular Biology and Evolution, Volume 35, Issue 5, May 2018, Pages 1291โ€“1294, https://doi.org/10.1093/molbev/msy032
This article published with a comment intended only to the editors of the journal. The comment has been removed. The author regrets the error.

So the correct link is the original one?
https://academic.oup.com/mbe/article/35/5/1291/4922696

hg19

Hi,
is hg19 the one from UCSC or the equivalent of GRCh37, b37, h37 etc?
I have my samples mapped against hs37d5 (1000 Genomes project phase II). Is it proper for use with Yleaf?

Also when I am running yleaf 3.1 (conda installation as proposed in the manual) I get the following when the program starts

_Error processing line 1 of /home/psonisns/miniconda3/envs/yleaf/lib/python3.7/site-packages/distutils-precedence.pth:

Traceback (most recent call last):
File "/home/psonisns/miniconda3/envs/yleaf/lib/python3.7/site.py", line 168, in addpackage
exec(line)
File "", line 1, in
ModuleNotFoundError: No module named '_distutils_hack'

Remainder of file ignored_

But then it continues with the analysis without any issues (I think) ...

Old Prediction Option not using the correct files

I created #15 as the first of a few changes that are needed to fix the old prediction option.

The next part should be to fix the intermediates.txt file look up since currently it's missing a /, so you'd see an error that the file wasn't found at .../data/hg_prediction_tablesIntermediates.txt

One option I found was to change Intermediates.txt to /Intermediates.txt at

intermediate_tree_table = hg_intermediate + "Intermediates.txt"

This however hasn't yet fixed all the issues that I'm having with old predictions.

Yleaf.py: error: argument -r/--Reads_thresh: invalid int value: 'ef'

Hi, do you know what might be the problem?

##################
mikael@mikael-HP-Z600-Workstation[Yleaf] python Yleaf.py -bam /media/hd01/data/genome_mikael/cleanreads/md.chrY.bam -ref hg38 -pos /usr/local/bioinf/Yleaf/hg38.txt -out /media/hd01/data/genome_mikael/cleanreads/ydna_out -r 1 -q 20 -b 90 -t 1
Erasmus MC Department of Genetic Identification

Yleaf: software tool for human Y-chromosomal 
phylogenetic analysis and haplogroup inference v2.1



       |
      /|\          
     /\|/\    
    \\\|///   
     \\|//  
      |||   
      |||    
      |||    

usage: Yleaf.py [-h] [-fastq PATH] [-bam PATH] [-f PATH] -pos PATH -out STRING
[-r READS_THRESH] -q QUALITY_THRESH -b BASE_MAJORITY
[-t THREADS]
Yleaf.py: error: argument -r/--Reads_thresh: invalid int value: 'ef'

usage: Yleaf.py [-h] [-fastq PATH] [-bam PATH] [-f PATH] -pos PATH -out STRING
[-r READS_THRESH] -q QUALITY_THRESH -b BASE_MAJORITY
[-t THREADS]
Yleaf.py: error: argument -r/--Reads_thresh: invalid int value: 'ef'
##############

output

hi, I want to infer Y-haplogroups of 100 samples using their bam files merged as one, so when I run this tool, (using the command: Yleaf -bam input.bam -o output -rg hg19), in the output folder, the .out file contains haplogroup per position which is confusing, I don't know where to get haplogroups per sample, can you please help me? thanks

No such file or directory: 'Hg_Prediction_tables/tree.json'

Hi I had installed and using a previous version.

I updated to 3.0 and now I get the following error:
[mpileup] 1 samples in 1 input files
--- 0.20 seconds in run PileUp ---
Extracting haplogroups...
Traceback (most recent call last):
File "/home/psonis/software/Yleaf/Yleaf.py", line 587, in
main()
File "/home/psonis/software/Yleaf/Yleaf.py", line 561, in main
output_file = samtools(args.threads, folder, folder_name, bam_file, args.Quality_thresh,
File "/home/psonis/software/Yleaf/Yleaf.py", line 454, in samtools
extract_haplogroups(markerfile, args.Reads_thresh, args.Base_majority,
File "/home/psonis/software/Yleaf/Yleaf.py", line 413, in extract_haplogroups
tree = Tree("Hg_Prediction_tables/tree.json")
File "/home/psonis/software/Yleaf/tree.py", line 33, in init
self._construct_tree(file)
File "/home/psonis/software/Yleaf/tree.py", line 41, in _construct_tree
with open(file) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'Hg_Prediction_tables/tree.json'

Where is this file?

Also, I updated to 3.01 and everything is different. Not sure which is the executable, which are the proper files to use...
when testing python yleaf/Yleaf.py I get
Traceback (most recent call last):
File "/home/psonis/software/Yleaf/yleaf/Yleaf.py", line 30, in
from yleaf import version
ModuleNotFoundError: No module named 'yleaf'

No haplogroup result if I use the -r1 option

I'm using Yleaf with the option -r 2 and -r 1 in order to make some comparisons between the two outputs but I get stucked because of a strange result has occured: the haplogorup prediction for the -r 1 doesn't show any result, even if the -r 2 one has shown the haplogorup R1b1a1b1a as well.
I didnt't expect a result like this because the -r1 option is less stringent and so it may creates a finer result then the -r2 option.
Why's this happening? What's gone wrong?
I copy the lines I used for both the -r2 analysis' steps below:

Yleafv2.2/Yleaf.py -bam C-58_picard.bam -pos /Yleafv2.2/WGS_hg19_noChr.txt -out mysample_Y_r2_DNA_Yleaf -r 2 -q 20 -b 90 -t 6

/Yleafv2.2/predict_haplogroup.py -input mysample_Y_r2_DNA_Yleaf -out 58_r2_y.hg 2> hg.err

the log file contains this information: Set max per-file depth to 8000

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.