Coder Social home page Coder Social logo

fengcong3 / happe Goto Github PK

View Code? Open in Web Editor NEW
17.0 2.0 0.0 12.62 MB

A tool to visualize the haplotype pattern and various information in excel.

License: MIT License

Python 99.22% Shell 0.78%
haplotype population-genetics population-genomics read-depth tree

happe's Introduction

HAPPE

A tool to visualize the haplotype pattern and various information in excel. Please cite this paper when using HAPPE for your publications

Cong Feng, Xingwei Wang, Shishi Wu, Weidong Ning, Bo Song, Jianbin Yan, and Shifeng Cheng. 2022. “HAPPE: A Tool for Population Haplotype Analysis and Visualization in Editable Excel Tables.” Frontiers in Plant Science 13 (July): 927407. https://doi.org/10.3389/fpls.2022.927407.

avatar

Installing HAPPE

There easiest way to install HAPPE is to use pip3.

pip3 install HAPPE

or you can clone the project to your local directory and installing it with:

python3 setup.py install --record log.txt
#if u want to uninstall the package:
#cat log.txt | xargs rm -rf

then you should have the HAPPE command available.

$ HAPPE -h

usage: HAPPE [-h] -g CONFIG -v GZVCF [-k KEEP] [-r REGION]
                          [-s SNPLIST] -i INF -c COLOR [-I SNPINF] [-R REF]
                          [-F FUNCANN] [-f | -x | -n] [-D DEPTH] [-d DEPTHBIN]
                          -o OUTPUT

show haplotype patterns in excel file./[email protected]

optional arguments:
  -h, --help            show this help message and exit
  -g CONFIG, --config CONFIG
                        config file.[required]
  -v GZVCF, --gzvcf GZVCF
                        gzvcf, bcftools indexed.use to annotation and get
                        ref/alt basepair.[required]
  -k KEEP, --keep KEEP  keep sample, if u wana plot a subset of
                        --gzvcf.[optional]
  -r REGION, --region REGION
                        if u wana plot a subset of --gzvcf, u can use this
                        option. if u use this option , ucant use -s
                        option[optional]
  -s SNPLIST, --snplist SNPLIST
                        snp id list(format:chr_pos). if u use this option , u
                        cant use -r option.[optional]
  -w TREEWIDTH, --treewidth TREEWIDTH
                        How many columns do you want to occupy for this tree
                        topology.(default=1000)[optional]
  -i INF, --inf INF     the information of each sample.[required]
  -c COLOR, --color COLOR
                        the color of each sample.[required]
  -I SNPINF, --snpinf SNPINF
                        more information about SNP.[optional]
  -R REF, --Ref REF     change Reference and color system.[optional]
  -F FUNCANN, --FuncAnn FUNCANN
                        functional annotation file.[optional]
  -f, --functional      only functional SNP
  -x, --coding          only coding region SNP
  -n, --noncoding       only noncoding region SNP
  -D DEPTH, --Depth DEPTH
                        depth dir for each sample.[optional]
  -d DEPTHBIN, --Depthbin DEPTHBIN
                        Depth bin size.[optional,default:50]
  -o OUTPUT, --output OUTPUT
                        output prefix

Preparing config file

[software]
bgzip=
bcftools=
tabix=

Preparing the vcf file

  1. The SNP/INDEL ID must be in the format :Chromosome_position.
  2. Only bi-allelic remains in vcf file.
  3. Compress vcf to vcf.gz using bgzip
  4. Use bcftools index to create an index for the vcf.gz file.

Preparing the depth information

if you want to integrate the depth information, you need to prepare the depth file as follows:

  1. Create a directory for each sample with the name of the sample.
  2. using mosdepth to calc the depth of each position for each sample.
#example
mosdepth -f ref.fa -Q 0 sample1/sample1.Q0  path/to/sample1.bam

Usage

"-g  CONFIG", required parameter, give the paths to bcftools, bgzip and tabix in the CONFIG file. 

"-v GZVCF", required parameter, input vcf file.

"-k SAMPLELIST", required parameter, list of samples to be retained, one sample per line.

"-r REGION", optional parameter, the genomic region to be displayed, format: chromosome: start-end.

"-s VARIANTLIST", optional parameter, the list of variant IDs you need to keep, using this parameter you cannot use the -r parameter.

"-w TREEWIDTH", optional parameter, the width of the tree topology.

"-i INFORMATION", optional parameter, additional sample information, the first column must be the sample ID.

"-c COLOR", optional parameter, Specify the color of each sample, the first column is the sample id and the second column is the color hex code.

"-I VARINFORMATION", optional parameter, Additional variant annotation information, such as GWAS p-value. the first colum is the variant id and each column is the annotation information with header.

"-f", optional parameter, Only the variant that changes the amino acid is retained.( Requires that the input vcf file has been annotated with SnpEff.)

"-x", optional parameter, Only the variant in the CDS region is retained.( Requires that the input vcf file has been annotated with SnpEff.)

"-n", optional parameter, Only the variant in the non-coding region is retained.( Requires that the input vcf file has been annotated with SnpEff.)

"-D DIRECTORY", optional parameter, This directory contains the depth information for each sample calculated using mosdepth, one directory per sample.

"-d WINDOWSIZE", optional parameter, window size for calculate normalized depth.

"-o PREFIX", required parameter, output prefix.

example

The example data covered in the publication is in the example/ folder.

HAPPE \
-g config.ini \
-v ./data/00.annotated_vcf/SEVIR.592.SNP.ann.allele2.part.vcf.gz \
-r 5:6847970-6850236 \
-w 100 \
-d 20 \
-i ./data/02.sample_information//sample_inf.tsv \
-c ./data/02.sample_information//sample.color \
-D ./data/01.depth \
-k ./data/02.sample_information//small_example.list \
-o SEVIR_5G085400v2


## each file of the prameter
## -g config.ini
# [software]
# bgzip=path_to/bgzip
# bcftools=path_to/bcftools
# tabix=path_to/tabix

## -i ./data/02.sample_information//sample_inf.tsv
## Just make sure the first column is the sample name.
# Sample_ID	... ...
# sample1   ... ...

## -c ./data/02.sample_information//sample.color
## Just make sure the first column is the sample name and the second column is color code.
# Sample_ID	color
# sample1	FF0000
# ...       ...

## -F FunctionalAnnotation_v1__HCgenes_v1.0.TAB
## just make sure the first column is the gene name , and the forth column is the functional annotation.
# Gene_name	XXX XXX function ... ...
# gene1     XXX XXX func1    ... ...

## -D ./data/01.depth
##Make sure that the files *mosdepth.summary.txt and *per-base.bed.gz are in the directory for each sample in this directory.

happe's People

Contributors

fengcong3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

happe's Issues

error 256 no such file or directory

HAPPE keeps complaining "no such file or directory" but weirdly it is complaining about how it cannot find the OUTPUT file that would be created when I provide the --output argument

Here's the error using your example files. I'm on a Mac running 13.5.2 and using zsh as my shell (although I tried in bash too).

% HAPPE \
-g happe.config \
-v SEVIR.592.SNP.ann.allele2.part.vcf.gz \
-i sample_inf.tsv \
-c sample.color.txt \
-k small_example.list \
-o SEVIR_5G085400v2

[2023-10-02 16:31:25,816] - HAPPE - INFO - deal args.
[2023-10-02 16:31:25,818] - HAPPE - INFO - Filter samples and variants.
ln: SEVIR_5G085400v2.vcf.gz: No such file or directory
[2023-10-02 16:31:25,842] - HAPPE - ERROR - Filter samples and variants. -- retrun code:256

Issue during the execution of pheat2excel_V5.py

Hello,

I tried to run the script on test data and I get an error linked to the execution of pheat2excel_V5.py.
You can find the error log down below.
I would also like to know if it is possible to have access to your publication data to be sure that any problem is not data related.

Best regards,
Abdel

****Traceback (most recent call last):  File "/home/s953680/apps/python/current/lib/python3.8/site-packages/openpyxl-3.2.0b1-py3.8.egg/openpyxl/utils/cell.py", line 110, in get_column_letter    return _STRING_COL_CACHE[idx]
KeyError: 18279

During handling of the above exception, another exception occurred:

Traceback (most recent call last):  File "/home/s953680/apps/python/3.8/lib/python3.8/site-packages/HAPPE-0.1.4-py3.8.egg/HAPPE/pheat2excel_V5.py", line 225, in     col_let = get_column_letter(start_col+index+2)  File "/home/s953680/apps/python/current/lib/python3.8/site-packages/openpyxl-3.2.0b1-py3.8.egg/openpyxl/utils/cell.py", line 112, in get_column_letter    raise ValueError("Invalid column index {0}".format(idx))
ValueError: Invalid column index 18279
[2022-09-06 16:29:36,947] - HAPPE - ERROR - add heat matrix -- retrun code:256
like 1****

excel_haplotype.py file missing

Hello,
I am trying to install and use your program but I cannot find the file excel_haplotype.py.
It is referenced in the init_.py script and I get an error when trying to install it.
Then when I do replace from .excel_haplotype import main_1, the program stops at a clustering step.
Would you please help me with that ?
Best regards,
Abdel

[ERROR] HAPPE - ERROR - write sample information to excel file

image
Dear HAPPE Team:

I got the error as below, May you give me some suggestion? Thank you so much

(Python3_6) HAPPE -v 2023NKSC_F1reseq_allele2_maf0.05_DP20_missing0.9.recode.vcf.gz -g config.ini -c NKSCsample.color -i NKSCsample_inf.tsv -o ./HAPPE
[2023-03-06 21:11:02,529] - HAPPE - INFO - deal args.
[2023-03-06 21:11:02,529] - HAPPE - INFO - Filter samples and variants.
[2023-03-06 21:11:02,531] - HAPPE - INFO - convert file format.
[2023-03-06 21:11:04,243] - HAPPE - INFO - hierarchy clustering.
cluster min and max: 1 29
[2023-03-06 21:11:06,848] - HAPPE - INFO - read snp information file.
[2023-03-06 21:11:06,848] - HAPPE - INFO - calc out the snp_info_row_num = 7.
[2023-03-06 21:11:06,848] - HAPPE - INFO - plot tree to excel file.
[2023-03-06 21:11:09,489] - HAPPE - INFO - get sample information start col and get sample order.
[2023-03-06 21:11:09,496] - HAPPE - INFO - write sample information to excel file.
Traceback (most recent call last):
File "/home/hsiang/miniconda3/envs/Python3_6/lib/python3.6/site-packages/HAPPE/sampleinf2excel_V2.py", line 97, in
color_fill = PatternFill(fgColor=color_d[sample], fill_type="solid")
KeyError: '6_9'
[2023-03-06 21:11:10,120] - HAPPE - ERROR - write sample information to excel file -- retrun code:256

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.