phyne는 phylogenetic analysis를 위한 파이프라인으로 다양한 인풋을 지원합니다.
- server : wolf
- language : python 2.7
- MAFFT v7.123b
- MUSCLE v3.8.31
- Gblocks 0.91b
- FastTree 2.1.11
- R package (RColorBrewer, ggplot2, phylogram)
$python bin/phyne.py --help
usage: phyne.py [-h] [--mode {mlst,nssnp,ortholog}] [--config CONFIG]
[--outdir OUTDIR] [--prefix PREFIX]
optional arguments:
-h, --help show this help message and exit
--mode {mlst,nssnp,ortholog}
--config CONFIG
--outdir OUTDIR
--prefix PREFIX
$python bin/phyne.py --mode mlst --config [mlst.conf] --outdir [result] --prefix [testset]
-
Input : Bacterial genome sequences (FASTA format)
-
Output
- {outdir}/{prefix}.mlst_profile.xls : MLST profile (a tab-separated line)
GCA_000439795.1 aphagocytophilum 64 pheS(42) glyA(32) fumC(28) mdh(18) sucA(40) dnaN(28) atpA(26)
GCA_000013125.1 aphagocytophilum 161 pheS(42) glyA(32) fumC(28) mdh(18) sucA(82) dnaN(28) atpA(26)
GCA_000439775.1 aphagocytophilum 64 pheS(42) glyA(32) fumC(28) mdh(18) sucA(40) dnaN(28) atpA(26)
GCA_000689655.1 aphagocytophilum 215 pheS(103) glyA(77) fumC(28) mdh(4) sucA(94) dnaN(28) atpA(60)
GCA_000689635.2 aphagocytophilum 82 pheS(3) glyA(33) fumC(29) mdh(3) sucA(2) dnaN(2) atpA(4)
GCA_000964685.1 aphagocytophilum 64 pheS(42) glyA(32) fumC(28) mdh(18) sucA(40) dnaN(28) atpA(26)
GCA_000689615.1 aphagocytophilum 217 pheS(104) glyA(78) fumC(70) mdh(53) sucA(95) dnaN(77) atpA(1)
GCA_000478425.1 aphagocytophilum 64 pheS(42) glyA(32) fumC(28) mdh(18) sucA(40) dnaN(28) atpA(26)
GCA_000968455.1 aphagocytophilum - pheS(42) glyA(32) fumC(28) mdh(18?) sucA(40) dnaN(28) atpA(26)
- col 1 : the genome name
- col 2 : the matching PubMLST scheme name
- col 3 : the ST (sequence type)
- col 4 ~ : the allele IDs
- {outdir}/{prefix}.mlst_sequence.fa : Allele sequences are arranged in order
- {outdir}/{prefix}.mlst_sequence.fa.order : THE order
- {outdir}/{prefix}.mlst_align.fa : multiple sequence alignment (fasta)
- {outdir}/{prefix}.mlst_align.phylip : multiple sequence alignment (phylip)
- {outdir}/{prefix}.mlst_align.newick : tree (newick)
- {outdir}/{prefix}.mlst_align.dist : Sequence distance metrics
- {outdir}/{prefix}.PhylogeneticTree.png : very simple phylogenetic tree figure
- {outdir}/{prefix}.PCA.png : PCA plot
- Without target scheme
$ cat bin/phyne_mlst.conf
{
"target_scheme" : "",
"input_genome" : {
"mygenome" : "denovogenome.fa",
"relativegenome1" : "Salmonella_enterica_subsp_enterica_serovar_Typhimurium_DT104_v1.fa",
"relativegenome2" : "Salmonella_enterica_subsp_enterica_serovar_Typhi_str_CT18_v1.fa",
"relativegenome3" : "Salmonella_enterica_subsp_enterica_serovar_Weltevreden_str_10259_v0.2.fa"
},
"phyne_mlst_exe" : "/BiO/BioPeople/siyoo/phyne/bin/phyne_mlst.py"
}
- With target scheme Scheme list
$ cat bin/phyne_mlst.conf
{
"target_scheme" : "aphagocytophilum",
"input_genome" : {
"genome1" : "GCA_000013125.1_ASM1312v1_genomic.fna",
"genome2" : "GCA_000439775.1_ASM43977v1_genomic.fna",
"genome3" : "GCA_000439795.1_ASM43979v1_genomic.fna",
"genome4" : "GCA_000478425.1_ASM47842v1_genomic.fna",
"genome5" : "GCA_000689655.1_MRK1.0_genomic.fna"
},
"phyne_mlst_exe" : "/BiO/BioPeople/siyoo/phyne/bin/phyne_mlst.py"
}
multisample vcf ==> Defind and select Non-Synoymous SNPs ==> Phylogenetic analysis
$python bin/phyne.py --mode nssnp --config [mlst.conf] --outdir [result] --prefix [testset]
- Input file : Multisample variant calling data (format : vcf)
- Output file
- {outdir}/{prefix}.nssnp.fasta : Nonsynoymous SNP concatenated data (format : fasta)
- {outdir}/{prefix}.nssnp.newick : tree (format : newick)
- {outdir}/distance.txt : Sequence distance metrics
- {outdir}/PhylogeneticTree.png : Phylogenetic tree figure
- {outdir]/PCA.png : PCA plot
### REQUERMENT ARGUMENTS ###
INPUT=path/to/multisample.snpeff.test.vcf
SAMPLE_IDS=SAMPLE1,SAMPLE2,SAMPLE3,SAMPLE4,SAMPLE5
### OPTIONAL ARGUMENTS ###
GQ=60
DP=10
REF=90.0
HET=70.0,30.0
ALT=90.0
Mandatory arguments
- config (file_path) : Pipeline usage input configure file
- outdir (dir_path) : Output directory
- prefix (STRING) : Output result file name
Filter optional arguments
- GQ (INT) : Genotype quality score (default=60)
- DP (INT) : Genotype total read depth (default=10)
- REF (FLOAT) : Reference allele ratio (default=90.0)
- HET (FLOAT,FLOAT) : Heterozygous allele ratio (default=70.0,30.0 (mean = 70 : 30))
- ALT (FLOAT) : Alternative allele ratio (default=90.0)
! The above options will be removed if they do not SNP the conditions.
Based on the evolutionary similarity of sequences, a pipeline that draws phylogenetic trees by selecting only single copy genes of two or more species/genomes
- Step1. Clustering ortholog Groups of Protein Sequences with orthoMCL
- Step2. Single copy gene selection
- Step3. Multiple sequence alignment
- Step4. Remove ambiguously aligned regions
- Step5. Merge clean alignment and Make tree
- Step6. Calculate the sequence distance
python bin/phyne.py --mode ortholog --config [ortholog.conf] --outdir [result] --prefix [testset]
-
Input : protein sequences (FASTA format)
-
Output : Intermediate directory
- {outdir}/1.orthoMCL : orthoMCL-Run Result files
- {outdir}/2.mafft : multiple sequence alignment of each single copy gene cluster (fasta)
- {outdir}/3.Gblock : The sequence from which the region with low quality is removed (fasta)
- {outdir}/4.fasttree : newick file
- Ooutput : Final files
- {outdir}/Report/ortholog.fa : multiple sequnece alignmet merge
- {outdir}/Report/newick.txt : tree
- {outdir}/Report/SinglecopyGene.xls : single copy gene lists between species
- {outdir}/Report/distance.txt : Sequence distance metrics
- {outdir}/Report/PhylogeneticTree.png : very simple phylogenetic tree figure
- {outdir}/Report/PCA.png : PCA plot
- phyne_ortholog.conf :
#otrhoMCL
##########################################################################
orthoMCL = Y
ortho_fasta_dir = /BiO/BioPeople/boram/Projects/Phylogenetic/pepiteds
ortho_thread = 40
parser = Y
ortholog_fa = Y
##########################################################################
Tree = fasttree
- orthoMCL을 실행 시키려면 Y, 다음 Step만 진행 하고자 한다면 N
- ortho_fasta_dir은 종들의 amino acid sequence (fasta) 파일이 있는 directory 경로 입력
- ortho_thread는 orthoMCL의 thread를 설정해주는 것으로, 기본 40으로 설정
- parser는 Single copy gene을 선별하는 script 실행, 이미 진행 했다면 N으로 설정
- ortholog_fa는 종 별 Single copy gene의 서열을 모아놓은 fasta 생성
! 처음 실행시킬때에는 모두 Y로 실행 시키고, 원하는 step부터 실행시키고자 할 때 N으로 변경하여 사용하면 된다.
lastest update : 2019-05-24