Coder Social home page Coder Social logo

jlsteenwyk / phykit Goto Github PK

View Code? Open in Web Editor NEW
55.0 5.0 7.0 17.84 MB

a UNIX shell toolkit for processing and analyzing multiple sequence alignments and phylogenies

Home Page: https://jlsteenwyk.com/PhyKIT/

License: MIT License

Python 99.05% Makefile 0.95%
bioinformatics evolution evolutionary-biology genomics phylogenetics phylogenomics python multiple-sequence-alignments

phykit's Introduction

Logo

Docs · Report Bug · Request Feature

follow on Twitter

PhyKIT is a UNIX shell toolkit for processing and analyzing phylogenomic data.

If you found PhyKIT useful, please cite PhyKIT: a broadly applicable UNIX shell toolkit for processing and analyzing phylogenomic data. Bioinformatics. doi: 10.1093/bioinformatics/btab096.


This documentation covers downloading and installing PhyKIT. Details about each function as well as tutorials for using PhyKIT are available in the online documentation.


Quick Start

# install
pip install phykit
# run
phykit <function> <input file>

Installation

If you are having trouble installing PhyKIT, please contact the lead developer, Jacob L. Steenwyk, via email or twitter to get help.

To install using pip, we strongly recommend building a virtual environment to avoid software dependency issues. To do so, execute the following commands:

# create virtual environment
python -m venv .venv
# activate virtual environment
source .venv/bin/activate
# install phykit
pip install phykit

Note, the virtual environment must be activated to use phykit.

After using PhyKIT, you may wish to deactivate your virtual environment and can do so using the following command:

# deactivate virtual environment
deactivate

Similarly, to install from source, we strongly recommend using a virtual environment. To do so, use the following commands:

# download
git clone https://github.com/JLSteenwyk/PhyKIT.git
cd PhyKIT/
# create virtual environment
python -m venv .venv
# activate virtual environment
source .venv/bin/activate
# install
make install

To deactivate your virtual environment, use the following command:

# deactivate virtual environment
deactivate

Note, the virtual environment must be activated to use phykit.


To install via anaconda, execute the following command:

conda install bioconda::phykit

Visit here for more information: https://anaconda.org/bioconda/phykit


To test phykit installation, launch the help message

phykit -h

phykit's People

Contributors

dependabot[bot] avatar hyphaltip avatar jlsteenwyk avatar tjbiii avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

phykit's Issues

enhancement: faidx

include functionality to extract multiple entries using the faidx function.

consider having users input a single column text file for delimit multiple entries with a spacer character.

Can gene-gene covariation relationships be calculated for orthologous gene families?

Hi Jacob, thanks for providing this excellent toolkit.

I used OrthoFinder software to obtain orthologous gene families of five plants,including 81624 orthogroups in total. I want to use PhyKit to estimate whether there is a coevolutionary relationship between different orthogroups.
But one orthogroup has multiple genes from the same species. In your example file, e.g. Shen_etal_SciAdv_2020_NDC80.treefile, I found only one gene per species. So can gene-gene covariation be carried out in my project?

Thanks
Xiaoxu

ValueError: x and y must have length at least 2.

Hello,
I want to use phykit to evaluate gene-gene covariation, the command looks like this: "phykit cover treefiles/gene1.treefile.rooted treefiles/gene2.treefile.rooted -r tree.rooted.txt". However, I met this error: "ValueError: x and y must have length at least 2". Please tell me how to resolve this problem. Thanks.
The best,
Chun

create_concat: AttributeError: 'str' object has no attribute 'decode'

Hi!

I am trying to make a supermatrix of BUSCO genes.

I have made the txt file containing the full path to the files containing the alignments of each gene, and I have checked that the fasta headers are the same for each species in every fasta.

I have had lots of success using phykit in the past, but I am trying it on a new system and keep getting the following error:

Traceback (most recent call last):
  File "/home/administrator/anaconda3/envs/phykit/lib/python3.9/site-packages/phykit/services/alignment/create_concatenation_matrix.py", line 218, in fasta_file_write
    concatenated.append(s._data.decode("utf-8"))
AttributeError: 'str' object has no attribute '_data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/administrator/anaconda3/envs/phykit/bin/phykit", line 10, in <module>
    sys.exit(main())
  File "/home/administrator/anaconda3/envs/phykit/lib/python3.9/site-packages/phykit/phykit.py", line 1956, in main
    Phykit()
  File "/home/administrator/anaconda3/envs/phykit/lib/python3.9/site-packages/phykit/phykit.py", line 229, in __init__
    self.run_alias(args.command, sys.argv[2:])
  File "/home/administrator/anaconda3/envs/phykit/lib/python3.9/site-packages/phykit/phykit.py", line 309, in run_alias
    return self.create_concatenation_matrix(argv)
  File "/home/administrator/anaconda3/envs/phykit/lib/python3.9/site-packages/phykit/phykit.py", line 1895, in create_concatenation_matrix
    CreateConcatenationMatrix(args).run()
  File "/home/administrator/anaconda3/envs/phykit/lib/python3.9/site-packages/phykit/services/alignment/create_concatenation_matrix.py", line 18, in run
    self.create_concatenation_matrix(self.alignment_list_path, self.prefix)
  File "/home/administrator/anaconda3/envs/phykit/lib/python3.9/site-packages/phykit/services/alignment/create_concatenation_matrix.py", line 330, in create_concatenation_matrix
    self.fasta_file_write(
  File "/home/administrator/anaconda3/envs/phykit/lib/python3.9/site-packages/phykit/services/alignment/create_concatenation_matrix.py", line 221, in fasta_file_write
    concatenated.append(s.decode("utf-8"))
AttributeError: 'str' object has no attribute 'decode'

I would be really grateful for any help in solving this,

Many thanks,

Toby

Saturation value exceeded the expected range

Hi Jacob,

Thank you for developing the versatile tool, Phykit, in facilitating phylogenetic analyses. However, I encountered an issue where the saturation value exceeded the expected range, reaching a value greater than 1. I'm using phykit 1.19.6 version. I have sent the example files demonstrating this issue to your inbox.

Best,
Murphy

concatenate issue

Hi Jacob, many thanks for providing this amazing toolkit.

I came across an issue trying to concatenate alignments. I'm on ubuntu (18.04.6) with phykit running in a conda environment installed according to the instructions given in the readme.

I'm getting the error output below. I will send you a link to the input data via email.

I'm using phykit version 1.11.3 and biopython 1.78.

Thanks,
Heroen

--------------------
| General features |
--------------------
Total number of taxa: 48
Total number of alignments: 7301


----------------
| Output files |
----------------
Partition file output: concat.partition
Concatenated fasta output: concat.fa
Occupancy report: concat.occupancy

Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/phykit/bin/phykit", line 10, in <module>
    sys.exit(main())
  File "/home/ubuntu/miniconda3/envs/phykit/lib/python3.9/site-packages/phykit/phykit.py", line 2375, in main
    Phykit()
  File "/home/ubuntu/miniconda3/envs/phykit/lib/python3.9/site-packages/phykit/phykit.py", line 251, in __init__
    self.run_alias(args.command, sys.argv[2:])
  File "/home/ubuntu/miniconda3/envs/phykit/lib/python3.9/site-packages/phykit/phykit.py", line 345, in run_alias
    return self.create_concatenation_matrix(argv)
  File "/home/ubuntu/miniconda3/envs/phykit/lib/python3.9/site-packages/phykit/phykit.py", line 2297, in create_concatenation_matrix
    CreateConcatenationMatrix(args).run()
  File "/home/ubuntu/miniconda3/envs/phykit/lib/python3.9/site-packages/phykit/services/alignment/create_concatenation_matrix.py", line 18, in run
    self.create_concatenation_matrix(self.alignment_list_path, self.prefix)
  File "/home/ubuntu/miniconda3/envs/phykit/lib/python3.9/site-packages/phykit/services/alignment/create_concatenation_matrix.py", line 339, in create_concatenation_matrix
    self.fasta_file_write(
  File "/home/ubuntu/miniconda3/envs/phykit/lib/python3.9/site-packages/phykit/services/alignment/create_concatenation_matrix.py", line 231, in fasta_file_write
    entry = f">{x}\n{''.join(concat[x])}\n"
TypeError: sequence item 36: expected str instance, Seq found

Saturation calculation error?

Hi Jacob, thanks for developing PhyKIT, I find it very useful in calculating various summary statistics. However, I'm encountering an error with the saturation function, and I am not sure what the problem is. The error is as follows:

  File "/home/zhangy/miniconda3/envs/phykit/bin/phykit", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/zhangy/miniconda3/envs/phykit/lib/python3.11/site-packages/phykit/phykit.py", line 2375, in main
    Phykit()
  File "/home/zhangy/miniconda3/envs/phykit/lib/python3.11/site-packages/phykit/phykit.py", line 251, in __init__
    self.run_alias(args.command, sys.argv[2:])
  File "/home/zhangy/miniconda3/envs/phykit/lib/python3.11/site-packages/phykit/phykit.py", line 340, in run_alias
    return self.saturation(argv)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhangy/miniconda3/envs/phykit/lib/python3.11/site-packages/phykit/phykit.py", line 2192, in saturation
    Saturation(args).run()
  File "/home/zhangy/miniconda3/envs/phykit/lib/python3.11/site-packages/phykit/services/tree/saturation.py", line 38, in run
    patristic_distances, pairwise_identities = self.loop_through_combos_and_calculate_pds_and_pis(
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhangy/miniconda3/envs/phykit/lib/python3.11/site-packages/phykit/services/tree/saturation.py", line 80, in loop_through_combos_and_calculate_pds_and_pis
    pairwise_identities = self.calculate_pairwise_identities(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhangy/miniconda3/envs/phykit/lib/python3.11/site-packages/phykit/services/tree/saturation.py", line 104, in calculate_pairwise_identities
    if seq_one[idx] == seq_two[idx]:
                       ~~~~~~~^^^^^
IndexError: string index out of range

Thanks!

Miles

Phykit `thread_dna` with the log of Clipkit gives alignments of different lengths

Dear developers,

I am using Clipkit and Phykit to create a phylogenomics tree of 26 isolates of a species (+outgroup) on the single-copy unique orthologs.
Once I did the multi-alignments of my proteins and trimmed them using Clipkit, I would like to convert amino acids into DNA using thread_dna. However, the output shows alignments of different lengths.

I am using Phykit v1.19.8, installed through conda. Here is my command:

phykit thread_dna -p 02_muscle/$SAMPLE*clipkit -c 02_muscle/$SAMPLE*.log -n 03_mrna_per_OG/$SAMPLE*fa > 04_pal2nal_mrna_log/$SAMPLE.pal2nal.afa

Here are the input/output files on one OG (I renamed all the files with a ".txt" so they can be uploaded in github):

Is this a bug, or am I using Phykit wrongly?

Thank you for your time and I wish you a nice day!
Marion

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.