Coder Social home page Coder Social logo

hmartiniano / gnomad_python_api Goto Github PK

View Code? Open in Web Editor NEW

This project forked from furkanmtorun/gnomad_python_api

0.0 2.0 0.0 278 KB

🧬 gnomAD Python API is used to obtain data from gnomAD (genome aggregation database).

License: GNU General Public License v3.0

Python 100.00%

gnomad_python_api's Introduction

🧬 gnomAD Python API

Actions for gnomad_python_api Python Badges gnomAD Python API License

#️⃣ What is gnomAD and the purpose of this script?

gnomAD (The Genome Aggregation Database) [1] is aggregation of thousands of exomes and genomes human sequencing studies. Also, gnomAD consortium annotates the variants with allelic frequency in genomes and exomes.

Here, this API with both CLI and GUI versions is able to search the genes or transcripts of your interest and retrieve variant data from the database via gnomAD backend API that based on GraphQL query language.


⭐ If you like it, please do not forget give a star!


#️⃣ Requirements and Installation

  • Create a directory and download the "gnomad_api_cli.py" and "requirements.txt" files or clone the repository via Git using following command:

    git clone https://github.com/furkanmtorun/gnomad_python_api.git

  • Install the required packages if you do not already:

    pip3 install -r requirements.txt

The requirements.txt contains required libraries for both GUI (graphical user interface) and CLI (command-line interface) versions.

  • It's ready to use now!

If you did not install pip yet, please follow the instruction here.

#️⃣ GUI | Usage

In the GUI version of gnomAD Python API, Streamlit has been used.

Note: In GUI version, it is possible to generate plots from the data retrieved. This option is not available in CLI version since it is still under development.

So, it is recommended to use GUI version.

  • To use GUI version of gnomAD Python API:

    streamlit run gnomad_api_gui.py

  • Here are the screenshots for the GUI version:

    gnomAD Python API GUI

    gnomAD Python API GUI - Main Screen

    gnomAD Python API GUI

    gnomAD Python API GUI - Outputs

    gnomAD Python API GUI

    gnomAD Python API GUI - Outputs and Plots

The outputs are also saved into outputs/ folder in the GUI version.

#️⃣ CLI | Usage & Options

Options Description Parameters
-filter_by It defines the input type. gene_name, gene_id, transcript_id, or rs_id
-search_by It defines the input. Type a gene/transcript identifier
e.g.: TP53, ENSG00000169174, ENST00000544455
Type the name of file containig your inputs
e.g: myGenes.txt
-dataset It defines the dataset. exac, gnomad_r2_1, gnomad_r3, gnomad_r2_1_controls, gnomad_r2_1_non_neuro, gnomad_r2_1_non_cancer, or gnomad_r2_1_non_topmed
-sv_dataset It defines structural variants dataset. gnomad_sv_r2_1, gnomad_sv_r2_1_controls, or gnomad_sv_r2_1_non_neuro
-reference_genome It defines reference genome build. GRCh37 or GRCh38
-h It displays the parameters. To get help via script: python gnomad_api_cli.py -h

❗ Here, for getting variants, gnomad_r2_1 and gnomad_sv_r2_1 are defined as default values for these two -dataset and -sv_dataset options, respectively.

❗ Also, you need to choose GRCh38 for retrieving variants from the gnomad_r3 dataset. However, in the GRCh38 build, structural variants are not available.

#️⃣ CLI | Example Usages

  • How to list the variants by gene name or gene id?

    For gene name:

    python gnomad_api_cli.py -filter_by=gene_name -search_by="BRCA1" -dataset="gnomad_r2_1" -sv_dataset="gnomad_sv_r2_1"

    If you get data from gnomad_r3:

    python gnomad_api_cli.py -filter_by=gene_name -search_by="BRCA1" -dataset="gnomad_r3" -reference_genome="GRCh38"

    For Ensembl gene ID

    python gnomad_api_cli.py -filter_by=gene_id -search_by="ENSG00000169174" -dataset="gnomad_r2_1" -sv_dataset="gnomad_sv_r2_1"

  • How to list the variants by transcript ID?

    python gnomad_api_cli.py -filter_by=transcript_id -search_by="ENST00000407236" -dataset="gnomad_r2_1"

  • How to get variant info by RS ID (rsId)?

    python gnomad_api_cli.py -filter_by=rs_id -search_by="rs201857604" -dataset="gnomad_r2_1"

  • How to list the variants using a file containing genes/transcripts?

    • Prepare your file that contains gene name, Ensembl gene IDs, Ensembl transcript IDs or RS IDs line-by-line.

      ENSG00000169174
      ENSG00000171862
      ENSG00000170445

    • Then, run the following command:

      python gnomad_api_cli.py -filter_by="gene_id" -search_by="myFavoriteGenes.txt" -dataset="gnomad_r2_1" -sv_dataset="gnomad_sv_r2_1"

    Please, use only one type of identifier in the file.

  • Then, the variants will be listed in "outputs" folder in the folders according to their identifier (gene name, gene id, transcript id or rsId).

  • That's all!

#️⃣ Disclaimer

All the outputs provided by this tool are for informational purposes only.

The information is not intended to replace any consultation, diagnosis, and/or medical treatment offered by physicians or healthcare providers.

The author of the app will not be liable for any direct, indirect, consequential, special, exemplary, or other damages arising therefrom.

#️⃣ Contributing & Feedback

I would be very happy to see any feedback or contributions to the project.

For problems and enhancement requests, please open an issue above.

⭐ If you like it, please do not forget give a star!

#️⃣ Developer

Furkan M. Torun (@furkanmtorun) | [email protected] | Academia: Google Scholar Profile

#️⃣ References

  1. Karczewski, K.J., Francioli, L.C., Tiao, G. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). https://doi.org/10.1038/s41586-020-2308-7

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.