Coder Social home page Coder Social logo

clin-geneset-etl's Introduction

clin-geneset-etl

Populate Redis table with scala

Redis docker-compose

Installation

on a swarm cluster or on a local environment Create a network instance with

docker network create -d overlay --attachable proxy Then Run

docker-compose up

To access the Redis instance on another console

Run:

docker exec -ti redis_redis_1 /bin/sh
redis-cli
> ping
the redis should answer PONG

To clean Redis run the following command on the cli:

flushall

to check for an alias and get ensembl id for that gene:

smembers gene:HYST2477

To get data for 1 ensembl id

smembers id:ENSG00000121410

To run etl

To compile and build runtime:

mvn clean install

Step 1 geneInfo

To execute etl with geneInfo (populate Redis with the geneInfo file (gene alias and gene info):

java -jar target/geneset-etl-1.0-SNAPSHOT-jar-with-dependencies.jar geneInfo Homo_sapiens.gene_info.txt

Step 2 HPO

To execute etl with hpo (populate Redis with the HPO file (Gene to HPO panel)

java -jar target/geneset-etl-1.0-SNAPSHOT-jar-with-dependencies.jar hpo ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt

Step 3 Orphanet

To execute etl with orphanets (populate Redis with the orphanets xml file (ensembl to orphanet panel)

java -jar target/geneset-etl-1.0-SNAPSHOT-jar-with-dependencies.jar orpha en_product6.xml

Step 4 Radboudumc

To execute etl with radboudumc (populate Redis with the radboudumc genes panels files in pdf (Gene to Radboudumc panel)

java -jar target/geneset-etl-1.0-SNAPSHOT-jar-with-dependencies.jar rad RAD_Files _DG217.pdf

Step 5 Omim

To execute etl with Omim (populate Redis with the omim genes maps file in txt (Gene -> Omim Phenotype + transmission mode)

java -jar target/geneset-etl-1.0-SNAPSHOT-jar-with-dependencies.jar omim genemap2.txt

Source of Data

NCBI

ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz

HPO - Human Phenotype Ontology

http://compbio.charite.de/jenkins/job/hpo.annotations.monthly/lastSuccessfulBuild/artifact/annotation/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt

Orphanet

RARE DISEASES WITH THEIR ASSOCIATED GENES:

http://www.orphadata.org/cgi-bin/index.php

Radboudumc

note1: (Feb 2020) 3 new panels were added : Liver, SHHM, Hereditary Bone Marrow Failure
note2: on Rad website, when we click on Muscle disorders, the Intelectual disorders shown instead... So, I'm running with a copy of 216 instead...

Omim (May 2020)

https://omim.org

  • Download all the 4 Omim databases files:
    • mim2gene.txt
    • mimTitles.txt
    • genemap2.txt
    • morbidmap.tx

clin-geneset-etl's People

Contributors

latch2112 avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.