Coder Social home page Coder Social logo

mrcieu / opengwas-api Goto Github PK

View Code? Open in Web Editor NEW
5.0 7.0 1.0 21.68 MB

API for MRC-IEU OpenGWAS platform (https://gwas.mrcieu.ac.uk)

Dockerfile 0.21% Python 79.95% Shell 0.39% HTML 15.91% WDL 2.55% CSS 0.99%
gwas population-genetics bioinformatics summary-statistics

opengwas-api's Introduction

MRC-IEU OpenGWAS API

DOI

Source code for backend of https://gwas.mrcieu.ac.uk/. See here for more details.

Citation

Elsworth B, Lyon M, Alexander T, Liu Y, Matthews P, Hallett J, Bates P, Palmer T, Haberland V, Davey Smith G, Zheng J, Haycock P, Gaunt TR, Hemani G. The MRC IEU OpenGWAS data infrastructure. bioRxiv, p. 2020.08.10.244293, Aug. 2020. https://doi.org/10.1101/2020.08.10.244293

opengwas-api's People

Contributors

asset-web avatar elswob avatar explodecomputer avatar jasonqiu avatar t0mrg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

markmclaren

opengwas-api's Issues

upload error

ml18692@ieu-db-interface:~/igd/igd-elasticsearch$ docker run -it -v /data:/data igd-elasticsearch:5aba4e46ba3e78f77efc1a9cbed92e28700f22d7 python -m index_data -f /data/igd/ieu-b-testing_1637656951_67203/ieu-b-testing_1637656951_67203.vcf.gz -g ieu-b-testing_1637656951_67203 -i ieu-b -e 132.226.129.208 -p 9200 /data/igd/ieu-b-testing_1637656951_67203/clump.txt
/usr/local/bin/python: No module named index_data
ml18692@ieu-db-interface:~/igd/igd-elasticsearch$ docker run -it -v /data:/data igd-elasticsearch:5aba4e46ba3e78f77efc1a9cbed92e28700f22d7 python add-gwas.py -m index_data -f /data/igd/ieu-b-testing_1637656951_67203/ieu-b-testing_1637656951_67203.vcf.gz -g ieu-b-testing_1637656951_67203 -i ieu-b -e 132.226.129.208 -p 9200 /data/igd/ieu-b-testing_1637656951_67203/clump.txt
usage: add-gwas.py [-h] [-m,--method METHOD] [-i,--index_name INDEX_NAME]
                   [-g,--gwas_id GWAS_ID] [-f,--gwas_file GWAS_FILE]
                   [-t,--tophits TOPHITS_FILE] [-e,--ehost EHOST]
                   [-p,--port PORT]
add-gwas.py: error: unrecognized arguments: /data/igd/ieu-b-testing_1637656951_67203/clump.txt
ml18692@ieu-db-interface:~/igd/igd-elasticsearch$ docker run -it -v /data:/data igd-elasticsearch:5aba4e46ba3e78f77efc1a9cbed92e28700f22d7 python add-gwas.py -m index_data -f /data/igd/ieu-b-testing_1637656951_67203/ieu-b-testing_1637656951_67203.vcf.gz -g ieu-b-testing_1637656951_67203 -i ieu-b -e 132.226.129.208 -p 9200 -t /data/igd/ieu-b-testing_1637656951_67203/clump.txt
Namespace(ehost='132.226.129.208', gwas_file='/data/igd/ieu-b-testing_1637656951_67203/ieu-b-testing_1637656951_67203.vcf.gz', gwas_id='ieu-b-testing_1637656951_67203', index_name='ieu-b', method='index_data', port='9200', tophits_file='/data/igd/ieu-b-testing_1637656951_67203/clump.txt')
Indexing gwas data...
Checking ieu-b-testing_1637656951_67203 /data/igd/ieu-b-testing_1637656951_67203/ieu-b-testing_1637656951_67203.vcf.gz
Checking for previously indexed records...
/usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.13/security-minimal-setup.html to enable security.
  warnings.warn(message, category=ElasticsearchWarning)
Number of existing records = 0
Processing vcf to /tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837
Done
Index already exists, adding to that one then :)
Setting ieu-b to read/write
Found 125 tophits
Index already exists, adding to that one then :)
Setting ieu-b to read/write
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 0.0006 0
/usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: [types removal] Specifying types in bulk requests is deprecated.
  warnings.warn(message, category=ElasticsearchWarning)
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 125.9677 1000000



/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 252.7362 2000000

/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 378.6067 3000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 507.7257 4000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 635.6235 5000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 761.4044 6000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 890.6953 7000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 1029.5979 8000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 1164.4156 9000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 1292.6839 10000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 1420.9816 11000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 1550.2292 12000000
# Gwas id: testing_1637656951_67203
# Records in gwas: 12321875
# Records in index: 24643750
Error!, records indexed and records in file not the same
# Records in tophits file: 126
# Records in tophits index: 126
All tophit records indexed ok
Removing temporary txt.gz file
ml18692@ieu-db-interface:~/igd/igd-elasticsearch$ echo "$?"
0
ml18692@ieu-db-interface:~/igd/igd-elasticsearch$ ll
total 88
drwxrwxr-x 3 ml18692 ml18692  4096 Nov 24 16:15 ./
drwxrwxr-x 8 ml18692 ml18692  4096 Nov 24 17:05 ../
-rwxrwxr-x 1 ml18692 ml18692 14965 Nov 24 16:15 add-gwas.py*
-rw-rw-r-- 1 ml18692 ml18692   519 Nov 24 16:15 docker-compose-es.yml
-rw-rw-r-- 1 ml18692 ml18692   483 Nov 24 16:15 Dockerfile
-rw-rw-r-- 1 ml18692 ml18692   149 Nov 24 16:15 environment.yml
drwxrwxr-x 8 ml18692 ml18692  4096 Nov 24 16:43 .git/
-rw-rw-r-- 1 ml18692 ml18692 35149 Nov 24 16:15 LICENSE
-rw-rw-r-- 1 ml18692 ml18692  4929 Nov 24 16:15 README.md
-rw-rw-r-- 1 ml18692 ml18692    43 Nov 24 16:15 requirements.txt
ml18692@ieu-db-interface:~/igd/igd-elasticsearch$ docker run -it -v /data:/data igd-elasticsearch:5aba4e46ba3e78f77efc1a9cbed92e28700f22d7 python 
Python 3.6.10 (default, Mar 24 2020, 03:18:26) 
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import random
>>> import time
>>> import json
>>> import os
>>> import gzip
>>> import ntpath
>>> import sys
>>> import argparse
>>> import logging
>>> import uuid
>>> from pysam import VariantFile
>>> from elasticsearch import Elasticsearch


>>> from elasticsearch import helpers
>>> from collections import deque
>>> from pathlib import Path
>>> import subprocess
>>> 
>>> #main function index_gwas_data requires one required, and one optional paramater
... #1. gwas_id (required)
... #2. index_name (optional)
... 
>>> TIMEOUT=500
>>> 
>>> def es_gwas_count(gwas_id, index_name):
...     res=es.count(
...         request_timeout=TIMEOUT,
...         index=index_name,
...         body={
...             "query": {
...                 "bool" : {
...                     "filter" : [
...                         {"term":{"gwas_id":gwas_id}},
...                     ]
...                 }
...             }
...         })
...     #total=res['hits']['total']
...     #print(res['count'])
...     return(res['count'])
... 
>>> 
>>> 
>>> ES_HOST="132.226.129.208"
>>> ES_PORT="9200"
>>>  es = Elasticsearch([f"{ES_HOST}:{ES_PORT}"])
  File "<stdin>", line 1
    es = Elasticsearch([f"{ES_HOST}:{ES_PORT}"])
    ^
IndentationError: unexpected indent
>>> 
>>> es = Elasticsearch([f"{ES_HOST}:{ES_PORT}"])
>>> es_gwas_count("testing_1637656951_67203", "ieu-b")
/usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.13/security-minimal-setup.html to enable security.
  warnings.warn(message, category=ElasticsearchWarning)
24643750
>>> def es_gwas_count(gwas_id, index_name):
...     res=es.count(
...         request_timeout=TIMEOUT,
...         index=index_name,
...         body={
...             "query": {
...                 "bool" : {
...                     "filter" : [
...                         {"term":{"gwas_id":gwas_id}},
...                     ]
...                 }
...             }
...         })
...     #total=res['hits']['total']
...     #print(res['count'])
...     return(res)
... 
>>> es_gwas_count("testing_1637656951_67203", "ieu-b")
{'count': 24643750, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}}
>>> es_gwas_count("testing_1637656951_67203", "ieu-b")['count']
24643750
>>> 

Frequent 502 Bad Gateway

Screenshot 2023-12-15 at 23 18 51

I'm frequently getting issues with the API (502 error), and even just logging on to the API page fails frequently. It's been an issue for a few weeks on my machine. Is it just my set up?

Best
Nick

update phewas func

Modify phewas function to default to batches:

  • ieu-a
  • ieu-b
  • ukb-b

but allow users to change to include other batches

Updating neo4j hanging

Hey @mcgml can you help me with this - I'm trying to update the neo4j db but when I run the map_from_csv.py script it's hanging

gh13047@ieu-db-interface:~/mr-base-api/app/igd-metadata$ docker cp data/groups.tsv mr-base-api-v3_mr-base-api-v3-private_1:/tmp
gh13047@ieu-db-interface:~/mr-base-api/app/igd-metadata$ docker cp data/permissions.tsv mr-base-api-v3_mr-base-api-v3-private_1:/tmp
gh13047@ieu-db-interface:~/mr-base-api/app/igd-metadata$ docker cp data/memberships.tsv mr-base-api-v3_mr-base-api-v3-private_1:/tmp
gh13047@ieu-db-interface:~/mr-base-api/app/igd-metadata$ docker cp data/batches.tsv mr-base-api-v3_mr-base-api-v3-private_1:/tmp
gh13047@ieu-db-interface:~/mr-base-api/app/igd-metadata$ # import data to graph
gh13047@ieu-db-interface:~/mr-base-api/app/igd-metadata$ docker exec -it mr-base-api-v3_mr-base-api-v3-private_1 \
> python map_from_csv.py \
> --study /tmp/study.tsv \
> --groups /tmp/groups.tsv \
> --permissions_e /tmp/permissions.tsv \
> --memberships /tmp/memberships.tsv \
> --batches /tmp/batches.tsv
Production
Params: 

At this point it just hangs. I noticed a lot of analysis calls being made to this private repo, I'm not sure who is doing this or who has access to do it, and I wondered if this is essentially slowing it down, but I've left it running for several minutes and no progress

Sex chromosome phewas variant queries fail

I have recently been trying to use the OpenGWAS service (lovely work by the way) to look at a gene on the X chromsome. I have a script to use the API, which has worked well with other regions now and in the past, but I am realizing it seems to be failing on sex chromosome queries.

Here is an example curl:
curl -X POST 'http://gwas-api.mrcieu.ac.uk/phewas?variant=X:15704453-15809034&pval=0.001' -H 'accept: application/json' -H 'X-Api-Token: null'
Same query, generated from the endpoint documentation:
http://gwas-api.mrcieu.ac.uk/phewas/X%3A15688830-15788411/0.001

I have tested several 1 million base pair stretches on several different non-sex chromosomes (locs 10-11 million) and they all seem to work, but the same query on both X and Y chromosomes fail instantly.

As another test, I found a variant on Gtexportal.org in this region (rs5936049) and then queried it on OpenGWAS http://gwas-api.mrcieu.ac.uk/phewas/rs5936049/0.001. The query succeeds and in the results the chromosome is listed as "X".

So, either I am setting up the queries wrong, I couldn't find any documentation on sex chromosomes specifically, or there might be some issue with how the API handles non-numeric chromosome queries. Perhaps it's coercing the input str to numeric and failing? I also tried using 23/24 as the chromosome id, but that didn't work either.

Any thoughts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.