ml18692@ieu-db-interface:~/igd/igd-elasticsearch$ docker run -it -v /data:/data igd-elasticsearch:5aba4e46ba3e78f77efc1a9cbed92e28700f22d7 python -m index_data -f /data/igd/ieu-b-testing_1637656951_67203/ieu-b-testing_1637656951_67203.vcf.gz -g ieu-b-testing_1637656951_67203 -i ieu-b -e 132.226.129.208 -p 9200 /data/igd/ieu-b-testing_1637656951_67203/clump.txt
/usr/local/bin/python: No module named index_data
ml18692@ieu-db-interface:~/igd/igd-elasticsearch$ docker run -it -v /data:/data igd-elasticsearch:5aba4e46ba3e78f77efc1a9cbed92e28700f22d7 python add-gwas.py -m index_data -f /data/igd/ieu-b-testing_1637656951_67203/ieu-b-testing_1637656951_67203.vcf.gz -g ieu-b-testing_1637656951_67203 -i ieu-b -e 132.226.129.208 -p 9200 /data/igd/ieu-b-testing_1637656951_67203/clump.txt
usage: add-gwas.py [-h] [-m,--method METHOD] [-i,--index_name INDEX_NAME]
[-g,--gwas_id GWAS_ID] [-f,--gwas_file GWAS_FILE]
[-t,--tophits TOPHITS_FILE] [-e,--ehost EHOST]
[-p,--port PORT]
add-gwas.py: error: unrecognized arguments: /data/igd/ieu-b-testing_1637656951_67203/clump.txt
ml18692@ieu-db-interface:~/igd/igd-elasticsearch$ docker run -it -v /data:/data igd-elasticsearch:5aba4e46ba3e78f77efc1a9cbed92e28700f22d7 python add-gwas.py -m index_data -f /data/igd/ieu-b-testing_1637656951_67203/ieu-b-testing_1637656951_67203.vcf.gz -g ieu-b-testing_1637656951_67203 -i ieu-b -e 132.226.129.208 -p 9200 -t /data/igd/ieu-b-testing_1637656951_67203/clump.txt
Namespace(ehost='132.226.129.208', gwas_file='/data/igd/ieu-b-testing_1637656951_67203/ieu-b-testing_1637656951_67203.vcf.gz', gwas_id='ieu-b-testing_1637656951_67203', index_name='ieu-b', method='index_data', port='9200', tophits_file='/data/igd/ieu-b-testing_1637656951_67203/clump.txt')
Indexing gwas data...
Checking ieu-b-testing_1637656951_67203 /data/igd/ieu-b-testing_1637656951_67203/ieu-b-testing_1637656951_67203.vcf.gz
Checking for previously indexed records...
/usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.13/security-minimal-setup.html to enable security.
warnings.warn(message, category=ElasticsearchWarning)
Number of existing records = 0
Processing vcf to /tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837
Done
Index already exists, adding to that one then :)
Setting ieu-b to read/write
Found 125 tophits
Index already exists, adding to that one then :)
Setting ieu-b to read/write
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 0.0006 0
/usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: [types removal] Specifying types in bulk requests is deprecated.
warnings.warn(message, category=ElasticsearchWarning)
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 125.9677 1000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 252.7362 2000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 378.6067 3000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 507.7257 4000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 635.6235 5000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 761.4044 6000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 890.6953 7000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 1029.5979 8000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 1164.4156 9000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 1292.6839 10000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 1420.9816 11000000
/tmp/ieu-b-testing_1637656951_67203.88aaa5f3-57bc-4317-8eb7-dd643ab01837 1550.2292 12000000
# Gwas id: testing_1637656951_67203
# Records in gwas: 12321875
# Records in index: 24643750
Error!, records indexed and records in file not the same
# Records in tophits file: 126
# Records in tophits index: 126
All tophit records indexed ok
Removing temporary txt.gz file
ml18692@ieu-db-interface:~/igd/igd-elasticsearch$ echo "$?"
0
ml18692@ieu-db-interface:~/igd/igd-elasticsearch$ ll
total 88
drwxrwxr-x 3 ml18692 ml18692 4096 Nov 24 16:15 ./
drwxrwxr-x 8 ml18692 ml18692 4096 Nov 24 17:05 ../
-rwxrwxr-x 1 ml18692 ml18692 14965 Nov 24 16:15 add-gwas.py*
-rw-rw-r-- 1 ml18692 ml18692 519 Nov 24 16:15 docker-compose-es.yml
-rw-rw-r-- 1 ml18692 ml18692 483 Nov 24 16:15 Dockerfile
-rw-rw-r-- 1 ml18692 ml18692 149 Nov 24 16:15 environment.yml
drwxrwxr-x 8 ml18692 ml18692 4096 Nov 24 16:43 .git/
-rw-rw-r-- 1 ml18692 ml18692 35149 Nov 24 16:15 LICENSE
-rw-rw-r-- 1 ml18692 ml18692 4929 Nov 24 16:15 README.md
-rw-rw-r-- 1 ml18692 ml18692 43 Nov 24 16:15 requirements.txt
ml18692@ieu-db-interface:~/igd/igd-elasticsearch$ docker run -it -v /data:/data igd-elasticsearch:5aba4e46ba3e78f77efc1a9cbed92e28700f22d7 python
Python 3.6.10 (default, Mar 24 2020, 03:18:26)
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import random
>>> import time
>>> import json
>>> import os
>>> import gzip
>>> import ntpath
>>> import sys
>>> import argparse
>>> import logging
>>> import uuid
>>> from pysam import VariantFile
>>> from elasticsearch import Elasticsearch
>>> from elasticsearch import helpers
>>> from collections import deque
>>> from pathlib import Path
>>> import subprocess
>>>
>>> #main function index_gwas_data requires one required, and one optional paramater
... #1. gwas_id (required)
... #2. index_name (optional)
...
>>> TIMEOUT=500
>>>
>>> def es_gwas_count(gwas_id, index_name):
... res=es.count(
... request_timeout=TIMEOUT,
... index=index_name,
... body={
... "query": {
... "bool" : {
... "filter" : [
... {"term":{"gwas_id":gwas_id}},
... ]
... }
... }
... })
... #total=res['hits']['total']
... #print(res['count'])
... return(res['count'])
...
>>>
>>>
>>> ES_HOST="132.226.129.208"
>>> ES_PORT="9200"
>>> es = Elasticsearch([f"{ES_HOST}:{ES_PORT}"])
File "<stdin>", line 1
es = Elasticsearch([f"{ES_HOST}:{ES_PORT}"])
^
IndentationError: unexpected indent
>>>
>>> es = Elasticsearch([f"{ES_HOST}:{ES_PORT}"])
>>> es_gwas_count("testing_1637656951_67203", "ieu-b")
/usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.13/security-minimal-setup.html to enable security.
warnings.warn(message, category=ElasticsearchWarning)
24643750
>>> def es_gwas_count(gwas_id, index_name):
... res=es.count(
... request_timeout=TIMEOUT,
... index=index_name,
... body={
... "query": {
... "bool" : {
... "filter" : [
... {"term":{"gwas_id":gwas_id}},
... ]
... }
... }
... })
... #total=res['hits']['total']
... #print(res['count'])
... return(res)
...
>>> es_gwas_count("testing_1637656951_67203", "ieu-b")
{'count': 24643750, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}}
>>> es_gwas_count("testing_1637656951_67203", "ieu-b")['count']
24643750
>>>