Coder Social home page Coder Social logo

biothings_client.py's Introduction

Codacy Badge Codacy Coverage Badge PyPI Downloads Documentation Status

Intro

biothings_client is an easy-to-use Python wrapper to access any Biothings.api -based backend service. Currently, the following clients are available:

  • gene - The client for MyGene.Info, which provides access to gene objects.
  • variant - The client MyVariant.Info, which provides access to genetic variant objects.
  • chem - The client for MyChem.Info, which provides access to chemical/drug objects.
  • disease - The client for MyDisease.Info, which provides access to disease objects.
  • geneset - The client for MyGeneset.Info, which provides access to geneset/pathway objects.
  • taxon - The client for t.biothings.io, which provides access to taxon objects.

Requirements

python >=2.7 (including all python3 versions)

(It may still work under python 2.6, but it's not supported any more.)

requests (install using "pip install requests")

Optional dependencies

  • pandas (install using "pip install pandas") is required for returning a list of variant objects as DataFrame.
  • requests_cache (install using "pip install requests_cache") is required for local caching of API requests.

Installation

Option 1
pip install biothings_client
Option 2

download/extract the source code and run:

python setup.py install
Option 3

install the latest code directly from the repository:

pip install -e git+https://github.com/biothings/biothings_client.py#egg=biothings_client

Version history

CHANGES.txt

Tutorial

See the quick start tutorial at the biothings_client doc page.

Documentation

https://biothings-clientpy.readthedocs.io

Usage

In [1]: from biothings_client import get_client

# get a client for variant objects

In [2]: mv = get_client("variant")

In [3]: mv.getvariant("chr7:g.140453134T>C")
Out[3]:  #output below is collapsed
{"_id": "chr7:g.140453134T>C",
 "_version": 1,
 "chrom": "7",
 "cadd": {...},
 "clinvar": {...},
 "cosmic": {...},
 "dbnsfp": {...},
 "dbsnp": {...},
 "docm": {...},
 "hg19": {'end': 140453134, 'start': 140453134},
 "mutdb": {...},
 "snpeff": {...},
 "vcf": {
    "alt": "C",
    "position": "140453134",
    "ref": "T"
 }}

# get a client for gene objects

In [7]: mg = get_client("gene")

In [8]: mg.getgene(1017, 'name,symbol,refseq')
Out[8]:
{'_id': '1017',
 '_score': 21.03413,
 'name': 'cyclin dependent kinase 2',
 'refseq': {'genomic': ['NC_000012.12', 'NC_018923.2', 'NG_034014.1'],
  'protein': ['NP_001277159.1',
   'NP_001789.2',
   'NP_439892.2',
   'XP_011536034.1'],
  'rna': ['NM_001290230.1', 'NM_001798.4', 'NM_052827.3', 'XM_011537732.1'],
  'translation': [{'protein': 'NP_001789.2', 'rna': 'NM_001798.4'},
   {'protein': 'NP_439892.2', 'rna': 'NM_052827.3'},
   {'protein': 'NP_001277159.1', 'rna': 'NM_001290230.1'},
   {'protein': 'XP_011536034.1', 'rna': 'XM_011537732.1'}]},
 'symbol': 'CDK2'}

# get a client for chems/drugs

In [9]: md = get_client("chem")

In [10]: md.getchem("ATBDZSAENDYQDW-UHFFFAOYSA-N", fields="pubchem")
Out[10]:
{'_id': 'ATBDZSAENDYQDW-UHFFFAOYSA-N',
 '_version': 1,
 'pubchem': {'chiral_atom_count': 0,
  'chiral_bond_count': 0,
  'cid': 'CID4080429',
  'complexity': 250,
  'covalently-bonded_unit_count': 1,
  'defined_atom_stereocenter_count': 0,
  'defined_bond_stereocenter_count': 0,
  'exact_mass': 184.019415,
  'formal_charge': 0,
  'heavy_atom_count': 12,
  'hydrogen_bond_acceptor_count': 3,
  'hydrogen_bond_donor_count': 1,
  'inchi': 'InChI=1S/C8H8O3S/c1-2-7-4-3-5-8(6-7)12(9,10)11/h2-6H,1H2,(H,9,10,11)',
  'inchi_key': 'ATBDZSAENDYQDW-UHFFFAOYSA-N',
  'isotope_atom_count': 0,
  'iupac': {'traditional': '3-vinylbesylic acid'},
  'molecular_formula': 'C8H8O3S',
  'molecular_weight': 184.21232,
  'monoisotopic_weight': 184.019415,
  'rotatable_bond_count': 2,
  'smiles': {'isomeric': 'C=CC1=CC(=CC=C1)S(=O)(=O)O'},
  'tautomers_count': 1,
  'topological_polar_surface_area': 62.8,
  'undefined_atom_stereocenter_count': 0,
  'undefined_bond_stereocenter_count': 0,
  'xlogp': 1.4}}

# get a client for taxa

In [11]: mt = get_client("taxon")

In [12]: mt.gettaxon(9606)
Out[12]:
{'_id': '9606',
 '_version': 1,
 'authority': ['homo sapiens linnaeus, 1758'],
 'common_name': 'man',
 'genbank_common_name': 'human',
 'has_gene': True,
 'lineage': [9606,
  9605,
  207598,
  9604,
  314295,
  9526,
  314293,
  376913,
  9443,
  314146,
  1437010,
  9347,
  32525,
  40674,
  32524,
  32523,
  1338369,
  8287,
  117571,
  117570,
  7776,
  7742,
  89593,
  7711,
  33511,
  33213,
  6072,
  33208,
  33154,
  2759,
  131567,
  1],
 'other_names': ['humans'],
 'parent_taxid': 9605,
 'rank': 'species',
 'scientific_name': 'homo sapiens',
 'taxid': 9606,
 'uniprot_name': 'homo sapiens'}

Contact

Drop us any feedback @biothingsapi

biothings_client.py's People

Contributors

amiteshksharma avatar ctrl-schaff avatar cyrus0824 avatar erikyao avatar everaldorodrigo avatar newgene avatar ravila4 avatar sdhutchins avatar simonvh avatar skumar951 avatar tirkarthi avatar zcqian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

biothings_client.py's Issues

Use standard logging stream

Many components use print statement regardless of the verbose value. Consider replacing them using standardized logging stream or raising customized exceptions unless there is a verbose parameter evaluating to true. Consider automatically configure the logging stream if not already configured to achieve a similar behavior as before. If it is desirable to add an additional debug/details stream, maybe this is helpful: https://github.com/elastic/elasticsearch-py/blob/208269a532ac6d8cf5aaa901b16410ba84cb4513/elasticsearch/connection/base.py#L41

fields parameter not working as intended

In this example:

mg = get_client("gene")
mg.getgene(1017, 'name,symbol,refseq')

the result returns an unfiltered dictionary (which I won't post here since it's big)

Outdated clinvar?

Hi,

First of all, thanks a lot for the great effort put into building this wonderful service! It really has everything you could wish for.
I do have a question. When I manually search clinvar for position chr13:48941720 ("13[chr] AND 48941720[chrpos37]") I get a hit (https://www.ncbi.nlm.nih.gov/clinvar/variation/1177569/?new_evidence=false) dated Jul 13, 2021.
However, I find no hits when I use biothings:

from biothings_client import get_client

my_variant = get_client("variant")
result = my_variant.query(q='clinvar.chrom:13 AND clinvar.hg19.start:48941720', fields="clinvar")
result['hits']

Could it be that the ClinVar database used by biothings is outdated?

Thanks!

Hylke

End positions of 'delins' variations

On biothings_client/mixins/variant.py#L116:

elif len(ref) > 1 and len(alt) > 1:
    if ref[0] == alt[0]:
        # if ref and alt overlap from the left, trim them first
        _chrom, _pos, _ref, _alt = self._normalized_vcf(chrom, pos, ref, alt)
        return self.format_hgvs(_chrom, _pos, _ref, _alt)
    else:
        end = int(pos) + len(alt) - 1
        hgvs = 'chr{0}:g.{1}_{2}delins{3}'.format(chrom, pos, end, alt)

It should be end = int(pos) + len(ref) - 1. The current code is prone to ID conflict.

See hgvs.py#L127.

update tests using drugbank ids

we should update those mychem tests related to the drugbank data src, since it has been removed recently from MyChem.info API.

get_client('chem') not working

In [41]: biothings_client.get_client('chem')
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-41-b2b228d0f801> in <module>()
----> 1 biothings_client.get_client('chem')

~/opt/devpy3/lib/python3.6/site-packages/biothings_client/__init__.py in get_client(biothing_type, instance, *args, **kwargs)
    164     biothing_type = biothing_type.lower()
    165     if (biothing_type not in CLIENT_SETTINGS and not kwargs.get('url', False)):
--> 166         raise Exception("No client named '{0}', currently available clients are: {1}".format(biothing_type, list(CLIENT_SETTINGS.keys())))
    167     _settings = CLIENT_SETTINGS[biothing_type] if biothing_type in CLIENT_SETTINGS else _generate_settings(biothing_type, kwargs.get('url'))
    168     _class = type(_settings["class_name"], tuple([_settings["base_class"]] + _settings["mixins"]),

Exception: No client named 'chem', currently available clients are: ['gene', 'variant', 'taxon', 'drug']

even though "chem" is registered here: https://github.com/biothings/biothings_client.py/blob/master/biothings_client/__init__.py#L145

requests.exceptions.HTTPError: 400 Client Error: Expect type list

The simple script

from biothings_client import get_client
gene_client = get_client('gene')
results = gene_client.querymany(['P24941', 'O14727'], scopes='uniprot', fields='symbol,name')
print(results)

Results in the following error:

from biothings_client import get_client
gene_client = get_client('gene')
results = gene_client.querymany(['P24941', 'O14727'], scopes='uniprot', fields='symbol,name')
querying 1-2...Traceback (most recent call last):
File "", line 1, in
File "/nobackup/bioinfo_share/software/anaconda2/envs/py27_dev/lib/python2.7/site-packages/biothings_client/base.py", line 542, in _querymany
for hits in self._repeated_query(query_fn, qterms, verbose=verbose):
File "/nobackup/bioinfo_share/software/anaconda2/envs/py27_dev/lib/python2.7/site-packages/biothings_client/base.py", line 223, in _repeated_query
from_cache, query_result = query_fn(batch, **fn_kwargs)
File "/nobackup/bioinfo_share/software/anaconda2/envs/py27_dev/lib/python2.7/site-packages/biothings_client/base.py", line 541, in query_fn
def query_fn(qterms): return self._querymany_inner(qterms, verbose=verbose, **kwargs)
File "/nobackup/bioinfo_share/software/anaconda2/envs/py27_dev/lib/python2.7/site-packages/biothings_client/base.py", line 488, in _querymany_inner
return self._post(_url, params=_kwargs, verbose=verbose)
File "/nobackup/bioinfo_share/software/anaconda2/envs/py27_dev/lib/python2.7/site-packages/biothings_client/base.py", line 176, in _post
res.raise_for_status()
File "/nobackup/bioinfo_share/software/anaconda2/envs/py27_dev/lib/python2.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Expect type list. for url: http://mygene.info/v3/query/
print(results)
[{u'query': u'"P24941","O14727"', u'notfound': True}]

I've used mygene python API for a number of years without a hitch, existing scripts that have worked now throw this error. Any help would be massively appreciated.

query terms split on space when passed to querymany as a list.

Reported from @mmayers12:

from biothings_client import get_client
mt = get_client('taxon')
result = mt.querymany(['Aspergillus parasiticus', 'Chlamydophila psittaci'], scopes=['scientific_name'])

print('\n\n',set([r['query'] for r in result]))

this result in:

querying 1-2...done.
Finished.
3 input query terms found dup hits:
    [('Aspergillus', 10), ('parasiticus', 10), ('psittaci', 10)]
Pass "returnall=True" to return complete lists of duplicate or missing query terms.


 {'Aspergillus', 'parasiticus', 'Chlamydophila', 'psittaci'}

The query terms should not be split on spaces.

Use doc_type key in metadata to generate client without having to specific biothings_type

Currently, when pointing biothings_client to a specific Biothings API URL, we need to specify:

client = get_client("variant","http://pending.biothings.io/denovodb")

Metadata have recently been updated to include "doc_type" containing "variant", that is, the actually biothings type being returned by the API. See: http://pending.biothings.io/denovodb/metadata. Generating the client can then be:

client = get_client("http://pending.biothings.io/denovodb")

make biothings_type optional. By default, /metadata is assumed to be added to the end of the URL.

This improvement will allow to generate clients completely dynamically, without having to know anything but the URL.

Handle more gracefully when the biothing_type value from metadata endpoint is a list

Currently, an unhandled RuntimeError exception will be raised. We should handle it more gracefully.

In [1]: import biothings_client
m
In [2]: mgs = biothings_client.get_client(url='https://mygeneset.info/v1')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
File ~/opt/devpy3/lib/python3.10/site-packages/biothings_client/__init__.py:235, in get_client(biothing_type, instance, *args, **kwargs)
    234     biothing_type = dic.get('biothing_type')
--> 235     assert isinstance(biothing_type, str)
    236 except requests.RequestException:

AssertionError:

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 mgs = biothings_client.get_client(url='https://mygeneset.info/v1')

File ~/opt/devpy3/lib/python3.10/site-packages/biothings_client/__init__.py:239, in get_client(biothing_type, instance, *args, **kwargs)
    237         raise RuntimeError("Cannot access metadata url to determine biothing_type.")
    238     except AssertionError:
--> 239         raise RuntimeError("Biothing_type in metadata url is not a valid string.")
    240 else:
    241     biothing_type = biothing_type.lower()

RuntimeError: Biothing_type in metadata url is not a valid string.

In this case, the biothing_type value is a single-value list:

{
  "biothing_type": ["geneset"],
...
}

Although MyGeneset.info API should change the value to just a single value, biothings_client can still handle this type of error more gracefully:

  • if the value is a single value list, just use this value as biothing_type
  • if the value is a multi-value list, should raise a RuntimeError with a more specific error message
  • if the value is other cases, raise the same RuntimeError

Specifying the fetch_all block size?

I'm interested in using MyGeneInfo.query(..., fetch_all=True). Here's the fetch_all docs:

:param fetch_all: if True, return a generator to all query results (unsorted). This can provide a very fast
return of all hits from a large query.
Server requests are done in blocks of 1000 and yielded individually. Each 1000 block of
results must be yielded within 1 minute, otherwise the request will expire at server side.

Regarding "server requests are done in blocks of 1000", is it possible to control that number? I.e. set a parameter like fetch_all_size?

There are two reasons why I'm interested in this:

  1. for testing, where I will only consume the first few results.

  2. to prevent timeout. If we have to do a somewhat slow processing of results as we iterate through them, it's possible the following timeout could occur:

    Each 1000 block of results must be yielded within 1 minute, otherwise the request will expire at server side.

    Reducing the block size would speed up the time it takes to process each block.

Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated

The following line is deprecated in Python 3.8:

from collections import Iterable

Got the following messages:

/usr/share/miniconda/lib/python3.8/site-packages/biothings_client/base.py:10
  /usr/share/miniconda/lib/python3.8/site-packages/biothings_client/base.py:10: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
    from collections import Iterable

Python 3.9 is scheduled for release on 2020-10-05.

MyGene client: .query with fields=list returns incomplete results

The documentation indicates that a list is accepted for biothings_client.MyGeneInfo's fields argument (and passing a list is certainly the most pythonic interface):

:param fields: fields to return, a list or a comma-separated string.
If **fields="all"**, all available fields are returned

But using a list seems to result in missing fields.

Reproducible example

import biothings_client
mg = biothings_client.get_client('gene')
biothings_client.__version__

Using version 0.2.1.

fields = [
    'entrezgene',
    'symbol',
    'name',
    'genomic_pos.chr',
]
results = mg.query(
    'type_of_gene:"protein-coding"',
    fields=", ".join(fields),  # does not work as list
    species="human",
    fetch_all=True,
    entrezonly=True,
)
next(results)

Outputs the expected result:

{'_id': '283450',
 '_score': 0.28663203,
 'entrezgene': '283450',
 'genomic_pos': {'chr': '12'},
 'name': 'HECT domain E3 ubiquitin protein ligase 4',
 'symbol': 'HECTD4'}

However when fields is a list rather than a string:

results = mg.query(
    'type_of_gene:"protein-coding"',
    fields=fields,  # this should work, but doesn't
    species="human",
    fetch_all=True,
    entrezonly=True,
)
next(results)
# all fields besides genomic_pos are missing
{'_id': '283450', '_score': 0.28663203, 'genomic_pos': {'chr': '12'}}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.