ga4gh / ga4gh-client Goto Github PK

Client implementation for accessing GA4GH APIs

License: Apache License 2.0

Python 100.00%

ga4gh-client's Introduction

GA4GH Client

This is a client library for using the Global Alliance for Genomics and Health (GA4GH) API. This library provides an easy Python programming interface to access GA4GH compliant servers such as the 1kgenomes.ga4gh.org server.

Installation

pip install ga4gh-client

To install the latest alpha release use

pip install --pre ga4gh_client

This installs both the client command line utility and the GA4GH client programming library.

To demonstrate the CLI try:

ga4gh_client datasets-search http://1kgenomes.ga4gh.org

To access the programming API you can use a Python console:

>>> from ga4gh.client import client
>>> c = client.HttpClient("http://1kgenomes.ga4gh.org")
>>> datasets = list(c.search_datasets())
>>> print datasets
[id: "WyIxa2dlbm9tZXMiXQ"
name: "1kgenomes"
description: "Variants from the 1000 Genomes project and GENCODE genes annotations"
]
>>>

REFERENCES

For more examples of using the GA4GH client visit this iPython notebook.
For more information about GA4GH see the GA4GH website.
Full documentation is available at read-the-docs.org.
For a quick start with the GA4GH API, please see our demo.
To configure and deploy the GA4GH server in production please see the installation page.
If you would like to contribute to the project, please see the development page.

ga4gh-client's People

Contributors

Stargazers

Watchers

Forkers

dcolligan ohsu-comp-bio saupchurch david4096 ejacox kozbo emi80 ahwagner egeulgen obonyojimmy b3ds-d01 sid10on10

ga4gh-client's Issues

Add config for client

ga4gh/ga4gh-server#537

Typing urls, etc. on the command line to launch the client is tedious. Ideally there should be some kind of optional client configuration file that can be read in and take e.g. the following options:

server url
minimalOutput flag
verbosity
logging options (log to file? etc.)

This is similar to how you might use the s3 client, where it gathers your key info from environment or configuration files.

Version installed from pypi is 0.0.5

The tags for the releases need to be changed so the latest installs by default. When doing pip install ga4gh-client I get 0.0.5

Ran into this flake error when running travis

Traceback (most recent call last):

  File "/home/travis/virtualenv/python2.7.9/bin/flake8", line 11, in <module>

    sys.exit(main())

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/main/cli.py", line 16, in main

    app.run(argv)

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/main/application.py", line 322, in run

    self._run(argv)

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/main/application.py", line 306, in _run

    self.run_checks()

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/main/application.py", line 243, in run_checks

    self.file_checker_manager.start(files)

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/checker.py", line 371, in start

    self.make_checkers(paths)

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/checker.py", line 285, in make_checkers

    if argument == filename or should_create_file_checker(filename)

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/checker.py", line 412, in __init__

    self.display_name = self.processor.filename

AttributeError: 'NoneType' object has no attribute 'filename'

Error running against GA4GH 1000 Genomes Examples

After installing the client successfully with
pip install --pre ga4gh_client --ignore-installed --no-cache-dir

When trying to run the basic sample (https://github.com/BD2KGenomics/bioapi-examples/blob/master/python_notebooks/1kg.ipynb), I am encountering the following error:

from ga4gh.client import client
c = client.HttpClient("http://1kgenomes.ga4gh.org")
datasets = list(c.search_datasets())
Traceback (most recent call last):
File "", line 1, in
File "/Library/Python/2.7/site-packages/ga4gh/client/client.py", line 89, in _run_search_request
protocol_request, object_name, protocol_response_class)
File "/Library/Python/2.7/site-packages/ga4gh/client/client.py", line 939, in _run_search_page_request
response.text, protocol_response_class)
File "/Library/Python/2.7/site-packages/ga4gh/client/client.py", line 37, in _deserialize_response
return protocol.fromJson(json_response_string, protocol_response_class)
File "/Library/Python/2.7/site-packages/ga4gh/schemas/protocol.py", line 154, in fromJson
return json_format.Parse(json, protoClass())
File "/Library/Python/2.7/site-packages/google/protobuf/json_format.py", line 369, in Parse
return ParseDict(js, message, ignore_unknown_fields)
File "/Library/Python/2.7/site-packages/google/protobuf/json_format.py", line 384, in ParseDict
parser.ConvertMessage(js_dict, message)
File "/Library/Python/2.7/site-packages/google/protobuf/json_format.py", line 415, in ConvertMessage
self._ConvertFieldValuePair(value, message)
File "/Library/Python/2.7/site-packages/google/protobuf/json_format.py", line 501, in _ConvertFieldValuePair
raise ParseError('Failed to parse {0} field: {1}'.format(name, e))
google.protobuf.json_format.ParseError: Failed to parse datasets field: Message type "ga4gh.schemas.ga4gh.Dataset" has no field named "info".

ga4gh_run_tests still not playing nice with flake8 wildcards

(of course is actually a problem in ga4gh_common)

Add option of silencing client output

It's hard to debug tests like the server's tests.end_to_end.test_client_json when there is so much client output... this should probably be part of the logging/config changes

Update 'RnaQuantification' attribute names in 'SearchRnaQuantificationsRunner'

Similar to #51 but for SearchRnaQuantificationsRunner.

G2P Client search_phenotype missing `qualifiers` parameter

Bug: Searching by qualifier is missing from the client.
(The parameter and functionality is there on server)

Work in progress snippets below ...


$ git diff  ga4gh_client/client.py
diff --git a/ga4gh_client/client.py b/ga4gh_client/client.py
index e4aeb0e..668d6db 100644
--- a/ga4gh_client/client.py
+++ b/ga4gh_client/client.py
@@ -678,7 +678,7 @@ class AbstractClient(object):

     def search_phenotype(
             self, phenotype_association_set_id=None, phenotype_id=None,
-            description=None, type_=None, age_of_onset=None):
+            description=None, qualifiers=None, type_=None, age_of_onset=None):
         """
         Returns an iterator over the Phenotypes from the server
         """
@@ -690,6 +690,8 @@ class AbstractClient(object):
             request.description = description
         if type_:
             request.type.mergeFrom(type_)
+        if qualifiers:
+            request.qualifiers.extend(qualifiers)
         if age_of_onset:
             request.age_of_onset = age_of_onset
         request.page_size = pb.int(self._page_size)


$ git diff  tests/unit/test_client.py
diff --git a/tests/unit/test_client.py b/tests/unit/test_client.py
index f599efb..af11178 100644
--- a/tests/unit/test_client.py
+++ b/tests/unit/test_client.py
@@ -56,6 +56,9 @@ class TestSearchMethodsCallRunRequest(unittest.TestCase):
         self.rnaQuantificationId = "rnaQuantificationId"
         self.expressionLevelId = "expressionLevelId"
         self.threshold = 0.0
+        self.qualifiers = [protocol.OntologyTerm(), protocol.OntologyTerm()]
+        self.qualifiers[0].id = "q0"
+        self.qualifiers[1].id = "q1"

     def testSetPageSize(self):
         testClient = client.AbstractClient()
@@ -383,6 +386,19 @@ class TestSearchMethodsCallRunRequest(unittest.TestCase):
             request, "phenotypes",
             protocol.SearchPhenotypesResponse)

+    def testSearchPhenotypeQualifiers(self):
+        request = protocol.SearchPhenotypesRequest()
+        request.phenotype_association_set_id = \
+            self.phenotype_association_set_id
+        request.qualifiers.extend(self.qualifiers)
+        request.page_size = self.pageSize
+        self.httpClient.search_phenotype(
+            phenotype_association_set_id=self.phenotype_association_set_id,
+            qualifiers=self.qualifiers)
+        self.httpClient._run_search_request.assert_called_once_with(
+            request, "phenotypes",
+            protocol.SearchPhenotypesResponse)
+
     def testSearchPhenotypeAssociationSets(self):
         request = protocol.SearchPhenotypeAssociationSetsRequest()
         request.dataset_id = self.datasetId

Useful summaries needed for default client CLI output

ga4gh/ga4gh-server#293

The current output of the CLI isn't very helpful for development, as it only outputs a single attribute for each object we read from the server. We should have something that at least allows us to identify the object in question. We can make this easier by refactoring things a little.

For example, we can have something like

class SearchVariantSetsRunner(AbstractSearchRunner):
     def __init__(self, args):
           super(SearchVariantSetsRunner, self).__init__(args) 
           request = RequestFactory(args).createGASearchVariantSetsRequest()    
           self._setRequest(request, args)
           self._method = self._httpClient.searchVariantSets

     def printObject(self, variantSet):
          print(variantSet.datasetId, variantSet.id, sep="\t")

The idea here is that we just have to implement the printObject method in each of the runner classes, and all the iteration is taken care of in the superclass.

We should then define printObjects for many of the runners, especially ReadGroupSets and Reads.

Peers

Implement the client to go along with ga4gh/ga4gh-server#1507

Update Release notes for the v0.6.0a10 release

'module' object has no attribute 'HttpClient'

Created a new virtualenv
pip install ga4gh_client --no-cache-dir --pre ga4gh_client
pip install jupiter
jupiter notebook

then run 1kg_bio_metadata_service.ipynb
(can be found here: https://github.com/achave11/bioapi-examples/blob/notebook_examples/python_notebooks/1kg_bio_metadata_service.ipynb)
step through the first code block and get the following error when trying to initialize the httpClient:

code:

import ga4gh_client as client
c = client.HttpClient("http://1kgenomes.ga4gh.org")

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-b2f2f769182e> in <module>()
      1 import ga4gh_client as client
----> 2 c = client.HttpClient("http://1kgenomes.ga4gh.org")

AttributeError: 'module' object has no attribute 'HttpClient'

Also tried in console:

>>> import ga4gh_client as client
>>> c = client.HttpClient("http://1kgenomes.ga4gh.org") 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'HttpClient'

Update 'ExpressionLevel' attribute names in 'SearchExpressionLevelsRunner'

Installing ga4gh-client (0.6.0a9) via pip and running the following command:

ga4gh_client expressionlevels-search --rnaQuantificationId WyIxa2dlbm9tZXMiLCJFLUdFVVYtMSBSTkEgUXVhbnRpZmljYXRpb24iLCJIRzAwMTA0Il0 http://1kgenomes.ga4gh.org

gives the following error:

Traceback (most recent call last):
  File "env/bin/ga4gh_client", line 11, in <module>
    sys.exit(client_main())
  File "env/lib/python2.7/site-packages/ga4gh/client/cli.py", line 1678, in client_main
    runner.run()
  File "env/lib/python2.7/site-packages/ga4gh/client/cli.py", line 767, in run
    self._output(iterator)
  File "env/lib/python2.7/site-packages/ga4gh/client/cli.py", line 773, in _textOutput
    expression.isNormalized, expression.rawReadCount,
AttributeError: 'ExpressionLevel' object has no attribute 'isNormalized'

Implement updated ontology term

To go with ga4gh/ga4gh-schemas#694

Update pypi readme

It's pretty spartan.

Fix client logging

ga4gh/ga4gh-server#324

Client logging has our messages entangled with urllib3's messages. We should probably suppress all of urllib3's messages.

Also, we don't really care about printing the debugLevel for our messages.

$ python client_dev.py -vv -O variants-search http://localhost:8000/v0.5.1
INFO:ga4gh.client:POST http://localhost:8000/v0.5.1/variantsets/search
DEBUG:ga4gh.client:json request:
DEBUG:ga4gh.client:{
    "datasetIds": [],
    "pageSize": null,
    "pageToken": null
}
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:requests.packages.urllib3.connectionpool:"POST /v0.5.1/variantsets/search HTTP/1.1" 200 4887
DEBUG:ga4gh.client:json response:
DEBUG:ga4gh.client:{
    "nextPageToken": null,
    "variantSets": [
...

Client callsets text output garbled

ga4gh/ga4gh-server#875

In the protobuf implementation, the output of calls is garbled in the client. Run

$ python client_dev.py variants-search http://localhost:8000 --pageSize=5 --callSetIds=MWtnLXAzLXN1YnNldDptdm5jYWxsOkhHMDA1MzM=

The output is no longer on a single line.

Update to protobuf 3.2

For dramatic performance improvements
https://gist.github.com/ljdursi/63780885455f5d983f4ceccd3d079150

Enable ReadTheDocs

Also:

Figure out if sphinx-argparse needs to be in requirements.txt
Update TODO.txt

Get rid of warnings in doc build

From e.g. https://travis-ci.org/ga4gh/ga4gh-client/builds/211071377

/home/travis/build/ga4gh/ga4gh-client/docs/status.rst:14: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/travis/build/ga4gh/ga4gh-client/docs/status.rst:21: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/travis/build/ga4gh/ga4gh-client/docs/status.rst:30: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/travis/build/ga4gh/ga4gh-client/docs/status.rst:34: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/travis/build/ga4gh/ga4gh-client/docs/status.rst:56: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/travis/build/ga4gh/ga4gh-client/docs/status.rst:60: WARNING: Bullet list ends without a blank line; unexpected unindent.

Add overview documentation

The client should be documented separately from the server and provide a nice overview without getting into databases, etc. The generated docs are a good start.

Implement attributes field

Implement the new attributes field to go with ga4gh/ga4gh-schemas#700.

Add documentation

The documentation is very bare-bones. Add:

more info about status/releases
the api documentation removed in ga4gh/ga4gh-server#1440

Add authentication via header

To go with

ga4gh/ga4gh-server#1470

Requests made by client with key=invalid argument

ga4gh/ga4gh-server#601

This command

python client_dev.py -O variants-search http://localhost:8000 --variantSetId ZGF0YXNldDE6MWtnLXBoYXNlMQ==

results in a request to

/variants/search?key=invalid

If the key=invalid part is not necessary (which I suspect it is not), we should not make requests with that argument.

Publish to pypi

Add FASTA output option for client list-reference-bases command

The server supports the list-reference-bases command to provide
an easy way to get reference sequence over a given interval. This
is implemented in the references-list-bases command in the command
line client. For example, we can run

$ python client_dev.py references-list-bases http://localhost:8000 R1JDaDM4LXN1YnNldDoz

which prints out the sequences to stdout. We want to add the option
to print the sequences to stdout in FASTA format. Open ga4gh/cli.py
and find ListReferenceBasesRunner.

First, fix a bug in the current implementation so it looks like

    def run(self):
        sequence = self._client.listReferenceBases(
            self._referenceId, self._start, self._end)
        print(sequence)

Then add the following to addReferencesBasesListParser

    parser.add_argument(
        "--outputFormat", "-O", choices=['text', 'fasta'], default="text",
        help=(
            "The format for sequence output. Currently supported are "
            "'text' (default), which prints the sequence out directly and"
            "'fasta', which formats the sequence into fixed width FASTA"))

Update ListReferenceBasesRunner to look like

    def run(self):
        sequence = self._client.listReferenceBases(
            self._referenceId, self._start, self._end)
        if self._outputFormat == "text":
            print(sequence)
        else:
            print("FASTA", sequence)

Try this out with the -O fasta and -O text options.
Write the FASTA output; the textwrap module will be useful
here.
Generalise the addOutputFormatArgument method to cover
our needs here as well as the JSON output elsewhere.

Search RNA Expressions by feature names rather than IDs

Remove the need to have the feature IDs looked up and placed into the expression records. Allow search by a list of names instead.

Add variant search example to readme

Variant search is more substantial than doing a dataset search

Add better instructions to install/develop

I'm having an issue that has been addressed previously (#774) in ga4gh-schema.

I'm trying to build a downstream app that uses ga4gh-client, but I'm having an issue with a protobuf dependency for pip install on both ga4gh-client and ga4gh-schema.

I thought it was resolved in the other issue, but it appears to still be a problem for ga4gh-schema installs too.

What's the recommended course of action?

(test_ga4gh)$ pip install ga4gh-client
Collecting ga4gh-client
  Using cached ga4gh_client-0.0.5.tar.gz
Collecting ga4gh_common==0.0.5 (from ga4gh-client)
Collecting ga4gh_schemas (from ga4gh-client)
  Using cached ga4gh_schemas-0.0.9.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/n0/dyt85yg950n9mjtl5cnkbfz00000gn/T/pip-build-p7E4qc/ga4gh-schemas/setup.py", line 37, in <module>
        process_schemas.main([PROTOCOL_VERSION, 'python'])
      File "scripts/process_schemas.py", line 222, in main
        pb.run(parsedArgs)
      File "scripts/process_schemas.py", line 207, in run
        protoc = self._getProtoc(destination_path)
      File "scripts/process_schemas.py", line 169, in _getProtoc
        protocs))
    Exception: Can't find a good protoc. Tried [u'/private/var/folders/n0/dyt85yg950n9mjtl5cnkbfz00000gn/T/pip-build-p7E4qc/ga4gh-schemas/python/protobuf/src/protoc']
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/n0/dyt85yg950n9mjtl5cnkbfz00000gn/T/pip-build-p7E4qc/ga4gh-schemas/

Change genotype to list value

ga4gh/ga4gh-schemas#735

search_phenotype_association_sets automatically requests pageSize 0

The search_phenotype_association_sets method (and presumably other methods that use the ga4gh.client.protocol module) automatically requests a page size of 0. This can lead to bad requests on servers not configured to explicitly handle page size 0 requests (e.g. http://rest.ensembl.org:8080/ga4gh/datasets/search).

I suggest that we change this to a positive integer default, or omit it entirely and leave it to the server to specify the page size.