Coder Social home page Coder Social logo

ga4gh-client's Introduction

http://genomicsandhealth.org/files/logo_ga.png

GA4GH Client

This is a client library for using the Global Alliance for Genomics and Health (GA4GH) API. This library provides an easy Python programming interface to access GA4GH compliant servers such as the 1kgenomes.ga4gh.org server.

Installation

pip install ga4gh-client

To install the latest alpha release use

pip install --pre ga4gh_client

This installs both the client command line utility and the GA4GH client programming library.

To demonstrate the CLI try:

ga4gh_client datasets-search http://1kgenomes.ga4gh.org

To access the programming API you can use a Python console:

>>> from ga4gh.client import client
>>> c = client.HttpClient("http://1kgenomes.ga4gh.org")
>>> datasets = list(c.search_datasets())
>>> print datasets
[id: "WyIxa2dlbm9tZXMiXQ"
name: "1kgenomes"
description: "Variants from the 1000 Genomes project and GENCODE genes annotations"
]
>>>

REFERENCES

  • For more examples of using the GA4GH client visit this iPython notebook.
  • For more information about GA4GH see the GA4GH website.
  • Full documentation is available at read-the-docs.org.
  • For a quick start with the GA4GH API, please see our demo.
  • To configure and deploy the GA4GH server in production please see the installation page.
  • If you would like to contribute to the project, please see the development page.

ga4gh-client's People

Contributors

david4096 avatar dcolligan avatar ejacox avatar emi80 avatar kozbo avatar saupchurch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ga4gh-client's Issues

Add config for client

ga4gh/ga4gh-server#537

Typing urls, etc. on the command line to launch the client is tedious. Ideally there should be some kind of optional client configuration file that can be read in and take e.g. the following options:

  • server url
  • minimalOutput flag
  • verbosity
  • logging options (log to file? etc.)

This is similar to how you might use the s3 client, where it gathers your key info from environment or configuration files.

Ran into this flake error when running travis

Traceback (most recent call last):

  File "/home/travis/virtualenv/python2.7.9/bin/flake8", line 11, in <module>

    sys.exit(main())

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/main/cli.py", line 16, in main

    app.run(argv)

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/main/application.py", line 322, in run

    self._run(argv)

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/main/application.py", line 306, in _run

    self.run_checks()

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/main/application.py", line 243, in run_checks

    self.file_checker_manager.start(files)

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/checker.py", line 371, in start

    self.make_checkers(paths)

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/checker.py", line 285, in make_checkers

    if argument == filename or should_create_file_checker(filename)

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/flake8/checker.py", line 412, in __init__

    self.display_name = self.processor.filename

AttributeError: 'NoneType' object has no attribute 'filename'

Error running against GA4GH 1000 Genomes Examples

After installing the client successfully with
pip install --pre ga4gh_client --ignore-installed --no-cache-dir

When trying to run the basic sample (https://github.com/BD2KGenomics/bioapi-examples/blob/master/python_notebooks/1kg.ipynb), I am encountering the following error:

from ga4gh.client import client
c = client.HttpClient("http://1kgenomes.ga4gh.org")
datasets = list(c.search_datasets())
Traceback (most recent call last):
File "", line 1, in
File "/Library/Python/2.7/site-packages/ga4gh/client/client.py", line 89, in _run_search_request
protocol_request, object_name, protocol_response_class)
File "/Library/Python/2.7/site-packages/ga4gh/client/client.py", line 939, in _run_search_page_request
response.text, protocol_response_class)
File "/Library/Python/2.7/site-packages/ga4gh/client/client.py", line 37, in _deserialize_response
return protocol.fromJson(json_response_string, protocol_response_class)
File "/Library/Python/2.7/site-packages/ga4gh/schemas/protocol.py", line 154, in fromJson
return json_format.Parse(json, protoClass())
File "/Library/Python/2.7/site-packages/google/protobuf/json_format.py", line 369, in Parse
return ParseDict(js, message, ignore_unknown_fields)
File "/Library/Python/2.7/site-packages/google/protobuf/json_format.py", line 384, in ParseDict
parser.ConvertMessage(js_dict, message)
File "/Library/Python/2.7/site-packages/google/protobuf/json_format.py", line 415, in ConvertMessage
self._ConvertFieldValuePair(value, message)
File "/Library/Python/2.7/site-packages/google/protobuf/json_format.py", line 501, in _ConvertFieldValuePair
raise ParseError('Failed to parse {0} field: {1}'.format(name, e))
google.protobuf.json_format.ParseError: Failed to parse datasets field: Message type "ga4gh.schemas.ga4gh.Dataset" has no field named "info".

Add option of silencing client output

It's hard to debug tests like the server's tests.end_to_end.test_client_json when there is so much client output... this should probably be part of the logging/config changes

G2P Client search_phenotype missing `qualifiers` parameter

Bug: Searching by qualifier is missing from the client.
(The parameter and functionality is there on server)

Work in progress snippets below ...


$ git diff  ga4gh_client/client.py
diff --git a/ga4gh_client/client.py b/ga4gh_client/client.py
index e4aeb0e..668d6db 100644
--- a/ga4gh_client/client.py
+++ b/ga4gh_client/client.py
@@ -678,7 +678,7 @@ class AbstractClient(object):

     def search_phenotype(
             self, phenotype_association_set_id=None, phenotype_id=None,
-            description=None, type_=None, age_of_onset=None):
+            description=None, qualifiers=None, type_=None, age_of_onset=None):
         """
         Returns an iterator over the Phenotypes from the server
         """
@@ -690,6 +690,8 @@ class AbstractClient(object):
             request.description = description
         if type_:
             request.type.mergeFrom(type_)
+        if qualifiers:
+            request.qualifiers.extend(qualifiers)
         if age_of_onset:
             request.age_of_onset = age_of_onset
         request.page_size = pb.int(self._page_size)


$ git diff  tests/unit/test_client.py
diff --git a/tests/unit/test_client.py b/tests/unit/test_client.py
index f599efb..af11178 100644
--- a/tests/unit/test_client.py
+++ b/tests/unit/test_client.py
@@ -56,6 +56,9 @@ class TestSearchMethodsCallRunRequest(unittest.TestCase):
         self.rnaQuantificationId = "rnaQuantificationId"
         self.expressionLevelId = "expressionLevelId"
         self.threshold = 0.0
+        self.qualifiers = [protocol.OntologyTerm(), protocol.OntologyTerm()]
+        self.qualifiers[0].id = "q0"
+        self.qualifiers[1].id = "q1"

     def testSetPageSize(self):
         testClient = client.AbstractClient()
@@ -383,6 +386,19 @@ class TestSearchMethodsCallRunRequest(unittest.TestCase):
             request, "phenotypes",
             protocol.SearchPhenotypesResponse)

+    def testSearchPhenotypeQualifiers(self):
+        request = protocol.SearchPhenotypesRequest()
+        request.phenotype_association_set_id = \
+            self.phenotype_association_set_id
+        request.qualifiers.extend(self.qualifiers)
+        request.page_size = self.pageSize
+        self.httpClient.search_phenotype(
+            phenotype_association_set_id=self.phenotype_association_set_id,
+            qualifiers=self.qualifiers)
+        self.httpClient._run_search_request.assert_called_once_with(
+            request, "phenotypes",
+            protocol.SearchPhenotypesResponse)
+
     def testSearchPhenotypeAssociationSets(self):
         request = protocol.SearchPhenotypeAssociationSetsRequest()
         request.dataset_id = self.datasetId

Useful summaries needed for default client CLI output

ga4gh/ga4gh-server#293

The current output of the CLI isn't very helpful for development, as it only outputs a single attribute for each object we read from the server. We should have something that at least allows us to identify the object in question. We can make this easier by refactoring things a little.

For example, we can have something like

class SearchVariantSetsRunner(AbstractSearchRunner):
     def __init__(self, args):
           super(SearchVariantSetsRunner, self).__init__(args) 
           request = RequestFactory(args).createGASearchVariantSetsRequest()    
           self._setRequest(request, args)
           self._method = self._httpClient.searchVariantSets

     def printObject(self, variantSet):
          print(variantSet.datasetId, variantSet.id, sep="\t")

The idea here is that we just have to implement the printObject method in each of the runner classes, and all the iteration is taken care of in the superclass.

We should then define printObjects for many of the runners, especially ReadGroupSets and Reads.

'module' object has no attribute 'HttpClient'

Created a new virtualenv
pip install ga4gh_client --no-cache-dir --pre ga4gh_client
pip install jupiter
jupiter notebook

code:

import ga4gh_client as client
c = client.HttpClient("http://1kgenomes.ga4gh.org") 

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-b2f2f769182e> in <module>()
      1 import ga4gh_client as client
----> 2 c = client.HttpClient("http://1kgenomes.ga4gh.org")

AttributeError: 'module' object has no attribute 'HttpClient'

Also tried in console:

>>> import ga4gh_client as client
>>> c = client.HttpClient("http://1kgenomes.ga4gh.org") 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'HttpClient'

Update 'ExpressionLevel' attribute names in 'SearchExpressionLevelsRunner'

Installing ga4gh-client (0.6.0a9) via pip and running the following command:

ga4gh_client expressionlevels-search --rnaQuantificationId WyIxa2dlbm9tZXMiLCJFLUdFVVYtMSBSTkEgUXVhbnRpZmljYXRpb24iLCJIRzAwMTA0Il0 http://1kgenomes.ga4gh.org

gives the following error:

Traceback (most recent call last):
  File "env/bin/ga4gh_client", line 11, in <module>
    sys.exit(client_main())
  File "env/lib/python2.7/site-packages/ga4gh/client/cli.py", line 1678, in client_main
    runner.run()
  File "env/lib/python2.7/site-packages/ga4gh/client/cli.py", line 767, in run
    self._output(iterator)
  File "env/lib/python2.7/site-packages/ga4gh/client/cli.py", line 773, in _textOutput
    expression.isNormalized, expression.rawReadCount,
AttributeError: 'ExpressionLevel' object has no attribute 'isNormalized'

Fix client logging

ga4gh/ga4gh-server#324

Client logging has our messages entangled with urllib3's messages. We should probably suppress all of urllib3's messages.

Also, we don't really care about printing the debugLevel for our messages.

$ python client_dev.py -vv -O variants-search http://localhost:8000/v0.5.1
INFO:ga4gh.client:POST http://localhost:8000/v0.5.1/variantsets/search
DEBUG:ga4gh.client:json request:
DEBUG:ga4gh.client:{
    "datasetIds": [],
    "pageSize": null,
    "pageToken": null
}
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:requests.packages.urllib3.connectionpool:"POST /v0.5.1/variantsets/search HTTP/1.1" 200 4887
DEBUG:ga4gh.client:json response:
DEBUG:ga4gh.client:{
    "nextPageToken": null,
    "variantSets": [
...

Client callsets text output garbled

ga4gh/ga4gh-server#875

In the protobuf implementation, the output of calls is garbled in the client. Run

$ python client_dev.py variants-search http://localhost:8000 --pageSize=5 --callSetIds=MWtnLXAzLXN1YnNldDptdm5jYWxsOkhHMDA1MzM=

The output is no longer on a single line.

Enable ReadTheDocs

Also:

  • Figure out if sphinx-argparse needs to be in requirements.txt
  • Update TODO.txt

Get rid of warnings in doc build

From e.g. https://travis-ci.org/ga4gh/ga4gh-client/builds/211071377

/home/travis/build/ga4gh/ga4gh-client/docs/status.rst:14: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/travis/build/ga4gh/ga4gh-client/docs/status.rst:21: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/travis/build/ga4gh/ga4gh-client/docs/status.rst:30: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/travis/build/ga4gh/ga4gh-client/docs/status.rst:34: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/travis/build/ga4gh/ga4gh-client/docs/status.rst:56: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/travis/build/ga4gh/ga4gh-client/docs/status.rst:60: WARNING: Bullet list ends without a blank line; unexpected unindent.

Add overview documentation

The client should be documented separately from the server and provide a nice overview without getting into databases, etc. The generated docs are a good start.

Requests made by client with key=invalid argument

ga4gh/ga4gh-server#601

This command

python client_dev.py -O variants-search http://localhost:8000 --variantSetId ZGF0YXNldDE6MWtnLXBoYXNlMQ==

results in a request to

/variants/search?key=invalid

If the key=invalid part is not necessary (which I suspect it is not), we should not make requests with that argument.

Add FASTA output option for client list-reference-bases command

The server supports the list-reference-bases command to provide
an easy way to get reference sequence over a given interval. This
is implemented in the references-list-bases command in the command
line client. For example, we can run

$ python client_dev.py references-list-bases http://localhost:8000 R1JDaDM4LXN1YnNldDoz

which prints out the sequences to stdout. We want to add the option
to print the sequences to stdout in FASTA format. Open ga4gh/cli.py
and find ListReferenceBasesRunner.

  • First, fix a bug in the current implementation so it looks like
    def run(self):
        sequence = self._client.listReferenceBases(
            self._referenceId, self._start, self._end)
        print(sequence)
  • Then add the following to addReferencesBasesListParser
    parser.add_argument(
        "--outputFormat", "-O", choices=['text', 'fasta'], default="text",
        help=(
            "The format for sequence output. Currently supported are "
            "'text' (default), which prints the sequence out directly and"
            "'fasta', which formats the sequence into fixed width FASTA"))
  • Update ListReferenceBasesRunner to look like
    def run(self):
        sequence = self._client.listReferenceBases(
            self._referenceId, self._start, self._end)
        if self._outputFormat == "text":
            print(sequence)
        else:
            print("FASTA", sequence)
  • Try this out with the -O fasta and -O text options.
  • Write the FASTA output; the textwrap module will be useful
    here.
  • Generalise the addOutputFormatArgument method to cover
    our needs here as well as the JSON output elsewhere.

Add better instructions to install/develop

I'm having an issue that has been addressed previously (#774) in ga4gh-schema.

I'm trying to build a downstream app that uses ga4gh-client, but I'm having an issue with a protobuf dependency for pip install on both ga4gh-client and ga4gh-schema.

I thought it was resolved in the other issue, but it appears to still be a problem for ga4gh-schema installs too.

What's the recommended course of action?

(test_ga4gh)$ pip install ga4gh-client
Collecting ga4gh-client
  Using cached ga4gh_client-0.0.5.tar.gz
Collecting ga4gh_common==0.0.5 (from ga4gh-client)
Collecting ga4gh_schemas (from ga4gh-client)
  Using cached ga4gh_schemas-0.0.9.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/n0/dyt85yg950n9mjtl5cnkbfz00000gn/T/pip-build-p7E4qc/ga4gh-schemas/setup.py", line 37, in <module>
        process_schemas.main([PROTOCOL_VERSION, 'python'])
      File "scripts/process_schemas.py", line 222, in main
        pb.run(parsedArgs)
      File "scripts/process_schemas.py", line 207, in run
        protoc = self._getProtoc(destination_path)
      File "scripts/process_schemas.py", line 169, in _getProtoc
        protocs))
    Exception: Can't find a good protoc. Tried [u'/private/var/folders/n0/dyt85yg950n9mjtl5cnkbfz00000gn/T/pip-build-p7E4qc/ga4gh-schemas/python/protobuf/src/protoc']
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/n0/dyt85yg950n9mjtl5cnkbfz00000gn/T/pip-build-p7E4qc/ga4gh-schemas/

search_phenotype_association_sets automatically requests pageSize 0

The search_phenotype_association_sets method (and presumably other methods that use the ga4gh.client.protocol module) automatically requests a page size of 0. This can lead to bad requests on servers not configured to explicitly handle page size 0 requests (e.g. http://rest.ensembl.org:8080/ga4gh/datasets/search).

I suggest that we change this to a positive integer default, or omit it entirely and leave it to the server to specify the page size.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.