Coder Social home page Coder Social logo

globus-sdk-python's Introduction

build status

Latest Released Version

Supported Python Versions

License

pre-commit.ci status

Import Style

Code Style

Globus SDK for Python

This SDK provides a convenient Pythonic interface to Globus APIs.

Basic Usage

Install with pip install globus-sdk

You can then import Globus client classes and other helpers from globus_sdk. For example:

Testing, Development, and Contributing

Go to the CONTRIBUTING guide for detail.

globus-sdk-python's People

Contributors

aaschaer avatar actions-user avatar ada-globus avatar bjmc avatar corpulentcoffee avatar dependabot[bot] avatar derek-globus avatar jackkordas avatar jasonalt avatar jaswilli avatar jbryan avatar jimpruyne avatar joshbryan-globus avatar khk-globus avatar kurtmckee avatar mattias-lidman avatar mh-globus avatar mikkel14 avatar nellev avatar nickolausds avatar nsoranzo avatar pn2200 avatar pre-commit-ci[bot] avatar rpwagner avatar rudyardrichter avatar sirosen avatar tylern4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

globus-sdk-python's Issues

Improving response classes

I find myself constantly forgetting to add the .data, and it seems superfluous. I think it's especially confusing for the iter responses. I propose that we add a __getitem__ to GlobusResponse that operates on the .data, so you can do response['path'] instead of response.data['path'], and make it have a default __next__ that directly exposes response['DATA']. That makes it more convenient, while still giving us a central response class that we can tweak in the future if needed.

With the current response formats, it feels like Java, not like Python.

Add a helper for Task Wait

We started talking about the potential merit of having the SDK handle Task Wait itself.

I think the simplest approach is to have a method that times out and returns True/False depending on whether or not the task completed.

Basic signature and behavior I'd like to recommend:

def task_wait(task_id, timeout=10, polling_interval=1):
    """
    Wait until a Task is complete or fails, with a time limit. If the task is "ACTIVE" after time
    runs out, returns ``False``. Otherwise returns ``True``.

    :param uuid task_id: ID of the Task to wait for completion
    :param int timeout: Number of seconds to wait in total. Minimum 1
    :param int polling_interval: Number of seconds between queries to Globus about the
    Task status. Minimum 1
    """
    ...

I think it's important to support both a timeout and a polling interval, to let clients make their own decisions about how long to wait and what to do between polling periods.
By way of example, I could see a system working multiple transfers with lax latency requirements wanting to cycle between them in a single thread.

There's a viable approach, for supporting the CLI heartbeats, of adding a generic "polling callback" that gets the results of get_task(task_id) every time it's called and is assumed to take 0 seconds.
Consider:

def task_wait(task_id, timeout=10, polling_interval=1, polling_callback=None):
    """
    ...

    :param uuid task_id: ID of the Task to wait for completion
    ...
    :param callable polling_callback: A function which takes a Task document, as
    returned by ``TransferClient.get_task``, as its only argument. Called after each
    Task status check to perform an arbitrary action between checks.

    Note: we assume that ``polling_callback`` takes 0 seconds to run.

    For example, one could use the ``polling_callback`` to print a simple message
    between checks.

    >>> def update_ticker(task_doc):
    >>>     print('.', end='')
    >>> client.task_wait(tid, polling_callback=update_ticker)

    This will print a dot every second, until the task completes or 10 seconds run out.
    """
    ...

That example replaces the following messy segment:

>>> retries = 10
>>> counter = 0
>>> done_stat = False
>>> while counter < retries and not done_stat:
...     print('.', end='')
...     done_stat = client.task_wait(tid, timeout=1)
...
......
>>> done_stat
True

To me, that ability to inject functionality between the polling calls simplifies things tremendously.
So long as we state that we assume it to take 0 seconds, it doesn't add any code complexity internally either.

Add support for 3-Legged OAuth for "Native App" clients to Globus Auth

Globus Auth is adding a concept of "Native App" clients which do a full 3-legged OAuth flow even though they do not have a client secret.
Doing this flow requires the use of a temporary PKCE secret to play the role of a client secret when getting an authorization code.

Although support for this flow in the CLI ( globus/globus-cli#16 ) is the first target, we should make sure that our bindings are sufficiently generic that they can be used for applications employing other flows.
In particular, some apps will be using an embedded browser to implement this flow, and supporting that as well as CLI use will probably cover the majority of our use cases.

After a discussion with @jaswilli, we reached the conclusion that there's little use for the state parameter passed through the OAuth flow, so we don't need to contend with the possibility of multiple simultaneous authentication flows. Handling one at a time will be sufficient for the foreseeable future.

@bd4 I think it's desirable for the SDK to encapsulate parts of this flow, as it has to generate a secret, send it, and then send a related secret to exchange the auth code for a token.
It's nice to think of a client for a given service as your way of "talking to" that service, so I think this should be built as a helper hanging off of the AuthClient object.

I'm putting together a proposed structure for this, even though the support for it in Globus Auth has not yet been released.

Add `autopep8` diff to linting validation?

@jimPruyne recently submitted a PR with what I considered a noisy diff, but it turns out that diff was generated by autopep8 in his workflow.

We can easily see what would change with an autopep8 run on the whole repo with autopep8 -r --diff globus_sdk
It doesn't look like much to me, and none of the things it changes are things I feel very strongly about (so long as we're consistent, really).

autopep8 is a good tool and I want to be friendly towards its usage.
Should we do a single run of autopep8 -r --in-place, and then add the diff command to Travis?
Or do a single run of it --in-place but not enforce it in Travis (which is more accepting, but could lead to noise when someone does run autopep8 on a module)?

@jaswilli, @corpulentcoffee, your feedback would be appreciated.

Allow UUIDs to be passed to AuthClient.get_identities

For uniformity with the TransferClient accepting a UUID type for endpoint IDs, single-identity calls to get_identities should be allowed to pass a UUID.

We can also expand the set of types it takes. For example, iterables of strings or of UUIDs are things which we can handle if desired.
Mixed type iterables are kind of nasty, but it doesn't hurt us much to support them if we want (in fact, it might be harder to ensure that we aren't getting them).

Add OAuth2 flow for client-credentials grant

Client Credentials is the typical way of getting an access token directly as a client, not on behalf of a user.
This might not fit into the existing "Flow Manager" pattern, and may be special-cased.

This work might also include a bit of a refactor to the actions that are considered generic for all flows (like get_authorize_url), since not all of them will apply.
Depending on whether this looks like an extreme exception or part of a class of flows, we'll have to make a decision there.

[Consider] Add automatic retries to request methods

We have been considering adding support for retry logic on all HTTP request methods, built into the SDK.
However, there are a few caveats and gotchas.

Replies May Change

  • Not all requests are safe/idempotent to retry.

    Some POST, PUT, and DELETE methods, like task submission, are carefully built to allow retries in the event of network failures.

    Not all APIs are so carefully put together -- and in some cases the semantics of a retry may be safe, but differ from a successful request. For example, an API may accept POST /some/path, and 200 on the first request, but 403 on subsequent POSTs. Follow-up requests may even have strange semantics like "restart processing of this request".

  • Even safely retry-able requests may have different responses on retry.

    Even if something is fine and safe, a second submission may give a 204 instead of a 200. It may even do nothing new, but return a 403 or 404, which we would translate into an error. This seems especially true of DELETEs, which should generally be safe to retry, but very reasonably can 404 after the first request arrives -- I would call that behavior convergent, rather than idempotent. Semantically, everything is fine, but we will "stupidly" translate the 404 into an error, and likely surprise someone.

  • Related things may be changing service-side.

    I'm not sure if this opens up new issues or not, but server-side state may change the semantics of a request between retries. Consider task label updates, which are not allowed on completed tasks, but are allowed on incomplete tasks. The initial update may network error (but arrive and succeed), followed by a retry which fails because the Task completes.
    Even though clients already should be accounting for the error raised, they will likely (incorrectly) assume that the request failed in its entirety.

The Caller Still Needs to be Ready for Failure

It's always possible for the retries to be exceeded, and then we're back in the same situation, demanding that the caller has good error handling.

Less Flexible than Client Retries

Whatever the client to the SDK is, it will be limited by whatever we suggest.
For example, someone might have a network that makes exponential backoff for network failures highly desirable to reduce congestion, and someone else might want linear backoff for their switch which drops 1/2 of all packets. No matter what choice we make, it will likely be the wrong choice for someone down the line.

Easier to Add than Remove

Once we've added retries, it's hard to wean people off of our retry logic.
They will start relying on it for the stability of their own applications.
If we do this, it means we need to do it right.

Paginated Calls Don't Error Until Iteration Time

There's a somewhat unexpected/odd behavior that results from the way that Python handles iterator definition.

Consider the following simple example:

>>> def err_iter():
...     raise ValueError()
...     for x in (1, 2):
...             yield x
... 
>>> xs = err_iter() # fine, but will error when we try to use it
>>> next(xs) # uh-oh!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in err_iter
ValueError

It's a bit of a subtlety of the way that iterator functions work -- it doesn't run at all until the first call to next() happens. That holds even if it will error before ever reaching the yield.

For paginated API calls, the behavior is unintuitive and confusing.
Even an unauthenticated call to instantiate the iterator will "work". It will just fail at iteration time.

It would be better to raise these errors at iteration instantiation time.
I think we can do that by attaching a little bit of state to the class instance in the PaginatedResource decorator.

Error in docs: Revoke Token is more finicky than indicated

The docstring for AuthClient.oauth2_revoke_token indicates, in its examples section, that you can do revocations with bad creds or wrong creds.
I think this gives a bad impression, since apparently the endpoint is more restrictive than that.

Although I maintain that this should be fixed to be highly permissive server-side, we should meanwhile update the docstring to be accurate.

I'm not bothering about this for 0.4.2, however.

PaginatedResource isn't iterable in Python 3

Reproduction:

~/Repos/globus-sdk$ python2
Python 2.7.11 (default, Mar 31 2016, 06:18:34) 
[GCC 5.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from globus_sdk import TransferClient
>>> for endpoint in TransferClient().endpoint_search(filter_scope='recently-used'):
...     print endpoint.data['display_name']
... 
blaqh
Globus Tutorial Endpoint 1
Globus Tutorial Endpoint 2
Globus Tutorial HTTPS Endpoint Server
MRDP Repository
>>> ^D

~/Repos/globus-sdk$ python3
Python 3.5.1 (default, Mar  3 2016, 09:29:07) 
[GCC 5.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from globus_sdk import TransferClient
>>> for endpoint in TransferClient().endpoint_search(filter_scope='recently-used'):
...     print(endpoint.data['display_name'])
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: iter() returned non-iterator of type 'PaginatedResource'
>>> ^D

Add project-wide logging using typical python loggers

The entire SDK should be logging things using python logging tools.

That way, people can turn on log levels and attach handlers to the logger named globus_sdk, and get good log output.

Important sites where this can be of use:

  • DEBUG: loading tokens and config -- explain where a client got its token from at instantiation time
  • DEBUG: request sent/received -- show full path requested + query params, log headers (redact auth header content other than type), received headers &c.
  • ERROR: show captured requests exceptions that convert to NetworkErrors

Add custom User Agent for SDK and SDK-based Applications

As a takeaway from our meeting this morning, we want to SDK-based applications to identify themselves with a unique User Agent.
This is obviously spoofable/circumventable, but it should give us nice info on who is using the SDK.

As an additional field, we want an optional app_name that can be attached to the client, which will be appended to the Globus SDK user agent.
That will allow for the CLI to do Globus Python SDK / Globus CLI as its User Agent.

Support Loading Token when Instantiating a Client

After some thought about this, I think this should be an SDK responsibility.
I think that we should have an option somewhere in ~/.globus.cfg for this, but also maybe an option in the client initializers.

Here's the issue: if I write code that talks to multiple APIs, I'll end up having a bunch of cases in which I instantiate the clients myself.
There are a couple of tempting options for how to pass an auth token to all of them uniformly, but they all seem worse than having this in the SDK itself.

People will write (with varying degrees of correctness) something like

def make_clients(token):
    ac, nc, tc = AuthClient(), NexusClient(), TransferClient()
    for c in (ac, nc, tc):
        c.set_auth_token(token)
    return ac, nc, tc

because so many calls must be authenticated.

I'd rather see set_auth_token be invoked if you pass auth_token=... to the constructor.
However, better yet, I'd like for the SDK itself to support loading the auth token from a file, like ~/.globus.cfg.

We could tell people to define something like

[default]
auth_token = ABC123

[environment beta]
auth_token = DEF456

That would make the simplest possible config file

auth_token = GHI789

I think that's appealingly simple.

I'm fighting hard to make sure that the CLI isn't just a re-implementation of curl, and I think handling the token outside of the SDK pushes me to make some compromises, like adding --headers to handle this.

@bd4 What do you think of this?
I think the ideal is that you can put it in a config file or an environment variable and things just work.
It only gets tricky when you want to support multiple tokens for multiple environments (which, of course, is something of an internal must).

Creating a share fails

Use case:

I'm writing a script that will create a share when triggered by a request from a user through our web interface.

Using this script:

shared_ep_data = {
"DATA_TYPE": "shared_endpoint",
"host_endpoint": 'e25a4bda-0636-11e6-a732-22000bf2d559',
"host_path": "/data/bulktest/greatest",
"display_name": "generated by test script",

optionally specify additional endpoint fields

"description": "my test share"
}
tc.endpoint_autoactivate('e25a4bda-0636-11e6-a732-22000bf2d559', if_expires_in=3600)
create_result = tc.create_shared_endpoint(shared_ep_data)
endpoint_id = create_result["id"]
print("new endpoint id: ", endpoint_id)

I get the following error:

rw2@macbook2:~/dev/anl/endpoints$ python3 endpoint-test.py
Traceback (most recent call last):
File "endpoint-test.py", line 41, in
create_result = tc.create_shared_endpoint(shared_ep_data)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/globus_sdk/transfer/client.py", line 340, in create_shared_endpoint
return self.post('shared_endpoint', json_body=data)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/globus_sdk/base.py", line 156, in post
response_class=response_class)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/globus_sdk/base.py", line 241, in _request
raise self.error_class(r)
globus_sdk.exc.TransferAPIError: (409, 'Exists', 'That endpoint already exists', 'CSH1MYMqq')

Refactor Getting Started docs

The current Getting Started docs still refer users to tokens.globus.org, even though that method of loading tokens is slated for removal before v1.
I think the preferred method for developers playing with the SDK to get tokens should be some form of client registration (as opposed to the CLI).

Keeping this low friction is going to be hard.
Perhaps we need to support client credentials in the config, and do some implicit token fetching.
That can be future work. Just having a quick snippet that does client credentials grants which we tell people to include isn't so bad for the interim.

Notably, if we make people register clients, the ConfidentialAppAuthClient class handles some of this. The real trick is getting a working Authorizer for TransferClients.

Require [general] section header in config, be standard INI

Although it originally seemed like a nice bit of user-friendliness to not strictly require a section heading for the [general] section, it's causing grief in the CLI already ( globus/globus-cli#12 ).

Standard INI format requires that all sections have headers, so we should just move to that.
It does mean that any SDK users will get parsing errors if we do the upgrade harshly.

We should update tokens.globus.org , then tell people on the developer listhost that we're changing this and they should update.
Since the current parser tolerates full INI files, no one will have issues who listens to us.

After that's done, we're still in beta, so I think it will be safe to just drop the special parsing code.

@bda , thoughts?

Rename TimeoutError and ConnectionError

These are standard error classes in Python 3, so we should rename them to not redefine builtins.

Maybe GlobusTimeoutError and GlobusConnectionError, to keep it clear and simple.

How much error handling should the SDK do before requests are sent to the Transfer API?

@jimPruyne mentioned in #13 that he thinks (or at least thought at one time) that the SDK should catch label='' before a task is submitted to the Transfer API.
This is part of a more general class of feature, which is handling for invalid requests to the Transfer API prior to API interactions.

I think this only applies to Transfer, since it's our oldest and most mature API, so others should just rely on non-200 responses for now.
Even if you don't agree, please keep this particular issue focused only on our handling of the Transfer API.
Otherwise, there are many rabbit holes and we might not be able to drive this to resolution.

I actually created an error class for this exact type of error handling (perhaps poorly named as InvalidDocumentBodyError), but it's currently unused because I stripped out the code that was triggering it.

In the particular case of label='' for a Task, I don't think the SDK should do validation.
It's a fairly subtle and non-generalizable error specific to task submission, and I don't see that a single request to the API matters much.
However, it becomes more of a concern if we think that a lot of work might be done between assembling the Task document and task submission.
Causing an error when label='', or perhaps when paths do or do not have requisite trailing slashes, would, in that case, move the failure to be earlier.
I'm not really convinced that's a likely scenario and important use case on which to focus.

There are some potential areas for this type of handling that make a bit more sense client-side. That could be because the SDK has a better idea of what went wrong and how to fix it in an SDKish way, or because it's not the same kind of error (or even an error at all) when it arrives at the API.
Some examples, in no particular order,

  • POST body has wrong DATA_TYPE for the API endpoint, possibly recommend using a relevant make_X_data helper
  • Transfer Task is missing submission ID, also recommend using make_X_data helper
  • Non-UUID string supplied in a field that must be a UUID (e.x. endpoint ID). This often results in a 404 from the API because the requested resource doesn't exist, but the SDK could treat as "invalid" and throw a different error.
  • Similar, attempt to use canonical endpoint name for a call that only supports endpoint IDs

Although there will always be exceptions to our rules, we shouldn't be handling these decisions on a pure case-by-case basis.
There's a little bit of bleed-over into this issue from the fact that the Transfer API gives 404s for certain requests which are definitively "wrong", like GET /endpoint/nonuuidstring, whereas maybe those should result in 400 Bad Request.

One of the stronger cases for more SDK error handling is that of invalid documents for Transfer Tasks, which suggest that the user isn't relying on make_submit_transfer_data. The API can't possibly know this, since it doesn't know what the client is (unless we add logic to the API pertaining to a custom user agent and use that agent in the SDK).
Should we do more client-side error detection for this class of errors?

Test suite expansion

Conversation topics to get us going:

  • what additional nose tests need adding to globus_sdk/tests?
  • for testing methods that are actually supposed to call live services, do we want to use real test tokens, or do we want to fake the services, or both?
  • expand to AppVeyor testing for Windows? add OS X on Travis?

Add PendingDeprecationWarnings to Nexus Client (but where?)

We mentioned this in the comments on #1

We want anyone who starts using globus_sdk.NexusClient to be aware that any and all features offered there are going away in the not-too-distant future.
This seems like an appropriate place for PendingDeprecationWarnings to come into play.

I think that we should have a single warning for the entire client in its initializer, with something along the lines of

warnings.warn(
    ('Globus Nexus provides access to features of Globus which are '
     'being moved to new services. If you use Nexus, be aware that '
     'Globus plans to deprecate the API in 2016, and you should stop '
     'using it before 2017'),
    PendingDeprecationWarning)

How long should the warning be, and what should it say? Do we want to commit in the warning to a timeline?

I feel comfortable saying we'll be done with Nexus in 2017, given that we only need to kill the requirement for GOAuth tokens to do it.

Attempt to move off of `six`

Because six==1.4.1 is embedded in the OSX system python, we get a conflict with our included version if someone does a sudo pip install globus-sdk on OSX. This is also described in #45
The result is that our six is installed behind the system version, and isn't the one loaded at import time.

Solutions that have been batted around include just doing doc about virtualenv ( #45 ), trying to modify sys.path, and moving off of six altogether.
Although the virtualenv is a best practice, and ultimately the safest solution, it's desirable to let people do the global pip install if we can support it without too much effort.
The push to do that becomes somewhat stronger when you consider that these dependencies are inherited by the CLI.

The current usage sites for six are as follows:

  • six.add_metaclass on base Authorizer
  • six.moves.configparser in config loading
  • six.moves.urllib.parse in a couple of locations
  • six.Iterator as a parent of PaginatedResource

The use of six.moves is pretty easy to work around, since the things that are renamed generally haven't changed their signatures. Even if they have, we can wrap methods with our own helpers to handle this.
six.Iterator is a convenience like the six.moves module, so that's also not a big deal.

If we had meaningful things in six.add_metaclass, this could get really painful trying to define 2-to-3 compatible metaclasses, but we're just using it to make an abstract base class non-instantiable.
That's not really important -- having all of its methods raise NotImplementedError communicates the same thing.

Unless there's a gotcha somewhere in this, we should just go ahead and do it.
This does not obsolete #45 -- that's still a best practice and we should encourage it.

Creating a share fails

Use case:

I'm writing a script that will create a share when triggered by a request from a user through our web interface.

Using this script:

shared_ep_data = {
"DATA_TYPE": "shared_endpoint",
"host_endpoint": 'e25a4bda-0636-11e6-a732-22000bf2d559',
"host_path": "/data/bulktest/greatest",
"display_name": "generated by test script",

optionally specify additional endpoint fields

"description": "my test share"
}
tc.endpoint_autoactivate('e25a4bda-0636-11e6-a732-22000bf2d559', if_expires_in=3600)
create_result = tc.create_shared_endpoint(shared_ep_data)
endpoint_id = create_result["id"]
print("new endpoint id: ", endpoint_id)

I get the following error:

rw2@macbook2:~/dev/anl/endpoints$ python3 endpoint-test.py
Traceback (most recent call last):
File "endpoint-test.py", line 41, in
create_result = tc.create_shared_endpoint(shared_ep_data)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/globus_sdk/transfer/client.py", line 340, in create_shared_endpoint
return self.post('shared_endpoint', json_body=data)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/globus_sdk/base.py", line 156, in post
response_class=response_class)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/globus_sdk/base.py", line 241, in _request
raise self.error_class(r)
globus_sdk.exc.TransferAPIError: (409, 'Exists', 'That endpoint already exists', 'CSH1MYMqq')

Rename Service Clients to not conflict with OAuth2 usage of the word "Client"?

As I've started working on some more thorough tutorial documents, the word "Client" is a sticking point.

Often, when talking about Auth concepts, we end up focused on things like the client_id and having to involve the language of OAuth2 Clients.
That seems inherently confusing when we've been calling the service adapters TransferClient, AuthClient, etc.

Is there an alternative, suitable name?
One option is to simply name them by the services that they represent, so it becomes

TransferClient -> GlobusTransfer

and

AuthClient -> GlobusAuth

But we would presumably keep the OAuth2-oriented subclasses of GlobusAuth named by the type of "OAuth2 Client" that they represent.

Is this just a recipe for more confusion?

server resources don't allow integer server ids

If you try to round trip a numeric server id received from JSON API responses, you get a confusing error: TypeError: quote_from_bytes() expected bytes. Probably just need to wrap the server id in an str.

Add validate_token to AuthClient

As soon as this is added to the docs site, we should add this to either AuthClient or ConfidentialAppAuthClient.
We should wait for the docs to be updated so that we can put a proper External Documentation section into the docstring.

@mattias-lidman or @bjmc, can you confirm that this does or does not work for Native App clients?
I would assume that, like introspect, it requires an ID and secret.

If this is the superior call for basic validity checking, we should be sure to update the docstring on introspect to recommend using it instead where possible.

Replace all usage of sphinx :param: lists

In our docstrings, a number of things are still documented using the Sphinx parameter lists, as in

"""
:param name: Foo a bar with ``name``!
:param othervar: Baz a biz with ``othervar``?
"""

These were fine as long as we kept our descriptions extremely succinct, but more and more information has been creeping into these lists. Eventually, the rendering breaks down and we get weird strings like

Para-
meters: ...

To resolve, just switch to the newer style docstrings I've started using:

"""
**Parameters**

    ``param_name``
      Description text
"""

Missing Transfer API Helpers (other than management console)

Kyle mentioned to me that shared endpoint creation is missing.

Adding that is easy, but we should try to do make sure there are no other misses.
Trying to build a list just now, this is what I came up with.

  • Create Shared Endpoint
  • Get Task pause info
  • Get effective endpoint pause rules

Although the pause related ones are tied to the management console, they're out in other parts of the API, so I think we should add them.

EDIT: removed the ones where I was mistaken.

Restructure docs to have a Tutorials and Examples section (or two sections, for those things)

I think the current structure of the docs is not conducive to learning and easy access.

We should have a tutorial as part of the getting started component, and a suite of examples as an appendix.
It doesn't make a lot of sense for the GlobusAuthorizer docs to have a large example section embedded in the middle of it, and the structured code documentation generally seems mixed up in the middle of the high-level meta.

This is a clear separation that I'd like for us to enforce -- I think it will clean things up significantly.

Expose underlying API error data more directly / easily

Right now, each GlobusAPIError contains an _underlying_response object.
If you want to catch API errors and do something with the full JSON or text document of the error without subclassing GlobusAPIError, you have to manually pull out the response object and inspect it.

Because our error handling can sometimes be incomplete in the fields it presents, and especially because complex document types are sometimes embedded in errors, I'd like us to expose this error information more directly.

Can we add GlobusAPIError.raw_json and GlobusAPIError.raw_text as direct ways of accessing the underlying parsed JSON and text body of the response?
I have a use-case for this that isn't against Auth or Transfer as I'm prototyping things, but it also seems like it would be useful for the more complex errors exposed by Auth.

Token Introspect Doesn't Support "include" parameter

I ran into this while trying to use the introspect call.

Per the docs, you have the include the additional parameter: {'include': 'identities_set'} to get a full identity set as part of the introspect response.

For now I've got my code using AuthClient.post directly, so this is easy enough to work around, but we should add support for that parameter out of the box.
Since that's the only supported value, and that most introspect callers will want it, I would even argue that it should be set by default.

Safe string conversions can be evaded by objects with `__str__` returning bytestrings

The safe string conversion we added for the TransferClient to handle UUID endpoint IDs is not bulletproof.

If it sees a bytestring directly, it will re-encode it as a string type.
However, if it sees an unknown type and relies on __str__ to produce a string type, it may be disappointed.
Well-behaved types should have __str__ returning a bytestring or raw string in py2 and returning unicode in py3, but apparently not even uuid.UUID meets that definition of "well-behaved".

If this turns out to be a problem in the future, we may have to more strictly sanitize things that we'd really like to be strings or unicodes by capturing the results of __str__ and trying our song-and-dance over again.

Add documentation for doing setup on OSX using virtualenv

Part of our setup guide needs to address virtualenvs for OSX users.
Because the system python on OSX is a battleground between the user's needs and the system's needs, it's not safe to rely upon.

More to the point: OSX ships with a stale version of six, which loads before any user-installed version.
To get around this, we should tell people to run in virtualenv.

Rename "start flow" methods

Currently all the Auth clients have a method to initialize the authentication flow, e.g., NativeAppAuthClient.oauth2_start_flow_native_app.

As discussed, let's drop the client-specific name in the method and rename them to be something like oauth2_start_flow.

TransferData and DeleteData should not call transfer_client.get_submission_id() in their __init__()s if kwargs already has a submission_id

If the caller provides a submission_id, this is a wasted call, adds unnecessary delay, and prevents dry-run testings without a valid token.

Additionally, if the caller explicitly passes a submission_id kwarg, but its value is None, then the behavior should be the same as if they passed no submission_id at all (i.e. a get_submission_id() call should indeed be made and the self.update(kwargs) call that happens at the end of initialization should not clobber it).

We should probably just explicitly name submission_id in the __init__() arguments list with a default of None, and explicitly do the right thing in the function body before the self.update() call happens (similar to sync_level and label).

Establish a consistent pattern of Data Dict vs. GlobusResponse for client return values

I've complicated our situation with regard to client methods by introducing the iterators.
Any iterable call yields dictionary objects consisting of individual documents.
That works out really tidily for the Transfer API, where you get an iterator of dicts with DATA_TYPE on each one of them, and everyone is happy.

However, things really start to fall apart when we go back to non-iterable calls.
Those calls used to return a GlobusResponse object, but now they can optionally return data dicts.

For example, TransferClient.get_endpoint() returns a GlobusResponse, but AuthClient.get_identities() returns a dict.

I'm not really concerned about NexusClient.get_goauth_token() because that's deprecated and slated for removal.
I just want us to be consistent among the calls that don't have a behavior that forbids them from returning GlobusResponse.

Since some calls won't have a valid json_body attribute (e.x. they return raw text or HTML), I think the best approach is that we have two forms of calls

  1. Is iterable, every item in the iterator is a dict
  2. Is not iterable, return type is GlobusResponse

Does this seem acceptable? Having a mix of GlobusResponse some places and GlobusResponse.json_body in others has confused me a couple of times, and will likely do the same for others.

An alternative is to actually wrap the results of iterable calls in artificial GlobusResponse objects -- I'm not at all opposed to this, if we want to do that, so long as we do it everywhere.

Figure out how to manage multiple access tokens + refresh tokens in Client objects

I had a bit of discussion with @jaswilli about this, and it's pretty clear that our thinking to-date about how to manage tokens isn't going to work for more sophisticated use cases, especially as refresh tokens come into play.
We need to design something more complete.
For now, I'm focusing on the model in memory, and assuming that we can figure something out to represent it in the config when we're done.

Consider the case of an application that wants to use TransferClient to perform a set of operations, and already has a Transfer Access Token + Transfer Refresh Token in hand.

We could do this:

tc = TransferClient(access_token=transfer_access_token)
...

However, if the interpreter instance is long-lived or if there's a long delay between getting the access token and this instantiation of the client, we risk having tc throw a TransferAPIError with code 401 at any time, during any operation, as the token expires.
Even without a long period of time passing, the access token could be invalidated in the middle of an operation.

The catastrophically bad case is that of a paginated call which expires part-way through:

for item in tc.operation_ls(...):
    call_func(item)

could throw a 401 any time that the loop ends and triggers next() on the PaginatedResource.

Even if we tell people that they have to be ready to catch a 401 and refresh, the logic to resume pagination is fairly tricky, and we're losing a lot of the value of the iteration interface.


I'm going to enumerate the options I can think of as I think of them.
There are many more than what I'm listing here.

Option 1: Give TransferClient Refresh Tokens, Catch 401s Internally

My first thought was to give the TransferClient its refresh tokens directly, and have it handle things.
However, this starts to devolve almost immediately:

tc = TransferClient(access_token=transfer_access_token, refresh_token=transfer_refresh_token, auth_client_id=...)
tc.operation_ls(...)

TransferClient methods are then able to talk to auth.globus.org to refresh their tokens whenever they get a 401.
However, it's very weird that the client consumes an auth_client_id.
Without that ID, it's not possible to do the refresh token flow.
It's also a bit aberrant that TransferClient is talking to auth.globus.org -- that seems like a violation of the TransferClient / AuthClient separation.

Option 2: Give TransferClient an AuthClient to use, Catch 401s Internally

The logical next step is to give TransferClient a reference to an AuthClient in addition to its tokens.

ac = AuthClient(client_id=...)
tc = TransferClient(access_token=transfer_access_token, refresh_token=transfer_refresh_token, auth_client=ac)

Now, if we're inside of a method body, we'd do something like this:

def operation_mkdir(self, endpoint_id, path, **params):
    resource_path = self.qjoin_path("operation/endpoint", endpoint_id, "mkdir")
    json_body = {'DATA_TYPE': 'mkdir', 'path': path}
    try:
        return self.post(resource_path, json_body=json_body, params=params)
     except GlobusTransferAuthRequiredError:
        self.access_token = self.auth_client.refresh(self.refresh_token)
        return self.post(resource_path, json_body=json_body, params=params)

This is starting to look promising, but there are a couple of problems:

  • What if the request mutates data in addition to returning a 401?
  • Extra network IO when we should really know whether or not the token is about to expire (Auth gives us the expires_in field)
  • Will handle very poorly in the PaginatedResource code, which must now see the TransferClient and AuthClient to update the access tokens

That first case seems like one we should rule out by just Not Letting APIs Do That.
If we can't make rules about APIs, this is going to get too painful.
A token could be invalidated at any time, so any call can produce errors -- we need to be able to reason about the retry logic in that case.

It would be nice to handle the extra IO load by checking expires_in, but that's not always possible.
If we refreshed recently, it should be.
I'd like to bundle that check with the error handling in some way, if possible.

Options 3: BaseClient accepts an AuthClient + Tokens, Check + Refresh Happens in BaseClient._request

This looks a little weird because BaseClient is a parent of AuthClient, but I think it works quite well.

Using Transfer as an example still,

ac = AuthClient(client_id=...)
tc = TransferClient(access_token=transfer_access_token, refresh_token=transfer_refresh_token, auth_client=ac)

This is the same as above, but then, rather than having tc.operation_mkdir handle the logic, it goes into BaseClient._request something like this:

# conditions necessary for a refresh to be at all possible
can_refresh = (self.auth_type == self.AUTHTYPE_TOKEN and
               self.auth_client is not None and self.refresh_token is not None)

if can_refresh:
    # looks up the access token + (nullable) expires_in for the given refresh token
    # refreshes the token if it's near expiry, return the access token
    self.set_token(self.auth_client.refresh(self.refresh_token, check_expiry=True))

r = self._session.request(method=method, url=url, headers=rheaders, params=params,
                          data=text_body, verify=self._verify)

# allow_refresh_and_retry is a kwarg to all request methods, defaults to True
# it is an explicit control on this behavior, for any endpoints where a retry after 401 is considered
# dangerous
if r.status_code == 401 and can_refresh and allow_refresh_and_retry:
    self.set_token(self.auth_client.refresh(self.refresh_token))
    r = self._session.request(method=method, url=url, headers=rheaders, params=params,
                              data=text_body, verify=self._verify)

This is not meant to be what we actually write in BaseClient -- I've just simplified it a bit to show the kind of thinking I'm engaged in.

The thing I dislike the most is the relationship between BaseClient and AuthClient that it creates.
There's a bidirectional dependency there, and although no circular imports are being created it's still more complex than I would like.

Another variant on this is to have a AuthTokenRefreshManager object, which can be easily synthesized from an AuthClient object, and use that in BaseClient as well.
It doesn't really solve the relationship between a BaseClient and an AuthClient, but it makes the object model maybe somewhat cleaner.

Test on windows

We have windows in the pypi classifiers, but it's untested, and the config file handling may behave strangely on windows because of the use of expanduser.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.