datadotworld / data.world-py Goto Github PK

View Code? Open in Web Editor NEW

101.0 42.0 30.0 2.12 MB

Python package for data.world

Home Page: https://data.world/integrations/python

License: Apache License 2.0

Python 99.96% Makefile 0.04%

datasets open-data reference-implementation api-client dwstruct-t01-dist

data.world-py's Introduction

data.world-py

A python library for working with data.world datasets.

This library makes it easy for data.world users to pull and work with data stored on data.world. Additionally, the library provides convenient wrappers for data.world APIs, allowing users to create and update datasets, add and modify files, etc, and possibly implement entire apps on top of data.world.

Quick start

Install

You can install it using pip directly from PyPI:

pip install datadotworld

Optionally, you can install the library including pandas support:

pip install datadotworld[pandas]

If you use conda to manage your python distribution, you can install from the community-maintained [conda-forge](https://conda-forge.github.io/) channel:

conda install -c conda-forge datadotworld-py

Configure

This library requires a data.world API authentication token to work.

Your authentication token can be obtained on data.world once you enable Python under Integrations > Python

To configure the library, run the following command:

dw configure

Alternatively, tokens can be provided via the DW_AUTH_TOKEN environment variable. On MacOS or Unix machines, run (replacing <YOUR_TOKEN>> below with the token obtained earlier):

export DW_AUTH_TOKEN=<YOUR_TOKEN>

Load a dataset

The load_dataset() function facilitates maintaining copies of datasets on the local filesystem. It will download a given dataset's datapackage and store it under ~/.dw/cache. When used subsequently, load_dataset() will use the copy stored on disk and will work offline, unless it's called with force_update=True or auto_update=True. force_update=True will overwrite your local copy unconditionally. auto_update=True will only overwrite your local copy if a newer version of the dataset is available on data.world.

Once loaded, a dataset (data and metadata) can be conveniently accessed via the object returned by load_dataset().

Start by importing the datadotworld module:

import datadotworld as dw

Then, invoke the load_dataset() function, to download a dataset and work with it locally. For example:

intro_dataset = dw.load_dataset('jonloyens/an-intro-to-dataworld-dataset')

Dataset objects allow access to data via three different properties raw_data, tables and dataframes. Each of these properties is a mapping (dict) whose values are of type bytes, list and pandas.DataFrame, respectively. Values are lazy loaded and cached once loaded. Their keys are the names of the files contained in the dataset.

For example:

>>> intro_dataset.dataframes
LazyLoadedDict({
    'changelog': LazyLoadedValue(<pandas.DataFrame>),
    'datadotworldbballstats': LazyLoadedValue(<pandas.DataFrame>),
    'datadotworldbballteam': LazyLoadedValue(<pandas.DataFrame>)})

IMPORTANT: Not all files in a dataset are tabular, therefore some will be exposed via raw_data only.

Tables are lists of rows, each represented by a mapping (dict) of column names to their respective values.

For example:

>>> stats_table = intro_dataset.tables['datadotworldbballstats']
>>> stats_table[0]
OrderedDict([('Name', 'Jon'),
             ('PointsPerGame', Decimal('20.4')),
             ('AssistsPerGame', Decimal('1.3'))])

You can also review the metadata associated with a file or the entire dataset, using the describe function. For example:

>>> intro_dataset.describe()
{'homepage': 'https://data.world/jonloyens/an-intro-to-dataworld-dataset',
 'name': 'jonloyens_an-intro-to-dataworld-dataset',
 'resources': [{'format': 'csv',
   'name': 'changelog',
   'path': 'data/ChangeLog.csv'},
  {'format': 'csv',
   'name': 'datadotworldbballstats',
   'path': 'data/DataDotWorldBBallStats.csv'},
  {'format': 'csv',
   'name': 'datadotworldbballteam',
   'path': 'data/DataDotWorldBBallTeam.csv'}]}
>>> intro_dataset.describe('datadotworldbballstats')
{'format': 'csv',
 'name': 'datadotworldbballstats',
 'path': 'data/DataDotWorldBBallStats.csv',
 'schema': {'fields': [{'name': 'Name', 'title': 'Name', 'type': 'string'},
                       {'name': 'PointsPerGame',
                        'title': 'PointsPerGame',
                        'type': 'number'},
                       {'name': 'AssistsPerGame',
                        'title': 'AssistsPerGame',
                        'type': 'number'}]}}

Query a dataset

The query() function allows datasets to be queried live using SQL or SPARQL query languages.

To query a dataset, invoke the query() function. For example:

results = dw.query('jonloyens/an-intro-to-dataworld-dataset', 'SELECT * FROM DataDotWorldBBallStats')

Query result objects allow access to the data via raw_data, table and dataframe properties, of type json, list and pandas.DataFrame, respectively.

For example:

>>> results.dataframe
      Name  PointsPerGame  AssistsPerGame
0      Jon           20.4             1.3
1      Rob           15.5             8.0
2   Sharon           30.1            11.2
3     Alex            8.2             0.5
4  Rebecca           12.3            17.0
5   Ariane           18.1             3.0
6    Bryon           16.0             8.5
7     Matt           13.0             2.1

Tables are lists of rows, each represented by a mapping (dict) of column names to their respective values. For example:

>>> results.table[0]
OrderedDict([('Name', 'Jon'),
             ('PointsPerGame', Decimal('20.4')),
             ('AssistsPerGame', Decimal('1.3'))])

To query using SPARQL invoke query() using query_type='sparql', or else, it will assume the query to be a SQL query.

Just like in the dataset case, you can view the metadata associated with a query result using the describe() function.

For example:

>>> results.describe()
{'fields': [{'name': 'Name', 'type': 'string'},
            {'name': 'PointsPerGame', 'type': 'number'},
            {'name': 'AssistsPerGame', 'type': 'number'}]}

Work with files

The open_remote_file() function allows you to write data to or read data from a file in a data.world dataset.

Writing files

The object that is returned from the open_remote_file() call is similar to a file handle that would be used to write to a local file - it has a write() method, and contents sent to that method will be written to the file remotely.

>>> import datadotworld as dw
>>>
>>> with dw.open_remote_file('username/test-dataset', 'test.txt') as w:
...   w.write("this is a test.")
>>>

Of course, writing a text file isn't the primary use case for data.world - you want to write your data! The return object from open_remote_file() should be usable anywhere you could normally use a local file handle in write mode - so you can use it to serialize the contents of a PANDAS DataFrame to a CSV file...

>>> import pandas as pd
>>> df = pd.DataFrame({'foo':[1,2,3,4],'bar':['a','b','c','d']})
>>> with dw.open_remote_file('username/test-dataset', 'dataframe.csv') as w:
...   df.to_csv(w, index=False)

Or, to write a series of dict objects as a JSON Lines file...

>>> import json
>>> with dw.open_remote_file('username/test-dataset', 'test.jsonl') as w:
...   json.dump({'foo':42, 'bar':"A"}, w)
...   json.dump({'foo':13, 'bar':"B"}, w)
>>>

Or to write a series of dict objects as a CSV...

>>> import csv
>>> with dw.open_remote_file('username/test-dataset', 'test.csv') as w:
...   csvw = csv.DictWriter(w, fieldnames=['foo', 'bar'])
...   csvw.writeheader()
...   csvw.writerow({'foo':42, 'bar':"A"})
...   csvw.writerow({'foo':13, 'bar':"B"})
>>>

And finally, you can write binary data by streaming bytes or bytearray objects, if you open the file in binary mode...

>>> with dw.open_remote_file('username/test-dataset', 'test.txt', mode='wb') as w:
...   w.write(bytes([100,97,116,97,46,119,111,114,108,100]))

Reading files

You can also read data from a file in a similar fashion

>>> with dw.open_remote_file('username/test-dataset', 'test.txt', mode='r') as r:
...   print(r.read)

Reading from the file into common parsing libraries works naturally, too - when opened in 'r' mode, the file object acts as an Iterator of the lines in the file:

>>> with dw.open_remote_file('username/test-dataset', 'test.txt', mode='r') as r:
...   csvr = csv.DictReader(r)
...   for row in csvr:
...      print(row['column a'], row['column b'])

Reading binary files works naturally, too - when opened in 'rb' mode, read() returns the contents of the file as a byte array, and the file object acts as an iterator of bytes:

>>> with dw.open_remote_file('username/test-dataset', 'test', mode='rb') as r:
...   bytes = r.read()

Additional API Features

For a complete list of available API operations, see official documentation.

Python wrappers are implemented by the ApiClient class. To obtain an instance, simply call api_client. For example:

client = dw.api_client

The client currently implements the following functions:

create_dataset
update_dataset
replace_dataset
get_dataset
delete_dataset
add_files_via_url
append_records
upload_files
upload_file
delete_files
sync_files
download_dataset
download_file
get_user_data
fetch_contributing_datasets
fetch_liked_datasets
fetch_datasets
fetch_contributing_projects
fetch_liked_projects
fetch_projects
get_project
create_project
update_project
replace_project
add_linked_dataset
remove_linked_dataset
delete_project
get_insight
get_insights_for_project
create_insight
replace_insight
update_insight
delete_insight
search_resources
create_new_tables
create_new_connections

For a few examples of what the ApiClient can be used for, see below.

Add files from URL

The add_files_via_url() function can be used to add files to a dataset from a URL. This can be done by specifying files as a dictionary where the keys are the desired file name and each item is an object containing url, description and labels.

For example:

>>> client = dw.api_client()
>>> client.add_files_via_url('username/test-dataset', files={'sample.xls': {'url':'http://www.sample.com/sample.xls', 'description': 'sample doc', 'labels': ['raw data']}})

Append records to stream

The append_record() function allows you to append JSON data to a data stream associated with a dataset. Streams do not need to be created in advance. Streams are automatically created the first time a streamId is used in an append operation.

For example:

>>> client = dw.api_client()
>>> client.append_records('username/test-dataset','streamId', {'data': 'data'})

Contents of a stream will appear as part of the respective dataset as a .jsonl file.

You can find more about those functions using help(client)

data.world-py's People

Contributors

Stargazers

Watchers

data.world-py's Issues

Progress bar for long-running operations in interactive mode

Potentially long-running operations, especially load_dataset() and query() should show users a progress bar when used in interactive mode (e.g. python shell / iPython).

Tests broken on handling of datapackages

the test_tables test in test_dataset.py is broken - it APPEARS that the issue is that a newer version of libraries pulled in vi datapackage

Support new /streams endpoint

Update api_client.py and autogen swagger classes to add support for new /streams endpoint.

Manually test 1.4.3 updates & update README accordingly

Please take a moment to manually test all new features introduced in 1.4.3. If possible, follow the steps in the current README and update it accordingly.

Publish docs on readthedocs.io automatically

Add a way to set the token in Python

This way I can handle multiple users at once.

Support all new API paths (up to API v0.9.0)

I got this list by running.

import requests
import json
api_swagger = requests.get('https://api.data.world/v0/swagger.json').text
api_swagger = api_swagger.encode('ascii', 'ignore').decode('ascii')
api_swagger = json.loads(api_swagger) 
python_swagger = requests.get('https://raw.githubusercontent.com/datadotworld/data.world-py/master/datadotworld/client/swagger-dwapi-def.json').text
python_swagger = python_swagger.encode('ascii', 'ignore').decode('ascii')
python_swagger = json.loads(python_swagger) 
print set(api_swagger['paths'].keys()) - set(python_swagger['paths'].keys())

New schema-import command

Something like:

dw schema-import --dataset shad/testing --csv schema.csv

The CSV would be the simplified format that we export from the UI. (note: ignore the Type column)

Optionally, the matching command might be nice (but not essential)

dw schema-export -d shad/testing --csv > schema.csv

Use dict instead of OrderedDict for python 3.6+

Normal dict are ordered starting in Python 3.6. The use of OrderedDict there could be avoided and would result in better users experience.

SPARQL queries cannot accept URI-valued parameters

When sending a SPARQL query with parameters, all python values are converted into boolean, integer, decimal, or assumed to be strings - it's not possible to pass a value that is meant to be treated as a URI.

since there's no deterministic way to determine whether the string value 'http://something.com/whatever' is meant to be treated as a string or a URI, we need something in the Python type system to differentiate the two cases. Since the current behavior is to treat them as strings, we should make that the default for backwards compatibility - and that's a reasonable default in any case.

I'm proposing to add a simple "wrapper type" UriParam that can be used to indicate that a parameter value is meant to be treated as a URI - then the type mediation code can do the right thing to convert the value into a URI parameter

Failed building wheel for cchardet

It won't complete my pip install of datadotworld, with or without pandas. Sort of new to this so I have no idea what's going on with this error and couldn't find it online. I think the error is at the end where I made the text bold (same as the title). Running macOS Sierra. Here's what terminal is saying:

`Collecting datadotworld
Using cached datadotworld-1.4.2-py2.py3-none-any.whl
Requirement already satisfied: python-dateutil<3.0a,>=2.6.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Requirement already satisfied: requests<3.0a,>=2.0.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Requirement already satisfied: certifi>=2017.04.17 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Collecting datapackage<1.0a,>=0.8.8 (from datadotworld)
Using cached datapackage-0.8.9-py2.py3-none-any.whl
Collecting urllib3<2.0a,>=1.15 (from datadotworld)
Using cached urllib3-1.22-py2.py3-none-any.whl
Requirement already satisfied: configparser<4.0a,>=3.5.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Collecting jsontableschema<1.0a,>=0.10.0 (from datadotworld)
Using cached jsontableschema-0.10.1-py2.py3-none-any.whl
Requirement already satisfied: click<7.0a,>=6.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Requirement already satisfied: six<2.0a,>=1.5.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Collecting tabulator<=1.4.1 (from datadotworld)
Using cached tabulator-1.4.1-py2.py3-none-any.whl
Requirement already satisfied: unicodecsv<1.0a,>=0.14 in /Applications/anaconda/lib/python2.7/site-packages (from datapackage<1.0a,>=0.8.8->datadotworld)
Requirement already satisfied: jsonschema<3.0a,>=2.5 in /Applications/anaconda/lib/python2.7/site-packages (from datapackage<1.0a,>=0.8.8->datadotworld)
Collecting rfc3986<1.0,>=0.4 (from jsontableschema<1.0a,>=0.10.0->datadotworld)
Using cached rfc3986-0.4.1-py2.py3-none-any.whl
Collecting isodate<1.0,>=0.5.4 (from jsontableschema<1.0a,>=0.10.0->datadotworld)
Using cached isodate-0.6.0-py2.py3-none-any.whl
Collecting future<1.0,>=0.15 (from jsontableschema<1.0a,>=0.10.0->datadotworld)
Requirement already satisfied: jsonlines<2.0,>=1.1 in /Applications/anaconda/lib/python2.7/site-packages (from tabulator<=1.4.1->datadotworld)
Requirement already satisfied: sqlalchemy<2.0,>=1.1 in /Applications/anaconda/lib/python2.7/site-packages (from tabulator<=1.4.1->datadotworld)
Collecting cchardet<2.0,>=1.0 (from tabulator<=1.4.1->datadotworld)
Using cached cchardet-1.1.3.tar.gz
Collecting ijson<3.0,>=2.0 (from tabulator<=1.4.1->datadotworld)
Using cached ijson-2.3-py2.py3-none-any.whl
Collecting linear-tsv<2.0,>=1.0 (from tabulator<=1.4.1->datadotworld)
Requirement already satisfied: xlrd<2.0,>=1.0 in /Applications/anaconda/lib/python2.7/site-packages (from tabulator<=1.4.1->datadotworld)
Requirement already satisfied: openpyxl<3.0,>=2.4 in /Applications/anaconda/lib/python2.7/site-packages (from tabulator<=1.4.1->datadotworld)
Requirement already satisfied: functools32 in /Applications/anaconda/lib/python2.7/site-packages (from jsonschema<3.0a,>=2.5->datapackage<1.0a,>=0.8.8->datadotworld)
Requirement already satisfied: jdcal in /Applications/anaconda/lib/python2.7/site-packages (from openpyxl<3.0,>=2.4->tabulator<=1.4.1->datadotworld)
Requirement already satisfied: et_xmlfile in /Applications/anaconda/lib/python2.7/site-packages (from openpyxl<3.0,>=2.4->tabulator<=1.4.1->datadotworld)
Building wheels for collected packages: cchardet
Running setup.py bdist_wheel for cchardet ... error
Complete output from command /Applications/anaconda/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-build-KxbsQY/cchardet/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/tmpIPYH1npip-wheel- --python-tag cp27:
cythonize: ['src/cchardet/_cchardet.pyx']
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-2.7
creating build/lib.macosx-10.7-x86_64-2.7/cchardet
copying src/cchardet/init.py -> build/lib.macosx-10.7-x86_64-2.7/cchardet
copying src/cchardet/version.py -> build/lib.macosx-10.7-x86_64-2.7/cchardet
running build_ext
building 'cchardet._cchardet' extension
creating build/temp.macosx-10.7-x86_64-2.7
creating build/temp.macosx-10.7-x86_64-2.7/src
creating build/temp.macosx-10.7-x86_64-2.7/src/cchardet
creating build/temp.macosx-10.7-x86_64-2.7/src/ext
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet/src
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet/src/base
gcc -fno-strict-aliasing -I/Applications/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Isrc/ext/libcharsetdetect/mozilla/extensions/universalchardet/src/base/ -Isrc/ext/libcharsetdetect/nspr-emu/ -Isrc/ext/libcharsetdetect/ -I/Applications/anaconda/include/python2.7 -c src/cchardet/_cchardet.cpp -o build/temp.macosx-10.7-x86_64-2.7/src/cchardet/_cchardet.o

Agreeing to the Xcode/iOS license requires admin privileges, please run “sudo xcodebuild -license” and then retry this command.

error: command 'gcc' failed with exit status 69

Failed building wheel for cchardet
Running setup.py clean for cchardet
Failed to build cchardet
Installing collected packages: cchardet, ijson, linear-tsv, tabulator, rfc3986, isodate, future, jsontableschema, datapackage, urllib3, datadotworld
Running setup.py install for cchardet ... error
Complete output from command /Applications/anaconda/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-build-KxbsQY/cchardet/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-BpLyfT-record/install-record.txt --single-version-externally-managed --compile:
cythonize: ['src/cchardet/_cchardet.pyx']
running install
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-2.7
creating build/lib.macosx-10.7-x86_64-2.7/cchardet
copying src/cchardet/init.py -> build/lib.macosx-10.7-x86_64-2.7/cchardet
copying src/cchardet/version.py -> build/lib.macosx-10.7-x86_64-2.7/cchardet
running build_ext
building 'cchardet._cchardet' extension
creating build/temp.macosx-10.7-x86_64-2.7
creating build/temp.macosx-10.7-x86_64-2.7/src
creating build/temp.macosx-10.7-x86_64-2.7/src/cchardet
creating build/temp.macosx-10.7-x86_64-2.7/src/ext
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet/src
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet/src/base
gcc -fno-strict-aliasing -I/Applications/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Isrc/ext/libcharsetdetect/mozilla/extensions/universalchardet/src/base/ -Isrc/ext/libcharsetdetect/nspr-emu/ -Isrc/ext/libcharsetdetect/ -I/Applications/anaconda/include/python2.7 -c src/cchardet/_cchardet.cpp -o build/temp.macosx-10.7-x86_64-2.7/src/cchardet/_cchardet.o

Agreeing to the Xcode/iOS license requires admin privileges, please run “sudo xcodebuild -license” and then retry this command.


error: command 'gcc' failed with exit status 69

----------------------------------------

Command "/Applications/anaconda/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-build-KxbsQY/cchardet/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-BpLyfT-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-build-KxbsQY/cchardet/`

Add static analysis (flake8) and test coverage checks in CircleCI

Builds should fail if there are any lint errors flagged by flake8 or if test coverage is insufficient. Let's add those checks to .circleci/config.yml. For both, auto-generated code in _swagger should be excluded.

Separate API for Resources

Solving issues #41 and #42 would almost surely require resources to be set with a richer interface than a dict. It would be valuable to have a separate interface for getting and setting resources on a dataset, with parameters for:

URL
Description
Data dictionary

Support date/datetime and RDF types generally as query parameters

Support for more possible types as query parameters should be added.

date and datetime should map naturally to their XSD counterparts:
https://docs.python.org/3/library/datetime.html

To give the developer complete capability to express values, add RdfLiteralParam which will have two parts value and type where value is a string and type is a URI - and which will render into "{value}"^^<{type}>. Existing code in the datadotworld.convert_to_sparql_literal should be refactored to use this class.

Server won't accept presigned signed AWS S3 URLs

In creating a dataset with a resource with a signed AWS S3 URL, the server returns a Bad Request response, indicating that the URL is not valid.

Being able to submit signed URLs is valuable to be able to link private data into a private DW dataset.

The web application will accept the pre-signed URL without error, but doesn't actually load the file.

Here is the server's exception:

datadotworld.client.api.RestApiError: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Tue, 16 May 2017 23:51:19 GMT', 'Server': 'nginx/1.8.1', 'Content-Length': '1281', 'Connection': 'keep-alive'})
HTTP response body: {"code":400,"message":"Invalid DatasetCreateRequest . Violations = [ConstraintViolationImpl{interpolatedMessage='Source URL must be a valid URL', propertyPath=files[1].source.url, rootBeanClass=class world.data.api.dtos.pub.DatasetCreateRequest, messageTemplate='Source URL {org.hibernate.validator.constraints.URL.message}'}, ConstraintViolationImpl{interpolatedMessage='Source URL must be a valid URL', propertyPath=files[4].source.url, rootBeanClass=class world.data.api.dtos.pub.DatasetCreateRequest, messageTemplate='Source URL {org.hibernate.validator.constraints.URL.message}'}, ConstraintViolationImpl{interpolatedMessage='Source URL must be a valid URL', propertyPath=files[3].source.url, rootBeanClass=class world.data.api.dtos.pub.DatasetCreateRequest, messageTemplate='Source URL {org.hibernate.validator.constraints.URL.message}'}, ConstraintViolationImpl{interpolatedMessage='Source URL must be a valid URL', propertyPath=files[2].source.url, rootBeanClass=class world.data.api.dtos.pub.DatasetCreateRequest, messageTemplate='Source URL {org.hibernate.validator.constraints.URL.message}'}] :: Source URL must be a valid URL;Source URL must be a valid URL;Source URL must be a valid URL;Source URL must be a valid URL","details":"af9ff6d1-aa58-495a-92e9-7c3b4eb3fd80"}

Issue after inputting API key

After inputting the API key after dw configure, I got back this python error:

Traceback (most recent call last):
  File "/usr/local/bin/dw", line 11, in <module>
    load_entry_point('datadotworld==1.0.0b4', 'console_scripts', 'dw')()
  File "/Library/Python/2.7/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/Library/Python/2.7/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Library/Python/2.7/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Library/Python/2.7/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/click/decorators.py", line 27, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/datadotworld/cli.py", line 55, in configure
    config.auth_token = token
  File "/Library/Python/2.7/site-packages/datadotworld/config.py", line 82, in auth_token
    self._config_parser.add_section(self._profile)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 261, in add_section
    raise ValueError, 'Invalid section name: %s' % section
ValueError: Invalid section name: default

What should I do about this?

Use Unicode compatible CSV reader

Right now getting results as CSV or __repl__ in the ipython shell will throw a UnicodeEncodeError in Python 2.x if the results contain non ASCII characters.

We should implement UnicodeCSVReader (to solve the results case). Not sure the best way to solve the ipython __repl__ case

Check if ~/.data.world file path works in Windows

https://github.com/datadotworld/data.world-py/blob/master/datadotworld/client.py#L29

Use requests `iter_content()` to defer the download of query results

Currently, query results are downloaded eagerly, in spite of the use of stream=True. Instead, the query method should return an iterable leverage requests' iter_content() to download the data on-demand.

Create a contributors guide

Append to existing file

It would be useful if a file can be extended without having to read its content first.

Set and Report Resource Descriptions

The API should report and allow to be set the resource descriptions.

Automate conda-forge release

Introduce flag for load_dataset to download new version iff it's been updated server-side.

The load_dataset function currently accepts two parameters: dataset_key and force_update. Currently, if there the local version of the dataset is not the most recent, it raises a warning. Instead, I would like to add a 3rd parameter to load_dataset (suggested: auto_update), that is False by default, but when True, instead of raising a warning will simply update the local dataset to the latest version on data.world.

Optimize lazy loaders to minimize memory use

Currently, LocalDataset consumes too much memory when loading dataframes, for example. Most likely because of how it uses datapackage-py in its underlying implementation.

Filter "original/" from tables and dataframes dicts.

Datapackages are more complete now with the addition of original (untouched) files. As a result, LocalDataset.tables and LocalDataset.dataframes currently include two versions of tabular files (original + sanitized/normalized). The code should be modified so that only files in data/ are visible through LocalDataset.tables and LocalDataset.dataframes.

Add support for parameterized queries

Warn users on load_dataset() if local copy is outdated

Use correct data types for values in query results

Request JSON when invoking query endpoints and use the type metadata in the response to apply appropriate Python and pandas types to values, as opposed to using str for every value.

API Can't Handle Hierarchical Zip Contents

After uploading a ZIP file with hierarchical path names, calling get_dataset() fails, with:

  File "/Volumes/Storage/proj/virt/metatab3/lib/python3.5/site-packages/datadotworld/client/_swagger/models/file_summary_response.py", line 80, in name
    raise ValueError("Invalid value for `name`, must be a follow pattern or equal to `/^[^/]+$/`")
ValueError: Invalid value for `name`, must be a follow pattern or equal to `/^[^/]+$/`

The file name produced by the server has '/' in it, but the API does not allow these in names.

Specifically, the problem seems to be either (a) the server generates incorrect names or (b) the validation pattern is wrong at swagger-dwapi-def.json : definitions.FileSummaryResponse.properties.name.pattern

setters should not aggressively validate on reads

trying to do a simple GET of datasets through data.world-py 1.5.0

        datasets = dw.api_client.fetch_datasets()

getting the following:

...
  File "/Users/bryon/.virtualenvs/datadotworld-bryon-labs/lib/python3.6/site-packages/datadotworld/client/_swagger/models/file_summary_response.py", line 70, in __init__
    self.description = description
  File "/Users/bryon/.virtualenvs/datadotworld-bryon-labs/lib/python3.6/site-packages/datadotworld/client/_swagger/models/file_summary_response.py", line 153, in description
    raise ValueError("Invalid value for `description`, length must be greater than or equal to `1`")
ValueError: Invalid value for `description`, length must be greater than or equal to `1`

the offending code is:
https://github.com/datadotworld/data.world-py/blob/v1.5.0/datadotworld/client/_swagger/models/file_summary_response.py#L152-L153

since the value is coming back from the server, this validation is incorrect - either there's a defect on the server allowing this value to sometimes be empty, or (more likely) the python SDK should relax when deserializing server requests.

You should never get an "invalid data" error on a GET.

String literal 'NONE' gets mapped to Python None value in result sets

In a SQL or SPARQL response, the literal string value 'NONE' (case-insensitive) is converted into the python value None:

>>> datadotworld.query('bryon/odin-2015-2016', 'SELECT ?value WHERE{ BIND("NONE" AS ?value)}', query_type='sparql').table
[OrderedDict([('value', None)])]
>>> datadotworld.query('bryon/odin-2015-2016', 'SELECT ?value WHERE{ BIND("ABCD" AS ?value)}', query_type='sparql').table
[OrderedDict([('value', 'ABCD')])]
>>> datadotworld.query('bryon/odin-2015-2016', 'SELECT "NONE" AS value').table
[OrderedDict([('value', None)])]
>>> datadotworld.query('bryon/odin-2015-2016', 'SELECT "ABCD" AS value').table
[OrderedDict([('value', 'ABCD')])]

Convert docstrings to reStructuredText

All our code is currently documented in accordance with NumPy/SciPy conventions (https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt), but that doesn't appear to work well in some places (e.g. PyCharm inline help).

All docstrings should be converted to reStructuredText.

Got "SafeConfigParser instance has no attribute 'read_file'" when I tried to load a dataset

I was in a conference and got this error and couldn't move forward.

dw.load_dataset('data-society/the-simpsons-by-the-data')

AttributeError                            Traceback (most recent call last)
 in ()
----> 1 lds = dw.load_dataset('data-society/the-simpsons-by-the-data')  # , force_update=True)

/Users/plee/anaconda/lib/python2.7/site-packages/datadotworld/__init__.pyc in load_dataset(dataset_key, force_update, profile)
     87     ['changelog', 'datadotworldbballstats', 'datadotworldbballteam']
     88     """
---> 89     return _get_instance(profile).load_dataset(dataset_key,
     90                                                force_update=force_update)
     91 

/Users/plee/anaconda/lib/python2.7/site-packages/datadotworld/__init__.pyc in _get_instance(profile)
     42     if instance is None:
     43         config_param = (ChainedConfig()
---> 44                         if profile == 'default'
     45                         else FileConfig(profile=profile))
     46         instance = DataDotWorld(config=config_param)

/Users/plee/anaconda/lib/python2.7/site-packages/datadotworld/config.pyc in __init__(self, **kwargs)
    203         # Overrides (for testing)
    204         self._config_chain = kwargs.get('config_chain',
--> 205                                         [EnvConfig(), FileConfig()])
    206 
    207     def __getattribute__(self, item):

/Users/plee/anaconda/lib/python2.7/site-packages/datadotworld/config.pyc in __init__(self, profile, **kwargs)
    108 
    109         if path.isfile(self._config_file_path):
--> 110             self._config_parser.read_file(open(self._config_file_path))
    111             if self.__migrate_invalid_defaults(self._config_parser) > 0:
    112                 self.save()

AttributeError: SafeConfigParser instance has no attribute 'read_file'

Add support for saved queries

Support uploads of in-memory data (e.g. pandas.DataFrame)

Currently, RestApiClient only supports uploading files from the file disk.
It would be nice if it supported direct uploads of pandas.DataFrame too, and possibly other in-memory data structures.

Re-implement top-level query() using new JSONL APIs

Improving handling of RDF results and terms

For SPARQL queries:

Handle results of DECRIBE and CONSTRUCT queries as pure RDF and skip table schema inferencing
Harden table schema inferencing for SELECT queries in cases where variables are not typed consistently
Handle uri and bnode terms in query results
Consider allowing users to explicitly request query results in RDF form

Set and Report Data Dictionary

Allow the API to set and report the data dictionary attached to a resource.

It's an awesome feature! So I'd really like to be able to use it programmatically.

New problem with `dw configure`

With the new update, I have this issue with after putting my API key into dw configure:

Traceback (most recent call last):
  File "/usr/local/bin/dw", line 11, in <module>
    load_entry_point('datadotworld==1.0.0b5', 'console_scripts', 'dw')()
  File "/Library/Python/2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Library/Python/2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Library/Python/2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Library/Python/2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/click/decorators.py", line 27, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/datadotworld/cli.py", line 54, in configure
    config = obj.get('config') or Config(obj['profile'])
  File "/Library/Python/2.7/site-packages/datadotworld/config.py", line 65, in __init__
    self._config_file_path)
  File "/Library/Python/2.7/site-packages/datadotworld/config.py", line 117, in __migrate_config
    config_parser[configparser.DEFAULTSECT] = {'auth_token': token}
  File "/Library/Python/2.7/site-packages/backports/configparser/__init__.py", line 992, in __setitem__
    self.read_dict({key: value})
  File "/Library/Python/2.7/site-packages/backports/configparser/__init__.py", line 760, in read_dict
    self.set(section, key, value)
  File "/Library/Python/2.7/site-packages/backports/configparser/__init__.py", line 1238, in set
    _, option, value = self._validate_value_types(option=option, value=value)
  File "/Library/Python/2.7/site-packages/backports/configparser/__init__.py", line 1221, in _validate_value_types
    raise TypeError("option values must be strings")
TypeError: option values must be strings

What should I do about this?

Re-implement API functions using requests

Retire swagger auto-gen in favor of simpler implementation using requests.

Start by getting rid of:

.swagger-codegen
_swagger
.swagger-codegen-ignore
swagger-codegen-config.json
swagger-swapi-def.json
Makefile (update_swagger_codegen target)

The pattern I would like to encourage is one where:

Each module (*.py) file represents a section of the data.world api (e.g. projects, datasets, insights, etc). The ApiClient class, should aggregate all modules and serve as a point of access to all of them (e.g. ApiClient.projects, ApiClient.datasets, etc)
Each function follows a naming convention (TBD) and maps positional args to endpoint path parameters and keyword args to body parameters (see: https://github.com/datadotworld/target-datadotworld/blob/master/target_datadotworld/api_client.py#L164)
2.1. If Content-Type needs to be defined (e.g. streams): TBD
2.2. If upload parts need to defined (e.g. multi-part file upload): TBD
2.3. If Accepts needs to be defined (e.g. sql): TBD
A request session object defines common headers such as Authorization and User-Agent (see: https://github.com/datadotworld/target-datadotworld/blob/master/target_datadotworld/api_client.py#L37)
HTTP 429 responses are automatically handled (see: https://github.com/datadotworld/target-datadotworld/blob/master/target_datadotworld/api_client.py#L366)
All validation and business rules are delegated to the server, and requests exceptions can be thrown as is (avoid anti-patterns like this: https://github.com/datadotworld/target-datadotworld/blob/master/target_datadotworld/api_client.py#L192)

Additionally:

Every function must include detailed docstrings (**kwargs documentation should point to the appropriate model doc, for example, https://apidocs.data.world/api/models/datasetcreaterequest)
Every function must be unit tested. Tests must assert that:
2.1. HTTP request is composed with exactly the expected path, headers and body)
2.2. Function returns the correct values upon success
2.3. Function returns the correct errors upon failure

IMPORTANT: To maximize the benefits of these improvements, I would suggest breaking compatibility (i.e. release as v2.0).

API Has Title Limit Not In Application

I'm using the API to create datasets that have long titles, > 30 characters. Because the titles are used in keys, they must be truncated to the 30 character limit for keys, and the key apparently can't have ellipses ('...') at the end, so the title is truncated abruptly.

The API will throw an exception on long title:

ValueError: Invalid value for title, length must be less than or equal to 30

However, it appears that the web application does not enforce a 30 character limit, neither for the display titles nor the dataset key.

It would be preferable if the API either removed the 30 character limit, or allowed for a display title that could be longer, with a separate value for the dataset key slug.

import datadotworld fails on virtualenv

Python 2.7.14

pip list:

Package Version

apipkg 1.4
attrs 18.1.0
awsebcli 3.12.1
backports.shutil-get-terminal-size 1.0.0
beautifulsoup4 4.6.0
boto3 1.5.22
botocore 1.8.50
CacheControl 0.12.4
cchardet 1.1.3
cement 2.8.2
certifi 2018.4.16
chardet 3.0.4
click 6.7
colorama 0.3.7
configparser 3.5.0
coverage 4.0.3
cssselect 1.0.3
datadotworld 1.6.0
datapackage 0.8.9
decorator 4.3.0
Django 1.11.13
django-debug-toolbar 1.9.1
django-extensions 1.9.9
django-rest-auth 0.9.3
django-storages 1.6.5
django-webtest 1.9.2
djangorestframework 3.7.7
docker-py 1.7.2
dockerpty 0.4.1
docopt 0.6.2
docutils 0.14
drf-extensions 0.3.1
enum34 1.1.6
et-xmlfile 1.0.1
execnet 1.5.0
flake8 3.5.0
flake8-polyfill 1.0.2
freezegun 0.3.9
funcsigs 1.0.2
functools32 3.2.3.post2
future 0.16.0
futures 3.2.0
idna 2.6
ijson 2.3
ipdb 0.11
ipython 5.5.0
ipython-genutils 0.2.0
isodate 0.6.0
jdcal 1.4
jmespath 0.9.3
jsonlines 1.2.0
jsonschema 2.6.0
jsontableschema 0.10.1
linear-tsv 1.1.0
lxml 4.2.1
mccabe 0.6.1
mock 2.0.0
model-mommy 1.5.1
msgpack-python 0.5.6
numpy 1.14.3
openpyxl 2.5.3
pandas 0.23.0
pathlib2 2.3.2
pathspec 0.5.0
pbr 4.0.3
pep8 1.7.1
pep8-naming 0.5.0
pickleshare 0.7.4
Pillow 5.0.0
pip 10.0.1
pluggy 0.6.0
prompt-toolkit 1.0.15
psycopg2 2.7.4
py 1.5.3
pycodestyle 2.3.1
pyflakes 1.5.0
Pygments 2.2.0
pyquery 1.4.0
pytest 3.3.2
pytest-cache 1.0
pytest-cov 2.5.1
pytest-django 3.1.2
pytest-flake8 0.9.1
pytest-pep8 1.0.6
python-coveralls 2.9.1
python-dateutil 2.7.3
pytz 2017.3
PyYAML 3.12
requests 2.8.0
rfc3986 0.4.1
s3transfer 0.1.13
scandir 1.7
semantic-version 2.5.0
setuptools 39.2.0
sh 1.12.14
simplegeneric 0.8.1
six 1.11.0
SQLAlchemy 1.2.8
sqlparse 0.2.4
tabulate 0.7.5
tabulator 1.4.1
termcolor 1.1.0
tqdm 4.19.5
traitlets 4.3.2
typing 3.6.4
unicodecsv 0.14.1
urllib3 1.22
waitress 1.1.0
wcwidth 0.1.7
WebOb 1.8.1
websocket-client 0.48.0
WebTest 2.0.29
wheel 0.31.1
whitenoise 3.3.1
win-unicode-console 0.5
xlrd 1.1.0

Improve repr for LazyLoadedDict

__repr__ should return something "better to see". If possible, it should return a dict with str keys and only lazy loaded values.

Add support for OAuth

Bust cache upon load_dataset()

Currently, cache data produced by calling raw_data, tables and dataframes is not invalidated when load_dataset() is called again. Instead, when load_dataset() is called, all cache entries for the given dataset_key should be invalidated.

Can't save any unicode in my project

Adding a test code:

# coding=utf-8
import csv
import datadotworld as dw
DW_USER_NAME = "my_username"
DW_PROJECT_NAME = "my_projectname"
csv_filename = 'test_saving_unicode_strings.csv'
dw_project_path = '{}/{}'.format(DW_USER_NAME, DW_PROJECT_NAME)

with dw.open_remote_file(dw_project_path, csv_filename ) as w:
    csvw = csv.DictWriter(w, fieldnames=['foo', 'bar'])
    csvw.writeheader()
    csvw.writerow({'foo':42, 'bar':u"A"})
    csvw.writerow({'foo':13, 'bar':u"искам уникод запис"})

What I have tried:

adding encoding="utf-8" to the open_remove_file call
encode('utf-8') the 'bad' string
decode('utf-8') the 'bad' string

Each time I receive UnicodeEncodeError: or UnicodeDecodeError in different place:

Traceback (most recent call last):
  File ".\a2.py", line 13, in <module>
    csvw.writerow({'foo':13, 'bar':u"╨╕╤Б╨║╨░╨╝ ╤Г╨╜╨╕╨║╨╛╨┤ ╨╖╨░╨┐╨╕╤Б"})
  File "c:\python27\Lib\csv.py", line 152, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

Traceback (most recent call last):
  File ".\a2.py", line 13, in <module>
    csvw.writerow({'foo':13, 'bar':u"╨╕╤Б╨║╨░╨╝ ╤Г╨╜╨╕╨║╨╛╨┤ ╨╖╨░╨┐╨╕╤Б".encode("utf-8")})
  File "c:\python27\Lib\csv.py", line 152, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
  File "C:\python27\Lib\site-packages\datadotworld\files.py", line 107, in write
    self._queue.put(value.encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 3: ordinal not in range(128)

Issues with dw configure

I'm running into this error when I run dw configure:

  File "/usr/local/bin/dw", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 2991, in <module>
    @_call_aside
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 2977, in _call_aside
    f(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 3004, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 664, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 677, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 861, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (requests 2.5.3 (/Library/Python/2.7/site-packages), Requirement.parse('requests<3.0a,>=2.8'), set(['datapackage', 'tabulator']))```


Anyone know how to debug this?

load_dataset() problem

lds = dw.load_dataset('data-society/the-simpsons-by-the-data')

gives me:ValueError Traceback (most recent call last)
in ()
1 # load dataset
----> 2 lds = dw.load_dataset('data-society/the-simpsons-by-the-data')

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld_init_.py in load_dataset(dataset_key, force_update, auto_update, profile, **kwargs)
99 load_dataset(dataset_key,
100 force_update=force_update,
--> 101 auto_update=auto_update)
102
103

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\datadotworld.py in load_dataset(self, dataset_key, force_update, auto_update)
160 else:
161 try:
--> 162 dataset_info = self.api_client.get_dataset(dataset_key)
163 except RestApiError as e:
164 return LocalDataset(descriptor_file)

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client\api.py in get_dataset(self, dataset_key)
96 try:
97 return self._datasets_api.get_dataset(
---> 98 *(parse_dataset_key(dataset_key))).to_dict()
99 except _swagger.rest.ApiException as e:
100 raise RestApiError(cause=e)

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\apis\datasets_api.py in get_dataset(self, owner, id, **kwargs)
644 return self.get_dataset_with_http_info(owner, id, **kwargs)
645 else:
--> 646 (data) = self.get_dataset_with_http_info(owner, id, **kwargs)
647 return data
648

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\apis\datasets_api.py in get_dataset_with_http_info(self, owner, id, **kwargs)
727 _preload_content=params.get('_preload_content', True),
728 _request_timeout=params.get('_request_timeout'),
--> 729 collection_formats=collection_formats)
730
731 def patch_dataset(self, owner, id, body, **kwargs):

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, callback, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
324 body, post_params, files,
325 response_type, auth_settings, callback,
--> 326 _return_http_data_only, collection_formats, _preload_content, _request_timeout)
327 else:
328 thread = threading.Thread(target=self.__call_api,

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, callback, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
159 # deserialize response data
160 if response_type:
--> 161 return_data = self.deserialize(response_data, response_type)
162 else:
163 return_data = None

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in deserialize(self, response, response_type)
237 data = response.data
238
--> 239 return self.__deserialize(data, response_type)
240
241 def __deserialize(self, data, klass):

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize(self, data, klass)
277 return self.__deserialize_datatime(data)
278 else:
--> 279 return self.__deserialize_model(data, klass)
280
281 def call_api(self, resource_path, method,

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize_model(self, data, klass)
627 and isinstance(data, (list, dict)):
628 value = data[klass.attribute_map[attr]]
--> 629 kwargs[attr] = self.__deserialize(value, attr_type)
630
631 instance = klass(**kwargs)

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize(self, data, klass)
255 sub_kls = re.match('list[(.*)]', klass).group(1)
256 return [self.__deserialize(sub_data, sub_kls)
--> 257 for sub_data in data]
258
259 if klass.startswith('dict('):

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in (.0)
255 sub_kls = re.match('list[(.*)]', klass).group(1)
256 return [self.__deserialize(sub_data, sub_kls)
--> 257 for sub_data in data]
258
259 if klass.startswith('dict('):

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize_model(self, data, klass)
629 kwargs[attr] = self.__deserialize(value, attr_type)
630
--> 631 instance = klass(**kwargs)
632
633 return instance

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\models\file_summary_response.py in init(self, name, source, description, labels, size_in_bytes, created, updated)
68 self.source = source
69 if description is not None:
---> 70 self.description = description
71 if labels is not None:
72 self.labels = labels

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\models\file_summary_response.py in description(self, description)
151 raise ValueError("Invalid value for description, length must be less than or equal to 240")
152 if description is not None and len(description) < 1:
--> 153 raise ValueError("Invalid value for description, length must be greater than or equal to 1")
154
155 self._description = description

ValueError: Invalid value for description, length must be greater than or equal to 1

RestApiClient.create_dataset() should return dataset id

The RestApiClient.create_dataset() method currently returns None, which makes it difficult to automate later processing on the created dataset. It should return the dataset id of the newly created dataset, or the return value from RestApiClient.get_dataset()

datadotworld / data.world-py Goto Github PK

data.world-py's Introduction

data.world-py

Quick start

Install

Configure

Load a dataset

Query a dataset

Work with files

Writing files

Reading files

Additional API Features

Add files from URL

Append records to stream

data.world-py's People

Contributors

Stargazers

Watchers

Forkers

data.world-py's Issues

Recommend Projects

Recommend Topics

Recommend Org