Coder Social home page Coder Social logo

datadotworld / data.world-py Goto Github PK

View Code? Open in Web Editor NEW
100.0 42.0 30.0 2.12 MB

Python package for data.world

Home Page: https://data.world/integrations/python

License: Apache License 2.0

Python 99.96% Makefile 0.04%
datasets open-data reference-implementation api-client dwstruct-t01-dist

data.world-py's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data.world-py's Issues

String literal 'NONE' gets mapped to Python None value in result sets

In a SQL or SPARQL response, the literal string value 'NONE' (case-insensitive) is converted into the python value None:

>>> datadotworld.query('bryon/odin-2015-2016', 'SELECT ?value WHERE{ BIND("NONE" AS ?value)}', query_type='sparql').table
[OrderedDict([('value', None)])]
>>> datadotworld.query('bryon/odin-2015-2016', 'SELECT ?value WHERE{ BIND("ABCD" AS ?value)}', query_type='sparql').table
[OrderedDict([('value', 'ABCD')])]
>>> datadotworld.query('bryon/odin-2015-2016', 'SELECT "NONE" AS value').table
[OrderedDict([('value', None)])]
>>> datadotworld.query('bryon/odin-2015-2016', 'SELECT "ABCD" AS value').table
[OrderedDict([('value', 'ABCD')])]

Filter "original/" from tables and dataframes dicts.

Datapackages are more complete now with the addition of original (untouched) files. As a result, LocalDataset.tables and LocalDataset.dataframes currently include two versions of tabular files (original + sanitized/normalized). The code should be modified so that only files in data/ are visible through LocalDataset.tables and LocalDataset.dataframes.

Server won't accept presigned signed AWS S3 URLs

In creating a dataset with a resource with a signed AWS S3 URL, the server returns a Bad Request response, indicating that the URL is not valid.

Being able to submit signed URLs is valuable to be able to link private data into a private DW dataset.

The web application will accept the pre-signed URL without error, but doesn't actually load the file.

Here is the server's exception:

datadotworld.client.api.RestApiError: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Tue, 16 May 2017 23:51:19 GMT', 'Server': 'nginx/1.8.1', 'Content-Length': '1281', 'Connection': 'keep-alive'})
HTTP response body: {"code":400,"message":"Invalid DatasetCreateRequest . Violations = [ConstraintViolationImpl{interpolatedMessage='Source URL must be a valid URL', propertyPath=files[1].source.url, rootBeanClass=class world.data.api.dtos.pub.DatasetCreateRequest, messageTemplate='Source URL {org.hibernate.validator.constraints.URL.message}'}, ConstraintViolationImpl{interpolatedMessage='Source URL must be a valid URL', propertyPath=files[4].source.url, rootBeanClass=class world.data.api.dtos.pub.DatasetCreateRequest, messageTemplate='Source URL {org.hibernate.validator.constraints.URL.message}'}, ConstraintViolationImpl{interpolatedMessage='Source URL must be a valid URL', propertyPath=files[3].source.url, rootBeanClass=class world.data.api.dtos.pub.DatasetCreateRequest, messageTemplate='Source URL {org.hibernate.validator.constraints.URL.message}'}, ConstraintViolationImpl{interpolatedMessage='Source URL must be a valid URL', propertyPath=files[2].source.url, rootBeanClass=class world.data.api.dtos.pub.DatasetCreateRequest, messageTemplate='Source URL {org.hibernate.validator.constraints.URL.message}'}] :: Source URL must be a valid URL;Source URL must be a valid URL;Source URL must be a valid URL;Source URL must be a valid URL","details":"af9ff6d1-aa58-495a-92e9-7c3b4eb3fd80"}

New schema-import command

Something like:

dw schema-import --dataset shad/testing --csv schema.csv

The CSV would be the simplified format that we export from the UI. (note: ignore the Type column)
image

Optionally, the matching command might be nice (but not essential)

dw schema-export -d shad/testing --csv > schema.csv

Support all new API paths (up to API v0.9.0)

I got this list by running.

import requests
import json
api_swagger = requests.get('https://api.data.world/v0/swagger.json').text
api_swagger = api_swagger.encode('ascii', 'ignore').decode('ascii')
api_swagger = json.loads(api_swagger) 
python_swagger = requests.get('https://raw.githubusercontent.com/datadotworld/data.world-py/master/datadotworld/client/swagger-dwapi-def.json').text
python_swagger = python_swagger.encode('ascii', 'ignore').decode('ascii')
python_swagger = json.loads(python_swagger) 
print set(api_swagger['paths'].keys()) - set(python_swagger['paths'].keys())

Introduce flag for load_dataset to download new version iff it's been updated server-side.

The load_dataset function currently accepts two parameters: dataset_key and force_update. Currently, if there the local version of the dataset is not the most recent, it raises a warning. Instead, I would like to add a 3rd parameter to load_dataset (suggested: auto_update), that is False by default, but when True, instead of raising a warning will simply update the local dataset to the latest version on data.world.

Set and Report Data Dictionary

Allow the API to set and report the data dictionary attached to a resource.

It's an awesome feature! So I'd really like to be able to use it programmatically.

setters should not aggressively validate on reads

trying to do a simple GET of datasets through data.world-py 1.5.0

        datasets = dw.api_client.fetch_datasets()

getting the following:

...
  File "/Users/bryon/.virtualenvs/datadotworld-bryon-labs/lib/python3.6/site-packages/datadotworld/client/_swagger/models/file_summary_response.py", line 70, in __init__
    self.description = description
  File "/Users/bryon/.virtualenvs/datadotworld-bryon-labs/lib/python3.6/site-packages/datadotworld/client/_swagger/models/file_summary_response.py", line 153, in description
    raise ValueError("Invalid value for `description`, length must be greater than or equal to `1`")
ValueError: Invalid value for `description`, length must be greater than or equal to `1`

the offending code is:
https://github.com/datadotworld/data.world-py/blob/v1.5.0/datadotworld/client/_swagger/models/file_summary_response.py#L152-L153

since the value is coming back from the server, this validation is incorrect - either there's a defect on the server allowing this value to sometimes be empty, or (more likely) the python SDK should relax when deserializing server requests.

You should never get an "invalid data" error on a GET.

API Can't Handle Hierarchical Zip Contents

After uploading a ZIP file with hierarchical path names, calling get_dataset() fails, with:

  File "/Volumes/Storage/proj/virt/metatab3/lib/python3.5/site-packages/datadotworld/client/_swagger/models/file_summary_response.py", line 80, in name
    raise ValueError("Invalid value for `name`, must be a follow pattern or equal to `/^[^/]+$/`")
ValueError: Invalid value for `name`, must be a follow pattern or equal to `/^[^/]+$/`

The file name produced by the server has '/' in it, but the API does not allow these in names.

Specifically, the problem seems to be either (a) the server generates incorrect names or (b) the validation pattern is wrong at swagger-dwapi-def.json : definitions.FileSummaryResponse.properties.name.pattern

Bust cache upon load_dataset()

Currently, cache data produced by calling raw_data, tables and dataframes is not invalidated when load_dataset() is called again. Instead, when load_dataset() is called, all cache entries for the given dataset_key should be invalidated.

Got "SafeConfigParser instance has no attribute 'read_file'" when I tried to load a dataset

I was in a conference and got this error and couldn't move forward.

dw.load_dataset('data-society/the-simpsons-by-the-data')

AttributeError                            Traceback (most recent call last)
 in ()
----> 1 lds = dw.load_dataset('data-society/the-simpsons-by-the-data')  # , force_update=True)

/Users/plee/anaconda/lib/python2.7/site-packages/datadotworld/__init__.pyc in load_dataset(dataset_key, force_update, profile)
     87     ['changelog', 'datadotworldbballstats', 'datadotworldbballteam']
     88     """
---> 89     return _get_instance(profile).load_dataset(dataset_key,
     90                                                force_update=force_update)
     91 

/Users/plee/anaconda/lib/python2.7/site-packages/datadotworld/__init__.pyc in _get_instance(profile)
     42     if instance is None:
     43         config_param = (ChainedConfig()
---> 44                         if profile == 'default'
     45                         else FileConfig(profile=profile))
     46         instance = DataDotWorld(config=config_param)

/Users/plee/anaconda/lib/python2.7/site-packages/datadotworld/config.pyc in __init__(self, **kwargs)
    203         # Overrides (for testing)
    204         self._config_chain = kwargs.get('config_chain',
--> 205                                         [EnvConfig(), FileConfig()])
    206 
    207     def __getattribute__(self, item):

/Users/plee/anaconda/lib/python2.7/site-packages/datadotworld/config.pyc in __init__(self, profile, **kwargs)
    108 
    109         if path.isfile(self._config_file_path):
--> 110             self._config_parser.read_file(open(self._config_file_path))
    111             if self.__migrate_invalid_defaults(self._config_parser) > 0:
    112                 self.save()

AttributeError: SafeConfigParser instance has no attribute 'read_file'

Support date/datetime and RDF types generally as query parameters

Support for more possible types as query parameters should be added.

date and datetime should map naturally to their XSD counterparts:
https://docs.python.org/3/library/datetime.html

To give the developer complete capability to express values, add RdfLiteralParam which will have two parts value and type where value is a string and type is a URI - and which will render into "{value}"^^<{type}>. Existing code in the datadotworld.convert_to_sparql_literal should be refactored to use this class.

Improving handling of RDF results and terms

For SPARQL queries:

  • Handle results of DECRIBE and CONSTRUCT queries as pure RDF and skip table schema inferencing
  • Harden table schema inferencing for SELECT queries in cases where variables are not typed consistently
  • Handle uri and bnode terms in query results
  • Consider allowing users to explicitly request query results in RDF form

load_dataset() problem

lds = dw.load_dataset('data-society/the-simpsons-by-the-data')

gives me:ValueError Traceback (most recent call last)
in ()
1 # load dataset
----> 2 lds = dw.load_dataset('data-society/the-simpsons-by-the-data')

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld_init_.py in load_dataset(dataset_key, force_update, auto_update, profile, **kwargs)
99 load_dataset(dataset_key,
100 force_update=force_update,
--> 101 auto_update=auto_update)
102
103

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\datadotworld.py in load_dataset(self, dataset_key, force_update, auto_update)
160 else:
161 try:
--> 162 dataset_info = self.api_client.get_dataset(dataset_key)
163 except RestApiError as e:
164 return LocalDataset(descriptor_file)

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client\api.py in get_dataset(self, dataset_key)
96 try:
97 return self._datasets_api.get_dataset(
---> 98 *(parse_dataset_key(dataset_key))).to_dict()
99 except _swagger.rest.ApiException as e:
100 raise RestApiError(cause=e)

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\apis\datasets_api.py in get_dataset(self, owner, id, **kwargs)
644 return self.get_dataset_with_http_info(owner, id, **kwargs)
645 else:
--> 646 (data) = self.get_dataset_with_http_info(owner, id, **kwargs)
647 return data
648

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\apis\datasets_api.py in get_dataset_with_http_info(self, owner, id, **kwargs)
727 _preload_content=params.get('_preload_content', True),
728 _request_timeout=params.get('_request_timeout'),
--> 729 collection_formats=collection_formats)
730
731 def patch_dataset(self, owner, id, body, **kwargs):

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, callback, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
324 body, post_params, files,
325 response_type, auth_settings, callback,
--> 326 _return_http_data_only, collection_formats, _preload_content, _request_timeout)
327 else:
328 thread = threading.Thread(target=self.__call_api,

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, callback, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
159 # deserialize response data
160 if response_type:
--> 161 return_data = self.deserialize(response_data, response_type)
162 else:
163 return_data = None

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in deserialize(self, response, response_type)
237 data = response.data
238
--> 239 return self.__deserialize(data, response_type)
240
241 def __deserialize(self, data, klass):

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize(self, data, klass)
277 return self.__deserialize_datatime(data)
278 else:
--> 279 return self.__deserialize_model(data, klass)
280
281 def call_api(self, resource_path, method,

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize_model(self, data, klass)
627 and isinstance(data, (list, dict)):
628 value = data[klass.attribute_map[attr]]
--> 629 kwargs[attr] = self.__deserialize(value, attr_type)
630
631 instance = klass(**kwargs)

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize(self, data, klass)
255 sub_kls = re.match('list[(.*)]', klass).group(1)
256 return [self.__deserialize(sub_data, sub_kls)
--> 257 for sub_data in data]
258
259 if klass.startswith('dict('):

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in (.0)
255 sub_kls = re.match('list[(.*)]', klass).group(1)
256 return [self.__deserialize(sub_data, sub_kls)
--> 257 for sub_data in data]
258
259 if klass.startswith('dict('):

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize(self, data, klass)
277 return self.__deserialize_datatime(data)
278 else:
--> 279 return self.__deserialize_model(data, klass)
280
281 def call_api(self, resource_path, method,

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize_model(self, data, klass)
629 kwargs[attr] = self.__deserialize(value, attr_type)
630
--> 631 instance = klass(**kwargs)
632
633 return instance

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\models\file_summary_response.py in init(self, name, source, description, labels, size_in_bytes, created, updated)
68 self.source = source
69 if description is not None:
---> 70 self.description = description
71 if labels is not None:
72 self.labels = labels

C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\models\file_summary_response.py in description(self, description)
151 raise ValueError("Invalid value for description, length must be less than or equal to 240")
152 if description is not None and len(description) < 1:
--> 153 raise ValueError("Invalid value for description, length must be greater than or equal to 1")
154
155 self._description = description

ValueError: Invalid value for description, length must be greater than or equal to 1

import datadotworld fails on virtualenv

Python 2.7.14

pip list:

Package Version


apipkg 1.4
attrs 18.1.0
awsebcli 3.12.1
backports.shutil-get-terminal-size 1.0.0
beautifulsoup4 4.6.0
boto3 1.5.22
botocore 1.8.50
CacheControl 0.12.4
cchardet 1.1.3
cement 2.8.2
certifi 2018.4.16
chardet 3.0.4
click 6.7
colorama 0.3.7
configparser 3.5.0
coverage 4.0.3
cssselect 1.0.3
datadotworld 1.6.0
datapackage 0.8.9
decorator 4.3.0
Django 1.11.13
django-debug-toolbar 1.9.1
django-extensions 1.9.9
django-rest-auth 0.9.3
django-storages 1.6.5
django-webtest 1.9.2
djangorestframework 3.7.7
docker-py 1.7.2
dockerpty 0.4.1
docopt 0.6.2
docutils 0.14
drf-extensions 0.3.1
enum34 1.1.6
et-xmlfile 1.0.1
execnet 1.5.0
flake8 3.5.0
flake8-polyfill 1.0.2
freezegun 0.3.9
funcsigs 1.0.2
functools32 3.2.3.post2
future 0.16.0
futures 3.2.0
idna 2.6
ijson 2.3
ipdb 0.11
ipython 5.5.0
ipython-genutils 0.2.0
isodate 0.6.0
jdcal 1.4
jmespath 0.9.3
jsonlines 1.2.0
jsonschema 2.6.0
jsontableschema 0.10.1
linear-tsv 1.1.0
lxml 4.2.1
mccabe 0.6.1
mock 2.0.0
model-mommy 1.5.1
msgpack-python 0.5.6
numpy 1.14.3
openpyxl 2.5.3
pandas 0.23.0
pathlib2 2.3.2
pathspec 0.5.0
pbr 4.0.3
pep8 1.7.1
pep8-naming 0.5.0
pickleshare 0.7.4
Pillow 5.0.0
pip 10.0.1
pluggy 0.6.0
prompt-toolkit 1.0.15
psycopg2 2.7.4
py 1.5.3
pycodestyle 2.3.1
pyflakes 1.5.0
Pygments 2.2.0
pyquery 1.4.0
pytest 3.3.2
pytest-cache 1.0
pytest-cov 2.5.1
pytest-django 3.1.2
pytest-flake8 0.9.1
pytest-pep8 1.0.6
python-coveralls 2.9.1
python-dateutil 2.7.3
pytz 2017.3
PyYAML 3.12
requests 2.8.0
rfc3986 0.4.1
s3transfer 0.1.13
scandir 1.7
semantic-version 2.5.0
setuptools 39.2.0
sh 1.12.14
simplegeneric 0.8.1
six 1.11.0
SQLAlchemy 1.2.8
sqlparse 0.2.4
tabulate 0.7.5
tabulator 1.4.1
termcolor 1.1.0
tqdm 4.19.5
traitlets 4.3.2
typing 3.6.4
unicodecsv 0.14.1
urllib3 1.22
waitress 1.1.0
wcwidth 0.1.7
WebOb 1.8.1
websocket-client 0.48.0
WebTest 2.0.29
wheel 0.31.1
whitenoise 3.3.1
win-unicode-console 0.5
xlrd 1.1.0

Issues with dw configure

I'm running into this error when I run dw configure:

  File "/usr/local/bin/dw", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 2991, in <module>
    @_call_aside
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 2977, in _call_aside
    f(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 3004, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 664, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 677, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 861, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (requests 2.5.3 (/Library/Python/2.7/site-packages), Requirement.parse('requests<3.0a,>=2.8'), set(['datapackage', 'tabulator']))```


Anyone know how to debug this?

Issue after inputting API key

After inputting the API key after dw configure, I got back this python error:

Traceback (most recent call last):
  File "/usr/local/bin/dw", line 11, in <module>
    load_entry_point('datadotworld==1.0.0b4', 'console_scripts', 'dw')()
  File "/Library/Python/2.7/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/Library/Python/2.7/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Library/Python/2.7/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Library/Python/2.7/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/click/decorators.py", line 27, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/datadotworld/cli.py", line 55, in configure
    config.auth_token = token
  File "/Library/Python/2.7/site-packages/datadotworld/config.py", line 82, in auth_token
    self._config_parser.add_section(self._profile)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 261, in add_section
    raise ValueError, 'Invalid section name: %s' % section
ValueError: Invalid section name: default

What should I do about this?

Can't save any unicode in my project

Adding a test code:

# coding=utf-8
import csv
import datadotworld as dw
DW_USER_NAME = "my_username"
DW_PROJECT_NAME = "my_projectname"
csv_filename = 'test_saving_unicode_strings.csv'
dw_project_path = '{}/{}'.format(DW_USER_NAME, DW_PROJECT_NAME)

with dw.open_remote_file(dw_project_path, csv_filename ) as w:
    csvw = csv.DictWriter(w, fieldnames=['foo', 'bar'])
    csvw.writeheader()
    csvw.writerow({'foo':42, 'bar':u"A"})
    csvw.writerow({'foo':13, 'bar':u"искам уникод запис"})

What I have tried:

  • adding encoding="utf-8" to the open_remove_file call
  • encode('utf-8') the 'bad' string
  • decode('utf-8') the 'bad' string

Each time I receive UnicodeEncodeError: or UnicodeDecodeError in different place:

Traceback (most recent call last):
  File ".\a2.py", line 13, in <module>
    csvw.writerow({'foo':13, 'bar':u"╨╕╤Б╨║╨░╨╝ ╤Г╨╜╨╕╨║╨╛╨┤ ╨╖╨░╨┐╨╕╤Б"})
  File "c:\python27\Lib\csv.py", line 152, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

or

Traceback (most recent call last):
  File ".\a2.py", line 13, in <module>
    csvw.writerow({'foo':13, 'bar':u"╨╕╤Б╨║╨░╨╝ ╤Г╨╜╨╕╨║╨╛╨┤ ╨╖╨░╨┐╨╕╤Б".encode("utf-8")})
  File "c:\python27\Lib\csv.py", line 152, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
  File "C:\python27\Lib\site-packages\datadotworld\files.py", line 107, in write
    self._queue.put(value.encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 3: ordinal not in range(128)

New problem with `dw configure`

With the new update, I have this issue with after putting my API key into dw configure:

Traceback (most recent call last):
  File "/usr/local/bin/dw", line 11, in <module>
    load_entry_point('datadotworld==1.0.0b5', 'console_scripts', 'dw')()
  File "/Library/Python/2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Library/Python/2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Library/Python/2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Library/Python/2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/click/decorators.py", line 27, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/datadotworld/cli.py", line 54, in configure
    config = obj.get('config') or Config(obj['profile'])
  File "/Library/Python/2.7/site-packages/datadotworld/config.py", line 65, in __init__
    self._config_file_path)
  File "/Library/Python/2.7/site-packages/datadotworld/config.py", line 117, in __migrate_config
    config_parser[configparser.DEFAULTSECT] = {'auth_token': token}
  File "/Library/Python/2.7/site-packages/backports/configparser/__init__.py", line 992, in __setitem__
    self.read_dict({key: value})
  File "/Library/Python/2.7/site-packages/backports/configparser/__init__.py", line 760, in read_dict
    self.set(section, key, value)
  File "/Library/Python/2.7/site-packages/backports/configparser/__init__.py", line 1238, in set
    _, option, value = self._validate_value_types(option=option, value=value)
  File "/Library/Python/2.7/site-packages/backports/configparser/__init__.py", line 1221, in _validate_value_types
    raise TypeError("option values must be strings")
TypeError: option values must be strings

What should I do about this?

SPARQL queries cannot accept URI-valued parameters

When sending a SPARQL query with parameters, all python values are converted into boolean, integer, decimal, or assumed to be strings - it's not possible to pass a value that is meant to be treated as a URI.

since there's no deterministic way to determine whether the string value 'http://something.com/whatever' is meant to be treated as a string or a URI, we need something in the Python type system to differentiate the two cases. Since the current behavior is to treat them as strings, we should make that the default for backwards compatibility - and that's a reasonable default in any case.

I'm proposing to add a simple "wrapper type" UriParam that can be used to indicate that a parameter value is meant to be treated as a URI - then the type mediation code can do the right thing to convert the value into a URI parameter

Separate API for Resources

Solving issues #41 and #42 would almost surely require resources to be set with a richer interface than a dict. It would be valuable to have a separate interface for getting and setting resources on a dataset, with parameters for:

  • URL
  • Description
  • Data dictionary

Use Unicode compatible CSV reader

Right now getting results as CSV or __repl__ in the ipython shell will throw a UnicodeEncodeError in Python 2.x if the results contain non ASCII characters.

We should implement UnicodeCSVReader (to solve the results case). Not sure the best way to solve the ipython __repl__ case

RestApiClient.create_dataset() should return dataset id

The RestApiClient.create_dataset() method currently returns None, which makes it difficult to automate later processing on the created dataset. It should return the dataset id of the newly created dataset, or the return value from RestApiClient.get_dataset()

Failed building wheel for cchardet

It won't complete my pip install of datadotworld, with or without pandas. Sort of new to this so I have no idea what's going on with this error and couldn't find it online. I think the error is at the end where I made the text bold (same as the title). Running macOS Sierra. Here's what terminal is saying:

`Collecting datadotworld
Using cached datadotworld-1.4.2-py2.py3-none-any.whl
Requirement already satisfied: python-dateutil<3.0a,>=2.6.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Requirement already satisfied: requests<3.0a,>=2.0.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Requirement already satisfied: certifi>=2017.04.17 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Collecting datapackage<1.0a,>=0.8.8 (from datadotworld)
Using cached datapackage-0.8.9-py2.py3-none-any.whl
Collecting urllib3<2.0a,>=1.15 (from datadotworld)
Using cached urllib3-1.22-py2.py3-none-any.whl
Requirement already satisfied: configparser<4.0a,>=3.5.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Collecting jsontableschema<1.0a,>=0.10.0 (from datadotworld)
Using cached jsontableschema-0.10.1-py2.py3-none-any.whl
Requirement already satisfied: click<7.0a,>=6.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Requirement already satisfied: six<2.0a,>=1.5.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Collecting tabulator<=1.4.1 (from datadotworld)
Using cached tabulator-1.4.1-py2.py3-none-any.whl
Requirement already satisfied: unicodecsv<1.0a,>=0.14 in /Applications/anaconda/lib/python2.7/site-packages (from datapackage<1.0a,>=0.8.8->datadotworld)
Requirement already satisfied: jsonschema<3.0a,>=2.5 in /Applications/anaconda/lib/python2.7/site-packages (from datapackage<1.0a,>=0.8.8->datadotworld)
Collecting rfc3986<1.0,>=0.4 (from jsontableschema<1.0a,>=0.10.0->datadotworld)
Using cached rfc3986-0.4.1-py2.py3-none-any.whl
Collecting isodate<1.0,>=0.5.4 (from jsontableschema<1.0a,>=0.10.0->datadotworld)
Using cached isodate-0.6.0-py2.py3-none-any.whl
Collecting future<1.0,>=0.15 (from jsontableschema<1.0a,>=0.10.0->datadotworld)
Requirement already satisfied: jsonlines<2.0,>=1.1 in /Applications/anaconda/lib/python2.7/site-packages (from tabulator<=1.4.1->datadotworld)
Requirement already satisfied: sqlalchemy<2.0,>=1.1 in /Applications/anaconda/lib/python2.7/site-packages (from tabulator<=1.4.1->datadotworld)
Collecting cchardet<2.0,>=1.0 (from tabulator<=1.4.1->datadotworld)
Using cached cchardet-1.1.3.tar.gz
Collecting ijson<3.0,>=2.0 (from tabulator<=1.4.1->datadotworld)
Using cached ijson-2.3-py2.py3-none-any.whl
Collecting linear-tsv<2.0,>=1.0 (from tabulator<=1.4.1->datadotworld)
Requirement already satisfied: xlrd<2.0,>=1.0 in /Applications/anaconda/lib/python2.7/site-packages (from tabulator<=1.4.1->datadotworld)
Requirement already satisfied: openpyxl<3.0,>=2.4 in /Applications/anaconda/lib/python2.7/site-packages (from tabulator<=1.4.1->datadotworld)
Requirement already satisfied: functools32 in /Applications/anaconda/lib/python2.7/site-packages (from jsonschema<3.0a,>=2.5->datapackage<1.0a,>=0.8.8->datadotworld)
Requirement already satisfied: jdcal in /Applications/anaconda/lib/python2.7/site-packages (from openpyxl<3.0,>=2.4->tabulator<=1.4.1->datadotworld)
Requirement already satisfied: et_xmlfile in /Applications/anaconda/lib/python2.7/site-packages (from openpyxl<3.0,>=2.4->tabulator<=1.4.1->datadotworld)
Building wheels for collected packages: cchardet
Running setup.py bdist_wheel for cchardet ... error
Complete output from command /Applications/anaconda/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-build-KxbsQY/cchardet/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/tmpIPYH1npip-wheel- --python-tag cp27:
cythonize: ['src/cchardet/_cchardet.pyx']
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-2.7
creating build/lib.macosx-10.7-x86_64-2.7/cchardet
copying src/cchardet/init.py -> build/lib.macosx-10.7-x86_64-2.7/cchardet
copying src/cchardet/version.py -> build/lib.macosx-10.7-x86_64-2.7/cchardet
running build_ext
building 'cchardet._cchardet' extension
creating build/temp.macosx-10.7-x86_64-2.7
creating build/temp.macosx-10.7-x86_64-2.7/src
creating build/temp.macosx-10.7-x86_64-2.7/src/cchardet
creating build/temp.macosx-10.7-x86_64-2.7/src/ext
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet/src
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet/src/base
gcc -fno-strict-aliasing -I/Applications/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Isrc/ext/libcharsetdetect/mozilla/extensions/universalchardet/src/base/ -Isrc/ext/libcharsetdetect/nspr-emu/ -Isrc/ext/libcharsetdetect/ -I/Applications/anaconda/include/python2.7 -c src/cchardet/_cchardet.cpp -o build/temp.macosx-10.7-x86_64-2.7/src/cchardet/_cchardet.o

Agreeing to the Xcode/iOS license requires admin privileges, please run “sudo xcodebuild -license” and then retry this command.

error: command 'gcc' failed with exit status 69


Failed building wheel for cchardet
Running setup.py clean for cchardet
Failed to build cchardet
Installing collected packages: cchardet, ijson, linear-tsv, tabulator, rfc3986, isodate, future, jsontableschema, datapackage, urllib3, datadotworld
Running setup.py install for cchardet ... error
Complete output from command /Applications/anaconda/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-build-KxbsQY/cchardet/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-BpLyfT-record/install-record.txt --single-version-externally-managed --compile:
cythonize: ['src/cchardet/_cchardet.pyx']
running install
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-2.7
creating build/lib.macosx-10.7-x86_64-2.7/cchardet
copying src/cchardet/init.py -> build/lib.macosx-10.7-x86_64-2.7/cchardet
copying src/cchardet/version.py -> build/lib.macosx-10.7-x86_64-2.7/cchardet
running build_ext
building 'cchardet._cchardet' extension
creating build/temp.macosx-10.7-x86_64-2.7
creating build/temp.macosx-10.7-x86_64-2.7/src
creating build/temp.macosx-10.7-x86_64-2.7/src/cchardet
creating build/temp.macosx-10.7-x86_64-2.7/src/ext
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet/src
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet/src/base
gcc -fno-strict-aliasing -I/Applications/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Isrc/ext/libcharsetdetect/mozilla/extensions/universalchardet/src/base/ -Isrc/ext/libcharsetdetect/nspr-emu/ -Isrc/ext/libcharsetdetect/ -I/Applications/anaconda/include/python2.7 -c src/cchardet/_cchardet.cpp -o build/temp.macosx-10.7-x86_64-2.7/src/cchardet/_cchardet.o

Agreeing to the Xcode/iOS license requires admin privileges, please run “sudo xcodebuild -license” and then retry this command.


error: command 'gcc' failed with exit status 69

----------------------------------------

Command "/Applications/anaconda/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-build-KxbsQY/cchardet/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-BpLyfT-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-build-KxbsQY/cchardet/`

Re-implement API functions using requests

Retire swagger auto-gen in favor of simpler implementation using requests.

Start by getting rid of:

  • .swagger-codegen
  • _swagger
  • .swagger-codegen-ignore
  • swagger-codegen-config.json
  • swagger-swapi-def.json
  • Makefile (update_swagger_codegen target)

The pattern I would like to encourage is one where:

  1. Each module (*.py) file represents a section of the data.world api (e.g. projects, datasets, insights, etc). The ApiClient class, should aggregate all modules and serve as a point of access to all of them (e.g. ApiClient.projects, ApiClient.datasets, etc)
  2. Each function follows a naming convention (TBD) and maps positional args to endpoint path parameters and keyword args to body parameters (see: https://github.com/datadotworld/target-datadotworld/blob/master/target_datadotworld/api_client.py#L164)
    2.1. If Content-Type needs to be defined (e.g. streams): TBD
    2.2. If upload parts need to defined (e.g. multi-part file upload): TBD
    2.3. If Accepts needs to be defined (e.g. sql): TBD
  3. A request session object defines common headers such as Authorization and User-Agent (see: https://github.com/datadotworld/target-datadotworld/blob/master/target_datadotworld/api_client.py#L37)
  4. HTTP 429 responses are automatically handled (see: https://github.com/datadotworld/target-datadotworld/blob/master/target_datadotworld/api_client.py#L366)
  5. All validation and business rules are delegated to the server, and requests exceptions can be thrown as is (avoid anti-patterns like this: https://github.com/datadotworld/target-datadotworld/blob/master/target_datadotworld/api_client.py#L192)

Additionally:

  1. Every function must include detailed docstrings (**kwargs documentation should point to the appropriate model doc, for example, https://apidocs.data.world/api/models/datasetcreaterequest)
  2. Every function must be unit tested. Tests must assert that:
    2.1. HTTP request is composed with exactly the expected path, headers and body)
    2.2. Function returns the correct values upon success
    2.3. Function returns the correct errors upon failure

IMPORTANT: To maximize the benefits of these improvements, I would suggest breaking compatibility (i.e. release as v2.0).

API Has Title Limit Not In Application

I'm using the API to create datasets that have long titles, > 30 characters. Because the titles are used in keys, they must be truncated to the 30 character limit for keys, and the key apparently can't have ellipses ('...') at the end, so the title is truncated abruptly.

The API will throw an exception on long title:

ValueError: Invalid value for title, length must be less than or equal to 30

However, it appears that the web application does not enforce a 30 character limit, neither for the display titles nor the dataset key.

It would be preferable if the API either removed the 30 character limit, or allowed for a display title that could be longer, with a separate value for the dataset key slug.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.