datadotworld / data.world-py Goto Github PK
View Code? Open in Web Editor NEWPython package for data.world
Home Page: https://data.world/integrations/python
License: Apache License 2.0
Python package for data.world
Home Page: https://data.world/integrations/python
License: Apache License 2.0
In a SQL or SPARQL response, the literal string value 'NONE'
(case-insensitive) is converted into the python value None
:
>>> datadotworld.query('bryon/odin-2015-2016', 'SELECT ?value WHERE{ BIND("NONE" AS ?value)}', query_type='sparql').table
[OrderedDict([('value', None)])]
>>> datadotworld.query('bryon/odin-2015-2016', 'SELECT ?value WHERE{ BIND("ABCD" AS ?value)}', query_type='sparql').table
[OrderedDict([('value', 'ABCD')])]
>>> datadotworld.query('bryon/odin-2015-2016', 'SELECT "NONE" AS value').table
[OrderedDict([('value', None)])]
>>> datadotworld.query('bryon/odin-2015-2016', 'SELECT "ABCD" AS value').table
[OrderedDict([('value', 'ABCD')])]
Datapackages are more complete now with the addition of original (untouched) files. As a result, LocalDataset.tables
and LocalDataset.dataframes
currently include two versions of tabular files (original + sanitized/normalized). The code should be modified so that only files in data/
are visible through LocalDataset.tables
and LocalDataset.dataframes
.
In creating a dataset with a resource with a signed AWS S3 URL, the server returns a Bad Request response, indicating that the URL is not valid.
Being able to submit signed URLs is valuable to be able to link private data into a private DW dataset.
The web application will accept the pre-signed URL without error, but doesn't actually load the file.
Here is the server's exception:
datadotworld.client.api.RestApiError: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Tue, 16 May 2017 23:51:19 GMT', 'Server': 'nginx/1.8.1', 'Content-Length': '1281', 'Connection': 'keep-alive'})
HTTP response body: {"code":400,"message":"Invalid DatasetCreateRequest . Violations = [ConstraintViolationImpl{interpolatedMessage='Source URL must be a valid URL', propertyPath=files[1].source.url, rootBeanClass=class world.data.api.dtos.pub.DatasetCreateRequest, messageTemplate='Source URL {org.hibernate.validator.constraints.URL.message}'}, ConstraintViolationImpl{interpolatedMessage='Source URL must be a valid URL', propertyPath=files[4].source.url, rootBeanClass=class world.data.api.dtos.pub.DatasetCreateRequest, messageTemplate='Source URL {org.hibernate.validator.constraints.URL.message}'}, ConstraintViolationImpl{interpolatedMessage='Source URL must be a valid URL', propertyPath=files[3].source.url, rootBeanClass=class world.data.api.dtos.pub.DatasetCreateRequest, messageTemplate='Source URL {org.hibernate.validator.constraints.URL.message}'}, ConstraintViolationImpl{interpolatedMessage='Source URL must be a valid URL', propertyPath=files[2].source.url, rootBeanClass=class world.data.api.dtos.pub.DatasetCreateRequest, messageTemplate='Source URL {org.hibernate.validator.constraints.URL.message}'}] :: Source URL must be a valid URL;Source URL must be a valid URL;Source URL must be a valid URL;Source URL must be a valid URL","details":"af9ff6d1-aa58-495a-92e9-7c3b4eb3fd80"}
All our code is currently documented in accordance with NumPy/SciPy conventions (https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt), but that doesn't appear to work well in some places (e.g. PyCharm inline help).
All docstrings should be converted to reStructuredText.
Builds should fail if there are any lint errors flagged by flake8
or if test coverage is insufficient. Let's add those checks to .circleci/config.yml
. For both, auto-generated code in _swagger
should be excluded.
Potentially long-running operations, especially load_dataset()
and query()
should show users a progress bar when used in interactive mode (e.g. python shell / iPython).
I got this list by running.
import requests
import json
api_swagger = requests.get('https://api.data.world/v0/swagger.json').text
api_swagger = api_swagger.encode('ascii', 'ignore').decode('ascii')
api_swagger = json.loads(api_swagger)
python_swagger = requests.get('https://raw.githubusercontent.com/datadotworld/data.world-py/master/datadotworld/client/swagger-dwapi-def.json').text
python_swagger = python_swagger.encode('ascii', 'ignore').decode('ascii')
python_swagger = json.loads(python_swagger)
print set(api_swagger['paths'].keys()) - set(python_swagger['paths'].keys())
It would be useful if a file can be extended without having to read its content first.
Currently, RestApiClient
only supports uploading files from the file disk.
It would be nice if it supported direct uploads of pandas.DataFrame
too, and possibly other in-memory data structures.
The load_dataset
function currently accepts two parameters: dataset_key
and force_update
. Currently, if there the local version of the dataset is not the most recent, it raises a warning. Instead, I would like to add a 3rd parameter to load_dataset
(suggested: auto_update
), that is False
by default, but when True
, instead of raising a warning will simply update the local dataset to the latest version on data.world.
Allow the API to set and report the data dictionary attached to a resource.
It's an awesome feature! So I'd really like to be able to use it programmatically.
Normal dict are ordered starting in Python 3.6. The use of OrderedDict there could be avoided and would result in better users experience.
trying to do a simple GET of datasets through data.world-py 1.5.0
datasets = dw.api_client.fetch_datasets()
getting the following:
...
File "/Users/bryon/.virtualenvs/datadotworld-bryon-labs/lib/python3.6/site-packages/datadotworld/client/_swagger/models/file_summary_response.py", line 70, in __init__
self.description = description
File "/Users/bryon/.virtualenvs/datadotworld-bryon-labs/lib/python3.6/site-packages/datadotworld/client/_swagger/models/file_summary_response.py", line 153, in description
raise ValueError("Invalid value for `description`, length must be greater than or equal to `1`")
ValueError: Invalid value for `description`, length must be greater than or equal to `1`
the offending code is:
https://github.com/datadotworld/data.world-py/blob/v1.5.0/datadotworld/client/_swagger/models/file_summary_response.py#L152-L153
since the value is coming back from the server, this validation is incorrect - either there's a defect on the server allowing this value to sometimes be empty, or (more likely) the python SDK should relax when deserializing server requests.
You should never get an "invalid data" error on a GET.
After uploading a ZIP file with hierarchical path names, calling get_dataset() fails, with:
File "/Volumes/Storage/proj/virt/metatab3/lib/python3.5/site-packages/datadotworld/client/_swagger/models/file_summary_response.py", line 80, in name
raise ValueError("Invalid value for `name`, must be a follow pattern or equal to `/^[^/]+$/`")
ValueError: Invalid value for `name`, must be a follow pattern or equal to `/^[^/]+$/`
The file name produced by the server has '/' in it, but the API does not allow these in names.
Specifically, the problem seems to be either (a) the server generates incorrect names or (b) the validation pattern is wrong at swagger-dwapi-def.json : definitions.FileSummaryResponse.properties.name.pattern
Currently, cache data produced by calling raw_data
, tables
and dataframes
is not invalidated when load_dataset()
is called again. Instead, when load_dataset()
is called, all cache entries for the given dataset_key
should be invalidated.
Currently, query results are downloaded eagerly, in spite of the use of stream=True
. Instead, the query method should return an iterable leverage requests' iter_content()
to download the data on-demand.
I was in a conference and got this error and couldn't move forward.
dw.load_dataset('data-society/the-simpsons-by-the-data') AttributeError Traceback (most recent call last) in () ----> 1 lds = dw.load_dataset('data-society/the-simpsons-by-the-data') # , force_update=True) /Users/plee/anaconda/lib/python2.7/site-packages/datadotworld/__init__.pyc in load_dataset(dataset_key, force_update, profile) 87 ['changelog', 'datadotworldbballstats', 'datadotworldbballteam'] 88 """ ---> 89 return _get_instance(profile).load_dataset(dataset_key, 90 force_update=force_update) 91 /Users/plee/anaconda/lib/python2.7/site-packages/datadotworld/__init__.pyc in _get_instance(profile) 42 if instance is None: 43 config_param = (ChainedConfig() ---> 44 if profile == 'default' 45 else FileConfig(profile=profile)) 46 instance = DataDotWorld(config=config_param) /Users/plee/anaconda/lib/python2.7/site-packages/datadotworld/config.pyc in __init__(self, **kwargs) 203 # Overrides (for testing) 204 self._config_chain = kwargs.get('config_chain', --> 205 [EnvConfig(), FileConfig()]) 206 207 def __getattribute__(self, item): /Users/plee/anaconda/lib/python2.7/site-packages/datadotworld/config.pyc in __init__(self, profile, **kwargs) 108 109 if path.isfile(self._config_file_path): --> 110 self._config_parser.read_file(open(self._config_file_path)) 111 if self.__migrate_invalid_defaults(self._config_parser) > 0: 112 self.save() AttributeError: SafeConfigParser instance has no attribute 'read_file'
Support for more possible types as query parameters should be added.
date
and datetime
should map naturally to their XSD counterparts:
https://docs.python.org/3/library/datetime.html
To give the developer complete capability to express values, add RdfLiteralParam
which will have two parts value
and type
where value
is a string and type
is a URI - and which will render into "{value}"^^<{type}>
. Existing code in the datadotworld.convert_to_sparql_literal
should be refactored to use this class.
The API should report and allow to be set the resource descriptions.
For SPARQL queries:
DECRIBE
and CONSTRUCT
queries as pure RDF and skip table schema inferencingSELECT
queries in cases where variables are not typed consistentlyuri
and bnode
terms in query resultslds = dw.load_dataset('data-society/the-simpsons-by-the-data')
gives me:ValueError Traceback (most recent call last)
in ()
1 # load dataset
----> 2 lds = dw.load_dataset('data-society/the-simpsons-by-the-data')
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld_init_.py in load_dataset(dataset_key, force_update, auto_update, profile, **kwargs)
99 load_dataset(dataset_key,
100 force_update=force_update,
--> 101 auto_update=auto_update)
102
103
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\datadotworld.py in load_dataset(self, dataset_key, force_update, auto_update)
160 else:
161 try:
--> 162 dataset_info = self.api_client.get_dataset(dataset_key)
163 except RestApiError as e:
164 return LocalDataset(descriptor_file)
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client\api.py in get_dataset(self, dataset_key)
96 try:
97 return self._datasets_api.get_dataset(
---> 98 *(parse_dataset_key(dataset_key))).to_dict()
99 except _swagger.rest.ApiException as e:
100 raise RestApiError(cause=e)
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\apis\datasets_api.py in get_dataset(self, owner, id, **kwargs)
644 return self.get_dataset_with_http_info(owner, id, **kwargs)
645 else:
--> 646 (data) = self.get_dataset_with_http_info(owner, id, **kwargs)
647 return data
648
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\apis\datasets_api.py in get_dataset_with_http_info(self, owner, id, **kwargs)
727 _preload_content=params.get('_preload_content', True),
728 _request_timeout=params.get('_request_timeout'),
--> 729 collection_formats=collection_formats)
730
731 def patch_dataset(self, owner, id, body, **kwargs):
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, callback, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
324 body, post_params, files,
325 response_type, auth_settings, callback,
--> 326 _return_http_data_only, collection_formats, _preload_content, _request_timeout)
327 else:
328 thread = threading.Thread(target=self.__call_api,
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, callback, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
159 # deserialize response data
160 if response_type:
--> 161 return_data = self.deserialize(response_data, response_type)
162 else:
163 return_data = None
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in deserialize(self, response, response_type)
237 data = response.data
238
--> 239 return self.__deserialize(data, response_type)
240
241 def __deserialize(self, data, klass):
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize(self, data, klass)
277 return self.__deserialize_datatime(data)
278 else:
--> 279 return self.__deserialize_model(data, klass)
280
281 def call_api(self, resource_path, method,
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize_model(self, data, klass)
627 and isinstance(data, (list, dict)):
628 value = data[klass.attribute_map[attr]]
--> 629 kwargs[attr] = self.__deserialize(value, attr_type)
630
631 instance = klass(**kwargs)
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize(self, data, klass)
255 sub_kls = re.match('list[(.*)]', klass).group(1)
256 return [self.__deserialize(sub_data, sub_kls)
--> 257 for sub_data in data]
258
259 if klass.startswith('dict('):
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in (.0)
255 sub_kls = re.match('list[(.*)]', klass).group(1)
256 return [self.__deserialize(sub_data, sub_kls)
--> 257 for sub_data in data]
258
259 if klass.startswith('dict('):
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize(self, data, klass)
277 return self.__deserialize_datatime(data)
278 else:
--> 279 return self.__deserialize_model(data, klass)
280
281 def call_api(self, resource_path, method,
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\api_client.py in __deserialize_model(self, data, klass)
629 kwargs[attr] = self.__deserialize(value, attr_type)
630
--> 631 instance = klass(**kwargs)
632
633 return instance
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\models\file_summary_response.py in init(self, name, source, description, labels, size_in_bytes, created, updated)
68 self.source = source
69 if description is not None:
---> 70 self.description = description
71 if labels is not None:
72 self.labels = labels
C:\Users\LW130003\Anaconda3\lib\site-packages\datadotworld\client_swagger\models\file_summary_response.py in description(self, description)
151 raise ValueError("Invalid value for description
, length must be less than or equal to 240
")
152 if description is not None and len(description) < 1:
--> 153 raise ValueError("Invalid value for description
, length must be greater than or equal to 1
")
154
155 self._description = description
ValueError: Invalid value for description
, length must be greater than or equal to 1
the test_tables
test in test_dataset.py
is broken - it APPEARS that the issue is that a newer version of libraries pulled in vi datapackage
Python 2.7.14
pip list:
Package Version
apipkg 1.4
attrs 18.1.0
awsebcli 3.12.1
backports.shutil-get-terminal-size 1.0.0
beautifulsoup4 4.6.0
boto3 1.5.22
botocore 1.8.50
CacheControl 0.12.4
cchardet 1.1.3
cement 2.8.2
certifi 2018.4.16
chardet 3.0.4
click 6.7
colorama 0.3.7
configparser 3.5.0
coverage 4.0.3
cssselect 1.0.3
datadotworld 1.6.0
datapackage 0.8.9
decorator 4.3.0
Django 1.11.13
django-debug-toolbar 1.9.1
django-extensions 1.9.9
django-rest-auth 0.9.3
django-storages 1.6.5
django-webtest 1.9.2
djangorestframework 3.7.7
docker-py 1.7.2
dockerpty 0.4.1
docopt 0.6.2
docutils 0.14
drf-extensions 0.3.1
enum34 1.1.6
et-xmlfile 1.0.1
execnet 1.5.0
flake8 3.5.0
flake8-polyfill 1.0.2
freezegun 0.3.9
funcsigs 1.0.2
functools32 3.2.3.post2
future 0.16.0
futures 3.2.0
idna 2.6
ijson 2.3
ipdb 0.11
ipython 5.5.0
ipython-genutils 0.2.0
isodate 0.6.0
jdcal 1.4
jmespath 0.9.3
jsonlines 1.2.0
jsonschema 2.6.0
jsontableschema 0.10.1
linear-tsv 1.1.0
lxml 4.2.1
mccabe 0.6.1
mock 2.0.0
model-mommy 1.5.1
msgpack-python 0.5.6
numpy 1.14.3
openpyxl 2.5.3
pandas 0.23.0
pathlib2 2.3.2
pathspec 0.5.0
pbr 4.0.3
pep8 1.7.1
pep8-naming 0.5.0
pickleshare 0.7.4
Pillow 5.0.0
pip 10.0.1
pluggy 0.6.0
prompt-toolkit 1.0.15
psycopg2 2.7.4
py 1.5.3
pycodestyle 2.3.1
pyflakes 1.5.0
Pygments 2.2.0
pyquery 1.4.0
pytest 3.3.2
pytest-cache 1.0
pytest-cov 2.5.1
pytest-django 3.1.2
pytest-flake8 0.9.1
pytest-pep8 1.0.6
python-coveralls 2.9.1
python-dateutil 2.7.3
pytz 2017.3
PyYAML 3.12
requests 2.8.0
rfc3986 0.4.1
s3transfer 0.1.13
scandir 1.7
semantic-version 2.5.0
setuptools 39.2.0
sh 1.12.14
simplegeneric 0.8.1
six 1.11.0
SQLAlchemy 1.2.8
sqlparse 0.2.4
tabulate 0.7.5
tabulator 1.4.1
termcolor 1.1.0
tqdm 4.19.5
traitlets 4.3.2
typing 3.6.4
unicodecsv 0.14.1
urllib3 1.22
waitress 1.1.0
wcwidth 0.1.7
WebOb 1.8.1
websocket-client 0.48.0
WebTest 2.0.29
wheel 0.31.1
whitenoise 3.3.1
win-unicode-console 0.5
xlrd 1.1.0
I'm running into this error when I run dw configure
:
File "/usr/local/bin/dw", line 6, in <module>
from pkg_resources import load_entry_point
File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 2991, in <module>
@_call_aside
File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 2977, in _call_aside
f(*args, **kwargs)
File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 3004, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 664, in _build_master
return cls._build_from_requirements(__requires__)
File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 677, in _build_from_requirements
dists = ws.resolve(reqs, Environment())
File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 861, in resolve
raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (requests 2.5.3 (/Library/Python/2.7/site-packages), Requirement.parse('requests<3.0a,>=2.8'), set(['datapackage', 'tabulator']))```
Anyone know how to debug this?
After inputting the API key after dw configure
, I got back this python
error:
Traceback (most recent call last):
File "/usr/local/bin/dw", line 11, in <module>
load_entry_point('datadotworld==1.0.0b4', 'console_scripts', 'dw')()
File "/Library/Python/2.7/site-packages/click/core.py", line 716, in __call__
return self.main(*args, **kwargs)
File "/Library/Python/2.7/site-packages/click/core.py", line 696, in main
rv = self.invoke(ctx)
File "/Library/Python/2.7/site-packages/click/core.py", line 1060, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Library/Python/2.7/site-packages/click/core.py", line 889, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Library/Python/2.7/site-packages/click/core.py", line 534, in invoke
return callback(*args, **kwargs)
File "/Library/Python/2.7/site-packages/click/decorators.py", line 27, in new_func
return f(get_current_context().obj, *args, **kwargs)
File "/Library/Python/2.7/site-packages/datadotworld/cli.py", line 55, in configure
config.auth_token = token
File "/Library/Python/2.7/site-packages/datadotworld/config.py", line 82, in auth_token
self._config_parser.add_section(self._profile)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 261, in add_section
raise ValueError, 'Invalid section name: %s' % section
ValueError: Invalid section name: default
What should I do about this?
Please take a moment to manually test all new features introduced in 1.4.3. If possible, follow the steps in the current README and update it accordingly.
Adding a test code:
# coding=utf-8
import csv
import datadotworld as dw
DW_USER_NAME = "my_username"
DW_PROJECT_NAME = "my_projectname"
csv_filename = 'test_saving_unicode_strings.csv'
dw_project_path = '{}/{}'.format(DW_USER_NAME, DW_PROJECT_NAME)
with dw.open_remote_file(dw_project_path, csv_filename ) as w:
csvw = csv.DictWriter(w, fieldnames=['foo', 'bar'])
csvw.writeheader()
csvw.writerow({'foo':42, 'bar':u"A"})
csvw.writerow({'foo':13, 'bar':u"искам уникод запис"})
What I have tried:
encoding="utf-8"
to the open_remove_file callEach time I receive UnicodeEncodeError: or UnicodeDecodeError in different place:
Traceback (most recent call last):
File ".\a2.py", line 13, in <module>
csvw.writerow({'foo':13, 'bar':u"╨╕╤Б╨║╨░╨╝ ╤Г╨╜╨╕╨║╨╛╨┤ ╨╖╨░╨┐╨╕╤Б"})
File "c:\python27\Lib\csv.py", line 152, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
or
Traceback (most recent call last):
File ".\a2.py", line 13, in <module>
csvw.writerow({'foo':13, 'bar':u"╨╕╤Б╨║╨░╨╝ ╤Г╨╜╨╕╨║╨╛╨┤ ╨╖╨░╨┐╨╕╤Б".encode("utf-8")})
File "c:\python27\Lib\csv.py", line 152, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "C:\python27\Lib\site-packages\datadotworld\files.py", line 107, in write
self._queue.put(value.encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 3: ordinal not in range(128)
With the new update, I have this issue with after putting my API key into dw configure
:
Traceback (most recent call last):
File "/usr/local/bin/dw", line 11, in <module>
load_entry_point('datadotworld==1.0.0b5', 'console_scripts', 'dw')()
File "/Library/Python/2.7/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/Library/Python/2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/Library/Python/2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Library/Python/2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Library/Python/2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/Library/Python/2.7/site-packages/click/decorators.py", line 27, in new_func
return f(get_current_context().obj, *args, **kwargs)
File "/Library/Python/2.7/site-packages/datadotworld/cli.py", line 54, in configure
config = obj.get('config') or Config(obj['profile'])
File "/Library/Python/2.7/site-packages/datadotworld/config.py", line 65, in __init__
self._config_file_path)
File "/Library/Python/2.7/site-packages/datadotworld/config.py", line 117, in __migrate_config
config_parser[configparser.DEFAULTSECT] = {'auth_token': token}
File "/Library/Python/2.7/site-packages/backports/configparser/__init__.py", line 992, in __setitem__
self.read_dict({key: value})
File "/Library/Python/2.7/site-packages/backports/configparser/__init__.py", line 760, in read_dict
self.set(section, key, value)
File "/Library/Python/2.7/site-packages/backports/configparser/__init__.py", line 1238, in set
_, option, value = self._validate_value_types(option=option, value=value)
File "/Library/Python/2.7/site-packages/backports/configparser/__init__.py", line 1221, in _validate_value_types
raise TypeError("option values must be strings")
TypeError: option values must be strings
What should I do about this?
When sending a SPARQL query with parameters, all python values are converted into boolean, integer, decimal, or assumed to be strings - it's not possible to pass a value that is meant to be treated as a URI.
since there's no deterministic way to determine whether the string value 'http://something.com/whatever'
is meant to be treated as a string or a URI, we need something in the Python type system to differentiate the two cases. Since the current behavior is to treat them as strings, we should make that the default for backwards compatibility - and that's a reasonable default in any case.
I'm proposing to add a simple "wrapper type" UriParam
that can be used to indicate that a parameter value is meant to be treated as a URI - then the type mediation code can do the right thing to convert the value into a URI parameter
__repr__
should return something "better to see". If possible, it should return a dict with str
keys and only lazy loaded values.
This way I can handle multiple users at once.
Request JSON when invoking query endpoints and use the type metadata in the response to apply appropriate Python and pandas types to values, as opposed to using str for every value.
Right now getting results as CSV or __repl__
in the ipython shell will throw a UnicodeEncodeError
in Python 2.x if the results contain non ASCII characters.
We should implement UnicodeCSVReader (to solve the results case). Not sure the best way to solve the ipython __repl__
case
Update api_client.py and autogen swagger classes to add support for new /streams
endpoint.
The RestApiClient.create_dataset() method currently returns None, which makes it difficult to automate later processing on the created dataset. It should return the dataset id of the newly created dataset, or the return value from RestApiClient.get_dataset()
Currently, LocalDataset
consumes too much memory when loading dataframes, for example. Most likely because of how it uses datapackage-py
in its underlying implementation.
It won't complete my pip install of datadotworld, with or without pandas. Sort of new to this so I have no idea what's going on with this error and couldn't find it online. I think the error is at the end where I made the text bold (same as the title). Running macOS Sierra. Here's what terminal is saying:
`Collecting datadotworld
Using cached datadotworld-1.4.2-py2.py3-none-any.whl
Requirement already satisfied: python-dateutil<3.0a,>=2.6.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Requirement already satisfied: requests<3.0a,>=2.0.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Requirement already satisfied: certifi>=2017.04.17 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Collecting datapackage<1.0a,>=0.8.8 (from datadotworld)
Using cached datapackage-0.8.9-py2.py3-none-any.whl
Collecting urllib3<2.0a,>=1.15 (from datadotworld)
Using cached urllib3-1.22-py2.py3-none-any.whl
Requirement already satisfied: configparser<4.0a,>=3.5.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Collecting jsontableschema<1.0a,>=0.10.0 (from datadotworld)
Using cached jsontableschema-0.10.1-py2.py3-none-any.whl
Requirement already satisfied: click<7.0a,>=6.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Requirement already satisfied: six<2.0a,>=1.5.0 in /Applications/anaconda/lib/python2.7/site-packages (from datadotworld)
Collecting tabulator<=1.4.1 (from datadotworld)
Using cached tabulator-1.4.1-py2.py3-none-any.whl
Requirement already satisfied: unicodecsv<1.0a,>=0.14 in /Applications/anaconda/lib/python2.7/site-packages (from datapackage<1.0a,>=0.8.8->datadotworld)
Requirement already satisfied: jsonschema<3.0a,>=2.5 in /Applications/anaconda/lib/python2.7/site-packages (from datapackage<1.0a,>=0.8.8->datadotworld)
Collecting rfc3986<1.0,>=0.4 (from jsontableschema<1.0a,>=0.10.0->datadotworld)
Using cached rfc3986-0.4.1-py2.py3-none-any.whl
Collecting isodate<1.0,>=0.5.4 (from jsontableschema<1.0a,>=0.10.0->datadotworld)
Using cached isodate-0.6.0-py2.py3-none-any.whl
Collecting future<1.0,>=0.15 (from jsontableschema<1.0a,>=0.10.0->datadotworld)
Requirement already satisfied: jsonlines<2.0,>=1.1 in /Applications/anaconda/lib/python2.7/site-packages (from tabulator<=1.4.1->datadotworld)
Requirement already satisfied: sqlalchemy<2.0,>=1.1 in /Applications/anaconda/lib/python2.7/site-packages (from tabulator<=1.4.1->datadotworld)
Collecting cchardet<2.0,>=1.0 (from tabulator<=1.4.1->datadotworld)
Using cached cchardet-1.1.3.tar.gz
Collecting ijson<3.0,>=2.0 (from tabulator<=1.4.1->datadotworld)
Using cached ijson-2.3-py2.py3-none-any.whl
Collecting linear-tsv<2.0,>=1.0 (from tabulator<=1.4.1->datadotworld)
Requirement already satisfied: xlrd<2.0,>=1.0 in /Applications/anaconda/lib/python2.7/site-packages (from tabulator<=1.4.1->datadotworld)
Requirement already satisfied: openpyxl<3.0,>=2.4 in /Applications/anaconda/lib/python2.7/site-packages (from tabulator<=1.4.1->datadotworld)
Requirement already satisfied: functools32 in /Applications/anaconda/lib/python2.7/site-packages (from jsonschema<3.0a,>=2.5->datapackage<1.0a,>=0.8.8->datadotworld)
Requirement already satisfied: jdcal in /Applications/anaconda/lib/python2.7/site-packages (from openpyxl<3.0,>=2.4->tabulator<=1.4.1->datadotworld)
Requirement already satisfied: et_xmlfile in /Applications/anaconda/lib/python2.7/site-packages (from openpyxl<3.0,>=2.4->tabulator<=1.4.1->datadotworld)
Building wheels for collected packages: cchardet
Running setup.py bdist_wheel for cchardet ... error
Complete output from command /Applications/anaconda/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-build-KxbsQY/cchardet/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/tmpIPYH1npip-wheel- --python-tag cp27:
cythonize: ['src/cchardet/_cchardet.pyx']
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-2.7
creating build/lib.macosx-10.7-x86_64-2.7/cchardet
copying src/cchardet/init.py -> build/lib.macosx-10.7-x86_64-2.7/cchardet
copying src/cchardet/version.py -> build/lib.macosx-10.7-x86_64-2.7/cchardet
running build_ext
building 'cchardet._cchardet' extension
creating build/temp.macosx-10.7-x86_64-2.7
creating build/temp.macosx-10.7-x86_64-2.7/src
creating build/temp.macosx-10.7-x86_64-2.7/src/cchardet
creating build/temp.macosx-10.7-x86_64-2.7/src/ext
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet/src
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet/src/base
gcc -fno-strict-aliasing -I/Applications/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Isrc/ext/libcharsetdetect/mozilla/extensions/universalchardet/src/base/ -Isrc/ext/libcharsetdetect/nspr-emu/ -Isrc/ext/libcharsetdetect/ -I/Applications/anaconda/include/python2.7 -c src/cchardet/_cchardet.cpp -o build/temp.macosx-10.7-x86_64-2.7/src/cchardet/_cchardet.o
Agreeing to the Xcode/iOS license requires admin privileges, please run “sudo xcodebuild -license” and then retry this command.
error: command 'gcc' failed with exit status 69
Failed building wheel for cchardet
Running setup.py clean for cchardet
Failed to build cchardet
Installing collected packages: cchardet, ijson, linear-tsv, tabulator, rfc3986, isodate, future, jsontableschema, datapackage, urllib3, datadotworld
Running setup.py install for cchardet ... error
Complete output from command /Applications/anaconda/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-build-KxbsQY/cchardet/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-BpLyfT-record/install-record.txt --single-version-externally-managed --compile:
cythonize: ['src/cchardet/_cchardet.pyx']
running install
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-2.7
creating build/lib.macosx-10.7-x86_64-2.7/cchardet
copying src/cchardet/init.py -> build/lib.macosx-10.7-x86_64-2.7/cchardet
copying src/cchardet/version.py -> build/lib.macosx-10.7-x86_64-2.7/cchardet
running build_ext
building 'cchardet._cchardet' extension
creating build/temp.macosx-10.7-x86_64-2.7
creating build/temp.macosx-10.7-x86_64-2.7/src
creating build/temp.macosx-10.7-x86_64-2.7/src/cchardet
creating build/temp.macosx-10.7-x86_64-2.7/src/ext
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet/src
creating build/temp.macosx-10.7-x86_64-2.7/src/ext/libcharsetdetect/mozilla/extensions/universalchardet/src/base
gcc -fno-strict-aliasing -I/Applications/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Isrc/ext/libcharsetdetect/mozilla/extensions/universalchardet/src/base/ -Isrc/ext/libcharsetdetect/nspr-emu/ -Isrc/ext/libcharsetdetect/ -I/Applications/anaconda/include/python2.7 -c src/cchardet/_cchardet.cpp -o build/temp.macosx-10.7-x86_64-2.7/src/cchardet/_cchardet.o
Agreeing to the Xcode/iOS license requires admin privileges, please run “sudo xcodebuild -license” and then retry this command.
error: command 'gcc' failed with exit status 69
----------------------------------------
Command "/Applications/anaconda/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-build-KxbsQY/cchardet/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-BpLyfT-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/2z/pnm8d8f56dd8c3xd8qf6g8jr0000gn/T/pip-build-KxbsQY/cchardet/`
Retire swagger auto-gen in favor of simpler implementation using requests
.
Start by getting rid of:
The pattern I would like to encourage is one where:
Content-Type
needs to be defined (e.g. streams): TBDAccepts
needs to be defined (e.g. sql): TBDHTTP 429
responses are automatically handled (see: https://github.com/datadotworld/target-datadotworld/blob/master/target_datadotworld/api_client.py#L366)Additionally:
IMPORTANT: To maximize the benefits of these improvements, I would suggest breaking compatibility (i.e. release as v2.0).
I'm using the API to create datasets that have long titles, > 30 characters. Because the titles are used in keys, they must be truncated to the 30 character limit for keys, and the key apparently can't have ellipses ('...') at the end, so the title is truncated abruptly.
The API will throw an exception on long title:
ValueError: Invalid value for
title
, length must be less than or equal to30
However, it appears that the web application does not enforce a 30 character limit, neither for the display titles nor the dataset key.
It would be preferable if the API either removed the 30 character limit, or allowed for a display title that could be longer, with a separate value for the dataset key slug.
Similar to https://github.com/datadotworld/dwapi-spec/blob/master/CONTRIBUTING.md please.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.