druid-io / pydruid Goto Github PK
View Code? Open in Web Editor NEWA Python connector for Druid
License: Other
A Python connector for Druid
License: Other
Thanks for the nice package; it seems to be useful! 馃憤
It is possible to combine several data sources together, like so:
{
"type": "union",
"dataSources": ["<string_value1>", "<string_value2>", "<string_value3>", ... ]
}
Is there a way to consider this case when using pydruid?
Dear Deep,
I was trying out a groupby query on metrics. pyDruid told me I had a malformed query, but when I run the generated query through curl it works.
import pydruid.client
import datetime
bard_url = 'http://x.x.x.x:8080/'
endpoint = 'druid/v2/?pretty'
query = pydruid.client.pyDruid(bard_url,endpoint)
dataSource = 'mmx_metrics'
filters = (pydruid.client.Dimension("metric") == "query/time") & (pydruid.client.Dimension("service") == "druid/prod/bard")
intervals = [datetime.datetime.utcnow().isoformat() + '/PT5M']
foo = query.groupBy(dataSource=dataSource, intervals=intervals, granularity="minute", dimensions=['host','service'], aggregations = {"count": pydruid.client.doubleSum("count")}, filter=filters)
Gives me:
---------------------------------------------------------------------------
IOError Traceback (most recent call last)
<ipython-input-11-9d231bdb8e44> in <module>()
----> 1 foo = query.groupBy(dataSource=dataSource, intervals=intervals, granularity="minute", dimensions=['host','service'], aggregations = {"count": pydruid.client.doubleSum("count")}, filter=filters)
/usr/lib/python2.7/site-packages/pyDruid-0.1.7-py2.7.egg/pydruid/client.pyc in groupBy(self, **args)
157 self.query_dict = query_dict
158 self.query_type = 'groupby'
--> 159 return self.post(query_dict)
160
161 def segmentMetadata(self, **args):
/usr/lib/python2.7/site-packages/pyDruid-0.1.7-py2.7.egg/pydruid/client.pyc in post(self, query)
47 res.close()
48 except urllib2.HTTPError, e:
---> 49 raise IOError('Malformed query: \n {0}'.format(json.dumps(self.query_dict, indent = 4)))
50 else:
51 self.result = self.parse()
IOError: Malformed query:
{
"dimensions": [
"host",
"service"
],
"aggregations": [
{
"type": "doubleSum",
"fieldName": "count",
"name": "count"
}
],
"filter": {
"fields": [
{
"type": "selector",
"dimension": "metric",
"value": "query/time"
},
{
"type": "selector",
"dimension": "service",
"value": "druid/prod/bard"
}
],
"type": "and"
},
"intervals": [
"2013-12-06T00:38:38.760172/PT5M"
],
"dataSource": "mmx_metrics",
"granularity": "minute",
"queryType": "groupBy"
}
I put the generated query into /tmp/query.druid and ran the following:
curl -X POST "http://x.x.x.x:8080/druid/v2/?pretty" -H 'content-type: application/json' -d @/tmp/query.druid
It returned the results I expected.
I saw this with both the pip installed version and the git version.
-Jeff
We need to package superset, pydruid alongwith the dependencies to support offline installations for our clients.
Now the issue is that requests brings in a LGPL dependent library named 'chardet' which we cannot ship in the bundle. (https://github.com/requests/requests/blob/master/setup.py#L46)
On a quick glance at the usage of requests inside pydruid, it seems this can easily be replaced with urllib.
I'd like to write assertions to check that the filter's my code generates are valid. Can Filters implement eq to allow comparisons to work?
Eg.
>>> f1 = Filter(value=1, dimension="test")
>>> f2 = Filter(value=1, dimension="test")
>>> f1 == f2
False
I'm doing virtually the same thing as the last example on the github page, where I try to create a filter with Dimension(something).in_([...])
, but the error in the title showed up. Is this a problem with python version? I'm using 3.5.3
. I've also checked that my pydruid
version is 0.3.1
.
this comes up after running pip install pydruid:
Running setup.py egg_info for package pydruid
Traceback (most recent call last):
File "", line 14, in ?
File "/home/ctsai/build/pydruid/setup.py", line 30, in ?
tests_require=['pytest', 'six', 'mock'],
File "/usr/lib64/python2.4/distutils/core.py", line 110, in setup
_setup_distribution = dist = klass(attrs)
File "/usr/lib/python2.4/site-packages/setuptools/dist.py", line 219, in init
self.fetch_build_eggs(attrs.pop('setup_requires'))
File "/usr/lib/python2.4/site-packages/setuptools/dist.py", line 242, in fetch_build_eggs
for dist in working_set.resolve(
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 481, in resolve
dist = best[req.key] = env.best_match(req, self, installer)
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 717, in best_match
return self.obtain(req, installer) # try and download/install
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 729, in obtain
return installer(requirement)
File "/usr/lib/python2.4/site-packages/setuptools/dist.py", line 286, in fetch_build_egg
return cmd.easy_install(req)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 446, in easy_install
return self.install_item(spec, dist.location, tmpdir, deps)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 471, in install_item
dists = self.install_eggs(spec, download, tmpdir)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 655, in install_eggs
return self.build_and_install(setup_script, setup_base)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 930, in build_and_install
self.run_setup(setup_script, setup_base, args)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 919, in run_setup
run_setup(setup_script, args)
File "/usr/lib/python2.4/site-packages/setuptools/sandbox.py", line 26, in run_setup
DirectorySandbox(setup_dir).run(
File "/usr/lib/python2.4/site-packages/setuptools/sandbox.py", line 63, in run
return func()
File "/usr/lib/python2.4/site-packages/setuptools/sandbox.py", line 29, in
{'file':setup_script, 'name':'main'}
File "setup.py", line 9
with io.open('README.rst', encoding='utf-8') as readme:
^
SyntaxError: invalid syntax
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 14, in ?
File "/home/ctsai/build/pydruid/setup.py", line 30, in ?
tests_require=['pytest', 'six', 'mock'],
File "/usr/lib64/python2.4/distutils/core.py", line 110, in setup
_setup_distribution = dist = klass(attrs)
File "/usr/lib/python2.4/site-packages/setuptools/dist.py", line 219, in init
self.fetch_build_eggs(attrs.pop('setup_requires'))
File "/usr/lib/python2.4/site-packages/setuptools/dist.py", line 242, in fetch_build_eggs
for dist in working_set.resolve(
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 481, in resolve
dist = best[req.key] = env.best_match(req, self, installer)
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 717, in best_match
return self.obtain(req, installer) # try and download/install
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 729, in obtain
return installer(requirement)
File "/usr/lib/python2.4/site-packages/setuptools/dist.py", line 286, in fetch_build_egg
return cmd.easy_install(req)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 446, in easy_install
return self.install_item(spec, dist.location, tmpdir, deps)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 471, in install_item
dists = self.install_eggs(spec, download, tmpdir)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 655, in install_eggs
return self.build_and_install(setup_script, setup_base)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 930, in build_and_install
self.run_setup(setup_script, setup_base, args)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 919, in run_setup
run_setup(setup_script, args)
File "/usr/lib/python2.4/site-packages/setuptools/sandbox.py", line 26, in run_setup
DirectorySandbox(setup_dir).run(
File "/usr/lib/python2.4/site-packages/setuptools/sandbox.py", line 63, in run
return func()
File "/usr/lib/python2.4/site-packages/setuptools/sandbox.py", line 29, in
{'__file__':setup_script, '__name__':'__main__'}
File "setup.py", line 9
with io.open('README.rst', encoding='utf-8') as readme:
^
SyntaxError: invalid syntax
Command python setup.py egg_info failed with error code 1
Please i need help, i am new to python i am following a training video from Udemy build 10 world real applications,
i am about building the dictionary but i am stuck with the errors below.
i don't know what else to do. i will appreciate any help concering how to fix this error.
thanks
parse_constant=parse_constant, object_pairs_hook=obje
ct_pairs_hook, **kw)
File "C:\Users\learn\AppData\Local\Programs\Python\Pyth
on37-32\lib\json\__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "C:\Users\learn\AppData\Local\Programs\Python\Pyth
on37-32\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\learn\AppData\Local\Programs\Python\Pyth
on37-32\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value
) from None
json.decoder.JSONDecodeError: Expecting value: line 1 col
umn 1 (char 0)
Is it possibile to bump a new version (maybe 0.3.1
) including the in filter support?
Based on what I read in the docs, Druid supports custom granularities like this:
{"type": "duration", "duration": 7200000}
Is there support for that in PyDruid?
Some Druid are running with Kerberos enabled, that would be nice to have pydruid to work with these kerberized instances.
I just saw that you are using the requests library to request the druid http api.
And I also saw that there is a requests-kerberos library to add kerberos auth.
Would it be possible to integrate it in pydruid?
The only requirement would be to add an argument in the requests calls, like this example:
import requests
from requests_kerberos import HTTPKerberosAuth, REQUIRED
kerberos_auth = HTTPKerberosAuth(mutual_authentication=REQUIRED, sanitize_mutual_error_response=False)
r = requests.get("https://windows.example.org/wsman", auth=kerberos_auth)
how we can insert data into druid using python code?
I am trying to read druid data base using python 3.5 and DB API of pydruid; however whenever I run the execute statement I get the error:
Code I am running:
from pydruid.db import connect
conn = connect(host='XXXXXXX', port=8082, path='/druid/v2', scheme='http')
curs = conn.cursor()
curs.execute("""SELECT * FROM wikipedia LIMIT 10""")
Error:
_---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
in ()
----> 1 curs.execute("""SELECT * FROM wikipedia LIMIT 10""")
/apps/cmor/anaconda3/lib/python3.5/site-packages/pydruid-0.3.1-py3.5.egg/pydruid/db/api.py in g(self, *args, **kwargs)
39 raise exceptions.Error(
40 '{klass} already closed'.format(klass=self.class.name))
---> 41 return f(self, *args, **kwargs)
42 return g
43
/apps/cmor/anaconda3/lib/python3.5/site-packages/pydruid-0.3.1-py3.5.egg/pydruid/db/api.py in execute(self, operation, parameters)
187 # let's consume it and insert it back.
188 results = self._stream_query(query)
--> 189 first_row = next(results)
190 self._results = itertools.chain([first_row], results)
191
/apps/cmor/anaconda3/lib/python3.5/site-packages/pydruid-0.3.1-py3.5.egg/pydruid/db/api.py in _stream_query(self, query)
267 # raise any error messages
268 if r.status_code != 200:
--> 269 payload = r.json()
270 msg = (
271 '{error} ({errorClass}): {errorMessage}'.format(**payload)
/apps/cmor/anaconda3/lib/python3.5/site-packages/requests/models.py in json(self, **kwargs)
892 # used.
893 pass
--> 894 return complexjson.loads(self.text, **kwargs)
895
896 @Property
/apps/cmor/anaconda3/lib/python3.5/json/init.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
317 parse_int is None and parse_float is None and
318 parse_constant is None and object_pairs_hook is None and not kw):
--> 319 return _default_decoder.decode(s)
320 if cls is None:
321 cls = JSONDecoder
/apps/cmor/anaconda3/lib/python3.5/json/decoder.py in decode(self, s, _w)
337
338 """
--> 339 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
340 end = _w(s, end).end()
341 if end != len(s):
/apps/cmor/anaconda3/lib/python3.5/json/decoder.py in raw_decode(self, s, idx)
355 obj, end = self.scan_once(s, idx)
356 except StopIteration as err:
--> 357 raise JSONDecodeError("Expecting value", s, err.value) from None
358 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)_
Some of the functions/constructors like longsum and Field are not included when you import from pydruid.client import *
, you need to go into utils, so I think the readme needs to be updated. I can do it myself if needed, but didn't want to give incorrect information since I'm unfamiliar with pydruid.
Hi,
I am trying pydruid for the first time on python 2.7, with wikipedia data source.
However I am getting the following error when trying to execute the following query in python.
`top_langs = query.topn(
datasource = "wikipedia",
granularity = "all",
intervals = "2013-06-01T00:00/2020-01-01T00",
dimension = "channel",
filter = Dimension("namespace") == "article",
aggregations = {"edit_count": longsum("count")},
metric = "edit_count",
threshold = 4
)
print top_langs # Do this if you want to see the raw JSON`
`NameError Traceback (most recent call last)
in ()
4 intervals = "2013-06-01T00:00/2020-01-01T00",
5 dimension = "channel",
----> 6 filter = Dimension("namespace") == "article",
7 aggregations = {"edit_count": longsum("count")},
8 metric = "edit_count",
NameError: name 'Dimension' is not defined`
Hi, am I wrong or there's no support for "filtered" aggregation?
When building query if filter=None
then an exception occurs:
File "/Users/se7entyse7en/Envs/viralize-web/lib/python2.7/site-packages/pydruid/utils/filters.py", line 61, in build_filter
return filter_obj.filter['filter']
AttributeError: 'NoneType' object has no attribute 'filter'
Hello all!
I'm trying to create a datasource in superset, using pydruid. I can use pydruid cli successfully, but when I try to create a superset datasource, pointing to druid I'm getting this error: "Can't load plugin sqlalchemy.dialects:druid".
Any idea how to solve this?
Best regards.
Superset version
0.15.0 integrated with hadoop
Expected results
Integrate Druid with SQL lab
Actual results
Unable to integrate Druid with SQL lab
Steps to reproduce
Install pydruid and try to create a database using this plugin
How to perform following aggregation operation in pydruid AVG(DISTINCT(col))
Hello All,
How i can generate subqueries using pydruid because datasource field only take either str or list?
""" ValueError: Datasource definition not valid. Must be string or list of strings """
Below is the sample query. On which I am passing query output of 1st query to another query as datasource.
{
"queryType": "groupBy",
"dataSource":{
"type": "query",
"query": {
"queryType": "groupBy",
"dataSource": "druid_source",
"granularity": {"type": "period", "period": "P1M"},
"dimensions": ["source_dim"],
"aggregations": [
{ "type": "doubleMax", "name": "value", "fieldName": "stream_value" }
],
"intervals": [ "2012-01-01T00:00:00.000/2020-01-03T00:00:00.000" ]
}
},
"granularity": "hour",
"dimensions": ["source_dim"],
"aggregations": [
{ "type": "longSum", "name": "outerquerryvalue", "fieldName": "value" }
],
"intervals": [ "2012-01-01T00:00:00.000/2020-01-03T00:00:00.000" ]
}
I installed pydruid
with pip
.
Doing, Filter(type="selector", dimension="dim", value=val)
returns,
'Filter type: {0} does not exist'.format(args['type']))
NotImplementedError: Filter type: selector does not exist
Why isn't a basic selector filter still implemented?
When unit tests are run on Python2.7
TestPostAggregators.test_build_thetapostaggregator fails with
E AssertionError: assert [{'field': <p...tchEstimate'}] == [{'field': {'f...tchEstimate'}]
E At index 0 diff: {'field': <pydruid.utils.postaggregator.ThetaSketch instance at 0x1116e9ea8>, 'type': 'thetaSketchEstimate', 'name': 'pag1'} != {'field': {'fieldName': 'theta1', 'type': 'fieldAccess'}, 'type': 'thetaSketchEstimate', 'name': 'pag1'}
It looks like the object is being used in the dictionary rather than the object's post_aggregator. Will drop a PR to fix.
The 'alphaNumeric' parameter in the bound filter is useless when the dimension is numeric and has negative values .I find the solution to solve the problem that set the 'ordering' to 'numeric' instead of 'alphaNumberic',but this can not be supported in the lastest vesion. I hope that the api of pydruid can update with the druid.io .
The current version only seems to support granularity that is defined as an enum in druid. It does not support the more generic JSON object scheme for granularity.
Adding option to query Zookeeper for available brokers, for availability / load-balancing
http://druid.io/docs/latest/querying/filters.html
{
"filter": {
"type": "extraction",
"dimension": "product",
"value": "bar_1",
"extractionFn": {
"type": "lookup",
"lookup": {
"type": "map",
"map": {
"product_1": "bar_1",
"product_5": "bar_1",
"product_3": "bar_1"
}
}
}
}
}
I tried running the pydruid async client. It does not work. The issue is that it uses AsyncHttpClient from tornado which is the interface. Instead we should be using one of the implementation, SimpleHttpAsyncClient or CurlAsyncHTTPClient. The latter is better as described in this link http://www.tornadoweb.org/en/stable/httpclient.html#module-tornado.simple_httpclient. I have made this one line fix and can create a patch if it is fine by everyone.
It seems like the github master branch corresponds to version 0.2 but on pypi you can find version 0.2.1. Is this intended?
It'd be handy to be able to use a 'like' Filter when querying -> http://druid.io/docs/latest/querying/filters.html
Right now there is no mapping for complex column types in SQLAlchemy and this causes a "Key error: other" when autoloading a Table.
With the merge of #72 we now have the ability to perform theta sketch operations. This functionality is needed in Superset, but we need a new PyDruid pip package to be pushed so that we can use this new feature.
Is there a release process / timeline to get a new package pushed? Is there anything I can do to help? My understand is that the setup.py
script needs to be modified to list 0.3.2
and a git tag is added for that version.
Is there any reason why PyDruid
class is not extending object
?
see client.py, having.py
Hi,
As far as I can see, pydruid does not currently support "new" query types, i.e. "select" and "search". Will be nice to have them :-).
Maybe it would also make sense to expose the __post
method, so we are able to send queries directly? Might also be useful if somebody wants to use pydruid as a "middleware", with queries prepared somewhere else.
Eg.
from pydruid.utils.filters import Dimension
filter = (Dimension("test") != 1)
filter.show()
Results in:
TypeError: Object of type 'Filter' is not JSON serializable
pip install pydruid
Collecting pydruid
Using cached https://files.pythonhosted.org/packages/32/91/4be6f902d50f22fc6b9e2eecffbef7d00989ba477e9c8e034074186cd10c/pydruid-0.4.2.tar.gz
Complete output from command python setup.py egg_info:
Download error on https://pypi.org/simple/pytest-runner/: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:645) -- Some packages may not be found!
Couldn't find index page for 'pytest-runner' (maybe misspelled?)
Download error on https://pypi.org/simple/: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:645) -- Some packages may not be found!
No local packages or working download links found for pytest-runner
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/6g/xcjdd64j6clctps0_25h_n3h0000gp/T/pip-install-iatsyysf/pydruid/setup.py", line 44, in <module>
include_package_data=True,
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/__init__.py", line 128, in setup
_install_setup_requires(attrs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/__init__.py", line 123, in _install_setup_requires
dist.fetch_build_eggs(dist.setup_requires)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/dist.py", line 504, in fetch_build_eggs
replace_conflicting=True,
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pkg_resources/__init__.py", line 774, in resolve
replace_conflicting=replace_conflicting
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1057, in best_match
return self.obtain(req, installer)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1069, in obtain
return installer(requirement)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/dist.py", line 571, in fetch_build_egg
return cmd.easy_install(req)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 667, in easy_install
raise DistutilsError(msg)
distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('pytest-runner')
```
----------------------------------------
hitting on MAC OS ..
I'm trying to run the following query
selected_apps = ((dr.Dimension('appId') == 0) | (dr.Dimension('appId') == 1) | (dr.Dimension('appId') == 2))
query = druid.topn(datasource='sessions', granularity='all', intervals='2016-02-05/P7D', filter=selected_apps, aggregations={'sessions': dr.longsum('sessions')}, dimension='appId', metric='sessions')
Looking at the actual constructed query I get:
{
"metric": "sessions",路
"aggregations": [
{
"fieldName": "sessions",路
"type": "longSum",路
"name": "sessions"
}
],路
"dimension": "appId",路
"filter": {
"fields": [
{
"fields": [
{
"type": "selector",路
"dimension": "appId",路
"value": 0
},路
{
"type": "selector",路
"dimension": "appId",路
"value": 1
}
],路
"type": "or"
},路
{
"type": "selector",路
"dimension": "appId",路
"value": 2
}
],路
"type": "or"
},路
"intervals": "2016-02-05/P7D",路
"dataSource": "sessions",路
"granularity": "all",路
"queryType": "topN"
}
Inside the filter object I was expecting the fields array to contain all selectors at the same level and not nested.
what is the syntax for OR conditions in the filter argument?
eg. filter = (Dimension('A')=='val1') & ((Dimension('B')=='val2')# | (Dimension('C')=='val3'))
This doesn't seem to work, leads to error:
File "./pydruid_query.py", line 150, in <module>
gfl2 = group_last2.export_tsv('group_last2_result.tsv')
File "/usr/lib/python2.6/site-packages/pydruid/query.py", line 99, in export_tsv
header = list(self.result[0]['event'].keys())
IndexError: list index out of range
Looks like passing a TopNMetricSpec for a TopN query is currently not supported since metric
passed into topn
is a string.
From topn docstring:
:param str metric: Metric over which to sort the specified dimension by
I have some string data which contains numerical values. I would like to convert this data from string to long, so that I would be able to perform aggregations such as Min, Sum etc. As the link below suggests, druid allows you to perform this conversion by specifying a dimensionspec. Is this functionality supported in pydruid?
When combining filters using & and | operators, the operands are modified. This might lead to unexpected results.
This only happens, if the type of operation is the same as the type of the first filter.
It looks like releasing new version, you guys have removed 0.2.1 from pypi.
I'd ask you to re-upload that as we've have a strict dependency upgrade policy, and our requirements are locked against specific version of packages, including pydruid.
Currently the AsyncPyDruid use default 20s timeout value in tornado HTTPRequest, which produce timeout errors frequently for me when doing some heavy queries. It'll be great if we can customize this value.
First time druid user here - I initiated my druid server and I found that when I tried opening the CLI or anything with pydruid I kept getting:
File "/opt/anaconda3/bin/pydruid", line 11, in <module>
sys.exit(main())
File "/opt/anaconda3/lib/python3.6/site-packages/pydruid/console.py", line 170, in main
words = get_autocomplete(connection)
File "/opt/anaconda3/lib/python3.6/site-packages/pydruid/console.py", line 155, in get_autocomplete
get_tables(connection)
File "/opt/anaconda3/lib/python3.6/site-packages/pydruid/console.py", line 143, in get_tables
cursor.execute('SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES')
File "/opt/anaconda3/lib/python3.6/site-packages/pydruid/db/api.py", line 41, in g
return f(self, *args, **kwargs)
File "/opt/anaconda3/lib/python3.6/site-packages/pydruid/db/api.py", line 189, in execute
first_row = next(results)
#
File "/opt/anaconda3/lib/python3.6/site-packages/pydruid/db/api.py", line 269, in _stream_query
payload = r.json()
File "/opt/anaconda3/lib/python3.6/site-packages/requests/models.py", line 892, in json
#
return complexjson.loads(self.text, **kwargs)
File "/opt/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/opt/anaconda3/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/anaconda3/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Had to go through the source to realize that there is a .json()
call to show the error message and that was failing. Would be a good idea to catch and mention this in a better way.
I realized my issue was that localhost:8082
wasn't running.
Is there any reason for which has been choosen py.test
instead of the built-in unittest
for testing? I would like to add some tests and I honestly don't know py.test
, but unittest
seems compatible. Can I write them without using py.test
?
SQLAlchemy throws an AttributeError
when fetching from a ResultProxy when the underlying Druid query results in an empty result set, e.g. no rows, as the proxy is prematurely closed, i.e.,
>>> from sqlalchemy.engine import create_engine
>>> engine = create_engine('druid://localhost:8082/druid/v2/sql')
>>> result = engine.execute("SELECT SUM(x) FROM foo WHERE bar IS NULL")
>>> result.fetchall()
...
AttributeError: 'NoneType' object has no attribute 'fetchall'
The reason the ResultProxy is closed is because cursor.description
is None
. This results in result.cursor
being None
which is why the exception is thrown.
It seems that other dialects have a description irrespective of whether there are rows and hence don't suffer from the same issue. Note I'm uncertain whether there's even a viable solution as the Druid SQL REST API doesn't provide this information for an empty result set,
> cat query.json
{"query": "SELECT SUM(x) FROM foo WHERE bar IS NULL"}
> curl -XPOST -H 'Content-Type: application/json' http://localhost:8082/druid/v2/sql/ -d @query.json
[]
Note I know one can circumvent this issue by using a raw connection, however this example illustrates the behavior that Pandas uses for reading SQL.
Is there any 'Interval Filter' available ?
I couldn't find it in "type" : "interval" in filter code.
Filtering on a set of ISO 8601 intervals:
{
"type" : "interval",
"dimension" : "__time",
"intervals" : [
"2014-10-01T00:00:00.000Z/2014-10-07T00:00:00.000Z",
"2014-11-15T00:00:00.000Z/2014-11-16T00:00:00.000Z"
]
}
How can i apply multiple filters to a group by query following does not work
filters = (pydruid.utils.filters.Dimension("unit")=='000721') & (pydruid.utils.filters.Dimension("val") > 0)
query1 = query.groupby(
datasource=dataSource,
granularity='minute',
#intervals='2016-08-01/p12w',
#intervals='2016-08-01/pt24h',
intervals='2016-08-11/2016-08-15',
dimensions=["GPN"],
filter=filters,
aggregations={"val_sum": ag.doublesum("val"),"val_count": ag.count("val")},
post_aggregations={"Avg": (pag.Field("val_sum") / pag.Field("val_count"))},
context={"timeout": 600000}#,
# limit_spec={
#"type": "default",
# "limit": 50,
# "columns" : ["sensor_val_sum","sensor_val_count"]
# }
)
Gives following error
Traceback (most recent call last):
File "Test_Query_2Filters.py", line 41, in
context={"timeout": 600000}#,
File "/usr/local/lib/python2.7/dist-packages/pydruid/client.py", line 191, in groupby
query = self.query_builder.groupby(kwargs)
File "/usr/local/lib/python2.7/dist-packages/pydruid/query.py", line 316, in groupby
return self.build_query(query_type, args)
File "/usr/local/lib/python2.7/dist-packages/pydruid/query.py", line 250, in build_query
query_dict[key] = Filter.build_filter(val)
File "/usr/local/lib/python2.7/dist-packages/pydruid/utils/filters.py", line 90, in build_filter
filter['fields'] = [Filter.build_filter(f) for f in filter['fields']]
File "/usr/local/lib/python2.7/dist-packages/pydruid/utils/filters.py", line 87, in build_filter
filter = filter_obj.filter['filter']
AttributeError: 'bool' object has no attribute 'filter'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
馃枛 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 馃搳馃搱馃帀
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google 鉂わ笍 Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.