druid-io / pydruid Goto Github PK

View Code? Open in Web Editor NEW

508.0 508.0 199.0 578 KB

A Python connector for Druid

License: Other

Python 99.80% Shell 0.20%

pydruid's People

Contributors

Stargazers

Watchers

Forkers

rjurney fernandotakai aburan28 whitehats seanv507 dietel mruwnik raphaellig griffy mindis drorata mistercrunch vibze se7entyse7en gianm turu nmckoy boneill42 dakra atomx awesome-python sologoub wtelecom psalaberria002 typpo the-dcruz xuemeibranch silverfernsys shopkick themp taskforcex condenast nagyistge ochando stvhanna piykumar tapanpandita knoguchi lvsongping datafruit azymnis djvaldez roganw nlokare var23rav sajomathews a3digit singular-labs dylwylie viblo yxkemiya lzongren k29913x wangchuan2008888 lionaneesh scot0357 sascha-coenen sebastianzontek jeremyborg lihaolixuewei112612 vuchau betodealmeida j3pic boorad danfrankj evilmcjerkface congwen dubrzr jeffreythewang nishantmonu51 kartiktaskhuman xqliu pantlavanya manikantan22 ericych john-bodley adelcast donbowman aioscloud switchdin makesh-gmak dlin-me swanandrao sakthi5006 wjdecorte sergioruizsan legoscia haltwise jasonwanga jezdez yangfhit lyft adolfojunior benhopp chattg1 parthi10 threat-community soumyo vikramarsid longbai

pydruid's Issues

Use union of data sources

Thanks for the nice package; it seems to be useful! 👍

It is possible to combine several data sources together, like so:

{
       "type": "union",
       "dataSources": ["<string_value1>", "<string_value2>", "<string_value3>", ... ]
}

Is there a way to consider this case when using pydruid?

misleading error message

Dear Deep,

I was trying out a groupby query on metrics. pyDruid told me I had a malformed query, but when I run the generated query through curl it works.

import pydruid.client
import datetime

bard_url = 'http://x.x.x.x:8080/'
endpoint = 'druid/v2/?pretty'
query = pydruid.client.pyDruid(bard_url,endpoint)

dataSource = 'mmx_metrics'
filters = (pydruid.client.Dimension("metric") == "query/time") & (pydruid.client.Dimension("service") == "druid/prod/bard")
intervals = [datetime.datetime.utcnow().isoformat() + '/PT5M']

foo = query.groupBy(dataSource=dataSource, intervals=intervals, granularity="minute", dimensions=['host','service'], aggregations = {"count": pydruid.client.doubleSum("count")}, filter=filters)

Gives me:

---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-11-9d231bdb8e44> in <module>()
----> 1 foo = query.groupBy(dataSource=dataSource, intervals=intervals, granularity="minute", dimensions=['host','service'], aggregations = {"count": pydruid.client.doubleSum("count")}, filter=filters)

/usr/lib/python2.7/site-packages/pyDruid-0.1.7-py2.7.egg/pydruid/client.pyc in groupBy(self, **args)
    157                 self.query_dict = query_dict
    158                 self.query_type = 'groupby'
--> 159                 return self.post(query_dict)
    160 
    161         def segmentMetadata(self, **args):

/usr/lib/python2.7/site-packages/pyDruid-0.1.7-py2.7.egg/pydruid/client.pyc in post(self, query)
     47                         res.close()
     48                 except urllib2.HTTPError, e:
---> 49                         raise IOError('Malformed query: \n {0}'.format(json.dumps(self.query_dict, indent = 4)))
     50                 else:
     51                         self.result = self.parse()

IOError: Malformed query: 
 {
    "dimensions": [
        "host",
        "service"
    ],
    "aggregations": [
        {
            "type": "doubleSum",
            "fieldName": "count",
            "name": "count"
        }
    ],
    "filter": {
        "fields": [
            {
                "type": "selector",
                "dimension": "metric",
                "value": "query/time"
            },
            {
                "type": "selector",
                "dimension": "service",
                "value": "druid/prod/bard"
            }
        ],
        "type": "and"
    },
    "intervals": [
        "2013-12-06T00:38:38.760172/PT5M"
    ],
    "dataSource": "mmx_metrics",
    "granularity": "minute",
    "queryType": "groupBy"
}

I put the generated query into /tmp/query.druid and ran the following:

curl -X POST "http://x.x.x.x:8080/druid/v2/?pretty" -H 'content-type: application/json' -d @/tmp/query.druid

It returned the results I expected.

I saw this with both the pip installed version and the git version.

-Jeff

Replace requests library with urllib

We need to package superset, pydruid alongwith the dependencies to support offline installations for our clients.
Now the issue is that requests brings in a LGPL dependent library named 'chardet' which we cannot ship in the bundle. (https://github.com/requests/requests/blob/master/setup.py#L46)

On a quick glance at the usage of requests inside pydruid, it seems this can easily be replaced with urllib.

Release to pypi

Have Filter implement eq

I'd like to write assertions to check that the filter's my code generates are valid. Can Filters implement eq to allow comparisons to work?

Eg.

>>> f1 = Filter(value=1, dimension="test")
>>> f2 = Filter(value=1, dimension="test")
>>> f1 == f2
False

`Dimension` object has no attribute `in_`

I'm doing virtually the same thing as the last example on the github page, where I try to create a filter with Dimension(something).in_([...]), but the error in the title showed up. Is this a problem with python version? I'm using 3.5.3. I've also checked that my pydruid version is 0.3.1.

PyDruid Installation Error??

this comes up after running pip install pydruid:

Running setup.py egg_info for package pydruid
Traceback (most recent call last):
File "", line 14, in ?
File "/home/ctsai/build/pydruid/setup.py", line 30, in ?
tests_require=['pytest', 'six', 'mock'],
File "/usr/lib64/python2.4/distutils/core.py", line 110, in setup
_setup_distribution = dist = klass(attrs)
File "/usr/lib/python2.4/site-packages/setuptools/dist.py", line 219, in init
self.fetch_build_eggs(attrs.pop('setup_requires'))
File "/usr/lib/python2.4/site-packages/setuptools/dist.py", line 242, in fetch_build_eggs
for dist in working_set.resolve(
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 481, in resolve
dist = best[req.key] = env.best_match(req, self, installer)
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 717, in best_match
return self.obtain(req, installer) # try and download/install
File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 729, in obtain
return installer(requirement)
File "/usr/lib/python2.4/site-packages/setuptools/dist.py", line 286, in fetch_build_egg
return cmd.easy_install(req)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 446, in easy_install
return self.install_item(spec, dist.location, tmpdir, deps)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 471, in install_item
dists = self.install_eggs(spec, download, tmpdir)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 655, in install_eggs
return self.build_and_install(setup_script, setup_base)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 930, in build_and_install
self.run_setup(setup_script, setup_base, args)
File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 919, in run_setup
run_setup(setup_script, args)
File "/usr/lib/python2.4/site-packages/setuptools/sandbox.py", line 26, in run_setup
DirectorySandbox(setup_dir).run(
File "/usr/lib/python2.4/site-packages/setuptools/sandbox.py", line 63, in run
return func()
File "/usr/lib/python2.4/site-packages/setuptools/sandbox.py", line 29, in
{'file':setup_script, 'name':'main'}
File "setup.py", line 9
with io.open('README.rst', encoding='utf-8') as readme:
^
SyntaxError: invalid syntax
Complete output from command python setup.py egg_info:
Traceback (most recent call last):

File "", line 14, in ?

File "/home/ctsai/build/pydruid/setup.py", line 30, in ?

tests_require=['pytest', 'six', 'mock'],

File "/usr/lib64/python2.4/distutils/core.py", line 110, in setup

_setup_distribution = dist = klass(attrs)

File "/usr/lib/python2.4/site-packages/setuptools/dist.py", line 219, in init

self.fetch_build_eggs(attrs.pop('setup_requires'))

File "/usr/lib/python2.4/site-packages/setuptools/dist.py", line 242, in fetch_build_eggs

for dist in working_set.resolve(

File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 481, in resolve

dist = best[req.key] = env.best_match(req, self, installer)

File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 717, in best_match

return self.obtain(req, installer) # try and download/install

File "/usr/lib/python2.4/site-packages/pkg_resources.py", line 729, in obtain

return installer(requirement)

File "/usr/lib/python2.4/site-packages/setuptools/dist.py", line 286, in fetch_build_egg

return cmd.easy_install(req)

File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 446, in easy_install

return self.install_item(spec, dist.location, tmpdir, deps)

File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 471, in install_item

dists = self.install_eggs(spec, download, tmpdir)

File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 655, in install_eggs

return self.build_and_install(setup_script, setup_base)

File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 930, in build_and_install

self.run_setup(setup_script, setup_base, args)

File "/usr/lib/python2.4/site-packages/setuptools/command/easy_install.py", line 919, in run_setup

run_setup(setup_script, args)

File "/usr/lib/python2.4/site-packages/setuptools/sandbox.py", line 26, in run_setup

DirectorySandbox(setup_dir).run(

File "/usr/lib/python2.4/site-packages/setuptools/sandbox.py", line 63, in run

return func()

File "/usr/lib/python2.4/site-packages/setuptools/sandbox.py", line 29, in

{'__file__':setup_script, '__name__':'__main__'}

File "setup.py", line 9

with io.open('README.rst', encoding='utf-8') as readme:

      ^

SyntaxError: invalid syntax

Command python setup.py egg_info failed with error code 1

Issues with .json

Please i need help, i am new to python i am following a training video from Udemy build 10 world real applications,
i am about building the dictionary but i am stuck with the errors below.
i don't know what else to do. i will appreciate any help concering how to fix this error.

thanks

parse_constant=parse_constant, object_pairs_hook=obje
ct_pairs_hook, **kw)
  File "C:\Users\learn\AppData\Local\Programs\Python\Pyth
on37-32\lib\json\__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "C:\Users\learn\AppData\Local\Programs\Python\Pyth
on37-32\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\learn\AppData\Local\Programs\Python\Pyth
on37-32\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value
) from None
json.decoder.JSONDecodeError: Expecting value: line 1 col
umn 1 (char 0)

Version bump

Is it possibile to bump a new version (maybe 0.3.1) including the in filter support?

Custom duration granularities?

Based on what I read in the docs, Druid supports custom granularities like this:

{"type": "duration", "duration": 7200000}

Is there support for that in PyDruid?

Add kerberos auth

Some Druid are running with Kerberos enabled, that would be nice to have pydruid to work with these kerberized instances.
I just saw that you are using the requests library to request the druid http api.
And I also saw that there is a requests-kerberos library to add kerberos auth.
Would it be possible to integrate it in pydruid?

The only requirement would be to add an argument in the requests calls, like this example:

import requests
from requests_kerberos import HTTPKerberosAuth, REQUIRED
kerberos_auth = HTTPKerberosAuth(mutual_authentication=REQUIRED, sanitize_mutual_error_response=False)
r = requests.get("https://windows.example.org/wsman", auth=kerberos_auth)

insert data into druid using python code

how we can insert data into druid using python code?

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I am trying to read druid data base using python 3.5 and DB API of pydruid; however whenever I run the execute statement I get the error:

Code I am running:
from pydruid.db import connect
conn = connect(host='XXXXXXX', port=8082, path='/druid/v2', scheme='http')
curs = conn.cursor()
curs.execute("""SELECT * FROM wikipedia LIMIT 10""")

Error:
_---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
in ()
----> 1 curs.execute("""SELECT * FROM wikipedia LIMIT 10""")

/apps/cmor/anaconda3/lib/python3.5/site-packages/pydruid-0.3.1-py3.5.egg/pydruid/db/api.py in g(self, *args, **kwargs)
39 raise exceptions.Error(
40 '{klass} already closed'.format(klass=self.class.name))
---> 41 return f(self, *args, **kwargs)
42 return g
43

/apps/cmor/anaconda3/lib/python3.5/site-packages/pydruid-0.3.1-py3.5.egg/pydruid/db/api.py in execute(self, operation, parameters)
187 # let's consume it and insert it back.
188 results = self._stream_query(query)
--> 189 first_row = next(results)
190 self._results = itertools.chain([first_row], results)
191

/apps/cmor/anaconda3/lib/python3.5/site-packages/pydruid-0.3.1-py3.5.egg/pydruid/db/api.py in _stream_query(self, query)
267 # raise any error messages
268 if r.status_code != 200:
--> 269 payload = r.json()
270 msg = (
271 '{error} ({errorClass}): {errorMessage}'.format(**payload)

/apps/cmor/anaconda3/lib/python3.5/site-packages/requests/models.py in json(self, **kwargs)
892 # used.
893 pass
--> 894 return complexjson.loads(self.text, **kwargs)
895
896 @Property

/apps/cmor/anaconda3/lib/python3.5/json/init.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
317 parse_int is None and parse_float is None and
318 parse_constant is None and object_pairs_hook is None and not kw):
--> 319 return _default_decoder.decode(s)
320 if cls is None:
321 cls = JSONDecoder

/apps/cmor/anaconda3/lib/python3.5/json/decoder.py in decode(self, s, _w)
337
338 """
--> 339 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
340 end = _w(s, end).end()
341 if end != len(s):

/apps/cmor/anaconda3/lib/python3.5/json/decoder.py in raw_decode(self, s, idx)
355 obj, end = self.scan_once(s, idx)
356 except StopIteration as err:
--> 357 raise JSONDecodeError("Expecting value", s, err.value) from None
358 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)_

Readme out of date due to moving functions

Some of the functions/constructors like longsum and Field are not included when you import from pydruid.client import *, you need to go into utils, so I think the readme needs to be updated. I can do it myself if needed, but didn't want to give incorrect information since I'm unfamiliar with pydruid.

name 'Dimension' is not defined

Hi,

I am trying pydruid for the first time on python 2.7, with wikipedia data source.
However I am getting the following error when trying to execute the following query in python.

`top_langs = query.topn(
datasource = "wikipedia",
granularity = "all",
intervals = "2013-06-01T00:00/2020-01-01T00",
dimension = "channel",
filter = Dimension("namespace") == "article",
aggregations = {"edit_count": longsum("count")},
metric = "edit_count",
threshold = 4
)

print top_langs # Do this if you want to see the raw JSON`

`NameError Traceback (most recent call last)
in ()
4 intervals = "2013-06-01T00:00/2020-01-01T00",
5 dimension = "channel",
----> 6 filter = Dimension("namespace") == "article",
7 aggregations = {"edit_count": longsum("count")},
8 metric = "edit_count",

NameError: name 'Dimension' is not defined`

Support for 'filtered' aggregator

Hi, am I wrong or there's no support for "filtered" aggregation?

Exception when `filter=None`

When building query if filter=None then an exception occurs:

File "/Users/se7entyse7en/Envs/viralize-web/lib/python2.7/site-packages/pydruid/utils/filters.py", line 61, in build_filter
    return filter_obj.filter['filter']
 AttributeError: 'NoneType' object has no attribute 'filter'

Can't load plugin sqlalchemy.dialects:druid

Hello all!
I'm trying to create a datasource in superset, using pydruid. I can use pydruid cli successfully, but when I try to create a superset datasource, pointing to druid I'm getting this error: "Can't load plugin sqlalchemy.dialects:druid".

Any idea how to solve this?

Best regards.

Superset version
0.15.0 integrated with hadoop

Expected results
Integrate Druid with SQL lab

Actual results
Unable to integrate Druid with SQL lab

Steps to reproduce
Install pydruid and try to create a database using this plugin

Support for Distinct Values

How to perform following aggregation operation in pydruid AVG(DISTINCT(col))

SubQueries Support

Hello All,

How i can generate subqueries using pydruid because datasource field only take either str or list?
""" ValueError: Datasource definition not valid. Must be string or list of strings """

Below is the sample query. On which I am passing query output of 1st query to another query as datasource.

{
  "queryType": "groupBy",
  "dataSource":{
    "type": "query",
    "query": {
      "queryType": "groupBy",
      "dataSource": "druid_source",
      "granularity": {"type": "period", "period": "P1M"},
      "dimensions": ["source_dim"],
      "aggregations": [
        { "type": "doubleMax", "name": "value", "fieldName": "stream_value" }
      ],
      "intervals": [ "2012-01-01T00:00:00.000/2020-01-03T00:00:00.000" ]
    }
  },
  "granularity": "hour",
  "dimensions": ["source_dim"],
  "aggregations": [
    { "type": "longSum", "name": "outerquerryvalue", "fieldName": "value" }
  ],
  "intervals": [ "2012-01-01T00:00:00.000/2020-01-03T00:00:00.000" ]
}

Filter class selector type is not implemented

I installed pydruid with pip.
Doing, Filter(type="selector", dimension="dim", value=val) returns,

'Filter type: {0} does not exist'.format(args['type']))
NotImplementedError: Filter type: selector does not exist

Why isn't a basic selector filter still implemented?

ThetasketchEstimate not working on Python2.7

When unit tests are run on Python2.7

TestPostAggregators.test_build_thetapostaggregator fails with

E       AssertionError: assert [{'field': <p...tchEstimate'}] == [{'field': {'f...tchEstimate'}]
E         At index 0 diff: {'field': <pydruid.utils.postaggregator.ThetaSketch instance at 0x1116e9ea8>, 'type': 'thetaSketchEstimate', 'name': 'pag1'} != {'field': {'fieldName': 'theta1', 'type': 'fieldAccess'}, 'type': 'thetaSketchEstimate', 'name': 'pag1'}

It looks like the object is being used in the dictionary rather than the object's post_aggregator. Will drop a PR to fix.

Bug with bound filter for negative value

The 'alphaNumeric' parameter in the bound filter is useless when the dimension is numeric and has negative values .I find the solution to solve the problem that set the 'ordering' to 'numeric' instead of 'alphaNumberic',but this can not be supported in the lastest vesion. I hope that the api of pydruid can update with the druid.io .

Support for granularity spec

The current version only seems to support granularity that is defined as an enum in druid. It does not support the more generic JSON object scheme for granularity.

Broker Load-balancing

Adding option to query Zookeeper for available brokers, for availability / load-balancing

Support for Extraction Filter

http://druid.io/docs/latest/querying/filters.html

{
"filter": {
"type": "extraction",
"dimension": "product",
"value": "bar_1",
"extractionFn": {
"type": "lookup",
"lookup": {
"type": "map",
"map": {
"product_1": "bar_1",
"product_5": "bar_1",
"product_3": "bar_1"
}
}
}
}
}

Asynchronous client is not working

I tried running the pydruid async client. It does not work. The issue is that it uses AsyncHttpClient from tornado which is the interface. Instead we should be using one of the implementation, SimpleHttpAsyncClient or CurlAsyncHTTPClient. The latter is better as described in this link http://www.tornadoweb.org/en/stable/httpclient.html#module-tornado.simple_httpclient. I have made this one line fix and can create a patch if it is fine by everyone.

Versions issues

It seems like the github master branch corresponds to version 0.2 but on pypi you can find version 0.2.1. Is this intended?

Like filter

It'd be handy to be able to use a 'like' Filter when querying -> http://druid.io/docs/latest/querying/filters.html

Support for COMPLEX SQL types like hyperUnique in SQLAlchemy

Right now there is no mapping for complex column types in SQLAlchemy and this causes a "Key error: other" when autoloading a Table.

PyDruid 0.3.2?

With the merge of #72 we now have the ability to perform theta sketch operations. This functionality is needed in Superset, but we need a new PyDruid pip package to be pushed so that we can use this new feature.

Is there a release process / timeline to get a new package pushed? Is there anything I can do to help? My understand is that the setup.py script needs to be modified to list 0.3.2 and a git tag is added for that version.

`PyDruid` not extending `object`

Is there any reason why PyDruid class is not extending object?

having clause needs documentation in readme

see client.py, having.py

select/search/generic queries

Hi,

As far as I can see, pydruid does not currently support "new" query types, i.e. "select" and "search". Will be nice to have them :-).

Maybe it would also make sense to expose the __post method, so we are able to send queries directly? Might also be useful if somebody wants to use pydruid as a "middleware", with queries prepared somewhere else.

Filter.show() doesn't work for nested filters

Eg.

from pydruid.utils.filters import Dimension

filter = (Dimension("test") != 1)
filter.show()

Results in:

TypeError: Object of type 'Filter' is not JSON serializable

pydruid installation failinf

 pip install pydruid 
Collecting pydruid
  Using cached https://files.pythonhosted.org/packages/32/91/4be6f902d50f22fc6b9e2eecffbef7d00989ba477e9c8e034074186cd10c/pydruid-0.4.2.tar.gz
    Complete output from command python setup.py egg_info:
    Download error on https://pypi.org/simple/pytest-runner/: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:645) -- Some packages may not be found!
    Couldn't find index page for 'pytest-runner' (maybe misspelled?)
    Download error on https://pypi.org/simple/: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:645) -- Some packages may not be found!
    No local packages or working download links found for pytest-runner
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/6g/xcjdd64j6clctps0_25h_n3h0000gp/T/pip-install-iatsyysf/pydruid/setup.py", line 44, in <module>
        include_package_data=True,
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/__init__.py", line 128, in setup
        _install_setup_requires(attrs)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/__init__.py", line 123, in _install_setup_requires
        dist.fetch_build_eggs(dist.setup_requires)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/dist.py", line 504, in fetch_build_eggs
        replace_conflicting=True,
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pkg_resources/__init__.py", line 774, in resolve
        replace_conflicting=replace_conflicting
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1057, in best_match
        return self.obtain(req, installer)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1069, in obtain
        return installer(requirement)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/dist.py", line 571, in fetch_build_egg
        return cmd.easy_install(req)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 667, in easy_install
        raise DistutilsError(msg)
    distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('pytest-runner')
    ```
    ----------------------------------------
hitting on MAC OS ..

Query construction results in nested filters instead of an array

I'm trying to run the following query

selected_apps = ((dr.Dimension('appId') == 0) | (dr.Dimension('appId') == 1) | (dr.Dimension('appId') == 2))

query = druid.topn(datasource='sessions', granularity='all', intervals='2016-02-05/P7D', filter=selected_apps, aggregations={'sessions': dr.longsum('sessions')}, dimension='appId', metric='sessions')

Looking at the actual constructed query I get:

{
    "metric": "sessions",·
    "aggregations": [
        {
            "fieldName": "sessions",·
            "type": "longSum",·
            "name": "sessions"
        }
    ],·
    "dimension": "appId",·
    "filter": {
        "fields": [
            {
                "fields": [
                    {
                        "type": "selector",·
                        "dimension": "appId",·
                        "value": 0
                    },·
                    {
                        "type": "selector",·
                        "dimension": "appId",·
                        "value": 1
                    }
                ],·
                "type": "or"
            },·
            {
                "type": "selector",·
                "dimension": "appId",·
                "value": 2
            }
        ],·
        "type": "or"
    },·
    "intervals": "2016-02-05/P7D",·
    "dataSource": "sessions",·
    "granularity": "all",·
    "queryType": "topN"
}

Inside the filter object I was expecting the fields array to contain all selectors at the same level and not nested.

Query Filter Or Syntax?

what is the syntax for OR conditions in the filter argument?

eg. filter = (Dimension('A')=='val1') & ((Dimension('B')=='val2')# | (Dimension('C')=='val3'))

This doesn't seem to work, leads to error:

File "./pydruid_query.py", line 150, in <module>
gfl2 = group_last2.export_tsv('group_last2_result.tsv')
File "/usr/lib/python2.6/site-packages/pydruid/query.py", line 99, in export_tsv
header = list(self.result[0]['event'].keys())
IndexError: list index out of range

Support for query cancellation

Support for TopNMetricSpec

Looks like passing a TopNMetricSpec for a TopN query is currently not supported since metric passed into topn is a string.

From topn docstring:

:param str metric: Metric over which to sort the specified dimension by

Support for dimensionspecs

I have some string data which contains numerical values. I would like to convert this data from string to long, so that I would be able to perform aggregations such as Min, Sum etc. As the link below suggests, druid allows you to perform this conversion by specifying a dimensionspec. Is this functionality supported in pydruid?

http://druid.io/docs/latest/querying/dimensionspecs.html

Filter operands are modified during "and" and "or" operations

When combining filters using & and | operators, the operands are modified. This might lead to unexpected results.

This only happens, if the type of operation is the same as the type of the first filter.

Re-upload version 0.2.1 to pypi

It looks like releasing new version, you guys have removed 0.2.1 from pypi.

I'd ask you to re-upload that as we've have a strict dependency upgrade policy, and our requirements are locked against specific version of packages, including pydruid.

Support for timeout customization in AsyncPyDruid

Currently the AsyncPyDruid use default 20s timeout value in tornado HTTPRequest, which produce timeout errors frequently for me when doing some heavy queries. It'll be great if we can customize this value.

Is it possible to support merge property in segment metadata query?

Is it possible to support merge in `segment_metadata?

pydruid/pydruid/query.py

Line 336 in f9645c9

valid_parts = ['datasource', 'intervals', 'analysisTypes']

If yes, I can make a PR.

bad error for druid-sql not enabled

First time druid user here - I initiated my druid server and I found that when I tried opening the CLI or anything with pydruid I kept getting:

  File "/opt/anaconda3/bin/pydruid", line 11, in <module>
    sys.exit(main())
  File "/opt/anaconda3/lib/python3.6/site-packages/pydruid/console.py", line 170, in main
    words = get_autocomplete(connection)
  File "/opt/anaconda3/lib/python3.6/site-packages/pydruid/console.py", line 155, in get_autocomplete
    get_tables(connection)
  File "/opt/anaconda3/lib/python3.6/site-packages/pydruid/console.py", line 143, in get_tables
    cursor.execute('SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES')
  File "/opt/anaconda3/lib/python3.6/site-packages/pydruid/db/api.py", line 41, in g
    return f(self, *args, **kwargs)
  File "/opt/anaconda3/lib/python3.6/site-packages/pydruid/db/api.py", line 189, in execute
    first_row = next(results)
#
  File "/opt/anaconda3/lib/python3.6/site-packages/pydruid/db/api.py", line 269, in _stream_query
    payload = r.json()
  File "/opt/anaconda3/lib/python3.6/site-packages/requests/models.py", line 892, in json
#
    return complexjson.loads(self.text, **kwargs)
  File "/opt/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/opt/anaconda3/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/anaconda3/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Had to go through the source to realize that there is a .json() call to show the error message and that was failing. Would be a good idea to catch and mention this in a better way.
I realized my issue was that localhost:8082 wasn't running.

Tests unittest vs py.test

Is there any reason for which has been choosen py.test instead of the built-in unittest for testing? I would like to add some tests and I honestly don't know py.test, but unittest seems compatible. Can I write them without using py.test?

Lack of description for empty result set causes SQLAlchemy exception

SQLAlchemy throws an AttributeError when fetching from a ResultProxy when the underlying Druid query results in an empty result set, e.g. no rows, as the proxy is prematurely closed, i.e.,

>>> from sqlalchemy.engine import create_engine

>>> engine = create_engine('druid://localhost:8082/druid/v2/sql')
>>> result = engine.execute("SELECT SUM(x) FROM foo WHERE bar IS NULL")
>>> result.fetchall()
...
AttributeError: 'NoneType' object has no attribute 'fetchall'

The reason the ResultProxy is closed is because cursor.description is None. This results in result.cursor being None which is why the exception is thrown.

It seems that other dialects have a description irrespective of whether there are rows and hence don't suffer from the same issue. Note I'm uncertain whether there's even a viable solution as the Druid SQL REST API doesn't provide this information for an empty result set,

> cat query.json
{"query": "SELECT SUM(x) FROM foo WHERE bar IS NULL"}

> curl -XPOST -H 'Content-Type: application/json' http://localhost:8082/druid/v2/sql/ -d @query.json
[]

Note I know one can circumvent this issue by using a raw connection, however this example illustrates the behavior that Pandas uses for reading SQL.

to: @betodealmeida @mistercrunch

Support for 'interval' filter.

Is there any 'Interval Filter' available ?

I couldn't find it in "type" : "interval" in filter code.

Filtering on a set of ISO 8601 intervals:

{
    "type" : "interval",
    "dimension" : "__time",
    "intervals" : [
      "2014-10-01T00:00:00.000Z/2014-10-07T00:00:00.000Z",
      "2014-11-15T00:00:00.000Z/2014-11-16T00:00:00.000Z"
    ]
}

How to apply multiple filters to a group by query

How can i apply multiple filters to a group by query following does not work
filters = (pydruid.utils.filters.Dimension("unit")=='000721') & (pydruid.utils.filters.Dimension("val") > 0)
query1 = query.groupby(
datasource=dataSource,
granularity='minute',
#intervals='2016-08-01/p12w',
#intervals='2016-08-01/pt24h',
intervals='2016-08-11/2016-08-15',
dimensions=["GPN"],
filter=filters,
aggregations={"val_sum": ag.doublesum("val"),"val_count": ag.count("val")},
post_aggregations={"Avg": (pag.Field("val_sum") / pag.Field("val_count"))},
context={"timeout": 600000}#,
# limit_spec={
#"type": "default",
# "limit": 50,
# "columns" : ["sensor_val_sum","sensor_val_count"]
# }
)

Gives following error
Traceback (most recent call last):
File "Test_Query_2Filters.py", line 41, in
context={"timeout": 600000}#,
File "/usr/local/lib/python2.7/dist-packages/pydruid/client.py", line 191, in groupby
query = self.query_builder.groupby(kwargs)
File "/usr/local/lib/python2.7/dist-packages/pydruid/query.py", line 316, in groupby
return self.build_query(query_type, args)
File "/usr/local/lib/python2.7/dist-packages/pydruid/query.py", line 250, in build_query
query_dict[key] = Filter.build_filter(val)
File "/usr/local/lib/python2.7/dist-packages/pydruid/utils/filters.py", line 90, in build_filter
filter['fields'] = [Filter.build_filter(f) for f in filter['fields']]
File "/usr/local/lib/python2.7/dist-packages/pydruid/utils/filters.py", line 87, in build_filter
filter = filter_obj.filter['filter']
AttributeError: 'bool' object has no attribute 'filter'