Coder Social home page Coder Social logo

azure-kusto-python's Introduction

Microsoft Azure Kusto (Azure Data Explorer) SDK for Python

azure-kusto-data Package provides the capability to query Kusto clusters with Python.
PyPI version Downloads
azure-kusto-ingest Package allows sending data to Kusto service - i.e. ingest data.
PyPI version Downloads

Install

Option 1: Via PyPi

To install via the Python Package Index (PyPI), type:

  • pip install azure-kusto-data
  • pip install azure-kusto-ingest

Option 2: Source Via Git

To get the source code of the SDK via git just type:

git clone https://github.com/Azure/azure-kusto-python
cd ./azure-kusto-python/azure-kusto-data
python3 setup.py install
cd ../azure-kusto-ingest
python3 setup.py install

Option 3: Source Zip

Download a zip of the code via GitHub or PyPi. Then follow the same instructions in option 2.

Optionals:

  • Pandas - Package provides extra functionality for use with pandas. Since these are optional dependencies, install with pandas:
    • pip install azure-kusto-data[pandas]
    • pip install azure-kusto-ingest[pandas]

Minimum Requirements

  • Python 3.5 and above
  • See setup.py for dependencies

Authentication methods:

  • AAD Username/password - Provide your AAD username and password to Kusto client (check the notice below).
  • AAD application - Provide app ID and app secret to Kusto client.
  • AAD code - Provide only your AAD username, and authenticate yourself using a code, generated by ADAL.
  • AZ CLI - For those already using azure-cli, provide access token for the logged in user`.

<!> IMPORTANT NOTICE <!>: User authentication (using username and password) has a major caveat: Sometimes users are required to use Multi-Factor Authentication. In such a case, this flow won't work for them. It is a limitation of the AAD library we are using under the hood. There are several bugs reported.

There is also a feature request for the adal team to work on implementing IWA (Intergrated Windows Auth) so that signed in users won't have to authenticate. Feel free to upvote if it is relevant in your case.

Samples:

Best Practices

See the SDK best practices guide, which though written for the .NET SDK, applies similarly here.

Need Support?

  • Have a feature request for SDKs? Please post it on User Voice to help us prioritize
  • Have a technical question? Ask on Stack Overflow with tag "azure-data-explorer"
  • Need Support? Every customer with an active Azure subscription has access to support with guaranteed response time. Consider submitting a ticket and get assistance from Microsoft support team
  • Found a bug? Please help us fix it by thoroughly documenting it and filing an issue.

Looking for SDKs for other languages/platforms?

Contribute

We gladly accept community contributions.

  • Issues: Please report bugs using the Issues section of GitHub
  • Forums: Interact with the development teams on StackOverflow or the Microsoft Azure Forums
  • Source Code Contributions: If you would like to become an active contributor to this project please follow the instructions provided in Contributing.md.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

For general suggestions about Microsoft Azure please use our UserVoice forum.

azure-kusto-python's People

Contributors

alonadam avatar amshalev avatar arielyehezkely avatar asafmah avatar boazsha avatar chriscoe avatar creste avatar danield137 avatar enmoed avatar jjsridharan avatar matei-oltean avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar msftgits avatar ncbrown1 avatar ohadbitt avatar ohbitton avatar ronmonetamicro avatar sbxmdr avatar scovetta avatar shira263 avatar t-liyari avatar t-ronmoneta avatar tonybaloney avatar toshetah avatar uribarash-zz avatar y0nil avatar yaniv-ms avatar yihezkel avatar yogilad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-kusto-python's Issues

import library in pyspark

Hi! I read in this documentation that we can use this library in a Jupyter Notebook attached to a Spark cluster or Databricks.

I tried running from Pyspark3:
%config
!pip install azure-kusto-data==0.0.19

But received the error msg:
OSError: [Errno 13] Permission denied: '/usr/bin/anaconda/lib/python2.7/site-packages/dateutil/tzwin.py'

Do you know some possible reason?
Thank you,
Natalia

KustoStreamingIngestClient ValueError: Timeout value connect was 0:04:30, but it must be an int, float or None.

When attempting to use the streaming ingest client in databricks, using either dataframe or stream, I am hitting "ValueError: Timeout value connect was 0:04:30, but it must be an int, float or None." which I believe is being thrown by requests:

/databricks/python/lib/python3.5/site-packages/azure/kusto/ingest/_streaming_ingest_client.py in ingest_from_dataframe(self, df, ingestion_properties)
51
52 ingestion_properties.format = DataFormat.csv
---> 53 self._ingest(fd.zipped_stream, fd.size, ingestion_properties, content_encoding="gzip")
54
55 fd.delete_files()

Looking through the code, I don't see an obvious way to set this value, can you point me to some documentation?

Error when executing ingesting of multiple files

Receive the following error when attempting to process multiple local csv files:

Bad request: Request is invalid and cannot be executed. Entity ID '[DB NetDefaultDB v?.?]' of kind 'Database' was not found.

I'm passing my DB name as the database I'm connecting to so I searched the azure-kusto-ingest source and found that the first place that NetDefaultDB is used in the process is the _get_temp_storage_objects function:

def _get_temp_storage_objects(self):
        response = self._kusto_client.execute_mgmt("NetDefaultDB", ".create tempstorage")
        storages = list()
        for row in response.iter_all():
            storages.append(_ConnectionString.parse(row["StorageRoot"]))
        return storages

The execute management function first variable is the DB. Since the "NetDefaultDB" is a string I'm not sure if this is this part of Kusto's inner working for ingestion and this error is permission related or if this value is supposed to be a variable related to the DB actually being accessed.

AdalError "AADSTS50079" for MFA

When trying to authenticate with Kusto, I am getting the below error. Typically I manually go to a related site and force re-authentication with MFA, and that resolves the issue. However, that is not working also now. This error didn't exist for months, until the last month or two. Please suggest how to work around this.

AdalError: Get Token request returned http error: 400 and server response: {"error":"interaction_required","error_description":"AADSTS50079: The user is required to use multi-factor authentication.\r\nTrace ID: 8d4704cf-5c83-42bb-860b-7a2180d1fa00\r\nCorrelation ID: 87014a28-6bc1-4413-8592-ec1aac63646e\r\nTimestamp: 2018-12-31 20:57:33Z","error_codes":[50079],"timestamp":"2018-12-31 20:57:33Z","trace_id":"8d4704cf-5c83-42bb-860b-7a2180d1fa00","correlation_id":"87014a28-6bc1-4413-8592-ec1aac63646e","suberror":"basic_action"}

ValueError: Cannot convert non-finite values (NA or inf) to integer

I have a query that can be executed on Kusto.Explorer. When I tried to execute the same query and return a dataframe using dataframe = dataframe_from_result_table(response.primary_results[0]), I always get this error.

  File "c:/Users/yizhon/OneDrive - Microsoft/Azure IoT/Telemetry/Finance correlation/data-joining-scripts/data-merger.py", line 101, in ingestUsageDataFromKusto
    dataframe = dataframe_from_result_table(response.primary_results[0])
  File "C:\Python27\lib\site-packages\azure\kusto\data\helpers.py", line 56, in dataframe_from_result_table
    frame[col_name] = frame[col_name].astype(pandas_type, errors="raise" if raise_errors else "ignore")
  File "C:\Python27\lib\site-packages\pandas\util\_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 5001, in astype
    **kwargs)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3714, in astype
    return self.apply('astype', dtype=dtype, **kwargs)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3581, in apply
    applied = getattr(b, f)(**kwargs)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 575, in astype
    **kwargs)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 664, in _astype
    values = astype_nansafe(values.ravel(), dtype, copy=True)
  File "C:\Python27\lib\site-packages\pandas\core\dtypes\cast.py", line 702, in astype_nansafe
    raise ValueError('Cannot convert non-finite values (NA or inf) to '
ValueError: Cannot convert non-finite values (NA or inf) to integer

Error arises when I tried to join certain tables - but some joins succeeded. Is there a way to stop the conversion to integer?

Fail to connect contry cloud adx endpoint

for the azure-kusto-data sdk, there has no where to setup the login endpoin for AAD authentication. in Azure China, there has launch ADX, but can not connect using python SDK yet.

TypeError("Cannot convert %r to Decimal" % value)

Repro

from azure.kusto.data.request import KustoClient, KustoConnectionStringBuilder
from azure.kusto.data.exceptions import KustoServiceError

kcsb = KustoConnectionStringBuilder.with_aad_application_key_authentication(
    connection_string='',
    aad_app_id='',
    app_key='',
    authority_id='')

client = KustoClient(kcsb)

db = 'db'
try:
    query = "T"
    response = client.execute(db, query)

except KustoServiceError as error:
    response = None

print response.primary_results[0]

schema:

.create table T (a:string, b:decimal)
.ingest inline into table T
[,]

Problem description

Code fails with error TypeError: Cannot convert None to Decimal
It's because of:

  1. Decimal is value type and is not nullable in Python
  2. It is a nullable type in Kusto
  3. Kusto sends us null
  4. we are trying to convert null to Decimal:
    typed_value = KustoResultRow.convertion_funcs[column_type](value)
  5. adding decimal to this if fixes problem
    if column_type in ["datetime", "timespan"]:

full stacktrace:

File "test.py", line 17, in <module>
    response = client.execute(db, query)
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/azure/kusto/data/request.py", line 396, in execute
    return self.execute_query(database, query, properties)
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/azure/kusto/data/request.py", line 407, in execute_query
    self._query_endpoint, database, query, None, KustoClient._query_default_timeout, properties
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/azure/kusto/data/request.py", line 462, in _execute
    return KustoResponseDataSetV2(response.json())
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/azure/kusto/data/_response.py", line 134, in __init__
    super(KustoResponseDataSetV2, self).__init__([t for t in json_response if t["FrameType"] == "DataTable"])
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/azure/kusto/data/_response.py", line 18, in __init__
    self.tables = [KustoResultTable(t) for t in json_response]
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/azure/kusto/data/_models.py", line 137, in __init__
    self.rows = [KustoResultRow(self.columns, row) for row in json_table["Rows"]]
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/azure/kusto/data/_models.py", line 71, in __init__
    typed_value = KustoResultRow.convertion_funcs[column_type](value)
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/decimal.py", line 657, in __new__
    raise TypeError("Cannot convert %r to Decimal" % value)
TypeError: Cannot convert None to Decimal

Output of pip freeze

adal==1.2.2
asn1crypto==0.24.0
azure-kusto-data==0.0.31
azure-nspkg==3.0.2
certifi==2019.6.16
cffi==1.12.3
chardet==3.0.4
cryptography==2.7
enum34==1.1.6
idna==2.8
ipaddress==1.0.22
pbr==5.4.2
pycparser==2.19
PyJWT==1.7.1
python-dateutil==2.8.0
requests==2.22.0
six==1.12.0
stevedore==1.30.1
urllib3==1.25.3
virtualenv==16.7.2
virtualenv-clone==0.5.3
virtualenvwrapper==4.8.4

Error while calling str() on response containing datetime object

The str function in class KustoResultTable can not handle datetime objects.

Proposed solution: It would be better to have a default json serializable object in call to json.dumps. This would eliminate the need to have d["kind"] = d["kind"].value

    def __str__(self):
        d = self.to_dict()
        return json.dumps(d, default=str)

Error trace:

Traceback (most recent call last):
  File "util\kusto.py", line 126, in <module>
    cpu = kusto_client.get_compute_cpu_utilization(start_time, end_time, tenant)
  File "util\kusto.py", line 75, in get_compute_cpu_utilization
    results = self._poll_and_execute(query, end_time)
  File "util\kusto.py", line 51, in _poll_and_execute
    logging.info("Polling kusto result {} min: {}".format(i, result))
  File "C:\Users\shregup\AppData\Local\Programs\Python\Python37\lib\site-packages\azure\kusto\data\_models.py", line 170, in __str__
    return json.dumps(d)
  File "C:\Users\shregup\AppData\Local\Programs\Python\Python37\lib\json\__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "C:\Users\shregup\AppData\Local\Programs\Python\Python37\lib\json\encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\Users\shregup\AppData\Local\Programs\Python\Python37\lib\json\encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "C:\Users\shregup\AppData\Local\Programs\Python\Python37\lib\json\encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable

dataframe ingestion not honoring fields

Code Sample, a copy-pastable example if possible

fields = ["Name", "Metric", "Source"]
rows = [["p1", 23, "SDK"], ["p2", 25, "SDK"]]

df = pandas.DataFrame(data=rows, columns=fields)

ingestClient.ingest_from_dataframe(df, ingestion_properties=ingestion_props)

Total Sample available at below link

https://gist.github.com/prashanthmadi/738da1cf92f825eedf1451b985cee6e8

Problem description

As you can see in below screenshot.. field's haven't matched during ingestion.. and i don't see p1, p2 data instead i see metrics in name field

image

If query related, does it happen on other platforms (Kusto Web UI, Kusto Explorer)?

[this step is to help pin point problems that are only specific to this platform.]

Output of pip freeze

azure-kusto-data 0.0.35
azure-kusto-ingest 0.0.35

Max retries exceeded with url: /v1/rest/mgmt

It had been working fine till the recent change to use connection pool.

Error:
MaxRetryError: HTTPSConnectionPool(host='xxxx.uksouth.kusto.windows.net', port=443): Max retries exceeded with url: /v1/rest/mgmt (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f037fbb1b00>: Failed to establish a new connection: [Errno 111] Connection refused',))

I'm behind corporate proxy, using with_aad_application_key_authentication as authentication.

Failure parsing datetime "1-01-01 00:00:00"

Running the ".show diagnostics" command sometimes returns a value of "1-01-01 00:00:00" for the DataWarmingLastRunOn column. This value throws an exception when it is being converted to a python object:

Traceback (most recent call last):
File "/XXX/kusto_monitor.py", line 192, in
main()
File "/XXX/kusto_monitor.py", line 186, in main
process_cluster(cluster, args, output_file)
File "/XXX/kusto_monitor.py", line 141, in process_cluster
log_command_results(client, cluster_name, NO_DATABASE, ".show diagnostics", "diagnostics", output_file)
File "/XXX/kusto_monitor.py", line 93, in log_command_results
query_result = client.execute(database_name, command)
File "/XXX/venv_healthcheck/local/lib/python2.7/site-packages/azure/kusto/data/request.py", line 395, in execute
return self.execute_mgmt(database, query, properties)
File "/XXX/venv_healthcheck/local/lib/python2.7/site-packages/azure/kusto/data/request.py", line 418, in execute_mgmt
return self._execute(self._mgmt_endpoint, database, query, None, KustoClient._mgmt_default_timeout, properties)
File "/XXX/venv_healthcheck/local/lib/python2.7/site-packages/azure/kusto/data/request.py", line 463, in _execute
return KustoResponseDataSetV1(response.json())
File "/XXX/venv_healthcheck/local/lib/python2.7/site-packages/azure/kusto/data/_response.py", line 110, in init
super(KustoResponseDataSetV1, self).init(json_response["Tables"])
File "/XXX/venv_healthcheck/local/lib/python2.7/site-packages/azure/kusto/data/_response.py", line 18, in init
self.tables = [KustoResultTable(t) for t in json_response]
File "/XXX/venv_healthcheck/local/lib/python2.7/site-packages/azure/kusto/data/_models.py", line 137, in init
self.rows = [KustoResultRow(self.columns, row) for row in json_table["Rows"]]
File "/XXX/venv_healthcheck/local/lib/python2.7/site-packages/azure/kusto/data/_models.py", line 67, in init
self._hidden_values.append(to_pandas_datetime(value))
File "/XXX/venv_healthcheck/local/lib/python2.7/site-packages/azure/kusto/data/helpers.py", line 8, in to_pandas_datetime
return pd.to_datetime(raw_value)
File "/XXX/venv_healthcheck/local/lib/python2.7/site-packages/pandas/core/tools/datetimes.py", line 469, in to_datetime
result = _convert_listlike(np.array([arg]), box, format)[0]
File "/XXX/venv_healthcheck/local/lib/python2.7/site-packages/pandas/core/tools/datetimes.py", line 380, in _convert_listlike
raise e
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00

ValueError: Cannot convert non-finite values (NA or inf) to integer

i wanted to map female as 1 and male as 1 in int data type, but getting 'Cannot convert non-finite values (NA or inf) to integer'.i have checked Null value and there is no NaN value in that column.

for dataset in combine:
dataset['Sex'] = dataset['Sex'].map( {'female': 1, 'male': 0} ).astype(int)

train_df.head()


ValueError Traceback (most recent call last)
in
1 for dataset in combine:
----> 2 dataset['Sex']=dataset['Sex'].dropna(axis=0).map({'female':1, 'male':0}).astype(int)
3
4 train.head()

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
5689 # else, only a single dtype is given
5690 new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 5691 **kwargs)
5692 return self._constructor(new_data).finalize(self)
5693

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, **kwargs)
529
530 def astype(self, dtype, **kwargs):
--> 531 return self.apply('astype', dtype=dtype, **kwargs)
532
533 def convert(self, **kwargs):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
393 copy=align_copy)
394
--> 395 applied = getattr(b, f)(**kwargs)
396 result_blocks = _extend_blocks(applied, result_blocks)
397

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors, values, **kwargs)
532 def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
533 return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 534 **kwargs)
535
536 def _astype(self, dtype, copy=False, errors='raise', values=None,

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\blocks.py in _astype(self, dtype, copy, errors, values, **kwargs)
631
632 # _astype_nansafe works fine with 1-d only
--> 633 values = astype_nansafe(values.ravel(), dtype, copy=True)
634
635 # TODO(extension)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
674
675 if not np.isfinite(arr).all():
--> 676 raise ValueError('Cannot convert non-finite values (NA or inf) to '
677 'integer')
678

ValueError: Cannot convert non-finite values (NA or inf) to integer

TypeError: string indices must be integers

When running the following, I get a cryptic TypeError: string indices must be integers error. Either this is a bug on the Kusto side, or Kusto should provide a better error message.

Note: The below code works if I change execute_query into execute or execute_mgmt

from azure.kusto.data.request import KustoClient, KustoConnectionStringBuilder, ClientRequestProperties
from azure.kusto.data.exceptions import KustoServiceError
from azure.kusto.data.helpers import dataframe_from_result_table

cluster = "https://help.kusto.windows.net"

# In case you want to authenticate with AAD device code.
# Please note that if you choose this option, you'll need to autenticate for every new instance that is initialized.
# It is highly recommended to create one instance and use it for all of your queries.
kcsb = KustoConnectionStringBuilder.with_aad_device_authentication(cluster)

# The authentication method will be taken from the chosen KustoConnectionStringBuilder.
client = KustoClient(kcsb)

response = client.execute_query('Samples', '.show schema')

Why is there random.choice used in KustoIngestClient?

Error masked by a JSON decoder error

In KustoClient in the _execute method, when the response.status_code is not equal to 200, the code throws an exception converting the response to json format. However in cases where the response is not convertible to json, the original error is masked and a JSONDecodeError is thrown.

This should be fixed in such a way that the original error is not masked so that the user can fix their problem.

I have encountered this problem when my application didn't have access to the Kusto cluster. The original response status code was 403.

Make Pandas dependency optional?

Installing Kusto client typically requires me to install or update Pandas or its upstream dependencies like NumPy even though only one method currently depends on it. Is it feasible to make Pandas an optional dependency?

AAD authentication failed since v0.0.16 onwards

The error raised before asking for authentication using device code.
bug doesn't exist in version 0.0.15

KustoServiceError: (KustoServiceError(...), [{'Message': 'Authorization has been denied for this request.'}])

image

cannot import name 'KustoClient'

Hi, I'm trying to import Kusto Client using Azure notebooks but I'm facing some issues.
First, I typed "!pip install azure-kusto-data" and it resulted:
Requirement already satisfied: azure-kusto-data in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (0.0.11)
Requirement already satisfied: adal>=1.0.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-kusto-data) (1.2.0)
Requirement already satisfied: six>=1.10.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-kusto-data) (1.11.0)
Requirement already satisfied: requests>=2.13.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-kusto-data) (2.20.1)
Requirement already satisfied: pandas>=0.15.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-kusto-data) (0.22.0)
Requirement already satisfied: azure-nspkg>=2.0.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-kusto-data) (3.0.2)
Requirement already satisfied: python-dateutil>=2.6.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-kusto-data) (2.7.5)
Requirement already satisfied: cryptography>=1.1.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from adal>=1.0.0->azure-kusto-data) (2.3.1)
Requirement already satisfied: PyJWT>=1.0.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from adal>=1.0.0->azure-kusto-data) (1.7.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests>=2.13.0->azure-kusto-data) (2018.10.15)
Requirement already satisfied: idna<2.8,>=2.5 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests>=2.13.0->azure-kusto-data) (2.7)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests>=2.13.0->azure-kusto-data) (1.23)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests>=2.13.0->azure-kusto-data) (3.0.4)
Requirement already satisfied: pytz>=2011k in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from pandas>=0.15.0->azure-kusto-data) (2018.7)
Requirement already satisfied: numpy>=1.9.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from pandas>=0.15.0->azure-kusto-data) (1.14.6)
Requirement already satisfied: asn1crypto>=0.21.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from cryptography>=1.1.0->adal>=1.0.0->azure-kusto-data) (0.24.0)
Requirement already satisfied: cffi!=1.11.3,>=1.7 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from cryptography>=1.1.0->adal>=1.0.0->azure-kusto-data) (1.11.5)
Requirement already satisfied: pycparser in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from cffi!=1.11.3,>=1.7->cryptography>=1.1.0->adal>=1.0.0->azure-kusto-data) (2.19)
In [7]:

Later, I typed "from azure.kusto.data import KustoClient", but got the result:
"ImportError: cannot import name 'KustoClient'"

Have you seen this error before or have a clue about the reason?
Thank you,
Natalia

Does not install without pandas

On a Raspberry Pi Zero W running out-of-the-box Raspbian Stretch, running

pip install azure-kusto-data

Tries to install pandas (even though the [pandas] flag was not used) and then hangs:

  Downloading https://files.pythonhosted.org/packages/c5/db/e56e6b4bbac7c4a06de1c50de6fe1ef3810018ae11732a50f15f62c7d050/enum34-1.1.6-py2-none-any.whl
Collecting ipaddress (from cryptography>=1.1.0->adal>=1.0.0->azure-kusto-data)
  Downloading https://files.pythonhosted.org/packages/fc/d0/7fc3a811e011d4b388be48a0e381db8d990042df54aa4ef4599a31d39853/ipaddress-1.0.22-py2.py3-none-any.whl
Collecting pycparser (from cffi!=1.11.3,>=1.7->cryptography>=1.1.0->adal>=1.0.0->azure-kusto-data)
  Downloading https://www.piwheels.org/simple/pycparser/pycparser-2.18-py2.py3-none-any.whl (209kB)
    100% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 215kB 253kB/s
Building wheels for collected packages: pandas, cryptography, numpy, cffi
  Running setup.py bdist_wheel for pandas ... |

Random "Retry policy did not allow for a retry" errors when calling ingestclient.ingest_from_dataframe

Code Sample, a copy-pastable example if possible

def create_df(data):
    print('Constructing df: records = ' + str(len(data)))
    dt = []
    for idx, val in enumerate(data, 0):
        dt.append([idx, json.dumps(val)])

    fields = ['id', 'doc']
    df = pd.DataFrame(data=dt, columns=fields)
    
    return df

def send_data(data, db_name, table_name):
    df = create_df(data)
    
    df.to_csv(table_name + ".csv", index=False, encoding="utf-8", header=False)
    ingestion_props = IngestionProperties(
        database=db_name,
        table=table_name,
        dataFormat=DataFormat.csv,
        # incase status update for success are also required
        # reportLevel=ReportLevel.FailuresAndSuccesses,
        )
    failed = True #assume it will fail
    print("------------------------------------------------------------------------")
    print('Sending to kusto...')
    while failed:
        try:
            ingestclient.ingest_from_dataframe(df, ingestion_properties=ingestion_props)
            failed = False
        except:
            e = traceback.format_exc()
            print(e)
            
            failed = True
            time.sleep(10)

    print('Done sending')
    return

 r = QueryCosmosDbNoonsForClient(clientName)
 send_data(r, dbname, 'noons')
        
 r = GetAllPassagesFromPassageCache(clientName)
 send_data(r, dbname, 'passages')
        
 r = GetAlerts(clientName)
 send_data(r, dbname, 'alerts')
        
 r = GetVesselsWithModules(clientName)
 send_data(r, dbname, 'vesselswithmodules')
        
 r = GetClientList()
 send_data(r, dbname, 'clients')
        
 r = GetAllVesselGroupsForAllUsers(clientName)
        send_data(r, dbname, 'vesselgroupsforusers')
        
r = GetAllVesselsForAClient(clientName)
send_data(r, dbname, 'vessels')

r in each case is just a list of JSON strings.

Problem description

I get random errors when calling ingestclient.ingest_from_dataframe()
Example shows multiple calls to the above function and the above function traps for the exception and just retries. The retries are usually successful on the 1st or 2nd attempt. Here is example output:

Constructing df: records = 4610
------------------------------------------------------------------------
Sending to kusto...
Done sending
Constructing df: records = 16
------------------------------------------------------------------------
Sending to kusto...
Done sending
Constructing df: records = 287
------------------------------------------------------------------------
Sending to kusto...
------------------------------------------------------------------------
Sending to kusto...
ERROR:azure.storage.common.storageclient:Client-Request-ID=025eb4ca-e524-11e9-8b7e-70886b83b7da Retry policy did not allow for a retry: Server-Timestamp=Wed, 02 Oct 2019 14:50:42 GMT, Server-Request-ID=887414da-8003-000b-2830-7984e0000000, HTTP status code=400, Exception=The value for one of the HTTP headers is not in the correct format.<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidHeaderValue</Code><Message>The value for one of the HTTP headers is not in the correct format.RequestId:887414da-8003-000b-2830-7984e0000000Time:2019-10-02T14:50:43.4555251Z</Message><HeaderName>x-ms-version</HeaderName><HeaderValue>2019-02-02</HeaderValue></Error>.
Traceback (most recent call last):
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\KustoCache\__init__.py", line 108, in send_data
    ingestclient.ingest_from_dataframe(df, ingestion_properties=ingestion_props)
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\kusto\ingest\_ingest_client.py", line 66, in ingest_from_dataframe
    self.ingest_from_blob(BlobDescriptor(url, fd.size), ingestion_properties=ingestion_properties)
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\kusto\ingest\_ingest_client.py", line 121, in ingest_from_blob
    queue_service.put_message(queue_name=queue_details.object_name, content=encoded)
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\storage\queue\queueservice.py", line 793, in put_message
    None, None, content])
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\storage\common\storageclient.py", line 430, in _perform_request
    raise ex
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\storage\common\storageclient.py", line 358, in _perform_request
    raise ex
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\storage\common\storageclient.py", line 344, in _perform_request
    HTTPError(response.status, response.message, response.headers, response.body))
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\storage\common\_error.py", line 115, in _http_error_handler
    raise ex
azure.common.AzureHttpError: The value for one of the HTTP headers is not in the correct format.
<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidHeaderValue</Code><Message>The value for one of the HTTP headers is not in the correct format.
RequestId:887414da-8003-000b-2830-7984e0000000
Time:2019-10-02T14:50:43.4555251Z</Message><HeaderName>x-ms-version</HeaderName><HeaderValue>2019-02-02</HeaderValue></Error>

Done sending
Constructing df: records = 4
------------------------------------------------------------------------
Sending to kusto...
Done sending
Constructing df: records = 20
------------------------------------------------------------------------
Sending to kusto...
Done sending
Constructing df: records = 16
------------------------------------------------------------------------
Sending to kusto...
Done sending
Constructing df: records = 2
------------------------------------------------------------------------
Sending to kusto...
ERROR:azure.storage.common.storageclient:Client-Request-ID=0d71a462-e524-11e9-8a6a-70886b83b7da Retry policy did not allow for a retry: Server-Timestamp=Wed, 02 Oct 2019 14:51:01 GMT, Server-Request-ID=467a7e1b-2003-0060-4b30-790314000000, HTTP status code=400, Exception=The value for one of the HTTP headers is not in the correct format.<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidHeaderValue</Code><Message>The value for one of the HTTP headers is not in the correct format.RequestId:467a7e1b-2003-0060-4b30-790314000000Time:2019-10-02T14:51:01.9837789Z</Message><HeaderName>x-ms-version</HeaderName><HeaderValue>2019-02-02</HeaderValue></Error>.
Traceback (most recent call last):
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\KustoCache\__init__.py", line 108, in send_data
    ingestclient.ingest_from_dataframe(df, ingestion_properties=ingestion_props)
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\kusto\ingest\_ingest_client.py", line 66, in ingest_from_dataframe
    self.ingest_from_blob(BlobDescriptor(url, fd.size), ingestion_properties=ingestion_properties)
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\kusto\ingest\_ingest_client.py", line 121, in ingest_from_blob
    queue_service.put_message(queue_name=queue_details.object_name, content=encoded)
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\storage\queue\queueservice.py", line 793, in put_message
    None, None, content])
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\storage\common\storageclient.py", line 430, in _perform_request
    raise ex
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\storage\common\storageclient.py", line 358, in _perform_request
    raise ex
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\storage\common\storageclient.py", line 344, in _perform_request
    HTTPError(response.status, response.message, response.headers, response.body))
  File "c:\dev\src\i4\Services\i4ServicesPyParsing\.env\lib\site-packages\azure\storage\common\_error.py", line 115, in _http_error_handler
    raise ex
azure.common.AzureHttpError: The value for one of the HTTP headers is not in the correct format.
<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidHeaderValue</Code><Message>The value for one of the HTTP headers is not in the correct format.
RequestId:467a7e1b-2003-0060-4b30-790314000000
Time:2019-10-02T14:51:01.9837789Z</Message><HeaderName>x-ms-version</HeaderName><HeaderValue>2019-02-02</HeaderValue></Error>

Done sending

Output of pip freeze

[paste the output of pip freeze here below this line]
adal==1.2.1
antlr4-python3-runtime==4.7.2
applicationinsights==0.11.7
argcomplete==1.10.0
asn1crypto==0.24.0
astroid==2.1.0
atomicwrites==1.3.0
attrs==19.1.0
azure-batch==6.0.0
azure-cli==2.0.67
azure-cli-acr==2.2.9
azure-cli-acs==2.4.4
azure-cli-advisor==2.0.1
azure-cli-ams==0.4.7
azure-cli-appservice==0.2.21
azure-cli-backup==1.2.5
azure-cli-batch==4.0.3
azure-cli-batchai==0.4.10
azure-cli-billing==0.2.2
azure-cli-botservice==0.2.2
azure-cli-cdn==0.2.4
azure-cli-cloud==2.1.1
azure-cli-cognitiveservices==0.2.6
azure-cli-command-modules-nspkg==2.0.2
azure-cli-configure==2.0.24
azure-cli-consumption==0.4.4
azure-cli-container==0.3.18
azure-cli-core==2.0.67
azure-cli-cosmosdb==0.2.11
azure-cli-deploymentmanager==0.1.1
azure-cli-dla==0.2.6
azure-cli-dls==0.1.10
azure-cli-dms==0.1.4
azure-cli-eventgrid==0.2.4
azure-cli-eventhubs==0.3.7
azure-cli-extension==0.2.5
azure-cli-feedback==2.2.1
azure-cli-find==0.3.4
azure-cli-hdinsight==0.3.5
azure-cli-interactive==0.4.5
azure-cli-iot==0.3.11
azure-cli-iotcentral==0.1.7
azure-cli-keyvault==2.2.16
azure-cli-kusto==0.2.3
azure-cli-lab==0.1.8
azure-cli-maps==0.3.5
azure-cli-monitor==0.2.15
azure-cli-natgateway==0.1.1
azure-cli-network==2.5.2
azure-cli-nspkg==3.0.3
azure-cli-policyinsights==0.1.4
azure-cli-privatedns==1.0.2
azure-cli-profile==2.1.5
azure-cli-rdbms==0.3.12
azure-cli-redis==0.4.4
azure-cli-relay==0.1.5
azure-cli-reservations==0.4.3
azure-cli-resource==2.1.16
azure-cli-role==2.6.4
azure-cli-search==0.1.2
azure-cli-security==0.1.2
azure-cli-servicebus==0.3.6
azure-cli-servicefabric==0.1.20
azure-cli-signalr==1.0.1
azure-cli-sql==2.2.5
azure-cli-sqlvm==0.2.0
azure-cli-storage==2.4.3
azure-cli-telemetry==1.0.2
azure-cli-vm==2.2.23
azure-common==1.1.23
azure-cosmos==3.0.2
azure-datalake-store==0.0.39
azure-functions==1.0.0b5
azure-functions-devops-build==0.0.22
azure-functions-worker==1.0.0b10
azure-graphrbac==0.60.0
azure-keyvault==1.1.0
azure-kusto-data==0.0.33
azure-kusto-ingest==0.0.33
azure-mgmt-advisor==2.0.1
azure-mgmt-applicationinsights==0.1.1
azure-mgmt-authorization==0.50.0
azure-mgmt-batch==6.0.0
azure-mgmt-batchai==2.0.0
azure-mgmt-billing==0.2.0
azure-mgmt-botservice==0.2.0
azure-mgmt-cdn==3.1.0
azure-mgmt-cognitiveservices==3.0.0
azure-mgmt-compute==5.0.0
azure-mgmt-consumption==2.0.0
azure-mgmt-containerinstance==1.4.0
azure-mgmt-containerregistry==2.8.0
azure-mgmt-containerservice==5.2.0
azure-mgmt-cosmosdb==0.6.1
azure-mgmt-datalake-analytics==0.2.1
azure-mgmt-datalake-nspkg==3.0.1
azure-mgmt-datalake-store==0.5.0
azure-mgmt-datamigration==0.1.0
azure-mgmt-deploymentmanager==0.1.0
azure-mgmt-devtestlabs==2.2.0
azure-mgmt-dns==2.1.0
azure-mgmt-eventgrid==2.2.0
azure-mgmt-eventhub==2.6.0
azure-mgmt-hdinsight==0.2.1
azure-mgmt-imagebuilder==0.2.1
azure-mgmt-iotcentral==1.0.0
azure-mgmt-iothub==0.8.2
azure-mgmt-iothubprovisioningservices==0.2.0
azure-mgmt-keyvault==1.1.0
azure-mgmt-kusto==0.3.0
azure-mgmt-loganalytics==0.2.0
azure-mgmt-managementgroups==0.1.0
azure-mgmt-maps==0.1.0
azure-mgmt-marketplaceordering==0.1.0
azure-mgmt-media==1.1.1
azure-mgmt-monitor==0.5.2
azure-mgmt-msi==0.2.0
azure-mgmt-network==3.0.0
azure-mgmt-nspkg==3.0.2
azure-mgmt-policyinsights==0.3.1
azure-mgmt-privatedns==0.1.0
azure-mgmt-rdbms==1.8.0
azure-mgmt-recoveryservices==0.1.1
azure-mgmt-recoveryservicesbackup==0.1.2
azure-mgmt-redis==6.0.0
azure-mgmt-relay==0.1.0
azure-mgmt-reservations==0.3.1
azure-mgmt-resource==2.1.0
azure-mgmt-search==2.0.0
azure-mgmt-security==0.1.0
azure-mgmt-servicebus==0.6.0
azure-mgmt-servicefabric==0.2.0
azure-mgmt-signalr==0.1.1
azure-mgmt-sql==0.12.0
azure-mgmt-sqlvirtualmachine==0.3.0
azure-mgmt-storage==3.3.0
azure-mgmt-trafficmanager==0.51.0
azure-mgmt-web==0.42.0
azure-multiapi-storage==0.2.3
azure-nspkg==3.0.2
azure-storage-blob==1.3.1
azure-storage-common==1.4.2
azure-storage-nspkg==3.1.0
azure-storage-queue==2.1.0
bcrypt==3.1.7
certifi==2018.11.29
cffi==1.12.3
cftime==1.0.3.4
chardet==3.0.4
colorama==0.4.1
cryptography==2.7
DateTimeRange==0.5.5
fabric==2.4.0
geographiclib==1.49
grpcio==1.14.2
grpcio-tools==1.14.2
html2text==2018.1.9
humanfriendly==4.18
idna==2.8
importlib-metadata==0.20
invoke==1.2.0
ipaddress==1.0.22
isodate==0.6.0
isort==4.3.4
Jinja2==2.10.1
jmespath==0.9.4
knack==0.6.2
lazy-object-proxy==1.3.1
mail-parser==3.8.1
MarkupSafe==1.1.1
mbstrdecoder==0.7.0
mccabe==0.6.1
mock==3.0.5
more-itertools==7.2.0
msrest==0.6.8
msrestazure==0.6.1
netCDF4==1.4.2
numpy==1.16.0
oauthlib==3.0.1
packaging==19.1
pandas==0.24.0
paramiko==2.6.0
pluggy==0.12.0
portalocker==1.2.1
prompt-toolkit==1.0.16
protobuf==3.6.1
psutil==5.6.3
ptvsd==4.2.2
py==1.8.0
pycparser==2.19
Pygments==2.4.2
PyJWT==1.7.1
pylint==2.2.2
PyNaCl==1.3.0
pyOpenSSL==19.0.0
pyparsing==2.4.2
pyperclip==1.7.0
pypiwin32==223
pyreadline==2.1
pytest==5.1.2
python-dateutil==2.8.0
pytz==2018.9
pywin32==224
PyYAML==5.1.1
requests==2.21.0
requests-oauthlib==1.2.0
scipy==1.2.0
scp==0.13.2
simplejson==3.16.0
six==1.12.0
sshtunnel==0.1.5
tabulate==0.8.3
typed-ast==1.2.0
typepy==0.4.0
urllib3==1.24.1
vsts==0.1.25
vsts-cd-manager==1.0.2
wcwidth==0.1.7
websocket-client==0.56.0
wrapt==1.11.1
xarray==0.11.3
xlrd==1.2.0
xmltodict==0.12.0
zipp==0.6.0

Tenant ID for AAD-Federated

I normally use AAD-Federated to login on Kusto.Explorer. What's my tenant ID in this case? Trying to use this code:

# In case you want to authenticate with AAD username and password
username = "<username>"
password = "<password>"
kcsb = KustoConnectionStringBuilder.with_aad_user_password_authentication(cluster, username, password, authority_id)

SDK for .purge operation

@thisisnish commented on Fri Aug 23 2019

Of three python sdks listed here https://docs.microsoft.com/en-us/azure/kusto/api/python/kusto-python-client-library, none of them currently support .purge operation. Is this correct or am I missing something? How can one send .purge command using python sdk? Will this be incorporated in futire release of azure-kusto-mgmt library?


@kaerm commented on Fri Aug 23 2019

Hi @thisisnish thanks for letting us know, I'm tagging relevant teams to help with this


@thisisnish commented on Tue Aug 27 2019

@kaerm any update or information on this?


@kaerm commented on Tue Aug 27 2019

@thisisnish working on finding the right team

0.0.11 failed authentication via AAD applicaiton id

Application id look up under windows.net which fails the query execution

Get Token request returned http error: 400 and server response: {"error":"unauthorized_client","error_description":"AADSTS70001: Application with identifier '###' was not found in the directory windows.net

Conributing.md is missing

Under the "Contribute" title in readme.md, there is a hyperlink to Conributing.md file.
This file is missing.

JSONDecodeError on error response

If query execution doesn't succeed kusto plugin fails to parse error as it's not in a json format, e.g.

`JSONDecodeError Traceback (most recent call last)
in
----> 1 drop_response = client.execute(db_aliases["Trouter Client PROD"], drop_command)
2 drop_response

~/anaconda3_501/lib/python3.6/site-packages/azure/kusto/data/request.py in execute(self, database, query, properties)
393 """
394 if query.startswith("."):
--> 395 return self.execute_mgmt(database, query, properties)
396 return self.execute_query(database, query, properties)
397

~/anaconda3_501/lib/python3.6/site-packages/azure/kusto/data/request.py in execute_mgmt(self, database, query, properties)
416 :rtype: azure.kusto.data._response.KustoResponseDataSet
417 """
--> 418 return self._execute(self._mgmt_endpoint, database, query, None, KustoClient._mgmt_default_timeout, properties)
419
420 def execute_streaming_ingest(self, database, table, stream, stream_format, properties=None, mapping_name=None):

~/anaconda3_501/lib/python3.6/site-packages/azure/kusto/data/request.py in _execute(self, endpoint, database, query, payload, timeout, properties)
463 return KustoResponseDataSetV1(response.json())
464
--> 465 raise KustoServiceError([response.json()], response)
466
467 def _get_timeout(self, properties, default):

~/anaconda3_501/lib/python3.6/site-packages/requests/models.py in json(self, **kwargs)
895 # used.
896 pass
--> 897 return complexjson.loads(self.text, **kwargs)
898
899 @Property

~/anaconda3_501/lib/python3.6/json/init.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
352 parse_int is None and parse_float is None and
353 parse_constant is None and object_pairs_hook is None and not kw):
--> 354 return _default_decoder.decode(s)
355 if cls is None:
356 cls = JSONDecoder
`

The value for one of the HTTP headers is not in the correct format

Hi,
Using following simple code:

       KCSB_INGEST = KustoConnectionStringBuilder.with_aad_user_password_authentication(cluster, <u>, <p>)
       ingestclient = KustoIngestClient(KCSB_INGEST)
        dt = []
        for idx, val in enumerate(r,0):
            dt.append([idx, json.dumps(val)])
            
        fields = ['id', 'doc']
        df = pd.DataFrame(data=dt, columns=fields)

        ingestion_props = IngestionProperties(
            database=db_name,
            table=table_name,
            dataFormat=DataFormat.csv,
            )
        ingestclient.ingest_from_dataframe(df, ingestion_properties=ingestion_props)

r is just a list of strings.

I always get the following error:
ERROR:azure.storage.common.storageclient:Client-Request-ID=6cfd5cac-d9d2-11e9-a960-70886b83b7da Retry policy did not allow for a retry: Server-Timestamp=Wed, 18 Sep 2019 05:09:00 GMT, Server-Request-ID=7383a49b-e003-0040-7fdf-6d78b3000000, HTTP status code=400, Exception=The value for one of the HTTP headers is not in the correct format.InvalidHeaderValueThe value for one of the HTTP headers is not in the correct format.RequestId:7383a49b-e003-0040-7fdf-6d78b3000000Time:2019-09-18T05:09:00.6931679Zx-ms-version2019-02-02.
Traceback (most recent call last):
File "c:\dev\src\i4\Services\i4ServicesPyParsing\KustoCache_init_.py", line 69, in ingestnoons
ingestclient.ingest_from_dataframe(df, ingestion_properties=ingestion_props)
File "c:\dev\src\i4\Services\i4ServicesPyParsing.env\lib\site-packages\azure\kusto\ingest_ingest_client.py", line 66, in ingest_from_dataframe
self.ingest_from_blob(BlobDescriptor(url, fd.size), ingestion_properties=ingestion_properties)
File "c:\dev\src\i4\Services\i4ServicesPyParsing.env\lib\site-packages\azure\kusto\ingest_ingest_client.py", line 121, in ingest_from_blob
queue_service.put_message(queue_name=queue_details.object_name, content=encoded)
File "c:\dev\src\i4\Services\i4ServicesPyParsing.env\lib\site-packages\azure\storage\queue\queueservice.py", line 793, in put_message
None, None, content])
File "c:\dev\src\i4\Services\i4ServicesPyParsing.env\lib\site-packages\azure\storage\common\storageclient.py", line 430, in _perform_request
raise ex
File "c:\dev\src\i4\Services\i4ServicesPyParsing.env\lib\site-packages\azure\storage\common\storageclient.py", line 358, in _perform_request
raise ex
File "c:\dev\src\i4\Services\i4ServicesPyParsing.env\lib\site-packages\azure\storage\common\storageclient.py", line 344, in _perform_request
HTTPError(response.status, response.message, response.headers, response.body))
File "c:\dev\src\i4\Services\i4ServicesPyParsing.env\lib\site-packages\azure\storage\common_error.py", line 115, in _http_error_handler

I used the following to output to a file to look at the csv and it looks ok to me:
df.to_csv('foobar', index=False, encoding="utf-8", header=False, compression="gzip")

The database looks like:
image

This seems like a permissions issue or like something is missing from setup like a blob storage.
What is missing or why the error?

"Syntax error: Query could not be parsed: . Query: '.get ingestion resources'" when running parquet ingestion

Code Sample, a copy-pastable example if possible

# Your code here
client = KustoIngestClient(kcsb)
i=0
for BLOB_PATH in blob_list:
    if i%50==0:
        print(str(i)+":"+BLOB_PATH)
    ingestion_props = IngestionProperties(
        database="HEB",
        table="Orders",
        dataFormat=DataFormat.PARQUET,
        mappingReference ="ordersparquetmapping"
        # incase status update for success are also required
        # reportLevel=ReportLevel.FailuresAndSuccesses,
    )

    blob_descriptor = BlobDescriptor(blob_list[0], 100000)  # 10 is the raw size of the data in bytes.
    client.ingest_from_blob(blob_descriptor, ingestion_properties=ingestion_props)
    i=i+1
print("Done queueing all files")

Problem description

KustoServiceError: (KustoServiceError(...), [{'error': {'message': 'Request is invalid and cannot be executed.', '@type': 'Kusto.Data.Exceptions.SyntaxException', '@context': {'activityStack': '(Activity stack: CRID=KPC.execute;95c3c526-bc86-4103-957a-61bd9e71c72b ARID=1b4b19a9-9ea6-44f7-a31f-5c1fcbc5085f > DN.Admin.Client.ExecuteControlCommand/a2724130-abe8-46e7-90e7-501b58719343 > P.WCF.Service.ExecuteControlCommandInternal..IAdminClientServiceCommunicationContract/4d754776-c142-4f70-b327-f1eb2c19fb3c > DN.FE.ExecuteControlCommand/4986aaed-731c-4131-99f0-7905f36d4413)', 'appDomainName': 'Kusto.WinSvc.Svc.exe', 'processName': 'Kusto.WinSvc.Svc', 'processId': 3796, 'activityType': 'DN.FE.ExecuteControlCommand', 'timestamp': '2019-10-01T23:39:23.0328157Z', 'threadId': 6296, 'subActivityId': '4986aaed-731c-4131-99f0-7905f36d4413', 'clientRequestId': 'KPC.execute;95c3c526-bc86-4103-957a-61bd9e71c72b', 'machineName': 'KEngine000001', 'activityId': '1b4b19a9-9ea6-44f7-a31f-5c1fcbc5085f', 'parentActivityId': '4d754776-c142-4f70-b327-f1eb2c19fb3c', 'serviceAlias': 'ADXDEMO'}, '@message': "Syntax error: Query could not be parsed: . Query: '.get ingestion resources'", 'code': 'Bad request', '@permanent': True}}])

[this should explain why the current behavior is a problem and why the expected output is a better solution.]

If query related, does it happen on other platforms (Kusto Web UI, Kusto Explorer)?

[this step is to help pin point problems that are only specific to this platform.]

Output of pip freeze

[paste the output of pip freeze here below this line]

absl-py==0.8.0
adal==1.2.2
asn1crypto==0.24.0
astor==0.8.0
azure-common==1.1.23
azure-kusto-data==0.0.35
azure-kusto-ingest==0.0.35
azure-storage-blob==2.1.0
azure-storage-common==2.1.0
azure-storage-queue==2.1.0
certifi==2018.11.29
cffi==1.11.5
chardet==3.0.4
conda==4.5.12
cryptography==2.4.2
gast==0.3.0
google-pasta==0.1.7
grpcio==1.23.0
h5py==2.10.0
idna==2.8
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
Markdown==3.1.1
menuinst==1.4.14
numpy==1.17.2
opt-einsum==3.0.1
protobuf==3.9.1
py4j==0.10.7
pycosat==0.6.3
pycparser==2.19
PyJWT==1.7.1
pyOpenSSL==18.0.0
PySocks==1.6.8
pyspark==2.4.1
python-dateutil==2.8.0
pywin32==223
requests==2.21.0
ruamel-yaml==0.15.46
six==1.12.0
tb-nightly==1.15.0a20190806
termcolor==1.1.0
tf-estimator-nightly==1.14.0.dev2019080601
urllib3==1.24.1
Werkzeug==0.15.6
win-inet-pton==1.0.1
wincertstore==0.2
wrapt==1.11.2

azure-kusto-data fails with dependency on Pandas

Code Sample, a copy-pastable example if possible

pip3 install setuptools --upgrade
pip3 install azure-kusto-data

TELEMETRY_ROOT=$SYSTEM_DEFAULTWORKINGDIRECTORY/telemetry

echo "Updating functions..."

for file in $TELEMETRY_ROOT/shared/*.csl; do
    python3 $TELEMETRY_ROOT/deploy/scripts/execute_query.py $file
done

Problem description

We're using the python version without pandas. However, the SDK seems to try and use Pandas, failing with the following:

Traceback (most recent call last):
2019-08-26T17:39:05.6853772Z   File "/home/vsts/work/r1/a/telemetry/deploy/scripts/execute_query.py", line 6, in <module>
2019-08-26T17:39:05.6854664Z     from azure.kusto.data.request import KustoClient, KustoConnectionStringBuilder, ClientRequestProperties
2019-08-26T17:39:05.6856179Z   File "/home/vsts/.local/lib/python3.5/site-packages/azure/kusto/data/request.py", line 15, in <module>
2019-08-26T17:39:05.6856520Z     from ._response import KustoResponseDataSetV1, KustoResponseDataSetV2
2019-08-26T17:39:05.6856881Z   File "/home/vsts/.local/lib/python3.5/site-packages/azure/kusto/data/_response.py", line 10, in <module>
2019-08-26T17:39:05.6857175Z     from ._models import KustoResultColumn, KustoResultRow, KustoResultTable, WellKnownDataSet
2019-08-26T17:39:05.6857598Z   File "/home/vsts/.local/lib/python3.5/site-packages/azure/kusto/data/_models.py", line 30, in <module>
2019-08-26T17:39:05.6857655Z     class KustoResultRow(object):
2019-08-26T17:39:05.6858359Z   File "/home/vsts/.local/lib/python3.5/site-packages/azure/kusto/data/_models.py", line 35, in KustoResultRow
2019-08-26T17:39:05.6858558Z     pandas_funcs = {"datetime": to_pandas_datetime, "timespan": to_pandas_timedelta}
2019-08-26T17:39:05.6858990Z NameError: name 'to_pandas_datetime' is not defined

Output of pip freeze

I ran it from Azure devops so don't have PIP freeze handy. Since it only started in the last 1-2 days, I'm assuming this is a regression with azure-kusto-data 0.0.32

[paste the output of pip freeze here below this line]

Consider moving KustoClient

Currently, importing KustoIngestClient is from top level module, while KustoClient isn't.

from azure.kusto.data import KustoClient, KustoConnectionStringBuilder
from azure.kusto.ingest import KustoIngestClient

Should consider a better design decision that will be consistent between the packages

Empty results throw error instead of returning empty dataframe

When using pandas, the dataframe_from_result_table call throws an error if it is passed a result set that has a blank result set.

#103 introduced this problem Lines 13 & 14 throw an error if the parameter evaluates to False. As empty result tables evaluate to false this results in any blank result set throwing a ValueError.

Working code:

response = client.execute("DB", "print 'a'")
return [dataframe_from_result_table(x) for x in response.primary_results]

Broken code:

response = client.execute("DB", "print 'a' | take 0")
return [dataframe_from_result_table(x) for x in response.primary_results]

AttributeError: module 'dateutil.parser' has no attribute 'isoparse'

Hey,

I'm running a query, and the method failed with an exception.

Code -

from azure.kusto.data.request import KustoClient, KustoConnectionStringBuilder, ClientRequestProperties
from azure.kusto.data.exceptions import KustoServiceError
from azure.kusto.data.helpers import dataframe_from_result_table

cluster = "https://clustername.kusto.windows.net"
db = "dbname"

kcsb = KustoConnectionStringBuilder.with_aad_device_authentication(cluster)
client = KustoClient(kcsb)	

query = """
TableName
| where machine_id == "machine_id"
| where env_time > ago(1d)
| project env_time, message 
"""

client.execute(db, query)

Exception -

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
c:\python364-32\lib\site-packages\azure\kusto\data\_models.py in __init__(self, columns, row)
     63                         else:
---> 64                             typed_value = KustoResultRow.convertion_funcs[lower_column_type](value)
     65                     except (IndexError, AttributeError):

c:\python364-32\lib\site-packages\azure\kusto\data\_converters.py in to_datetime(value)
     18         return parser.parse(value)
---> 19     return parser.isoparse(value)
     20 

AttributeError: module 'dateutil.parser' has no attribute 'isoparse'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-12-bb7a6bdf3c0d> in <module>()
----> 1 client.execute(db, query)

c:\python364-32\lib\site-packages\azure\kusto\data\request.py in execute(self, database, query, properties)
    275         if query.startswith("."):
    276             return self.execute_mgmt(database, query, properties)
--> 277         return self.execute_query(database, query, properties)
    278 
    279     def execute_query(self, database, query, properties=None):

c:\python364-32\lib\site-packages\azure\kusto\data\request.py in execute_query(self, database, query, properties)
    285         :rtype: azure.kusto.data._response.KustoResponseDataSet
    286         """
--> 287         return self._execute(self._query_endpoint, database, query, KustoClient._query_default_timeout, properties)
    288 
    289     def execute_mgmt(self, database, query, properties=None):

c:\python364-32\lib\site-packages\azure\kusto\data\request.py in _execute(self, endpoint, database, query, default_timeout, properties)
    320         if response.status_code == 200:
    321             if endpoint.endswith("v2/rest/query"):
--> 322                 return KustoResponseDataSetV2(response.json())
    323             return KustoResponseDataSetV1(response.json())
    324 

c:\python364-32\lib\site-packages\azure\kusto\data\_response.py in __init__(self, json_response)
    132 
    133     def __init__(self, json_response):
--> 134         super(KustoResponseDataSetV2, self).__init__([t for t in json_response if t["FrameType"] == "DataTable"])

c:\python364-32\lib\site-packages\azure\kusto\data\_response.py in __init__(self, json_response)
     16 
     17     def __init__(self, json_response):
---> 18         self.tables = [KustoResultTable(t) for t in json_response]
     19         self.tables_count = len(self.tables)
     20         self.tables_names = [t.table_name for t in self.tables]

c:\python364-32\lib\site-packages\azure\kusto\data\_response.py in <listcomp>(.0)
     16 
     17     def __init__(self, json_response):
---> 18         self.tables = [KustoResultTable(t) for t in json_response]
     19         self.tables_count = len(self.tables)
     20         self.tables_names = [t.table_name for t in self.tables]

c:\python364-32\lib\site-packages\azure\kusto\data\_models.py in __init__(self, json_table)
    128             raise KustoServiceError(errors[0]["OneApiErrors"][0]["error"]["@message"], json_table)
    129 
--> 130         self.rows = [KustoResultRow(self.columns, row) for row in json_table["Rows"]]
    131 
    132     @property

c:\python364-32\lib\site-packages\azure\kusto\data\_models.py in <listcomp>(.0)
    128             raise KustoServiceError(errors[0]["OneApiErrors"][0]["error"]["@message"], json_table)
    129 
--> 130         self.rows = [KustoResultRow(self.columns, row) for row in json_table["Rows"]]
    131 
    132     @property

c:\python364-32\lib\site-packages\azure\kusto\data\_models.py in __init__(self, columns, row)
     64                             typed_value = KustoResultRow.convertion_funcs[lower_column_type](value)
     65                     except (IndexError, AttributeError):
---> 66                         typed_value = KustoResultRow.convertion_funcs[lower_column_type](value)
     67             elif lower_column_type in KustoResultRow.convertion_funcs:
     68                 typed_value = KustoResultRow.convertion_funcs[lower_column_type](value)

c:\python364-32\lib\site-packages\azure\kusto\data\_converters.py in to_datetime(value)
     17     if isinstance(value, six.integer_types):
     18         return parser.parse(value)
---> 19     return parser.isoparse(value)
     20 
     21 

AttributeError: module 'dateutil.parser' has no attribute 'isoparse'

Machine/Python details -

CPython 3.6.4
azure-kusto-data==0.0.27

compiler : MSC v.1900 32 bit (Intel)
system : Windows
release : 10
machine : AMD64
interpreter: 32bit

Thanks

Parquet support in python

Code Sample, a copy-pastable example if possible

INGESTION_CLIENT = KustoIngestClient(KCSB_INGEST)
FILE_SIZE = 64158321    # in bytes

# All ingestion properties are documented here: https://docs.microsoft.com/azure/kusto/management/data-ingest#ingestion-properties
INGESTION_PROPERTIES = IngestionProperties(database=KUSTO_DATABASE, table=DESTINATION_TABLE, dataFormat=DataFormat.parquet,
                                           mappingReference=DESTINATION_TABLE_COLUMN_MAPPING, additionalProperties={"creationTime":"2019-08-21"})
# FILE_SIZE is the raw size of the data in bytes
for BLOB_PATH in blob_list[0]:
    BLOB_DESCRIPTOR = BlobDescriptor(BLOB_PATH, FILE_SIZE)
    INGESTION_CLIENT.ingest_from_blob(
        BLOB_DESCRIPTOR, ingestion_properties=INGESTION_PROPERTIES)

print('Done queuing up ingestion with Azure Data Explorer')

Problem description

I set everything parquet but after running the ingestion, the library still thought I import csv. This is error from ADX
"Details": Mapping reference 'PARQUET_Mapping' of type 'csv' in database 'db01' could not be found.,

Output of pip freeze

[paste the output of pip freeze here below this line]

API CHANGE: add `to_dataframe` to each table

After investing work in #124,
and some internal discussions, we agreed to wait with this PR and reconsider changing the API to give better performance for both vanilla python and pandas use cases, and save some difficult trickery to allow parsing kusto type to dataframe:

Final api would look like

# result is of type KustoResultDataSet
result = client.execute(db, query)
# raw json 
result.tables[0].json()
# iterator with lazy parsing of json
result.tables[0].rows()
# dataframe parsing from raw json
result.tables[0].to_dataframe()

This will cause some memory pressure, so a best practice would probably be:

# either explicitly access a specific table and drop the reference after conversion
df = client.execute(db, query).primary_results[0].to_dataframe()
# or, parse it all
dfs = client.execute(db, query).to_dataframes()

Feel free to add your thoughts, code will be implemented in next couple of weeks.

timespan parsing is broken in dataframe_from_result_table when days and seconds co-exists

the following lambda fails to parse timespan correctly on when days and seconds co-exists.
ex> Kusto timespan: 7.04:44:01.5115511 (7 days, 4 hrs, 44 min, 1.5115511 seconds) will results in "7 days 04:44:01 days 5115511" string passed to pd.to_timedelta, which will produce "67 days 09:42:31" incorrectly. only the first dot should be replaced with "days"

if col_type.lower() == "timespan":

        frame[col_name] = pandas.to_timedelta(

            frame[col_name].apply(lambda t: t.replace(".", " days ") if t and "." in t.split(":")[0] else t)

        )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.