taraslayshchuk / es2csv Goto Github PK

View Code? Open in Web Editor NEW

510.0 23.0 191.0 46 KB

Export from an Elasticsearch into a CSV file

License: Apache License 2.0

Python 90.59% Shell 9.41%

elasticsearch csv cli export python-2 python kibana

es2csv's Introduction

es2csv

A CLI tool for exporting data from Elasticsearch into a CSV file

Command line utility, written in Python, for querying Elasticsearch in Lucene query syntax or Query DSL syntax and exporting result as documents into a CSV file. This tool can query bulk docs in multiple indices and get only selected fields, this reduces query execution time.

Quick Look Demo

Requirements

This tool should be used with Elasticsearch 5.x version, for older version please check 2.x release.
You also need Python 2.7.x and pip.

Installation

From source:

$ pip install git+https://github.com/taraslayshchuk/es2csv.git

From pip:

$ pip install es2csv

Usage

$ es2csv [-h] -q QUERY [-u URL] [-a AUTH] [-i INDEX [INDEX ...]]
         [-D DOC_TYPE [DOC_TYPE ...]] [-t TAGS [TAGS ...]] -o FILE
         [-f FIELDS [FIELDS ...]] [-S FIELDS [FIELDS ...]] [-d DELIMITER]
         [-m INTEGER] [-s INTEGER] [-k] [-r] [-e] [--verify-certs]
         [--ca-certs CA_CERTS] [--client-cert CLIENT_CERT]
         [--client-key CLIENT_KEY] [-v] [--debug]

Arguments:
 -q, --query QUERY                        Query string in Lucene syntax.               [required]
 -o, --output-file FILE                   CSV file location.                           [required]
 -u, --url URL                            Elasticsearch host URL. Default is http://localhost:9200.
 -a, --auth                               Elasticsearch basic authentication in the form of username:password.
 -i, --index-prefixes INDEX [INDEX ...]   Index name prefix(es). Default is ['logstash-*'].
 -D, --doc-types DOC_TYPE [DOC_TYPE ...]  Document type(s).
 -t, --tags TAGS [TAGS ...]               Query tags.
 -f, --fields FIELDS [FIELDS ...]         List of selected fields in output. Default is ['_all'].
 -S, --sort FIELDS [FIELDS ...]           List of <field>:<direction> pairs to sort on. Default is [].
 -d, --delimiter DELIMITER                Delimiter to use in CSV file. Default is ",".
 -m, --max INTEGER                        Maximum number of results to return. Default is 0.
 -s, --scroll-size INTEGER                Scroll size for each batch of results. Default is 100.
 -k, --kibana-nested                      Format nested fields in Kibana style.
 -r, --raw-query                          Switch query format in the Query DSL.
 -e, --meta-fields                        Add meta-fields in output.
 --verify-certs                           Verify SSL certificates. Default is False.
 --ca-certs CA_CERTS                      Location of CA bundle.
 --client-cert CLIENT_CERT                Location of Client Auth cert.
 --client-key CLIENT_KEY                  Location of Client Cert Key.
 -v, --version                            Show version and exit.
 --debug                                  Debug mode on.
 -h, --help                               show this help message and exit

[ Usage Examples | Release Changelog ]

es2csv's People

Contributors

Stargazers

Watchers

Forkers

pokab alotaibiuop amontalenti haginara bgoubot shoptet jing-code4value gentitope pandaadb fkpp alinield alvis-li stepanenkonickatgmailcom supernova106 dbakarcic sjjoseph purgna geekspeed jimbean0615 lovemondays romanmauricio polarism hyoklee locvx1234 eduardofv aravindanilkumar d-demirci jpfourny sop3k igledaniel hanbinggary leobuskin robertsassen huizyh jborichevskiy sukesh-alluri jandebleser pocoweb atomyuk vpanton mtog iracheng jasonliaoxiaoge ntent s65b40 bratislavml emhlbmc itwalter ipsolar ironsource-mobile suryakant54321 malterb cevoaustralia nonspecialist brianbola90 mark-e-deyoung zzl1787 avihoo ferluht nicovillanueva jeams2 fanjinfei vito-wang zer0tul nickelbob supermathie wolframalpha emchamp hariram32 shubhanshusingh glitchinthematrix010 dmnfortytwo kvrreddy ferhimedamine khairunnisasiwi tsui66 faisal-w david-gero-cp romasar86 siwach16 cyy0523xc gaskaj rc9000 ikonst lyft mkliu nsahaytech wrenth04 thinkerchi marczerbo shendezhou miro-ka hookhax biwano quentinadt wsulib manglav c0010 kshmir daizhanglong

es2csv's Issues

Export data from aggregation

Is it possible to export data from aggregations ? How ?

I tried with a query like this:

{
  "size": 0,
  "aggs": {
    "unique": {
      "terms": {
        "field": "user.id",
        "order": {
          "_key": "asc"
        }
      }
    }
  }
}

And it just returns me the data with all the fields ..
I would like the key and the doc_count. Is there a way?

Occasionally painfully slow performance over a network

I generally run es2csv from my local machine against a remote EC2 Linux instance over a VPN connection. Sometimes I get 600-800 docs/s and sometimes I get 10-20 docs/s. Everything that I control is the same. CPU utilization on the remote machine is very low. The same query run an hour later may run at 10x the speed.

Here is the query running local to remote:

es2csv -u http://x.x.x.x:9200 -o down.csv -i kaallc -q @'downsample.query' -r
Found 476521 results
Run query [ ] [5601/476521] [ 1%] [0:05:46] [ETA: 8:05:11] [ 16.2 docs/s]

Here is the query running directly on the remote Linux instance:

es2csv -u http://x.x.x.x.:9200 -o down.csv -i kaallc -q @'downsample.query' -r
Found 476521 results
Run query [# ] [169001/476521] [ 35%] [0:02:48] [ETA: 0:05:07] [1000.9 docs/s]

Is there any possible network tuning or other actions I can take to affect the performance?

Thank you.

es5 ?

Does it work with es5 ?

Error on requirements file

Hi, with the last commit on requirements.txt raise this error on install:
raise ValueError(msg, line, "at", line[p:]) ValueError: ("Expected ',' or end-of-list in", 'elasticsearch ==2.4.*', 'at', '*')

In the requirements file you can't insert stars(*). The correct code is for example:
elasticsearch>=2.4.0,<5.0.0
progressbar2>=3.10.0

Can't install on Python 3.x

Hi - Love using this on Python 2.7.x. But when I tried installing it for 3.5.x, I could not. Errors below - complaining about a missing 'HISTORY.rst'.

C:\WinPython-64bit-3.5.2.2Qt5\scripts>pip install es2csv
Collecting es2csv
  Downloading es2csv-1.0.3.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\AllUsers\AppData\Local\Temp\pip-build-wqb1vo1a\es2csv\setup.py", line 26, in <module>
        with open('HISTORY.rst') as history_file:
    FileNotFoundError: [Errno 2] No such file or directory: 'HISTORY.rst'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\AllUsers\AppData\Local\Temp\pip-build-wqb1vo1a\es2csv\

pip install git+https://github.com/taraslayshchuk/es2csv.git yields a similar outcome:

C:\WinPython-64bit-3.5.2.2Qt5\scripts>pip install git+https://github.com/taraslayshchuk/es2csv.git
Collecting git+https://github.com/taraslayshchuk/es2csv.git
  Cloning https://github.com/taraslayshchuk/es2csv.git to c:\users\AllUsers\appdata\local\temp\pip-v5g9qzta-build
Collecting elasticsearch>=2.3.0 (from es2csv==1.0.3)
  Downloading elasticsearch-2.4.0-py2.py3-none-any.whl (54kB)
    100% |################################| 61kB 119kB/s
Collecting progressbar>=2.3 (from es2csv==1.0.3)
  Cache entry deserialization failed, entry ignored
  Downloading progressbar-2.3.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\AllUsers\AppData\Local\Temp\pip-build-fetxalge\progressbar\setup.py", line 5, in <module>
        import progressbar
      File "C:\Users\AllUsers\AppData\Local\Temp\pip-build-fetxalge\progressbar\progressbar\__init__.py", line 59, in <module>
        from progressbar.widgets import *
      File "C:\Users\AllUsers\AppData\Local\Temp\pip-build-fetxalge\progressbar\progressbar\widgets.py", line 121, in <module>
        class FileTransferSpeed(Widget):
      File "C:\WinPython-64bit-3.5.2.2Qt5\python-3.5.2.amd64\lib\abc.py", line 133, in __new__
        cls = super().__new__(mcls, name, bases, namespace)
    ValueError: 'format' in __slots__ conflicts with class variable

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\AllUsers\AppData\Local\Temp\pip-build-fetxalge\progressbar\

Thanks,
E

Fix error when running pip install

Getting the following error on ubuntu 16.04 and python 2.7.12, pip 8.1.1

Traceback (most recent call last):
  File "/usr/bin/pip", line 11, in <module>
    sys.exit(main())
  File "/usr/lib/python2.7/dist-packages/pip/__init__.py", line 215, in main
    locale.setlocale(locale.LC_ALL, '')
  File "/usr/lib/python2.7/locale.py", line 581, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

Remove SSL warnings when not veryfing certs

Hello @taraslayshchuk,

By default, certs are not verified and a lot of warnings are displayed and spam the console.

Could you just print at the beginning of the execution a simple warning like "[!] certs are not verified, use --verify-certs instead" and disable original warnings with the directive like urllib3.disable_warnings()

Cheers !

File too large json.dumps

Tried to dump an index which is more than 300+ MB, and run into

$ es2csv -q '*' -i 'metrics-2017.10.20' -o /tmp/test.csv -u http://172.18.0.2:9200
Found 679860 results
Run query [ ] [30701/679860] [ 4%] [0:00:09] [ETA: 0:03:26] [ 3.1 Kidocs/s]Traceback (most recent call last):
File "/usr/local/bin/es2csv", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/es2csv.py", line 283, in main
es.search_query()
File "/usr/local/lib/python2.7/dist-packages/es2csv.py", line 40, in f_retry
return f(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/es2csv.py", line 170, in search_query
self.flush_to_file(hit_list)
File "/usr/local/lib/python2.7/dist-packages/es2csv.py", line 212, in flush_to_file
tmp_file.write('%s\n' % json.dumps(out))
IOError: [Errno 27] File too large

Export _score and _id

When I add _score or _id to the list of export fields, the resulting file has neither column. Is this expected behavior? Getting the relevancy score with results is important to my use case.

Thank you!

Fields syntax error

Not a regular Python user, so LMK if I have a bad version or something easy.

Getting a syntax error no matter what options I try:

# es2csv -u localhost:9200 -i 'logstash-2016-1*' -r '{"query": {"range":{"timestamp":{"gte":"2016-12-02 00:00:00","lte":"now","time_zone":"-08:00"}}}}' -o out.csv -f '@timestamp,_type,host,vhost,method,urlpath,origparams'
Traceback (most recent call last):
  File "/usr/bin/es2csv", line 7, in <module>
    from es2csv import main
  File "/usr/lib/python2.6/dist-packages/es2csv.py", line 211
    out = {field: hit[field] for field in META_FIELDS} if self.opts.meta_fields else {}
                               ^
SyntaxError: invalid syntax

Python:

Python 2.6.9 (unknown, Sep  1 2016, 23:34:36)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2

change list to string in output

Is there a possibility to change the elastic list field into a string in the csv output?

eg:
this:
animals: ["cat","dog","lion"]

looks like this in the output (with header):
animals.0.animals.1,animals.2
"cat","dog","lion"

while i would need something like this
animals
"cat,dog,lion"

thanks

please delete open rst in setup.py

Collecting es2csv
Downloading https://files.pythonhosted.org/packages/3d/3a/6dedac602d4b10022d427a78e9b718a57a7254e02810906b99b41a6e6c72/es2csv-5.2.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-c9kw1uju/es2csv/setup.py", line 26, in
with open('HISTORY.rst') as history_file:
FileNotFoundError: [Errno 2] No such file or directory: 'HISTORY.rst'

this error because pypi cannot save .rst file
thank you

Feature: ignore SSL verification

Hello,

When ElasticSearch is behind a Reverse Proxy (Nginx) with self-signed certificates, es2csv is failing with this error:

Traceback (most recent call last):
  File "/usr/bin/es2csv", line 11, in <module>
    sys.exit(main())
  File "/usr/lib/python2.7/site-packages/es2csv.py", line 277, in main
    es.create_connection()
  File "/usr/lib/python2.7/site-packages/es2csv.py", line 40, in f_retry
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/es2csv.py", line 74, in create_connection
    es = elasticsearch.Elasticsearch(self.opts.url, timeout=CONNECTION_TIMEOUT, http_auth=self.opts.auth)
  File "/usr/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 171, in __init__
    self.transport = transport_class(_normalize_hosts(hosts), **kwargs)
  File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 108, in __init__
    self.set_connections(hosts)
  File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 161, in set_connections
    connections = map(_create_connection, hosts)
  File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 160, in _create_connection
    return self.connection_class(**kwargs)
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 78, in __init__
    raise ImproperlyConfigured("Root certificates are missing for certificate "
elasticsearch.exceptions.ImproperlyConfigured: Root certificates are missing for certificate validation. Either pass them in using the ca_certs parameter or install certifi to use it automatically.

Thai language will become string unable to read

I have some field in ES that uses the Thai language.

After use the tool to extract , it become something unreadable.

ทเว้นตี้ช๊อป ภูเวียง
->
‡∏ó‡πÄ‡∏ß‡πâ‡∏ô‡∏ï‡∏µ‡πâ‡∏ä‡πä‡∏≠‡∏õ ‡∏†‡∏π‡πÄ‡∏ß‡∏µ‡∏¢‡∏á

2 issues, doesn't respect "size" in query, and added extra 'CRLF's in my CVS file

Query had "size" param, this program ignored it. I had to use -m.

When the CVS was generated, it has extra enters after each doc entry.

I should also include I am using this in Windows in Git Bash

Problem with UTF8 characters in geoip fields

We use geoip filter and for 85.237.234.8 we get this: geoip.city_name = 'Kysucké Nové Mesto'. The UTF8 character causes crash:

Traceback (most recent call last):                                                                                                                                                                                    ] [88/1000] [  8%] [0:00:00] [ETA:  0:00:00] [  2.68 kB/s]
  File "/usr/local/bin/es2csv", line 11, in <module>
    sys.exit(main())
  File "/Library/Python/2.7/site-packages/es2csv.py", line 248, in main
    es.write_to_csv()
  File "/Library/Python/2.7/site-packages/es2csv.py", line 212, in write_to_csv
    csv_writer.writerow(json.loads(line))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 152, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 6: ordinal not in range(128)

Would it be possible to fix the writer to work with UTF8?

Thanks for the script it is exactly what we were looking for!

Elasticsearch 6.0 support

Hello,

It seems that by upgrading elasticsearch-py to the newer version, the tool can extract records from Elasticsearch 6.0+ without any problem.
Are there any known issues ?

Thanks,
George

Extending --auth to work with ES AWS IAM

When using Amazon Elasticsearch Service protected by AWS Identity and Access Management (IAM), allow es2csv to receive Access Key ID and Secret Access Key.

see before in https://github.com/taskrabbit/elasticsearch-dump where --awsAccessKeyId
--awsSecretAccessKey are used.

Error during pip install

Hi, I cannot seem to install this tool using pip.

I'm running

pip install es2csv

And I get:

Collecting es2csv
  Using cached https://files.pythonhosted.org/packages/3d/3a/6dedac602d4b10022d427a78e9b718a57a7254e02810906b99b41a6e6c72/es2csv-5.2.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/rl/s1n4s2gj6s1g6xbtcb5tbc552g46ml/T/pip-install-_lbj65jn/es2csv/setup.py", line 26, in <module>
        with open('HISTORY.rst') as history_file:
    FileNotFoundError: [Errno 2] No such file or directory: 'HISTORY.rst'

HISTORY.rst isn't included in the distribution

Don't think that the HISTORY.rst file is included in the distribution. Installing directly from source works just fine.

pip3 install es2csv --user
Collecting es2csv
  Using cached es2csv-5.2.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-w54tvjbj/es2csv/setup.py", line 26, in <module>
        with open('HISTORY.rst') as history_file:
    FileNotFoundError: [Errno 2] No such file or directory: 'HISTORY.rst'

can I export data of aggregation

I wanted to export aggregation data but got this error:

Traceback (most recent call last):
  File "/usr/local/bin/es2csv", line 9, in <module>
    load_entry_point('es2csv==1.0.2', 'console_scripts', 'es2csv')()
  File "/usr/local/lib/python2.7/dist-packages/es2csv.py", line 265, in main
    es.search_query()
  File "/usr/local/lib/python2.7/dist-packages/es2csv.py", line 39, in f_retry
    return f(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/es2csv.py", line 106, in search_query
    query = json.loads(self.opts.query)
  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 76 column 29 (char 2482)

Thanks~

timestamp range

Thanks for this tool, works for my needs. I'm having a problem trying to specify a range using Lucene syntax on a field called 'timestamp', the format is Jan 20 11:33:40.
My query string is

'status: bounced AND timestamp: ["Jan 20 00:00:00" TO "Jan 20th 12:33:00"]'

and this returns 0 results, but I see results in kibana for this range.

Google ssearches on Lucene range syntax state:

field: [value1 TO value2]

What am I doing wrong?

unexpected error

when running:
es2csv -i _all -q '*' -o database.csv

i get error with this output, any hints?

Found 770052 results
Run query [ ] [0/770052] [ 0%] [0:00:00] [ETA: --:--:--] [ 0.0 s/docs]GET http://localhost:9200/_search/scroll?scroll=30m [status:400 request:0.002s]
Traceback (most recent call last):
File "/usr/local/bin/es2csv", line 11, in
load_entry_point('es2csv==5.2.1', 'console_scripts', 'es2csv')()
File "/usr/local/lib/python3.5/dist-packages/es2csv.py", line 283, in main
es.search_query()
File "/usr/local/lib/python3.5/dist-packages/es2csv.py", line 40, in f_retry
return f(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/es2csv.py", line 177, in search_query
res = next_scroll(res['_scroll_id'])
File "/usr/local/lib/python3.5/dist-packages/es2csv.py", line 40, in f_retry
return f(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/es2csv.py", line 93, in next_scroll
return self.es_conn.scroll(scroll=self.scroll_time, scroll_id=scroll_id)
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/client/init.py", line 955, in scroll
params=params, body=body)
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/transport.py", line 318, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/connection/base.py", line 122, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: <exception str() failed>

Encoding issue while writing into csv

Calling a field which is


"name": {
                        "type": "keyword"
                     },

Command that i ran:
es2csv -i index -D type -f name --verify-certs -u https://userwithurl -q '*' -o database.csv

And the error that was showing was :

Traceback (most recent call last):
File "/home/ubuntu/anaconda3/bin/es2csv", line 11, in
load_entry_point('es2csv==5.2.1', 'console_scripts', 'es2csv')()
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/es2csv.py", line 284, in main
es.write_to_csv()
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/es2csv.py", line 237, in write_to_csv
line_dict_utf8 = {k: v.encode('utf8') if isinstance(v, unicode) else v for k, v in line_as_dict.items()}
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/es2csv.py", line 237, in
line_dict_utf8 = {k: v.encode('utf8') if isinstance(v, unicode) else v for k, v in line_as_dict.items()}
NameError: name 'unicode' is not defined

python error?

[a408186@masnussyslog python]$ sudo es2csv -v -q 'host.raw: eosn*' -o eosn.txt
Traceback (most recent call last):
File "/usr/bin/es2csv", line 7, in
from es2csv import main
File "/usr/lib/python2.6/site-packages/es2csv.py", line 197
out = {field: hit[field] for field in META_FIELDS} if self.opts.meta_fields else {}
^
SyntaxError: invalid syntax

Getting "TypeError: unsupported operand type(s) for -: 'dict' and 'int'"

I must overlook something here so sorry for the basic question here!

after installing es2csv I tried to run it against public ES cluster
es2csv --verify-certs -u https://myes.com -i logstash-* -D log -q 'spot' -o database.csv

I get

Found {u'relation': u'eq', u'value': 250} results.
Traceback (most recent call last):                                                                                                          
  File "/Users/birayaha/Library/Python/2.7/bin/es2csv", line 11, in <module>
    sys.exit(main())
  File "/Users/birayaha/Library/Python/2.7/lib/python/site-packages/es2csv_cli.py", line 53, in main
    es.search_query()
  File "/Users/birayaha/Library/Python/2.7/lib/python/site-packages/es2csv.py", line 26, in f_retry
    return f(*args, **kwargs)
  File "/Users/birayaha/Library/Python/2.7/lib/python/site-packages/es2csv.py", line 148, in search_query
    bar = progressbar.ProgressBar(widgets=widgets, maxval=self.num_results).start()
  File "/Users/birayaha/Library/Python/2.7/lib/python/site-packages/progressbar/bar.py", line 633, in start
    self.update(self.min_value, force=True)
  File "/Users/birayaha/Library/Python/2.7/lib/python/site-packages/progressbar/bar.py", line 579, in update
    StdRedirectMixin.update(self, value=value)
  File "/Users/birayaha/Library/Python/2.7/lib/python/site-packages/progressbar/bar.py", line 141, in update
    DefaultFdMixin.update(self, value=value)
  File "/Users/birayaha/Library/Python/2.7/lib/python/site-packages/progressbar/bar.py", line 68, in update
    line = converters.to_unicode('\r' + self._format_line())
  File "/Users/birayaha/Library/Python/2.7/lib/python/site-packages/progressbar/bar.py", line 508, in _format_line
    widgets = ''.join(self._to_unicode(self._format_widgets()))
  File "/Users/birayaha/Library/Python/2.7/lib/python/site-packages/progressbar/bar.py", line 473, in _format_widgets
    data = self.data()
  File "/Users/birayaha/Library/Python/2.7/lib/python/site-packages/progressbar/bar.py", line 401, in data
    percentage=self.percentage,
  File "/Users/birayaha/Library/Python/2.7/lib/python/site-packages/progressbar/bar.py", line 320, in percentage
    total = self.max_value - self.min_value
TypeError: unsupported operand type(s) for -: 'dict' and 'int'

Not sure what I need to add to avoid the cast issue.

Installing from PyPI with pip is broken

The error is:

Collecting es2csv
  Downloading es2csv-5.2.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-5t9_3isy/es2csv/setup.py", line 26, in <module>
        with open('HISTORY.rst') as history_file:
    FileNotFoundError: [Errno 2] No such file or directory: 'HISTORY.rst'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-5t9_3isy/es2csv/

Improve CLI fields parsing

Hello @taraslayshchuk,

A bug exists in the generated CSV when specifying comma-separated fields on CLI as the first column is the comma-joined list of fields.
To be clear, if you specify --fields a,b,c you will have the following CSV:

"a,b,c",a,b,c
empty,whatever,whatever,whatever

But when you specify --fields a b c the CSV will be good:

a,b,c
whatever,whatever,whatever

Could you then improve the parsing ?

Cheers !

Not able to fetch more than 1000 records

Hi,

I am not able to export more than 10000 records. it is only exporting when m and s value is same. as "s" has max 10k . so it max export 10k records.

Delimiter option not working

Hi,

I try to change the default delimiter with -d ';', the delimiter in CSV file is still ","

Range Queries

Hi,

this is not really a problem (unless it is :) ) but I wonder how to do range queries.

In the past - and i think that's what kibana does - I expanded my indexes to the specific dates. This option does exist here manually, however I wonder if that is necessary?

This query for example works for me as well:

'{"query":{"range":{"@timestamp":{"gte":"2016-08-10T00:00:00.000Z","lt":"2016-08-10T00:00:00.000Z"}}}}'

Is this the intended way to query for ranges? And/or does that still query ALL indexes when I specify:

logstash-*

Essentially what I am trying to work out is if I need to expand my indexes and then apply these to your script?

Kind regards and thank you for this cool script!

Artur

Is there a way to force the columns order in output?

es2csv is not honouring the order of specified columns. For example if it write -f b a in the output csv i see a column first.

Passing a file name to -q is not called out in docs

I'll probably fix this myself soon, but there's this line in the change log that isn't called out in the readme:

Added option to read query string from file --query(-q) @'~/filename.json'. (Issue #5)

unexpected keyword argument 'fields' with -f option

Hello-

I can run this command without the fields option successfully. Inclusion of the -f options fails with the following response. Example of command follows that.

Thank you for this tool. It seems like one of the more popular ES GUI tools should integrate with it.

Trackback error:

Traceback (most recent call last):
File "/usr/local/bin/es2csv", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/es2csv.py", line 268, in main
es.search_query()
File "/usr/local/lib/python2.7/dist-packages/es2csv.py", line 40, in f_retry
return f(_args, _kwargs)
File "/usr/local/lib/python2.7/dist-packages/es2csv.py", line 123, in search_query
res = self.es_conn.search(_search_args)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 71, in _wrapped
return func(_args, params=params, **kwargs)
TypeError: search() got an unexpected keyword argument 'fields'

Example of command:
es2csv -i se-account-base -q "(country:usa AND account_type:regular AND date_inception:{2012-01-01 TO 2017-01-01}) NOT (restriction_type_2:deceased)" -f account_number title first_name middle_name last_name -o /usr/share/csv/account_base_3rd_party.csv

Found 0 results

I'm trying to use the command line utility es2csv to export data from Elasticsearch into CSV file with the following syntax taken from github repo:

es2csv -i logstash-2015-07-07 -q 'host: localhost' -o database.csv

So in my case I run the following command:

es2csv -i enron_test -q 'http://localhost:9200' -o database.csv

The problem is that this command returns Found 0 results, but I have some documents indexed into Elasticsearch database.

Anyone can solve this problem? Am I wrong in the syntax of the command? Thank you guys

Possible to provide alternate delimiter for kibana style?

Currently, when the Kibana style is flagged with -k, the delimiter used is a comma , e.g.:

History,Fiction

However, this presents a problem when other columns may have commas as well, resulting in a situation where splitting on comma is not a viable option as you may split kibana-style cells, or strings that contain commas.

Curious, might it be possible to provide a delimiter character -- like pipe | -- that could be used instead?

What is the version for elasticsearch 2.3.3

es2csv expectd one argument

when i start to export my csv file

and an error occcurs

es2csv: error: -q/--query: expected one argument

HTTP_EXCEPTIONS.get(status_code, TransportError)

I'm trying to run es2csv, but I'm getting the following error:

Traceback (most recent call last):
  File "/usr/local/bin/es2csv", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/site-packages/es2csv.py", line 283, in main
    es.search_query()
  File "/usr/local/lib/python2.7/site-packages/es2csv.py", line 40, in f_retry
    return f(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/es2csv.py", line 177, in search_query
    res = next_scroll(res['_scroll_id'])
  File "/usr/local/lib/python2.7/site-packages/es2csv.py", line 40, in f_retry
    return f(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/es2csv.py", line 93, in next_scroll
    return self.es_conn.scroll(scroll=self.scroll_time, scroll_id=scroll_id)
  File "/usr/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 955, in scroll
    params=params, body=body)
  File "/usr/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 122, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError

Any idea why?

change delimiter

Could you please suggest what i am doing wrong? i cannot change the delimiter of the output file using es2csv cli tool.

es2csv -q '*' -i test_index -o test.csv -f id name -d \t

tried also

es2csv -q '' -i test_index -o test.csv -f id name -d '\t'
es2csv -q '' -i test_index -o test.csv -f id name -d "\t"

and some other separators as well it gives back comma all the time.

update: i have installed the latest version of the tool, but still the same (i have seen that there was a bug fix for that already)

'unicode' is not defined

I've just downloaded & run it on debian and it gives me this:

  File "/usr/local/lib/python3.4/dist-packages/es2csv.py", line 237, in <dictcomp>
    line_dict_utf8 = {k: v.encode('utf8') if isinstance(v, unicode) else v for k, v in line_as_dict.items()}
NameError: name 'unicode' is not defined

I've hacked the file and modified unicode -> str and it worked

Hope it helps..

Sorting

I think I asked about this before, but I can't find any emails, and I didn't file as issue here.

Is there a means to sort the output based on a specified field?

how to read the query string from file?

sometimes i need to use terms query which include 10-thousands of uid.
It take a long time to paste the whole command in console window.

thanks~

Unable to get the whole result

I'm using
es2csv -q '*' -u ***.**.*.** :9200 -o database.csv
It should have at least 50K results
But i can only get 2 results
Thanks for helping

Can't backup with query

Try to backup csv file with query from json but it show error

Tried with command

es2csv -u http://localhost:9200 -i 'logstash-*' -q @'elk_query.json' -o backup.csv

and here is my elk_query.json

{
   "query" :{
      "range": {
        "@timestamp": {
          "gte": "2017-03-01T20:03:12.000",
          "lte": "2017-03-30T20:03:12.000"
        }
      }
   }
 }

It's show error as below

Traceback (most recent call last):
  File "/usr/local/bin/es2csv", line 11, in <module>
    sys.exit(main())
  File "/Library/Python/2.7/site-packages/es2csv.py", line 279, in main
    es.search_query()
  File "/Library/Python/2.7/site-packages/es2csv.py", line 40, in f_retry
    return f(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/es2csv.py", line 134, in search_query
    res = self.es_conn.search(**search_args)
  File "/Library/Python/2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/Library/Python/2.7/site-packages/elasticsearch/client/__init__.py", line 569, in search
    doc_type, '_search'), params=params, body=body)
  File "/Library/Python/2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/Library/Python/2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
    self._raise_error(response.status, raw_data)
  File "/Library/Python/2.7/site-packages/elasticsearch/connection/base.py", line 122, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, u'search_phase_execution_exception', u'Failed to parse query [{\n   "query" :{\n      "range": {\n        "@timestamp": {\n          "gte": "2017-03-01T20:03:12.000",\n          "lte": "2017-03-30T20:03:12.000"\n        }\n      }\n   }\n }\n]')

I also try this json query from kibana it's can show result properly, please help.
Thank you in advance

expired(multiple reads?)

Hi,

While I'm trying to perform queries over more than 7 days, the script just freeze and stop, I had to kill it. (es2csv was installer trough pip install)

I ten installed es2csv using the sources on Github and now I'm getting this error instead: "expired(multiple reads?)"

Here is the output:
"Scroll[c2NhbjswOzE7dG90YWxfaGl0czoxMzg4Nzg7] expired(multiple reads?). Saving loaded data.############################## ] [115189/138878] [ 82%] [0:00:10] [ETA: 0:00:02] [ 11.28 kdocs/s]"

And obviously not all data are in the CSV.

While I'm trying to perform the same query in Kibana, I'm able to retrieve all the data. Where the issue come from ?

Is Elasticsearch timed out during the query ? How can I solve this issue ?

Regards,
Cédric

es2ecv can not support encoding

Hi,

I use
es2csv -u http://127.0.0.1:9200 -i sample_index -q 'name: test' -o sample.csv
but i see csv output file that is not encode values,

Thanks,

SSL host connection throughs warnings

es2csv -u https://xxxURLzzz -q '*' -o export.csv

output:

/Library/Python/2.7/site-packages/elasticsearch/connection/http_urllib3.py:70: UserWarning: Connecting to xxxURLzzz using SSL with verify_certs=False is insecure.
  'Connecting to %s using SSL with verify_certs=False is insecure.' % host)
/Library/Python/2.7/site-packages/urllib3/connectionpool.py:841: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
/Library/Python/2.7/site-packages/urllib3/connectionpool.py:841: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
/Library/Python/2.7/site-packages/urllib3/connectionpool.py:841: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)

The export works fine though.

Argument for handling null values

Please add CLI argument to specify replacement for null field values.

For example, when importing TSV formatted data to Yandex Clickhouse - null-values should be represented as \N.

Also it would be great to define replacement per field or per field data type.

Script hangs while writing it to temporary file

Hello

I am using this library for fetching all the records from the ES and it works like a charm. Best thing available for ES to CSV.

Although, sometimes the script hangs during the search_query() phase. It stops writing to the temporary file and will keep on running.

Will this be a script issue or an issue on the ES side.

Any help or pin-pointing to certain direction is appreciated.

Thanks
Amit