fuyb1992 / es_pandas Goto Github PK
View Code? Open in Web Editor NEWRead, write and update large scale pandas DataFrame with Elasticsearch
License: MIT License
Read, write and update large scale pandas DataFrame with Elasticsearch
License: MIT License
progressbar2 is not being installed when installing the package
对于相同映射关系的index,比如按日期保存的数据,应该支持将多个index中的数据导入一个DataFrame中。
比如 index-2022-01, index-2022-02, index-2022-03 ...
df = ep.to_pandas('index-2022*',...)
Can I pass a user and pass to the connection to es?
TXS
The version of my elasticsearch instance ends with SNAPSHOT
and that's causing to fail when trying to init.
Version:
7.9.1-SNAPSHOT
Error I'm getting
ValueError: invalid literal for int() with base 10: ‘1-SNAPSHOT’
Hi there,
what parameter should I pass to provide the fetch size:
https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-translate.html
POST /_sql/translate
{
"query": "SELECT * FROM library ORDER BY page_count DESC",
"fetch_size": 10
}
On python 3.6 virtual environment in Ubuntu 14.04.5 LTS after installing es_pandas and progressbar2, I get the error "AttributeError: module 'progressbar' has no attribute 'version'" when trying to:
from es_pandas import es_pandas
root@ns502245:~# source p36/bin/activate
(p36) root@ns502245:~# pip install progressbar2
Requirement already satisfied: progressbar2 in ./p36/lib/python3.6/site-packages (3.50.0)
Requirement already satisfied: six in ./p36/lib/python3.6/site-packages (from progressbar2) (1.13.0)
Requirement already satisfied: python-utils>=2.3.0 in ./p36/lib/python3.6/site-packages (from progressbar2) (2.4.0)
WARNING: You are using pip version 19.3.1; however, version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
(p36) root@ns502245:~# python
Python 3.6.9 (default, Nov 19 2019, 14:10:59)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from es_pandas import es_pandas
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/p36/lib/python3.6/site-packages/es_pandas/__init__.py", line 1, in <module>
from .es_pandas import es_pandas
File "/root/p36/lib/python3.6/site-packages/es_pandas/es_pandas.py", line 7, in <module>
if not progressbar.__version__.startswith('3.'):
AttributeError: module 'progressbar' has no attribute '__version__'
如果不显示进度,则不应该计算index中的文档数。
特别是index中的文档数量巨大或index数量多时,会节约开销。
Using version 0.17 to_es gives error with show_progress=False
Traceback (most recent call last):
File "/opt/anaconda3/envs/algorithms/lib/python3.8/site-packages/elasticsearch/helpers/init.py", line 304, in parallel_bulk
for result in pool.imap(
File "/opt/anaconda3/envs/algorithms/lib/python3.8/multiprocessing/pool.py", line 868, in next
raise value
File "/opt/anaconda3/envs/algorithms/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/opt/anaconda3/envs/algorithms/lib/python3.8/multiprocessing/pool.py", line 144, in _helper_reraises_exception
raise ex
File "/opt/anaconda3/envs/algorithms/lib/python3.8/multiprocessing/pool.py", line 388, in _guarded_task_generation
for i, x in enumerate(iterable):
File "/opt/anaconda3/envs/algorithms/lib/python3.8/site-packages/elasticsearch/helpers/init.py", line 58, in _chunk_actions
for action, data in actions:
File "/opt/anaconda3/envs/algorithms/lib/python3.8/site-packages/es_pandas/es_pandas.py", line 136, in rec_to_actions
bar.update(i)
TypeError: update() takes 1 positional argument but 2 were given
Running below command, does not update the records in elasticsearch.
ep.to_es(df.iloc[:1000, 1:], index, doc_type=doc_type, _op_type='update')
N/A% (0 of 1000) | | Elapsed Time: 0:00:00 ETA: --:--:--
1000
(Edited)
I'm having the following failure trying to upload a value with an array.
>>> response = ep.to_es(df, index='myindex', _op_type='update')
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
It seems that he serialize
function fails since pd.isna
function returns an array when the input is an array.
Could you please consider to use np.all
method to wrap pd.isna
output to always produce a boolean and enable arrays to be processed?
I just pip install es_pandas
, and attach other packages including progressbar2 (>3), but can't work.
The following error message:
Incorrect version of progerssbar package, please do pip install progressbar2
but the version python detect is the python_utils
package, then I fixed out, the following error outputs
TypeError: __init__() got an unexpected keyword argument 'max_value'
Second call of method to_pandas using the default value for argument query_rule raises exception. The default value of argument query_rule should be None. The value should be internally set to default if the user does not set a query_rule.
How to implement pagination, specifically from/size parameters as the query_rule parameter does not accept from and size ?
Is there a way to disable the progress bar when writing a DataFrame to ES?
Thanks
Hi there,
I am doing a very simple sql query like this:
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
from elasticsearch.helpers.errors import BulkIndexError
import time
import pandas as pd
from es_pandas import es_pandas
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
# crete es_pandas instance
es = es_pandas(es_url,verify_certs=False,ssl_show_warn=False)
es.to_pandas(index='priam_unified_host-2021-05-03', query_sql='select top 10 * from day-2021-05-03 WHERE EventID=4688')
I get this error:
TypeError: search() got an unexpected keyword argument 'query_sql'
df = pd.DataFrame(self.get_source(anl, show_progress=show_progress, count=count)).set_index('_id')
if infer_dtype:
dtype = self.infer_dtype(index, df.columns.values)
if len(dtype):
df = df.astype(dtype)
return df
df = pd.DataFrame(self.get_source(anl, show_progress=show_progress, count=count))
if infer_dtype:
dtype = self.infer_dtype(index, df.columns.values)
if len(dtype):
df = df.astype(dtype)
df = df.set_index('_id') <<< 返回之前set_index
return df
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.