Comments (6)
Can you please share a minimal example that reproduces this issue?
from datacleaner.
Add data.csv like this for example:
a,b,c,d
b,b,b,b
c,c,c,c
a,a,a,a
and test.py like this:
from datacleaner import autoclean
import pandas as pd
raw_data = pd.read_csv("data.csv")
clean_data = autoclean(raw_data)
clean_data.to_csv("new_data.csv", sep=',', index=False)
and execute it and get the error like this:
ly@ly-VirtualBox:/tmp$ python test.py
Traceback (most recent call last):
File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/nanops.py", line 100, in f
result = alt(values, axis=axis, skipna=skipna, **kwds)
File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/nanops.py", line 319, in nanmedian
values = values.astype('f8')
ValueError: could not convert string to float: 'a'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/nanops.py", line 103, in f
result = alt(values, axis=axis, skipna=skipna, **kwds)
File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/nanops.py", line 319, in nanmedian
values = values.astype('f8')
ValueError: could not convert string to float: 'a'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ly/anaconda3/lib/python3.5/site-packages/datacleaner/datacleaner.py", line 77, in autoclean
input_dataframe[column].fillna(input_dataframe[column].median(), inplace=True)
File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 5310, in stat_func
numeric_only=numeric_only)
File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/series.py", line 2245, in _reduce
return op(delegate, skipna=skipna, **kwds)
File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/nanops.py", line 44, in _f
return f(*args, **kwargs)
File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/nanops.py", line 111, in f
raise TypeError(e)
TypeError: could not convert string to float: 'a'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 1980, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas/index.c:3332)
File "pandas/index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas/index.c:3035)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 303, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6610)
File "pandas/hashtable.pyx", line 309, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6554)
KeyError: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 6, in <module>
clean_data = autoclean(raw_data)
File "/home/ly/anaconda3/lib/python3.5/site-packages/datacleaner/datacleaner.py", line 85, in autoclean
input_dataframe[column].fillna(input_dataframe[column].mode()[0], inplace=True)
File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/series.py", line 583, in __getitem__
result = self.index.get_value(self, key)
File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 1986, in get_value
return tslib.get_value_box(s, key)
File "pandas/tslib.pyx", line 777, in pandas.tslib.get_value_box (pandas/tslib.c:17017)
File "pandas/tslib.pyx", line 793, in pandas.tslib.get_value_box (pandas/tslib.c:16774)
IndexError: index out of bounds
from datacleaner.
That does indeed seem like a bug, albeit a strange one! Can you send a PR with a patch to fix it?
from datacleaner.
Merged the PR - thanks for your help!
from datacleaner.
datacleaner v0.1.5 has your changes.
from datacleaner.
Hi, just want to check in, does this issue solved? I have an exact bug as yours, how did you address it in the end? Many thanks! @fndjjx
from datacleaner.
Related Issues (13)
- Planned functionality HOT 19
- ValueError instead of TypeError in Python 2.7 HOT 5
- Replace +/- Infs with Max/Min HOT 1
- Add scikit-learn compatibility to datacleaner
- Automatically cleaning unicode text HOT 2
- Add easy way to write out feature-to-categorical mapping.
- '<' not supported between instances of 'str' and 'int' HOT 2
- CI/CD doesn't work
- Add update_checker
- Integrate more encoding options for object columns HOT 11
- Integrate unit tests HOT 4
- Feature: %string to numerical value conversion HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datacleaner.