Comments (7)
Deps where this issue occurs: numpy-1.12.1
pandas-0.20.1
. (latest)
Frozen versions with no issue: numpy-1.11.0
pandas-0.18.0
from reportgen.
Pandas 0.19.0
breaks this code. Pandas 0.18.1
works. (Both numpy 1.11.0
and 1.12.1
work, leading me to believe this is an issue with Pandas usage?)
from reportgen.
Commenting out line 257 in report.py:
# Apply fixes to the data and diff the PIR movement
dfs = dh.clean_data(dfs)
... yields the expected packet numbers. Checking the dh.clean_data
function...
from reportgen.
Packets are discarded at line 192 of diff_pir
in datahandling.py
.
dfs = {i: dfs[i].drop(dfs[i][dfs[i].PIRDiff > pir_threshold].index) for i in dfs}
A mitigation strategy would be to set the values to 0 instead of dropping them.
However, what is different about the DataFrame in 0.19.0
which triggers this?
from reportgen.
PIR Diff result on Pandas 0.19.0 (with error)
DateTime
2016-02-02 17:58:31 NaN
2016-02-04 07:35:28 NaN
2016-02-08 13:55:25 -1.225261e+08
2016-02-09 07:33:33 -1.399888e+10
2016-02-09 13:22:58 -1.177617e+10
2016-02-10 15:28:20 -1.275398e+09
2016-03-23 17:52:49 -5.419100e+08
2016-03-23 19:04:56 -1.390800e+09
2016-03-23 19:35:09 -8.563218e+08
2016-03-23 19:36:07 -3.448276e+08
Name: PIRDiff, dtype: float64
Expected result (0.18.1):
DateTime
2016-02-02 17:58:31 NaN
2016-02-04 07:35:28 NaN
2016-02-08 13:55:25 -0.122526
2016-02-08 14:00:16 13.877688
2016-02-09 07:33:33 -13.998882
2016-02-09 08:31:46 9.651322
2016-02-09 09:15:27 3.928234
2016-02-09 13:22:58 -11.776167
2016-02-10 15:28:20 -1.275398
2016-03-23 17:52:49 -0.541910
Name: PIRDiff, dtype: float64
Note that the incorrect result seems to be off by the scale factor which is applied on line 177 (1e9), and the error propagates to subsequent values.
from reportgen.
Suspiciously, commenting out the scale factor on line 177 generates the expected result. What exactly changed about the diff() function in 0.19.0 to cause this?
http://pandas.pydata.org/pandas-docs/version/0.19.0/whatsnew.html
from reportgen.
Next push will fix this issue: just changed the line discarding packets to zero those values instead. The mystery remains as to why the scaling factor is no longer needed after Pandas 0.19.0 (and still a mystery as to why I needed it before: I must have thought it was weird otherwise I wouldn't have used ಠ_ಠ
as a variable name...).
from reportgen.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reportgen.