Coder Social home page Coder Social logo

ugentbiomath / wwdata Goto Github PK

View Code? Open in Web Editor NEW
11.0 11.0 12.0 18.78 MB

Python package to analyse, validate, fill and visualise data acquired in the context of (waste) water treatment

License: GNU General Public License v3.0

Jupyter Notebook 12.10% Python 86.88% Makefile 1.02%
data-analysis jupyter-notebook

wwdata's People

Contributors

cdemulde avatar pyup-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

wwdata's Issues

[tag_period] 

Write a function tag_period that allows the user to tag values in a certain period. Possibly add the option to only tag values below, above or between given values within this period.

Different tags for different filter functions

Create a different tag for the different filter functions, so the user can always find out based on which filter algorithm a datapoint was dropped. Possibly plotting can be adjusted based on this as well.
This will require a lot of small code adjustment in a lot of functions.

[check_filling_error] Visualise the check

It would be nice to show the user visually how a reliability check is done, i.e. have a figure with the original data and the filled data, filling the artificial gaps. Can probably be done by composing such a figure based on the last iteration of the check_filling_error function.

Add fit_function functionality

In order to provide a better way of 'predicting' data, add a fit_function functionality, with as arguments an independent data series, a dependent data series and a function that can relate them to each other. See also the scipy.optimize.curve_fit function. Return values would be the function parameters.

Additionally, a fill_missing_function function should be provided, making use of the determined parameters and the given function to replace data.

The workflow for this is the same as when replacing data by means of correlations.

[calc_daily_average] Add only_checked argument

Add the only_checked argument to the calc_daily_average function, to allow the user to only use validated data for daily average calculation. This will require some checks and error mitigation (if there is self.meta_valid column for the data in question, if there are NaN values generated this way...)

[Showcase_OnlineSensorBased] Errors and warnings

When running dataset.calc_daily_average an error message is shown:
TypeError: float() argument must be a string or a number, not 'Timestamp'.

There are some minor errors in the same file;
when running dataset.fill_missing_model a warning is shown, FutureWarning: 'argmin' is deprecated, use 'idxmin' instead. The behavior of 'argmin'.
Another warning is shown when running dataset.get_correlation, ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__.

[calc_ratio] improve error message

When setting the only_checked argument to True, but when one (or both) of the data series haven't been checked yet, this currently give a KeyError, which is confusing to the user, as (s)he doesn't know it concerns the self.meta_valid DataFrame, not the original DataFrame. Improve this to warn the user that only_checked cannot be fulfilled.
To be decided: proceed with calculation for all data points and just give a warning, or throw an error and stop?

Initial Update

The initial setup worked, but all your packages are up to date. You can safely close this issue.

Data range for tagging of nan and double values

For the filling functions, one can currently choose a range for filling; this is not in place yet for the filtering functions, but would be very helpful with regards to a.o. timesaving.

[fill_missing_daybefore] doesn't seem to work in for-loop

The fill_missing_daybefore function works separately, in a for-loop, but not in a for-loop where another filling function is applied. Example:
This works:

for data in Q_data:
    dataset.fill_missing_interpolation(data,50,[dt.datetime(2016,5,23),dt.datetime(2016,5,24)],clear=True)
for data in Q_data:
    dataset.fill_missing_daybefore(data,[dt.datetime(2016,5,1),dt.datetime(2016,6,1)],range_to_replace=[0,5],filtered_only=True,clear=False)

This doesn't:

for data in Q_data:
    dataset.fill_missing_interpolation(data,50,[dt.datetime(2016,5,23),dt.datetime(2016,5,24)],clear=True)
    dataset.fill_missing_daybefore(data,[dt.datetime(2016,5,1),dt.datetime(2016,6,1)],range_to_replace=[0,5],filtered_only=True,clear=False)

In the last example, no filling with the day before is done (fill_missing_interpolation does work properly).

[check_filling_error] show more than mean filling error

Instead of only showing the mean filling error, also provide information on max and/or min error. In case of e.g. a gap during wet weather (for wwtp data), some algorithms might not fill this gap appropriately at all, but this will have a small impact on the mean, while adding the max to the output gives a more realistic image.

self.columns doesn't always work

In the case where a column is added to the self.data dataframe, but not by a wwdata Class function, the self.columns command fails. Example:

dataset.columns

Gives ['airflow1','airflow2']
Executing:

dataset.data['airflow_total'] = dataset.data['airflow1']+dataset_control.data['airflow2']
dataset.columns

Gives the same result, while

dataset.data.columns

Does give
['airflow1','airflow2','airflow_total']

[moving_average_filter] first x data points are automatically tagged

Due to the application of a window for the moving average, the first x number of datapoints is automatically tagged in a dataset, where x is the size of the window. This needs to be solved, maybe by just copying the original values without filtering them (also needs to be done for the meta_valid and meta_filled datasets then!)

Definition of a wet weather event

This is currently done very rudimentary; options include:

  • Defining a dry weather period and marking everything that exceeds the 95-percentile of this as wet weather (currently the approach of Waterboard De Dommel)
  • Hurst exponent?
  • Calculate a moving average with a very large window, anything to different from that is wet weather.

Tag/detect drift

Write a function detecting drift in the data. The idea would be that the user gives:

  • a range in which to apply the function (similar to other tagging functions)
  • the maximum slope a signal is expected to have over a certain period (see below)
  • the period over which a certain slope is allowed

This function could calculate the slope of the data in a certain given period (by for example fitting a line through it) and compare it with the maximum expected slope. In first instance, it would be interesting for the user to know if drift is present, secondly it would be good to be able to correct for it.

Some additional sources of information/inspiration:

@jorasinghr could you have a look at this and let me know if things are unclear?

update tags when calculating proportional concentrations

When using calc_tot_prop, new columns are created, but these do not get a tag in the self.meta_valid and/or self.meta_filled dataframes. This is important, especially in the case where the filled data is further used and it's important to know what is real data and what is filled.

Include the addition of a new column to the self.meta_valid/meta_filled dataframes in the calc_tot_prop function. This should be a combination of the columns that the proportional concentration is calculated from, making that wherever the tag is not 'original', something else is used.

Initial Update

Hi 👊

This is my first visit to this fine repo, but it seems you have been working hard to keep all dependencies updated so far.

Once you have closed this issue, I'll create separate pull requests for every update as soon as I find one.

That's it for now!

Happy merging! 🤖

[savgol] integrate better in package

Integrate the function using the savitsky-golay filter better in the package, use same structure as simple_moving_average function and include as an option in moving_average_filter

[filtering] change 'filtered' tag

The tag name 'filtered' can be a bit confusing: does this mean the data point is filtered out, or is this the data that is left after filtering? Update to a more clear tag!

Documentation 'clear' argument in filter functions

The clear argument present in most of the filter/tagging functions is not explained in the docstrings. This is very important, as the argument also makes sure the relevant self.meta_valid dataframe column is added the first time a filtering function is executed on that column.

Testing for reliability of data imputation/gap filling

A function should be available for testing the reliability of gap filling. Possible code flow:

  • Create artificial gaps
  • Fill them
  • Compare with original
  • Come up with a measure to represent the reliability
  • Add function to filling functions

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.