Coder Social home page Coder Social logo

lucianolorenti / ceruleo Goto Github PK

View Code? Open in Web Editor NEW
29.0 3.0 7.0 86.13 MB

CeRULEo: Comprehensive utilitiEs for Remaining Useful Life Estimation methOds

Home Page: https://lucianolorenti.github.io/ceruleo/

License: MIT License

Python 98.41% TeX 1.59%
predictive-maintenance time-series remaining-useful-life remaining-useful-life-prediction

ceruleo's Introduction

Hi there ๐Ÿ‘‹

ceruleo's People

Contributors

frizzodavide avatar gianantonio avatar lucianolorenti avatar matthewfeickert avatar trellixvulnteam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ceruleo's Issues

Dataset enhancements

failure information is provided in different formats.

  • List of failures
  • Indication of the begging of the measurement
  • Just a Time in production field.

It would be handy to have these methods for splitting into cycles, computing the RUL column based on a timestamp column of the dataset or just a cycle column.

Also it would be handy to incorporate into the dataset classes ways of automatically handling different devices based on some identifier column.

`RULInverseWeighted` does not work on `TimeSeriesWindowTransformer`

I am trying to use the sample_weight input parameter of a TimeSeriesWindowTransformer inside a CeruleoRegressor in order to weight the training samples with the inverse of the RUL in this way:

regressor = CeruleoRegressor(
			TimeSeriesWindowTransformer(
			transformer,
			window_size=1,
			sample_weight=RULInverseWeighted(),
			padding=True,
			step=1),
			Ridge(alpha=15))

regressor.fit(train_dataset)

Unfortunately I get a KeyError. The main part of the error regards the definition of the RULInverseWeighted class:

class RULInverseWeighted(AbstractSampleWeights):
	"""
	Weight each sample by the inverse of the RUL
	"""
	def __call__(self, y, i: int, metadata):
		return 1 / (y[i, 0] + 1)

In particular there is a KeyError: (0,0) so it seems like the index [i,0] to which it is trying to access in order to compute the RUL of a certain sample does not exist.

The same error is also raised in case I create a WindowedDatasetIterator and try to inspect its elements with the next() method:

iterator = WindowedDatasetIterator(
			transformed_dataset,
			window_size=150,
			step=15,
			horizon=5,
			sample_weight=RULInverseWeighted(),
			iteration_type=IterationType.FORECAST 
		)

X,y,sw=next(iterator)
(X.shape, y.shape, sw.shape)

Improve Sickit-learn compatibilty

It is needed additional testing on setting parameters of a CeRULEo pipeline.

Some basic functionality works, but it would be nice to have full integration with the sklearn stack.

Control Charts (?)

Does it make sense to add some graphics modules regarding control charts?

Installation failed and dependencies have no versions

no versions

pyts

current installation error

Command

pip install ceruleo   

Error

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.12.0 requires numpy<1.24,>=1.22, but you have numpy 1.21.6 which is incompatible.

Bug Report: `GridSearchCV` does not work with Scalers

The GridSearchCV class, that works together with CeruleoMetricWrapper gives an error when a dataset including a Scaler in its Transformer is used.

Let's consider the following example:

  1. Load the CMAPSSDataset and define the FEATURES as the most relevant sensor measurement features:
# Load the dataset 
train_dataset = CMAPSSDataset(train=True, models='FD001')
test_dataset = CMAPSSDataset(train=False, models='FD001')[15:30]

# Define list of sensor measurement features
FEATURES = [train_dataset[0].columns[i] for i in sensor_indices]
  1. Define a simple Transformer
transformer = Transformer(
    pipelineX=make_pipeline(
        ByNameFeatureSelector(features=FEATURES), 
        MinMaxScaler(range=(-1, 1))

    ), 
    pipelineY=make_pipeline(
        ByNameFeatureSelector(features=['RUL']),  
    )
)

Note that here I am using the MinMaxScaler to scale the data in the range (-1,1)

  1. Define an instance of GridSearchCV to find compare different Regression models
regressor_gs = CeruleoRegressor(
    TimeSeriesWindowTransformer(
        transformer,
        window_size=32,
        padding=True,
        step=1),   
    Ridge(alpha=15))

grid_search = GridSearchCV(
    estimator=regressor_gs,
     param_grid={
        'ts_window_transformer__window_size': [5, 10],         
        'regressor': [Ridge(alpha=15), RandomForestRegressor(max_depth=5)]
    },
    scoring=CeruleoMetricWrapper('neg_mean_absolute_error')
)

grid_search.fit(train_dataset)

The output returned after the grid_search.fit(train_dataset) command is launched is:

There was an error when transforming with MinMaxScaler
There was an error when transforming with MinMaxScaler
There was an error when transforming with MinMaxScaler
... 

And then the final error message is:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

So probably there are two operands that should be subtracted one to the other but being both strings this results in an error.

Looking at additional details in the long error message returned we can find:

ValueError: 
All the 20 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

As suggested in the error message I added the error_score='raise' input argument to GridSearchCV to get a more detailed error explanation.

Looking at the new error message I think that the source of the error is in the transform method of the MinMaxScaler class contained in ceruleo.transformation.features.scalers:

def transform(self, X: pd.DataFrame) -> pd.DataFrame:

try:
	divisor = self.data_max - self.data_min

The one reported above is the subtraction where two strings are found in correspondence of self.data_max and self.data_min thus creating the TypeError: unsupported operand type(s) for -: 'str' and 'str' reported above.

I tried to run the code again after placing an import ipdb; ipdb.set_trace() inside the transform function but for some reason the code did not stop as it is supposed to happen when using ipdb.

I was also able to access the data_max and data_min attributes with transformer.pipelineX.final_step.data_max and transformer.pipelineX.final_step.data_min and I was also able to do:

transformer.pipelineX.final_step.data_max-transformer.pipelineX.final_step.data_min

without any error.

So I do not really have a clue why this bug appears.

Obviously running the code without MinMaxScaler, so with:

transformer = Transformer(
    pipelineX=make_pipeline(
        ByNameFeatureSelector(features=FEATURES), 
    ), 
    pipelineY=make_pipeline(
        ByNameFeatureSelector(features=['RUL']),  
    )
)

it works without any errors.

Take into consideration the sample number when the train-test splitting is made

Right now we are relying on sklearn.model_selection.train_test_split for splitting the run-to-failure cycles into train and test sets.

If the variance in the duration of the cycles is high this may cause an imbalance between the sets. It would be handy to split the cycles considering each cycle's length.

For example, if we have the following cycles:

>>> lengths = [5, 10, 15, 25, 50, 60, 40, 30, 9, 8]

We can have the following situation:

>>> train_set, test_set = train_test_split(lengths, train_size=0.8);

[[15, 10, 8, 5, 9, 40, 25], [60, 50, 30]]

But the lengths, in terms of samples, are almost equal in both sets.

>>> sum(train_set), sum(test_set)

(112, 140)

It would be nice to handle this situation, splitting taking into account this issue.

Is this still active?

I just found this repo and wanted to know if it is still actively developed. I just published a similar package for RUL datasets. Do you want to have a talk about collaborating? I couldn't find your contact info anywhere else.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.