Coder Social home page Coder Social logo

angus924 / minirocket Goto Github PK

View Code? Open in Web Editor NEW
272.0 2.0 31.0 57 KB

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

License: GNU General Public License v3.0

Python 100.00%
scalable time-series-classification convolution convolutional-kernel convolutional-neural-network

minirocket's People

Contributors

angus924 avatar murtazajafferji avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

minirocket's Issues

how to use minirocket in production

Hi,
I would like to ask, how to use minirocket for production or implementation phase. is there any way to save minirocket that was fitted in training data and use it for new dataset?

thank you

Feature Size

Thank you so much for making your work available! I have a quick question about the feature size. Looks like the minimum number of feature size is 84. Is there any harm in extracting 84 features and using only a subset them?

X_validation not transformed properly?

hi, for softmax.py, if the data is split into multiple chunks, then X_validation is only transformed for the first's chunk biases, as biases for different chunks are different, but the transform is only applied once.

if epoch == 0 and chunk_index == 0: # only run once <---

   parameters = fit(X_training, args["num_features"]) # returns: dilations, num_features_per_dilation, biases

   # transform validation data
   X_validation_transform = transform(X_validation, parameters)

would transforming the X_validation with each chunk's biases improve performance?

EDIT:

similarly for the latter part (where X_validation_transform is only normalised with mean and std values from the first chunk):

if epoch == 0 and chunk_index == 0:

                    # per-feature mean and standard deviation
                    f_mean = X_training_transform.mean(0)
                    f_std = X_training_transform.std(0) + 1e-8

                    # normalise validation features
                    X_validation_transform = (X_validation_transform - f_mean) / f_std
                    X_validation_transform = torch.FloatTensor(X_validation_transform)

Need example to use variable ROCKET

Hi, would you mind providing examples to use minirocket_variable and minirocket_multivariate_variable? I am not sure on how to configure the required data input.

thank you

Unlabeled data

hello, thanks for your excellent work.
wmm, and I have a problem, I find the response in "starting with "wide" data", you say the data can be unlabeled, it depends on my task "(You don't need labels necessarily, depending on your task.)"
and when I read your article or code readme, I notice that you mentioned the parameters in different data are same, right? (ok, I don't know if I understand right, and I can't find where is the latter information.)
So my question is, could I apply your work on my unlabeled data? if it's true, how can I set the "Y_traing" in examples codes?
thanks!

datatype

when i use my data with minirocket in pycharm , had a problem with dataype, like: Traceback (most recent call last):
File "E:/PycharmProjects/minirocket-main/code/traintest.py", line 46, in
parameters = fit(X_training)
File "E:\PycharmProjects\minirocket-main\code\minirocket.py", line 130, in fit
biases = _fit_biases(X, dilations, num_features_per_dilation, quantiles)
File "E:\ProgramData\Anaconda3\envs\deepl\lib\site-packages\numba\core\dispatcher.py", line 703, in _explain_matching_error
raise
TypeError(msg)TypeError: No matching definition for argument type(s) pyobject, array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)
how can i work it

Can't reproduce the accuracy of 'EOGHorizontalSignal' dataset in UCR109 based on original 'MINIROCKET' exmples

Hi, as the title suggests, code as follow:

`from minirocket import fit, transform
import pandas as pd
import numpy as np
from sklearn.linear_model import RidgeClassifierCV

csv_root = "/media/data1/ubuntu_env/data/TSC_datasets/Univariate_ts2csv/"
subdataset_name = "EOGHorizontalSignal"
train_csv_file_path = csv_root + subdataset_name + '/{}_TRAIN.csv'.format(subdataset_name)
test_csv_file_path = csv_root + subdataset_name + '/{}_TEST.csv'.format(subdataset_name)

train_csv_file = pd.read_csv(train_csv_file_path,
header = None,
sep = ",",
skiprows = 0,
engine = "c")
test_csv_file = pd.read_csv(test_csv_file_path,
header = None,
sep = ",",
skiprows = 0,
engine = "c")
total_train_data = train_csv_file.values[:].copy()
X_training, Y_training = total_train_data[:, 1:].astype(np.float32), total_train_data[:, 0].astype(np.int32)
total_test_data = test_csv_file.values[:].copy()
X_test, Y_test = total_test_data[:, 1:].astype(np.float32), total_test_data[:, 0].astype(np.int32)

note:

* input time series do not need to be normalised

* input data should be np.float32

parameters = fit(X_training)
X_training_transform = transform(X_training, parameters)
X_test_transform = transform(X_test, parameters)

classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10))
classifier.fit(X_training_transform, Y_training)
predictions = classifier.predict(X_test_transform)
total = len(Y_test)
correct = (predictions == Y_test).sum()
accuracy = correct / total
print("accuarcy: {}".format(accuracy))`

Based on the code above, the final acc is about 0.59~0.60. But the result to be displayed as follow is 0.83:
image
I have validated the data loading on several other datasets, accuracy appeared to be normal, so i think data loading in my code should be ok. There is a big difference between 0.58 from 0.80, and i really wounder what's the problem which lead to it. Thanks!

Extending Documentation of minirocket multivariate

Hello,

The implementations for minirocket multivariate (both here and on sktime) mention that it is a naive extension of the univatiate version, but do not give any clearer explanation of what is actually happening under the hood. Looking directly at the source code for this version does not help that much either, as it is fairly hard to read.

Could you extend the documentation on the repository with a (coarse) description of how the algorithm was extended to handle multivariate data and/or add some comments to the source code in that regard?

Thanks!

Any help for minirocket on UEA multivariate time series classification

hello, has any result report on minirocket on UEA multivariate time series classification archive? @angus924
I use the minirocket_multivariate to handle PenDigits dataset in UEA multivariate,but there is NaN in X_training_transform.
And the result on UEA is poor compared to the result of minirocket_dv on UCRArchive_2018, can give me some suggestion?
Code:
parameters = fit(X_training,num_features = 10_000)
X_training_transform = transform(X_training, parameters)
print('X_training_transform:',X_training_transform)
print('type(X_training_transform):',type(X_training_transform))
print("X_training_transform.shape:", X_training_transform.shape)
print("np.isnan(X_training_transform).any():", np.isnan(X_training_transform).any())
classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10), normalize = True)
classifier.fit(X_training_transform, Y_training)
X_test_transform = transform(X_test, parameters)
predictions = classifier.predict(X_test_transform)
Report:
last_X_training.shape: (7494, 2, 8)
last_X_test.shape: (3498, 2, 8)
last_Y_training.shape: (7494,)
last_Y_test.shape: (3498,)
X_training_transform: [[0. 0. 0. ... 0.625 0.875 0.375]
[0. 0. 0. ... 0.625 1. 0.125]
[0. 0. 0. ... 0.375 0.625 0.25 ]
...
[0. 0. 0. ... 0.375 0.875 0.125]
[0. 0. 0. ... 0.25 1. 0.125]
[0. 0. 0. ... 0.5 0.875 0.125]]
type(X_training_transform): <class 'numpy.ndarray'>
X_training_transform.shape: (7494, 9996)
np.isnan(X_training_transform).any(): True
Traceback (most recent call last):
File "cc-test.py", line 68, in
classifier.fit(X_training_transform, Y_training)
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/linear_model/_ridge.py", line 1943, in fit
multi_output=True, y_numeric=False)
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/base.py", line 433, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/utils/validation.py", line 878, in check_X_y
estimator=estimator)
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/utils/validation.py", line 721, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/utils/validation.py", line 106, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

Datatype Error

When I tried to run MiniRocket on my data in Python, I got the following datatype error: "TypeError: X must be in an sktime compatible format, of scitype Series, Panel or Hierarchical, for instance a pandas.DataFrame with sktime compatible time indices, or with MultiIndex and last(-1) level an sktime compatible time index. Allowed compatible mtype format specifications are: ['pd.Series', 'pd.DataFrame', 'np.ndarray', 'nested_univ', 'numpy3D', 'pd-multiindex', 'df-list', 'pd_multiindex_hier'] See the data format tutorial examples/AA_datatypes_and_datasets.ipynb, If you think the data is already in an sktime supported input format, run sktime.datatypes.check_raise(data, mtype) to diagnose the error, where mtype is the string of the type specification you want."

padding problem

For non-stationary queues, it might seem a bit odd if padding is done solely with zeros. Would it be better to fill in the values of the start point and end point instead?

Feature Transformation

Hello,

I am trying to run MiniRocket on my dataset, which is basically a SCADA dataset containing data from multiple sensors over period of time. Its a multivariate time series therefore I am using multivariate version of MiniRocket from sklearn. However, the features are not being transformed the way they are supposed to be.

Initially, I ran the following chunk of code on my personal SCADA dataset:

minirocket_multi = MiniRocketMultivariate()
X_train_transform = minirocket_multi.fit_transform(X_train)
X_test_transform = minirocket_multi.transform(X_test)

This is the output that I am getting,

----------------------Before Transformation------------------------------
X_train: (34992, 25)
X_test: (17472, 25)
----------------------After Transformation------------------------------
X_train: (1, 9996)
X_test: (1, 9996)

However, I think after transformation the shape X_train and X_test should be (34992, 9996) and (17472, 9996). Could you please help me in this regard? Why is just transforming one single sample, not the rest?

Also, I would like to mention that I have loaded data as using pickle file, containing data in form of pandas dataframe.

with open(train_file, "rb") as f:
data_train=pickle.load(f)
X_train_wt = data_train.iloc[:, :-1]
y_train_wt = data_train.iloc[:, -1] # Last column

TypeError: No matching definition for argument type(s) array(float64, 2d, C), array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)

Thank you very much, once again, for this great piece of software. Very much appreciated! I'm trying to use it with my data but unfortunately, I always get the following error if I attempt to fit my input with "parameters = fit(x_trainScaled)":

TypeError: No matching definition for argument type(s) array(float64, 2d, C), array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)

Here are some, probably, relevant characteristics of my input:

    print(x_trainScaled.shape)
    print(x_trainScaled.dtype)

returns:

(3000, 3000)
float64

// edit:

This is the whole traceback:

  File "minirocket\code\minirocket.py", line 130, in fit
    biases = _fit_biases(X, dilations, num_features_per_dilation, quantiles)
  File "\lib\site-packages\numba\dispatcher.py", line 500, in _explain_matching_error
    raise TypeError(msg)

Can't set random_state when doing a gridsearchCV

Dependencies

import numpy as np
from sklearn.linear_model import RidgeClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

from sktime.datasets import load_basic_motions
from sktime.transformations.panel.rocket import MiniRocketMultivariate`

Make train/test split and set up pipeline

X_train, y_train = load_basic_motions(split="train", return_X_y=True)

model = Pipeline([
    ('minirocket', MiniRocketMultivariate(random_state=42)), 
    ('ridge_clf', RidgeClassifier(random_state=42)),
])

Fit 1 model

model.fit(X_train, y_train)
Works fine

Now do a gridsearch for alpha value

parameters = {
  'ridge_clf__alpha': [0.1, 1, 10],
}

model_cv = GridSearchCV(model, parameters)

model_cv.fit(X_train, y_train)

"RuntimeError: Cannot clone object MiniRocketMultivariate(random_state=42), as the constructor either does not set or modifies parameter random_state"

starting with "wide" data

If I start with the wide data format, a 2d array of samples (rows) by sensor readings (columns), what is the right way to transform that to fit the requirements of this library?

Channels get lost

Dear all,

I am applying Minirocket to a set of multivariate series with 7 channels and 8020 data points.

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from minirocket_multivariate import fit, transform
import numpy as np

# Asume que df_final_extendedVelocidadAnglarDerivadaRadio ya está preparado con tus datos
X = df_final_extendedVelocidadAnglarDerivadaRadio.drop(columns=['etiqueta']).values
y = df_final_extendedVelocidadAnglarDerivadaRadio['etiqueta'].values

# Reforma X para que tenga la forma esperada por MiniRocket
X_reshaped = X.reshape(-1, 7, 8020)
X_reshaped = X_reshaped.astype(np.float32)

# Ajusta los parámetros de MiniRocket
params = fit(X_reshaped, num_features=100, max_dilations_per_kernel=84)

# Transforma los datos usando MiniRocket
X_transformed = transform(X_reshaped, params)

When I print X_reshaped.shape I get: (240, 7, 8020)

However the transformation using minirocket_multivariate.fit() returns a X_transformed with dimensions (240, 84). I would have expected (240, 7, 84). Is my assumption correct? If so, am I doing anything wrong? Your help will be highly appreciated.

Best,
Luis

Port to R language

Hi,

I was wondering if you plan to port minirocket to R language.

Many thanks,

minirocket_multivariate extremely slow

My setup is that I am using large dataset (10,000+) and I pass data as batches into model. I do not cache the data and run transform every time I pass data into model on every epoch. I run this same setup for both

minirocket.py with input shape (32768,99) and

minirocket_multivariate.py with input shape (32768,1,99) so the number of channel is 1.

I find that the minirocket_multivariate.py version runs significantly more slow on every transform() relative to minirocket.py.

Is there a potential bug in the code?

Minimum length time series

Hi, What is the minimum length of a time series for Minirocket? I have tried with time series of length 4 but It throws me an error

some question about multivarible version.

hello, I watch the code about multivarible miniroket. I think the combine multi channels is not make sense for me.
Conv(x) , x is channel 0
Conv(y), y is channel 1
when combine the channel, just become:
Conv(x+y)
why not, change the np.sum to np.prod.
Conv(x*y)

Example of CSV file reading

Hello, I'm trying to figure out what minirocket expects as data on input. I keep on getting
TypeError: No matching definition for argument type(s) pyobject, array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)

My data has following format:

timestamp,close
1619773130596,54559.47
1619773134938,54563.93
1619773139226,54554.23
1619773143564,54564.34

And I read it like this:

dataset = pd.read_csv(filename, usecols = [0, 1], header=0)
dataset = dataset.dropna()
dataset.columns = dataset.columns.to_series().apply(lambda x: x.strip())

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.