angus924 / minirocket Goto Github PK
View Code? Open in Web Editor NEWMINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification
License: GNU General Public License v3.0
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification
License: GNU General Public License v3.0
Hi,
I would like to ask, how to use minirocket for production or implementation phase. is there any way to save minirocket that was fitted in training data and use it for new dataset?
thank you
Thank you so much for making your work available! I have a quick question about the feature size. Looks like the minimum number of feature size is 84. Is there any harm in extracting 84 features and using only a subset them?
hi, for softmax.py
, if the data is split into multiple chunks, then X_validation
is only transformed for the first's chunk biases
, as biases
for different chunks are different, but the transform
is only applied once.
if epoch == 0 and chunk_index == 0: # only run once <---
parameters = fit(X_training, args["num_features"]) # returns: dilations, num_features_per_dilation, biases
# transform validation data
X_validation_transform = transform(X_validation, parameters)
would transforming the X_validation
with each chunk's biases improve performance?
EDIT:
similarly for the latter part (where X_validation_transform
is only normalised with mean and std values from the first chunk):
if epoch == 0 and chunk_index == 0:
# per-feature mean and standard deviation
f_mean = X_training_transform.mean(0)
f_std = X_training_transform.std(0) + 1e-8
# normalise validation features
X_validation_transform = (X_validation_transform - f_mean) / f_std
X_validation_transform = torch.FloatTensor(X_validation_transform)
Hi, would you mind providing examples to use minirocket_variable and minirocket_multivariate_variable? I am not sure on how to configure the required data input.
thank you
hello, thanks for your excellent work.
wmm, and I have a problem, I find the response in "starting with "wide" data", you say the data can be unlabeled, it depends on my task "(You don't need labels necessarily, depending on your task.)"
and when I read your article or code readme, I notice that you mentioned the parameters in different data are same, right? (ok, I don't know if I understand right, and I can't find where is the latter information.)
So my question is, could I apply your work on my unlabeled data? if it's true, how can I set the "Y_traing" in examples codes?
thanks!
when i use my data with minirocket in pycharm , had a problem with dataype, like: Traceback (most recent call last):
File "E:/PycharmProjects/minirocket-main/code/traintest.py", line 46, in
parameters = fit(X_training)
File "E:\PycharmProjects\minirocket-main\code\minirocket.py", line 130, in fit
biases = _fit_biases(X, dilations, num_features_per_dilation, quantiles)
File "E:\ProgramData\Anaconda3\envs\deepl\lib\site-packages\numba\core\dispatcher.py", line 703, in _explain_matching_error
raise
TypeError(msg)TypeError: No matching definition for argument type(s) pyobject, array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)
how can i work it
Hi, as the title suggests, code as follow:
`from minirocket import fit, transform
import pandas as pd
import numpy as np
from sklearn.linear_model import RidgeClassifierCV
csv_root = "/media/data1/ubuntu_env/data/TSC_datasets/Univariate_ts2csv/"
subdataset_name = "EOGHorizontalSignal"
train_csv_file_path = csv_root + subdataset_name + '/{}_TRAIN.csv'.format(subdataset_name)
test_csv_file_path = csv_root + subdataset_name + '/{}_TEST.csv'.format(subdataset_name)
train_csv_file = pd.read_csv(train_csv_file_path,
header = None,
sep = ",",
skiprows = 0,
engine = "c")
test_csv_file = pd.read_csv(test_csv_file_path,
header = None,
sep = ",",
skiprows = 0,
engine = "c")
total_train_data = train_csv_file.values[:].copy()
X_training, Y_training = total_train_data[:, 1:].astype(np.float32), total_train_data[:, 0].astype(np.int32)
total_test_data = test_csv_file.values[:].copy()
X_test, Y_test = total_test_data[:, 1:].astype(np.float32), total_test_data[:, 0].astype(np.int32)
parameters = fit(X_training)
X_training_transform = transform(X_training, parameters)
X_test_transform = transform(X_test, parameters)
classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10))
classifier.fit(X_training_transform, Y_training)
predictions = classifier.predict(X_test_transform)
total = len(Y_test)
correct = (predictions == Y_test).sum()
accuracy = correct / total
print("accuarcy: {}".format(accuracy))`
Based on the code above, the final acc is about 0.59~0.60. But the result to be displayed as follow is 0.83:
I have validated the data loading on several other datasets, accuracy appeared to be normal, so i think data loading in my code should be ok. There is a big difference between 0.58 from 0.80, and i really wounder what's the problem which lead to it. Thanks!
if the kernel numbers is much less than dataset sample numbers, will it Limit model effects?
I am trying to RUN MiniRocket. The accuracy I get is 85% and it is 2% different from your accuracy. I have used the code you provided and defined only 109 UCR datasets for it. You can see the relevant code in the link below. What is the reason for this 2% difference?
https://colab.research.google.com/drive/1YcrWTSF7oNqGeP-C0n-pdi2EzAqYo-_g?usp=sharing
Hello,
The implementations for minirocket multivariate (both here and on sktime) mention that it is a naive extension of the univatiate version, but do not give any clearer explanation of what is actually happening under the hood. Looking directly at the source code for this version does not help that much either, as it is fairly hard to read.
Could you extend the documentation on the repository with a (coarse) description of how the algorithm was extended to handle multivariate data and/or add some comments to the source code in that regard?
Thanks!
hello, has any result report on minirocket on UEA multivariate time series classification archive? @angus924
I use the minirocket_multivariate to handle PenDigits dataset in UEA multivariate,but there is NaN in X_training_transform.
And the result on UEA is poor compared to the result of minirocket_dv on UCRArchive_2018, can give me some suggestion?
Code:
parameters = fit(X_training,num_features = 10_000)
X_training_transform = transform(X_training, parameters)
print('X_training_transform:',X_training_transform)
print('type(X_training_transform):',type(X_training_transform))
print("X_training_transform.shape:", X_training_transform.shape)
print("np.isnan(X_training_transform).any():", np.isnan(X_training_transform).any())
classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10), normalize = True)
classifier.fit(X_training_transform, Y_training)
X_test_transform = transform(X_test, parameters)
predictions = classifier.predict(X_test_transform)
Report:
last_X_training.shape: (7494, 2, 8)
last_X_test.shape: (3498, 2, 8)
last_Y_training.shape: (7494,)
last_Y_test.shape: (3498,)
X_training_transform: [[0. 0. 0. ... 0.625 0.875 0.375]
[0. 0. 0. ... 0.625 1. 0.125]
[0. 0. 0. ... 0.375 0.625 0.25 ]
...
[0. 0. 0. ... 0.375 0.875 0.125]
[0. 0. 0. ... 0.25 1. 0.125]
[0. 0. 0. ... 0.5 0.875 0.125]]
type(X_training_transform): <class 'numpy.ndarray'>
X_training_transform.shape: (7494, 9996)
np.isnan(X_training_transform).any(): True
Traceback (most recent call last):
File "cc-test.py", line 68, in
classifier.fit(X_training_transform, Y_training)
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/linear_model/_ridge.py", line 1943, in fit
multi_output=True, y_numeric=False)
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/base.py", line 433, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/utils/validation.py", line 878, in check_X_y
estimator=estimator)
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/utils/validation.py", line 721, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/newhome/chenc/miniforge3/envs/AIcocahing/lib/python3.6/site-packages/sklearn/utils/validation.py", line 106, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
hi, where can I download the MosquitoSound, InsectSound, and FruitFlies datasets?
When I tried to run MiniRocket on my data in Python, I got the following datatype error: "TypeError: X must be in an sktime compatible format, of scitype Series, Panel or Hierarchical, for instance a pandas.DataFrame with sktime compatible time indices, or with MultiIndex and last(-1) level an sktime compatible time index. Allowed compatible mtype format specifications are: ['pd.Series', 'pd.DataFrame', 'np.ndarray', 'nested_univ', 'numpy3D', 'pd-multiindex', 'df-list', 'pd_multiindex_hier'] See the data format tutorial examples/AA_datatypes_and_datasets.ipynb, If you think the data is already in an sktime supported input format, run sktime.datatypes.check_raise(data, mtype) to diagnose the error, where mtype is the string of the type specification you want."
For non-stationary queues, it might seem a bit odd if padding is done solely with zeros. Would it be better to fill in the values of the start point and end point instead?
Hello,
I am trying to run MiniRocket on my dataset, which is basically a SCADA dataset containing data from multiple sensors over period of time. Its a multivariate time series therefore I am using multivariate version of MiniRocket from sklearn. However, the features are not being transformed the way they are supposed to be.
Initially, I ran the following chunk of code on my personal SCADA dataset:
minirocket_multi = MiniRocketMultivariate()
X_train_transform = minirocket_multi.fit_transform(X_train)
X_test_transform = minirocket_multi.transform(X_test)
This is the output that I am getting,
However, I think after transformation the shape X_train and X_test should be (34992, 9996) and (17472, 9996). Could you please help me in this regard? Why is just transforming one single sample, not the rest?
Also, I would like to mention that I have loaded data as using pickle file, containing data in form of pandas dataframe.
with open(train_file, "rb") as f:
data_train=pickle.load(f)
X_train_wt = data_train.iloc[:, :-1]
y_train_wt = data_train.iloc[:, -1] # Last column
It would be handy to have minirocket as a Tensorflow layer (and potentially have it use the GPU if that's better in that scenario).
Thank you very much, once again, for this great piece of software. Very much appreciated! I'm trying to use it with my data but unfortunately, I always get the following error if I attempt to fit my input with "parameters = fit(x_trainScaled)":
TypeError: No matching definition for argument type(s) array(float64, 2d, C), array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)
Here are some, probably, relevant characteristics of my input:
print(x_trainScaled.shape)
print(x_trainScaled.dtype)
returns:
(3000, 3000)
float64
// edit:
This is the whole traceback:
File "minirocket\code\minirocket.py", line 130, in fit
biases = _fit_biases(X, dilations, num_features_per_dilation, quantiles)
File "\lib\site-packages\numba\dispatcher.py", line 500, in _explain_matching_error
raise TypeError(msg)
import numpy as np
from sklearn.linear_model import RidgeClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.rocket import MiniRocketMultivariate`
X_train, y_train = load_basic_motions(split="train", return_X_y=True)
model = Pipeline([
('minirocket', MiniRocketMultivariate(random_state=42)),
('ridge_clf', RidgeClassifier(random_state=42)),
])
model.fit(X_train, y_train)
Works fine
parameters = {
'ridge_clf__alpha': [0.1, 1, 10],
}
model_cv = GridSearchCV(model, parameters)
model_cv.fit(X_train, y_train)
"RuntimeError: Cannot clone object MiniRocketMultivariate(random_state=42), as the constructor either does not set or modifies parameter random_state"
If I start with the wide data format, a 2d array of samples (rows) by sensor readings (columns), what is the right way to transform that to fit the requirements of this library?
pls help to understand how your this repo code related to sktime.transformers.series_as_features.rocket
as written in
https://towardsdatascience.com/minirocket-fast-er-and-accurate-time-series-classification-cdacca2dcbfa
from sktime.transformers.series_as_features.rocket import MiniRocket
is it the same code?
is the input X, user may self add padding?
Dear all,
I am applying Minirocket to a set of multivariate series with 7 channels and 8020 data points.
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from minirocket_multivariate import fit, transform
import numpy as np
# Asume que df_final_extendedVelocidadAnglarDerivadaRadio ya está preparado con tus datos
X = df_final_extendedVelocidadAnglarDerivadaRadio.drop(columns=['etiqueta']).values
y = df_final_extendedVelocidadAnglarDerivadaRadio['etiqueta'].values
# Reforma X para que tenga la forma esperada por MiniRocket
X_reshaped = X.reshape(-1, 7, 8020)
X_reshaped = X_reshaped.astype(np.float32)
# Ajusta los parámetros de MiniRocket
params = fit(X_reshaped, num_features=100, max_dilations_per_kernel=84)
# Transforma los datos usando MiniRocket
X_transformed = transform(X_reshaped, params)
When I print X_reshaped.shape
I get: (240, 7, 8020)
However the transformation using minirocket_multivariate.fit()
returns a X_transformed with dimensions (240, 84). I would have expected (240, 7, 84). Is my assumption correct? If so, am I doing anything wrong? Your help will be highly appreciated.
Best,
Luis
Hi,
I was wondering if you plan to port minirocket to R language.
Many thanks,
My setup is that I am using large dataset (10,000+) and I pass data as batches into model. I do not cache the data and run transform
every time I pass data into model on every epoch. I run this same setup for both
minirocket.py
with input shape (32768,99)
and
minirocket_multivariate.py
with input shape (32768,1,99)
so the number of channel is 1.
I find that the minirocket_multivariate.py
version runs significantly more slow on every transform()
relative to minirocket.py
.
Is there a potential bug in the code?
Hi, What is the minimum length of a time series for Minirocket? I have tried with time series of length 4 but It throws me an error
hello, I watch the code about multivarible miniroket. I think the combine multi channels is not make sense for me.
Conv(x) , x is channel 0
Conv(y), y is channel 1
when combine the channel, just become:
Conv(x+y)
why not, change the np.sum to np.prod.
Conv(x*y)
Hello, I'm trying to figure out what minirocket expects as data on input. I keep on getting
TypeError: No matching definition for argument type(s) pyobject, array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)
My data has following format:
timestamp,close
1619773130596,54559.47
1619773134938,54563.93
1619773139226,54554.23
1619773143564,54564.34
And I read it like this:
dataset = pd.read_csv(filename, usecols = [0, 1], header=0)
dataset = dataset.dropna()
dataset.columns = dataset.columns.to_series().apply(lambda x: x.strip())
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.