I lead the Dynamics and Neural Systems Group in the School of Physics at the University of Sydney.
Please check out our team's open code in the organizational GitHub page for the Dynamics and Neural Systems Group.
Highly comparative time-series analysis
Home Page: https://hctsa-users.gitbook.io/hctsa-manual
License: Other
I lead the Dynamics and Neural Systems Group in the School of Physics at the University of Sydney.
Please check out our team's open code in the organizational GitHub page for the Dynamics and Neural Systems Group.
Many features rely on similar intermediate calculations (such as the first zero-crossing of the ACF, or the embedding dimension estimated using fnn, etc.)
Rather than repeating these time consuming calculations again and again, would be more efficient to compute them once and have that information accessible by the functions.
This would come at the cost of making some functions rely on a particular input structure, but this could take the form of an optional argument, which is a structure with fields containing frequently used quantities. An alternative is to make the first input to all operations take either a vector as currently the case, or a structure that contains the time series data as well as some intermediate calculations that can be extracted (or recomputing if not available). This is potentially a more major change...
When I try to compile the install.m file I get this error:
Error using compile_mex
An error occurred while compiling ML_Fastdfa_core C code.
It appears that mex is not set up to work on this system (cf. 'doc mex' and 'mex -setup').
Get 'mex ML_fastdfa_core.c' to work, and then re-run compile_mex.m
In the code warns you of this but I do not know how to configure it
'Please make sure that mex is set up with the right compilers for this system.'
Perhaps it's time to update the terminology, since Feature
is more common usage than Operation
.
Could consider changing the name of the Operations
data object to Features
...
Hi
I am running the computations with the parallel option on (TS_Compute(true);). but I receive this message:
TS_Compute(true);
Loading data from HCTSA.mat... Done.
Computation will be performed across multiple cores using Matlab's Parallel Computing Toolbox.
[26-Mar-2023 11:53:47]: Extracting 7752 features from each of 24 time series.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
; ; ; : : : : ; ; ; 26-Mar-2023 11:53:47 ; ; ; : : : ; ; ;
---------------Time series 1 / 24 - - - - - - - - - - -
Computations will be performed serially without parallelization.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Preparing to calculate N1
ts_id = 1, N = 50000 samples
Computing 7752 features.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
why the computation does not perform with parallelization?
I am getting an error when I am trying to do multiple group assignment. I have assigned two set of labels to each timeSeriesdata. I have a total of 160 timeseries with 96 of them with a label of 'dir' and 64 with label of 'undir'. The same dataset also has 40 separate labels of 'a','b''c','d'. I want to take out the time series that has 'dir' and 'a'. But it is giving the following error
Error using TS_LabelGroups (line 138)
24 time series have multiple group assignments:
I used the code groupindices_songtype = TS_LabelGroups('HCTSA_blkorng_new_syllable.mat',{'dir','a'});
and have uploaded the hctsa file here
Kindly suggest how to approach this issue.
Hi,
I tried to run this code from the documentation (https://hctsa-users.gitbook.io/hctsa-manual/setup/compiling_binaries) to ignore the TISEAN functions
TS_LocalClearRemove('ops',TS_GetIDs('tisean','raw','ops'),1,'raw');
However this produces an error:
TS_LocalClearRemove('ops',TS_GetIDs('tisean','raw','ops'),1,'raw');
Loading data from HCTSA.mat... Done.
Warning: No matches to 'tisean' found in HCTSA.mat
> In TS_GetIDs (line 105)
Error: File: TS_LocalClearRemove_2.m Line: 1 Column: 31
"@" dotted names can be used only within a CLASS block.
After editing the first line of the TS_LocalClearRemove.m file from:
function TS_LocalClearRemove.m(whatData,tsOrOps,idRange,doRemove)
to:
function TS_LocalClearRemove(tsOrOps,idRange,doRemove,whatData)
The error appears to be fixed and I am able to ignore the TISEAN functions now. I don't know whether this change would affect anything else, because I had just downloaded the library today, but at least it appears to solve this problem that I'm facing.
Thank you for all your hard work in providing us with this library!
JeiShian
Hello together,
I recognized an error as I'm trying to reorder the features & time-series using 'TS_Cluster()'. I'm getting an error stating 'Error using BF_pdist: Unknown distance metric: 'corr_fast'' which is used in the 'TS_PairwiseDist' function.
Is there a solution to fix this? Thanks for help in advance!
Kind regards
Fabian
example: SY_TISEAN_nstat_z(y,4,{'ac',3})
Fix this now
The catch22 module has function names like DN_HistogramMode
as 2017 versions, when these functions have been updated in the latest hctsa, and thus there are duplicate but different functions with identical names—rock-bottom bad practice for software dev, and a problem with poor namespace control in Matlab.
May need to change this in catch22 to disambiguate when sitting in this package, or do some object-oriented implementation to make e.g., catch22.DN_HistogramMode
clearly distinguished (Using the directory with @
name convention?)
Hello,
This might be considered as a feature request or seeking advice how to hack it.
The gist of the problem is, I do not have all the toolboxes available. Seemingly producing errors and calling the functions adds an unnecessary overhead to my computation time, which I would want to minimize.
What I would like to have the option to do, is to exclude operations that belong to certain toolboxes, as to avoid even calling functions I know will fail.
The TS_ops.txt does not seem to contain information on which toolbox something belongs to.
So, if anyone has any ideas, how I could perform, this to speed up my computations, I would be very happy.
And thank you for a nice software suite! :)
PS: I have many more questions, and I have a feeling this is more a bug reporting place so where would be a more appropriate place to ask those.
great code thanks
may you clarify :
will it work for multivariate time series
1
where all values are continues values
2
or even will it work for multivariate time series where values are mixture of continues and categorical values
for example 2 dimensions have continues values and 3 dimensions are categorical values
color weight gender height age
1 black 56 m 160 34
2 white 77 f 170 54
3 yellow 87 m 167 43
4 white 55 m 198 72
5 white 88 f 176 32
I am using Ethoscope velocity data to distinguish between genotype. I don't have a license for the econometrics toolbox so I get a few errors about that. Additionally I get errors about functions not being supported. Are these issue with my installation or is it because the data is not amenable to that particular analysis?
Execution of script nn_prepare as a function is not supported:
/home/luca/Toolboxes/OpenTSTOOL/tstoolbox/mex/nn_prepare.m
Given public open source projects have free support for cloud ci services, consider leveraging one of them here. 🎉
https://blogs.mathworks.com/developer/2020/12/15/cloud-ci-services/
I get an error when using TS_SimSearch
:
>> TS_SimSearch(100,'whatPlots',{'network'});
Loading data from HCTSA_N.mat... Done.
Loaded euclidean distances from HCTSA_N.mat
---TARGET: PrimClass_Jim.dat---
1. UnivDorm_Adriana.dat (d = 2.67)
2. PrimClass_Joel.dat (d = 2.90)
3. UnivClass_Ayanna.dat (d = 2.94)
4. UnivClass_Alexandria.dat (d = 3.01)
5. Office_Ashanti.dat (d = 3.09)
6. UnivClass_Axel.dat (d = 3.10)
7. Office_Ayden.dat (d = 3.15)
8. PrimClass_Janice.dat (d = 3.16)
9. PrimClass_Jacquelyn.dat (d = 3.24)
10. UnivLab_Annette.dat (d = 3.25)
11. UnivLab_Angie.dat (d = 3.26)
12. PrimClass_Julian.dat (d = 3.28)
13. UnivLab_Andre.dat (d = 3.32)
14. PrimClass_Jonathon.dat (d = 3.37)
15. PrimClass_Josue.dat (d = 3.39)
16. PrimClass_Janiya.dat (d = 3.40)
17. UnivLab_Anita.dat (d = 3.46)
18. PrimClass_Julio.dat (d = 3.46)
19. PrimClass_Jeffery.dat (d = 3.54)
20. UnivDorm_Alka.dat (d = 3.55)
Thresholding by proportion
Visualizing a network with 20 nodes
--------Generating a network visualization with k = 1.000000e-02--------
Computing network visualization over 2000 iterations and 1 repeats
Undefined function or variable 'g'.
Error in NetVis_netvis (line 249)
J = [(E*x-f);(E*y-g)]; % This is the full gradient (attraction - repulsion)
Error in TS_SimSearch (line 347)
NetVis_netvis(A,'k',0.01,'textLabels',{dataStruct.Name},...
Then, when I go into NetVis_netvis.m
and set g
to j
, I get another error:
Error using gplot
Too many input arguments.
Error in NetVis_netvis (line 302)
[X,Y] = gplot(Ath{i},[x y]); %,'color',clinks{2}); %,'color',c{2}); % links
Error in TS_SimSearch (line 347)
NetVis_netvis(A,'k',0.01,'textLabels',{dataStruct.Name},...
mysql_dbopen.m throws error even after including the appropriate connector via
javaaddpath('/home/philip/work/CompEngineMatlab/Database/mysql-connector-java-5.1.34-bin.jar')
% -- Error --
Error using mysql_dbopen (line 24)
Error with java database connector: Java exception occurred:
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
Hi,
I am trying to extract the temporal features from a large number of BOLD resting state fMRI signals using hctsa toolbox. Therefore, I have written a script which loads keyword matrices, labels, and timeseries in a loop and inputs them sequentially to TS_Init for creation of HCTSA.mat files. But, for each loop the algorithm requires a user input "y" to confirm the time series. I could not find the corresponding line of the code in "TS_Init.m" file of the toolbox to modify it and skip this step, and I was wondering if you could please guide me towards solving this issue.
Thank you for the great toolbox.
Regards,
Ali
For massive datasets, you may not want the data included in your TimeSeries structure... Add an option to remove?
This is not an issue, but I couldn't find a 'getting started' page with information on how to execute the TS_Init on a given dataset. Is any information on how to create the INP_test_ts.mat file manually ?
Thanks for the package and all the information, btw. Great and amazing work !
Cell arrays are super clunky compared to table objects for TimeSeries
and Operations
. Would be an ~easy fix to convert to tables
Hi,
I'm generating features for a forecasting application. I require "causal" features, i.e. features that at any point in time do not use future information. I noted that there is a subset of features with the keyword 'forecasting.' Are these all causal? Are any features in other subsets also causal? Is there a simple way to extract only those sorts of features?
Thanks for any help on this matter,
Gavin.
Using TS_Classify(mynormalizedfile) with two classes accuracy is calculated in the command output then:
Warning: No available confusion matrix plotting functions
> In TS_Classify (line 233)
And no classification plot is generated.
I installed Matlab 2020a and ensured I have the Stats/ML Toolbox installed and the Deep Learning Toolbox. Still get the same error.
Tested with latest version of all files.
On a side note, I think there is unlikely but possible danger of having temporary fn colisions when running on parallel (it happened to me already once). We probably should create those in the system temporary dir and make sure they are unique.
I think there is a typo in TS_Classify, line 194 states:
assert(cfnParms.computePerFold == false);
I think it should be:
assert(cfnParams.computePerFold == false); ???
I'm also trying to run an analysis without using the simpleNull (because my classes are imbalanced).
I was getting an error that:
"Output argument "inSampleAccStd" (and possibly others) not assigned a value in the execution with "ComputeCVAccuracies" function."
So I added the following to the end of the ComputeCVAccuracies function:
if ~cfnParams.computePerFold
inSampleAccStd = 0;
end
Then I got the following error:
Unrecognized function or variable 'inSampleAccStd'.
Error in ComputeCVAccuracies (line 49)
inSampleAccStd = mean(inSampleAccStd,1);
Error in TS_Classify (line 203)
nullStats(i) = ComputeCVAccuracies(TS_DataMat,shuffledLabels,cfnParams,true);
Which I think I have fixed by replacing line 49 in ComputeCVAccuracies with:
if cfnParams.computePerFold
inSampleAccStd = mean(inSampleAccStd,1);
end
Hopefully these are accurate solutions!
Kind regards,
Neil
Hi Ben
I'm just trying to save the classifier produced by TS_Classify, and I'm getting an error saying both Unrecognized function or variable 'foldLosses', and Unrecognized function or variable 'whatLoss'. It looks like these variables are no longer produced by TS_Classify (when I use the find function to look for the variables within TS_Classify, they are not produced as outputs on any line). When I comment out the lines that require these variables, the function works.
It looks like the aspect of the TS_Classify function that utilizes these variables might have been removed at some stage, because I've noticed the following in the output description (I was also interested in the doPCs option, but couldn't work out how to perform it):
%---OUTPUTS:
% Text output on classification rate using all features, and if doPCs = true, also
% shows how this varies as a function of reduced PCs (text and as an output plot)
% foldLosses, the performance metric across repeats of cross-validation
% nullStats, the performance metric across randomizations of the data labels
% jointClassifier, details of the saved all-features classifier
Kind regards,
Neil
Hi Ben,
I get this error when trying to GetIds of a part of name from the Operations.Name table.
When I leave the 'Name' flag out it searches the keywords field and it works fine.
>> OperationIDs = TS_GetIDs('mystring', myFile, 'ops', 'Name');
Loading data from....mat... Done.
Brace indexing is not supported for variables of this type.
Error in TS_GetIDs (line 114)
cmatch = find(contains(theDataTable.Name,theMatchString{i}));
However it works when I use 'contains' below (partially copying the method used in keywords). I'm not sure the reason for the loop in Name using cmatch?
case {'name','Name'}
% The cell of comma-delimited keyword strings:
theKeywordCell = theDataTable.Name;
% Find objects with a keyword that matches the input string:
matches = find(contains(theKeywordCell, theMatchString));
% Return the IDs of the matches:
IDs = theDataTable.ID(matches);
% Check for empty:
if isempty(IDs)
warning('No matches to ''%s'' found in %s',theMatchString,theDataFile)
end
It would be fantastic to have gpu support for all those functions.
How much work/time is required to rewrite the matlab code so it could be used on GPU ?
As per infodynamics docs, CO_FirstMin(y,'mi-kraskov2',3) needs to be CO_FirstMin(y,'mi-kraskov2','3').
>> CO_FirstMin(randn(100),'mi-kraskov2',3)
No method 'setProperty' with matching signature found for class 'infodynamics.measures.continuous.kraskov.MutualInfoCalculatorMultiVariateKraskov2'.
Error in IN_Initialize_MI (line 71)
miCalc.setProperty('k', extraParam); % 4th input specifies number of nearest neighbors for KSG estimator
Error in IN_AutoMutualInfo (line 74)
miCalc = IN_Initialize_MI(estMethod,extraParam);
Error in CO_FirstMin>@(x)IN_AutoMutualInfo(y,x,'kraskov2',extraParam) (line 71)
corrfn = @(x) IN_AutoMutualInfo(y,x,'kraskov2',extraParam);
Error in CO_FirstMin (line 90)
autoCorr(i) = corrfn(i);
>> CO_FirstMin(randn(100),'mi-kraskov2','3')
ans =
2
The function TS_InitiateParallel calls BF_CheckToolBox using the argument 'distrib_computing_toolbox' on line 40 but BF_CheckToolBox uses 'parallel_computing_toolbox'.
However, when I change the code in my local version of hctsa and try to use parallel computing, TS_Compute starts to throw errors for a number of functions
I don't know if this is a local install problem.
Dear ben,
I am in the process of analyzing my 64-channel EEG recordings with HCTSA. However, I get the error that "each element of timeSeriesData must be univariate." I looked at the Bonn EEG dataset in the example and realized the data in that was univariate. What do you suggest for EEG signal analysis?
I used HCTSA on a particular dataset and was trying to analyze a particular feature i.e. SP_Summaries_fft_mean. When I am comparing the results in the TS_DataMat column and a code ran on the same dataset separately eg. [out] = SP_Summaries(TimeSeries(1).Data,'fft',[],[],false); I found that the two values are different for out.mean. I am looking at column 4534 in TS_DataMat as from Operations variable that seems to be corresponding feature. Kindly assist.
Hi Ben,
I have encountered an error when compiling MEX as part of the installation process.
When compiling the OpenTSTOOL, I get a series of warnings followed by an error. This is generated from the range_search.cpp file line 116 (see screenshot below):
I suspect that the error arises as the code is trying to compare a range to an integer, which is an invalid operation?
When justifying that part of the code, the MEX is compiled successfully (see screenshot below):
I just wanted to share this and I would be grateful for any suggestions.
FYI - I am using macOS Catalina v10.15.7 and Matlab R2021a.
Many thanks!
Irene
Hi Ben, first of all thanks for this amazing tool.
Could you please explain me how to compute only the notLengthDependent features? In your tutorial it's only explained how to do so with the catch22 set.
Thank you.
Hi Ben,
I have been using your HCTSA toolbox for sometime and had a question. HCTSA is an outstanding job.
When I get start with "hctsa_phenotypingWorm-master" and "hctsa_phenotypingFly-master" project. Here was the question that
plotconfusion(realLabels,predictLabels);
plotconfusion (line 111) update_args = standard_args(args{:});
plotconfusion>standard_args (line 255) Value is not a matrix or cell array.
I did not understand how to solve this problem. Can you give me some suggestions?
Hello,
Thank you for such an amazing tool. I'm wondering about how to approach multivariate time series using hctsa. Is there a special way of assigning the keywords before using TS_LabelGroups?
Thank you in advance,
Konstantin
Are you aware of any R bindings for hctsa?
Best regards,
Sebastian
Dear Ben,
first of all thank you for all the work and effort you put into the hctsa package. It has been really helpful so far!
In the beginning I was dealing with a single sensor to monitor a pressure ts. The classification is mostly a 2-class but can also be a multiple class problem.
Now, I have multiple sensors available that monitor one and the same process. I would like to include possible relations and dependencies between the sampled variables into my analsis. Therefore, I like to ask if you have made any experience yet on how to implement a multivariate ts analysis in a smart way, still using all the beautiful functions within this package.
Thank you in advance and best regards,
Alex
Analysis methods, like TS_TopFeatures
, assume that there are no errors in the data matrix (i.e., that all bad values have been filtered out of the dataset, using TS_normalize
). There should be better checks on this, to avoid the zeros in TS_DataMat
being treated as actual zeros (rather than error symbols in TS_Quality
. Best solution would be to use data in TS_Quality
to restrict the computation to good values (where meaningful analysis is possible), e.g., in the case of TS_TopFeatures
.
Dear Ben,
Thank you so much for developing such a great tool. I’m a pre-med student and I’m trying to analyse multi-channel EEG data with your program, but I’ve encountered a problem while doing so.
Since I’m trying to analyse multi-channel EEG data, I have more than one time series for each subject. However, when I read your journals and gitbook, it said that HCTSA can only analyse one time-series data for a subject. Could you kindly guide me on how to analyse multi-channel EEG data using HCTSA if it is possible?
Thank you.
Yours sincerely,
HJ
hi Ben,
the readme file has a broken link at the bottom section:
https://github.com/benfulcher/hctsa/#comp-engine-time-series
The problem I have is the following:
The code is:
set_false_flag()
_ = hctsa.prepare() # Here Matlab freezes
elevation_data = get_data()
features_data = get_features(elevation_data)
save_data(features_data)
set_true_flag()
print "Done"
this code is packed into a method and repeated in a loop ~20000 times.
The problem is that on 14 iteration Matlab for some reason freezes.
Those are the log messages.
Starting engine
Warming up
Configuring HCTSA
Setting up HCTSA operators (STUCK AT THIS PLACE)
Hooray, we can use HCTSA now...
How I can solve it ?
Maybe running it in a loop so many times is a bad idea ?!
Is is possible to set up Matlab running one globally and then use its functions ?
Hi Ben,
I have been using your HCTSA toolbox for sometime and had a query. First of all thank you for working on the package it has been really useful for my work.
I often face the problem that once I have performed feature extraction on a dataset,normalized the features trained a classifier based on the normalized features and then when I have a new data on which I want to make prediction after feature extraction I do not have the normalization parameters originally used.
So I have to combine the new and trained data features, normalize together and then separate out the new data. I wish there was something similar to python implementation of fit_transform(). Please let me know if you have any provision in your code for dealing with this.
Best Regards,
Avi
Hello and thank you for sharing the great package,
Is it possible to call the package from python using the Matlab engine and if yes
do you have any instructions for that?
best
Some operations output results that depend on the random seed, and thus running the same operation on the same time series can produce different results if run multiple times.
A solution to this is required, and could be done by allowing a random seed input to each non-deterministic function, to allow reproducible results. If none is provided, a default could be rng('default') at the start of each function.
I should implement this as a priority going forward.
DK_timerev function seems to be broken.
The function call to DK_timeLagembed in DK_timerev has been renamed, but the function DK_lagembed has not been renamed to DK_timeLagembed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.