Coder Social home page Coder Social logo

op_importance's Introduction

Selection pipeline for hctsa features

This is a pipeline for selecting small time-series feature sets from the comprehensive feature collection contained in the hctsa toolbox. Features are selected by their classification performance across a collection of time-series classification problems. The pipeline was used to generate the small feature set catch22 - CAnonical Time-series CHaracteristics based on the problems contained in the UEA/UCR time-series classification repository.

For information the pipeline and the catch22 feature set see our preprint:

For information on the full hctsa library of over 7000 features, see the following (open-access) publications:

Running the pipeline

Computing the hctsa-matrices

The selection process relies on computed and normalized feature-matrices from the hctsa toolbox.

๐Ÿ‘‹๐Ÿ‘‹๐Ÿ‘‹ Computed data (using v0.97 of hctsa) that we used for our analysis can be downloaded from this figshare repository. ๐Ÿ‘‹๐Ÿ‘‹๐Ÿ‘‹

See hctsa for instructions on how to compute the construct hctsa files from your data, run the features and normalize the matrices (hctsa relies on Matlab). The computed, normalized HCTSA mat files should be placed into a folder called input_data inside the op_importance folder with file names HCSTA_<dataset name>_N.mat.

Running the op_importance pipeline

The pipeline can be launched from the op_importance directory as

python Workflow.py <runtype>

Where <runtype> is a string composed of 2-3 parts delimited by an underscore: <classifier>_<normalisation>(_null). Where <classifier> selects the classifier type used among svm, dectree, linear and normalisation is either scaledrobustsigmoid or maxmin. An appended _null in the <runtype>-string means that distributions of classification accuracies for each feature are generated in a permutation-based procedure that shuffles the labels of the classification problems.

First step: compute null accuracy distributions

First, null distributions need to by generated by e.g.,

python Workflow.py dectree_maxmin_null

This can take long as 1000 classification runs are done on each dataset. It it preferable to do this computation on a cluster.

Make sure that compute_features = True is set in the main function of Workflow.py.

Second step: compute valid accuracy distributions

python Workflow.py dectree_maxmin_null

Third step: further analyses

Once the valid and null accuracies have been computed, all following analyses can be run without re-classification by setting compute_features = False.

See below for an example output of the pipeline that plots correlations in performance across datasets of the 500 best features as well as the clusters they end up in.

Example output

op_importance's People

Contributors

benfulcher avatar chlubba avatar philiphorst avatar sarabsethi avatar vp007-py avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.