Coder Social home page Coder Social logo

sahyadri2695 / ml-project-radar-object-classification Goto Github PK

View Code? Open in Web Editor NEW

This project forked from andrew878/ml-project-radar-object-classification

0.0 0.0 0.0 918 KB

A master's machine learning project that classifies objects/materials based on how they interact with a radar signal. This model achieved 1st (equal) place in St Andrew's ML competition (out of 80 participants).

Python 100.00%

ml-project-radar-object-classification's Introduction

RadarCATClassification

EXECUTIVE SUMMARY:

A master's machine learning project that classifies objects/materials based on how they interact with radar signals.

Within the provided python scripts there are:

◦ The training and calibration of four classification models:

▪ Logistic Regression;

▪ K-Nearest Neighbours;

▪ Single layer neural network.

▪ Gradient Boosted Trees;

◦ Aggregation of above models into an ensemble voting classifier that achieves 100% accuracy on unseen test data.

Full credit of original idea and data source:

https://sachi.cs.st-andrews.ac.uk/research/interaction/radarcat-exploits-googles-soli-radar-sensor-for-object-and-material-recognition/

OBJECTIVE:

Here, the ultimate desired outcome is not a better understanding of radar’s reflection and refraction properties to further physics, but rather it is the fast, accurate classification of objects. As such, when selecting a model, I did not restrict myself to choosing interpretable models only, and will focussed on accuracy and speed to classify.

An ensemble classifier was considered given:

• With the small amount of data per category suggests it could be difficult to train a single accurate model. The aggregated classifications of several different models with ‘OK’ accuracy might allow us overcome this by diversification of errors;

• Evidence in both academia (Kittler, Hatef, Duin, & Matas, 1998)⁠ and actual data-science competitions (Géron, 2017; Murray, 2018)⁠ suggest that ensemble methods can work well for classification problems;

GENERAL APPROACH:

• Step 1: Trained a regularised logistic regression model. Measure performance on training and validation sets to obtain an initial accuracy benchmark. a) Adjust regularisation parameter (L2) as appropriate if under or over-fitting occurs.

• Step 2: Trained the following models we get satisfiable accuracy. Use grid-search with cross-validation to tune hyper-parameters. a) A gradient boosted tree (‘GBT’) classifier; b) K-Nearest Neighbour (‘KNN’) classifier (note: KNN doesn’t require training, only hyper-parameter tuning); c) A feed-forward neural network classifier;

• Step 3: Combined all four models into a voting ensemble classifier and test performance on training and validation sets.

• Step 4: Combined training and validation data and retrain model parameters and predict against test set. This step will provide guidance if our ensemble approach is appropriate.

• Step 5: Use all data points (training, validation and test), retrain the model parameters and then use this to test the unlabelled ‘XToClassifyData’. This last step is to ensure we make our final predictions with the most amount of data as possible.

JUSTIFICATION FOR MODEL SELECTION:

Per guidance in Kittler et al, combinations of classifiers are “particularly useful if they are different” (Kittler et al., 1998)⁠. As such, each of the proposed models was chosen because it offers different qualities (strengths/weaknesses) to the ensemble voter:

Logistic regression:

• Provides linear decision boundaries and so for linearly separable data, this approach should work well.

• Takes the entire dataset into consideration when creating decision boundaries.

• Because “logistic regression wants to create a large margin between boundary and (all of) the data”, it can result in decision boundaries tilting in odd angles to accommodate outliers near the decision boundary (Minka, 2003)⁠.

Gradient Boosted Trees:

• The GBT model is itself an ensemble learner that gradually adds and trains multiple decision-tree models where each incremental tree is trained on the residuals of the prior tree. It is somewhat similar to forward stage-wise liner regression (Hastie et al, Chapter 4) in the sense that the models next tree is determined by the residuals of the model’s prior iteration.

• Tree methods such as GBT are robust to outliers (Hastie, Tibshirani, & Friedman, 2008)⁠.

• The combination of these trees will produce linear decision boundaries that are orthogonal to the PCA axes. This will be different to the decision boundaries of the other models.

• Lastly, it is difficult to ignore that GBT approaches tend to feature in many winning data-science entries such as Kaggle and the Netflix Prize (Koren, 2009)⁠.

K-Nearest Neighbour:

• KNN is a probabilistic classifier that formulates classifications based on the ‘K’ closest data points and nothing more (in this sense, the ‘K’ majority voting system shares similarities to an ensemble classifier).

• Per Hastie et al, KNN’s produces piece-wise linear decision boundaries which can be used to represent “irregular” and non-linear class boundaries (Hastie et al., 2008)⁠.

• Hastie et al also details how KNN are successful in many real world applications including electrocardiography signals. From my naive perspective, pictures of electrocardiography waves don’t look dissimilar to the waveforms in the RadarCat article. This leads me to believe KNN could be worth attempting.

• Russell and Norvig also note that “in low-dimensional spaces with plenty of data, nearest neighbours works very well”(Russell & Norvig, 2016)⁠. Due to the curse of dimensionality, KNN might have been infeasible in the original, larger feature set, but with the smaller PCA features, the data-to-feature ratio is much higher, and so there is a possibility it will perform well.

• Unfortunately, as the size of the data set gets very large, KNN will take longer and longer to classify (as it calculates the distance from every data point) but given the small data set, it is sufficient.

Neural network:

• NN’s can represent complex non-linear decision boundaries between classes and work well in instances where there is high-signal-to-noise ratios (Hastie et al., 2008)⁠.

• NN’s are well suited to classification tasks, particularly when “modelling real world complex relationships” (G. P. Zhang, 2000)⁠ (which this problem is considered as).

ml-project-radar-object-classification's People

Contributors

andrew878 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.