Coder Social home page Coder Social logo

sean-galloway / machine-learning-challenge Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 4.19 MB

Over a period of nine years in deep space, the NASA Kepler space telescope has been out on a planet-hunting mission to discover hidden planets outside of our solar system. To help process this data, you will create machine learning models capable of classifying candidate exoplanets from the raw dataset.

Jupyter Notebook 100.00%
python3 machine-learning classification-algorithm jupyter-notebook pandas sklearn sklean-classifier

machine-learning-challenge's Introduction

Machine-Learning - Exoplanet Exploration

Over a period of nine years in deep space, the NASA Kepler space telescope has been out on a planet-hunting mission to discover hidden planets outside of our solar system. To help process this data, you will create machine learning models capable of classifying candidate exoplanets from the raw dataset.

Preprocess the Data

  • Preprocess the dataset prior to fitting the model.
  • Perform feature selection and remove unnecessary features.
  • Use MinMaxScaler to scale the numerical data.
  • Note: all of the cleaning and separating of the data is done in the ETL.ipynb file. This is so that the steps do not have to be repeated for each model.
  • Separate the data into training and testing data. This is done with each model; there does not seem to be a way to do this in a separate file.

Tune Model Parameters

  • Use GridSearch to tune model parameters.
  • Tune and compare at least two different classifiers.

Analysis

Model-1 Logistic Regression

  • Using StandardScaler resulted in better scores that MinMaxScaler.
  • For feature selection, all of the supplied features are used. It is noticed that a dramatic drop off in accuracy occurs if any features are left out.
  • The results are:
    • Training: 0.548
    • Testing: 0.565
  • Using GridSearch and attempting to tune C and max_iter improved the results to 0.658.

Model-2 Random Forest

  • Using MinMaxScaler resulted in slightly better results versus the StandardScaler.
  • For feature selection, all of the supplied features are used. It is noticed that a dramatic drop off in accuracy occurs if any features are left out.
  • The results are:
    • Training: 0.489
    • Testing: 0.481
  • Using GridSearch and attempting to tune n_estimators, max_features, max_depth, min_samples_split, min_samples_leaf, bootstrap improved the results to 0.891.

Model-3 SVM

  • Using MinMaxScaler resulted in slightly better results versus the StandardScaler.
  • For feature selection, all of the supplied features are used. It is noticed that a dramatic drop off in accuracy occurs if any features are left out.
  • The results are:
    • Training: 0.495
    • Testing: 0.520
  • Using GridSearch and attempting to tune C and gamma improved the results to 0.608.

Final analysis

  • All of the models, prior to GridSearch, started off at roughly 0.500 which is selecting the correct result half of the time.
  • GridSearch helped all three models to a greater or lesser degree.
  • Random Forest shows the largest benefit from the GridSearch, achieving a final score of ~0.900. The score significantly beats out the other two models.
  • Random Forest might be a good enough model to predict new exoplanets. A concern is that Random Forests tend to be brittle and may need frequent re-training. Neural Nets should be the next line of analysis, preferably with two or more hidden layers.

machine-learning-challenge's People

Contributors

sean-galloway avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.