Coder Social home page Coder Social logo

sklearn_iris's Introduction

sklearn Iris Tutorial

Introduction to classification techniques with sklearns iris dataset.

Notebooks create and analyze the Iris Classification data with sklearn.

Notebooks implement a logistic regression and a random forest classifier.

Repo Goals

By completing these notebooks, you should be able to understand how to run classification with sklear, visualize results, compare models and select the best performing one using a given metric.

Notebooks rely on pandas, sklearn, joblib, seaborn and matplotlib libraries (see environment dir).

Notebooks Overview

  1. 1_Generate_Data
    • Imports data from sklearn
    • Saves data to csv files in 'data' dir
    • Has notes on environment setup (i.e., package versions) to complement the yml in the 'environment' dir
  2. 2_Train_Test_Split
    • Imports data using pandas from csv files
    • Uses sklearn train-test split to create training and test datasets
    • Saves split data back into csvs in 'data' dir
  3. 3_EDA
    • Imports training data only to conduct exploratory data analysis (EDA)
    • Visualizes data with seaborn pairplot
    • Has a method to explore column level data including number of unique values and value counts
  4. 4_Train_Models
    • Trains two models on training data
      • Logistic Regression
      • Random Forest Classifier
    • Saves models using joblib to file
    • Calculates performance on training data using confusion matrices and classification report
  5. 5_Predict_and_Evaluate
    • Loads trained models and test data
    • Runs predictions on test data
    • Evaluates performance by:
      • Visualizing confusion matrices
      • Calculating accuracy
      • Generating other metrics with a classification report

Example Results

The two plots below show predicted vs. actual values for the train and test dataset for both models tested. Confusion matrices for both models are shown on the test set.

Logistic Regression Confusion Matrix

Logistic Regression Confusion Matrix

Random Forest Confusion Matrix

Random Forest Confusion Matrix

sklearn_iris's People

Contributors

dlumian avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.