Coder Social home page Coder Social logo

intro_to_machine_learning's Introduction

Intro to Machine Learning

Machine Learning Algorithms from the Udacity Intro to Machine Learning Nanodegree

Data sets and Questions

The Enron fraud is a big, messy and totally fascinating story about corporate malfeasance of nearly every imaginable type. The Enron email and financial datasets are also big, messy treasure troves of information, which become much more useful once you know your way around them a bit. Find out about the Enron data set in the explore_enron_data Jupyter Notebook.

Enron story

  • The largest case of corporate fraud in American history: the Enron Corpus with real emails
  • Use it to to try to figure out if there are patterns within the emails of people who were persons of interest in the fraud case to see if you can identify those patterns
  • Using regressions: understand the relationship between the salaries of the people in Enron and their bonuses
  • Clustering on the data (type of unsupervised learning): who within this organization was a member of the board of directors and who was just a regular employee
    • Example: in netflix they use it to identify particular types of people by their movie choices (clusters of users)
  • Outlier detection and removal to find certain lines in the data set that were bugs basically, clean out manually

Person of Interest (POI) - Target

  • Indicted
  • Settled without admitting guilt
  • Testified in exchange for immunity

Regression

Model continuous data using linear regression and use regression to predict financial data for Enron employees and associates in the regressions_enron_data Jupyter Notebook.

Outliers

Outlier detection and removal in the enron_outliers Jupyter Notebook.

#1 Outliers - Rejection Algorithm

  • Fit a regression, take 10% of points that have the largest residuals, relative to your regression
  • Remove them
  • Re-train

#2 Outliers in the Enron finance data

  • get acquainted with some of the outliers in the Enron finance data
  • learn if/how to remove them.

Clustering - Unsupervised Learning

Learn about what unsupervised learning is and find out how to use scikit-learn's k-means algorithm in the enron_clustering Jupyter Notebook.

Feature Scaling

Apply MinMaxScaler on the salary and exercised_stock_options features from the Enron dataset in the previous enron_clustering Jupyter Notebook ro make better predictions abou POIs.

Text Learning

Find out how to use text data in your machine learning algorithm. Use sklearn TfidfVectorizer to convert a collection of raw documents to a matrix of TF-IDF features. Check it out in the enron_text_learning Jupyter Notebook.

Feature Selection

When and why to use feature selection using sklearn classifier feature_importances_ attribute to find out outliers in text data in the enron_feature_selection Jupyter Notebook.

Principal Component Analysis (PCA)

Learn about data dimensionality and reducing the number of dimensions with principal component analysis (PCA) in the eigenfaces Jupyter Notebook, an example that follows Faces recognition using eigenfaces and SVMs.

Cross-Validation

Learn more about testing, training, cross validation and parameter grid searches in the enron_validation Jupyter Notebook

intro_to_machine_learning's People

Contributors

anahristian avatar

Watchers

James Cloos avatar  avatar

Forkers

justinjiajia

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.