Coder Social home page Coder Social logo

cordero-c-perez / machine-learning Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 75.17 MB

This repository houses small projects (R and Python) exploring Machine Learning methods and algorithm description & implementation. Larger ML projects are housed in the Projects Repo.

R 0.60% Jupyter Notebook 99.40%
decision-trees knn machine-learning oner rpart rpartplot random-forest c50 naive-bayes clustering

machine-learning's Introduction

ML Repo

Note: some data is not provided in the repository. Links are available in the README when the data is not included.

This repository houses an ongoing series of small projects exploring Machine Learning models/implementations. There will typically be three files for applications in R (.r, .rmd, .md), and two files for applications in Python (.ipynb, .md) The README file descriptions below include the link to the markdown file and a small description of the ML implementation.

Files

Applies kNN classification (R) to speech recognition data and aims to identify the “best k” via manual cross-validation under 17 values of k. Overall classification accuracy serves as the measure of performance here and the data used for this can be found here.

Applies kNN classification (Python) to speech recognition data and aims to evalaute performance for the out of box model vs. a tuned model. Overall classification accuracy, precision, and recall, all serve as the measures of performance here and the data used for this project can be found here.

Applies 4 decision tree algorithms (C5.0, OneR, rpart, and randomForest (R)) to a diabetes dataset offered in the UCI machine learning repository with the aim to correctly classify the presence of diabetes given the presence of other conditions. The goal is to then improve on the models with a business objective in mind, NOT to improve the overall accuracy, and compare. The business objective being that the presence of false negatives in trying to predict the presence of a condition outweighs the overall accuracy of correct classification.Data used for this project can be found here.

This can be considered a supervised learning application (R) with a commonly used dataset (iris) to explore distinctions between three techniques, Hierarchical Clustering, Kmeans and Density Based Spatial Cluster Applications w/ Noise (DBSCAN) in R. Although this is not supervised learning in the traditional sense, this project explores applying three methods in a supervised setting as the data has the correct cluster classification available (species) to check the results against. This is important for identifying subtle nuances between methods and understanding the built-in assumptions of these functions. The Iris dataset is used as there are only quantitative variables present and thus removes the trouble of creating a proper distance or dissimilarity matrix for mixed data types (explored in later projects with mixed data). The real takeaway for this application is that in order to cluster effectively, the measurements chosen for features are far more important than the clustering algorithm itself.

Applies Logistic Regression (Python) to the breast cancer dataset with the aim of identifying the presence of malignant tissue samples given the sample features. This model does very well achieving 98% recall and 100% precision. Here recall is probably more important from the business' perspective so an iteration which trades precision for recall would be better suited for business implementation.

machine-learning's People

Contributors

cordero-c-perez avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.