Coder Social home page Coder Social logo

research-about-ml-regularized-algorithms's Introduction

Research Project about Machine Learning Project

Brief Introduction

This project is conducted by Professor Amanda Montoya and me, and this project was invited by 2019 Symposium on Data Science & Statistics for poster presentation. The poster and presentation slides are included in this repository. Please feel free to check it out.

Title

Comparing Performance of Lasso, Group Lasso, and Linear Regression with Categorical Predictors

Abstract

Machine learning is used frequently to train models and predict outcomes in different scientific areas. Lasso is a method that perform variable selection and regularization, and is often regarded as an advanced version of linear regression. People try to use lasso in the same way as linear regression, assuming they share same properties. For models with categorical predictors, group lasso has been suggested as an alternative to lasso to align with properties from linear regression. The goal of my project is to show that linear regression, lasso, and group lasso have distinct pros and cons and should be treated accordingly. By analyzing wage data with 6 variables with 20 categories total, we determined that lasso predicts better than group lasso which predicts better than linear regression. We also analyzed the effect of choosing different coding strategies on the predicted results. Linear regression is not affected when different coding strategies are chosen. However, using different coding strategies for categorical predictors, lasso builds model with different variable selection. Group lasso fixes the issue with coding strategy, but it can cause overfitting. Using Monte-Carlo simulation, we created a categorical predictor with one dominant category and several non-predictive categories. When there are few non-predictive categories, group lasso is more likely to include the categorical variable with only one dominant category than lasso. Group lasso is less likely to include this categorical variable than lasso when the number of non-predictive categories increases. Researchers primarily focus on the similarity between linear regression and lasso, but pay little attention to their different properties, particularly involving categorical predictors. This project demonstrates that when using lasso, the effect of choosing different coding strategies should be considered and group lasso should be avoided when a dominant category is expected.

research-about-ml-regularized-algorithms's People

Contributors

charlotte0408 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.