Coder Social home page Coder Social logo

l03-svm's Introduction

L03 SVM

Clone or download the repository as a zip folder and begin an R project for this lab. The directory contains instructions (repeated below) and a template to get an .Rmd file started.

Overview

The main goal of this lab is to continue practicing the application of support vector machines (SVMs).

Dataset

We have split the wildfires.csv dataset into a training dataset (wildfires_train.csv) and test dataset (wildfires_test.csv). They are contained in the data subdirectory along with a codebook.

Exercises

Located in the northeast of the wilderness area is a wildlife protection zone. It is home to several rare and endangered species, and thus conservationists and park rangers are very interested in whether a given wildfire is likely to reach it. The following exercises involve fitting models to predict if a wildfire reaches this zone. Please complete the exercises; be sure your solutions are clearly indicated and that the document is neatly formatted.

Exercise 1

In the following exercise, you will be fitting and tuning several different candidate models. You will then be asked to compare the tuned models. The data furnished to you for this lab are already split into a training and test set. Describe how you will use these two sets of data to tune and compare these candidate models. Will you need to create additional splits? Where will you apply cross-validation (if at all)? Justify your approach.

Exercise 2

Our goal is to predict whether a wildfire will reach the wildlife protection zone, as denoted by the indicator variable wlf. Previously we utilized boosting, bagging, and random forests methods to build candidate models. We also benchmarked those methods against a logistic regression model that utilized all predictors. In this lab, you will use a support vector classifier and support vector machines.

A. Tune a support vector classifier with a linear kernel to select an optimal cost. Consider values in the range 0.01 to 10. Compute the training and test error rates using this new value for cost. (Hint: You can use the tune() function or crossv_kfold() to tune your model.)

B. Tune a support vector classifier with a radial kernel to select an optimal cost. Use the default value for gamma and try to determine your own range of values for cost to explore.

C. Tune a support vector classifier with a polynomial kernel to select an optimal cost. Use the default value for degree and try to determine your own range of values for cost to explore.

D. Calculate the test error for each of the 3 candidate classifiers selected in parts (A) - (C). Which classifier is the best?

E. Construct a table displaying the test error for these 3 candidate classifiers, the candidate models from L02 CART, and the multiple logistic regression fit in L02 CART.

Challenge 1 (Not Mandatory)

Use the ggroc() function from the pROC package to construct ROC curves for the SVM candidate classifiers from above. Do this for each classifier on both the training and testing sets.


Challenge 2 (Not Mandatory)

Expand your solution to Exercise 2(B) where you fit a radial kernel SVM, and tune your model to find an optimal value of gamma. You can either tune gamma fixing the cost to the optimal value you found in Exercise 2(B). Alternatively, you can jointly tune both gamma and cost.

Challenge 3 (Not Mandatory)

Expand your solution to Exercise 2(C) where you fit a polynomial kernel SVM, and tune your model to find an optimal value of degree. You can either tune degree fixing the cost to the optimal value you found in Exercise 2(B). Alternatively, you can jointly tune both degree and cost.

l03-svm's People

Contributors

j3schaue avatar akuyper avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.