Coder Social home page Coder Social logo

predict_drop_out's Introduction

Predict_drop_out

Prediction of student behavior has been a prominant area of research in learning analytics and a major concern for higher education institutions and ed tech companies alike. It is the bedrock of methodology within the world of cognitive tutors and these methods have been exported to other areas within the education technology landscape. The ability to predict what a student is likely to do in the future so that interventions can be tailored to them has seen major growth and investment, though implementation is non-trivial and expensive. Although some institutions, such as Purdue University, have seen success we are yet to see widespread adoption of these approaches as they tend to be highly institution specific and require very concrete outcomes to be useful.

In this project I will model student data using three flavors of tree algorithm. We will be using these algorithms to attempt to predict which students drop out of courses. Many universities have a problem with students over-enrolling in courses at the beginning of semester and then dropping most of them as the make decisions about which classes to attend. This makes it difficult to plan for the semester and allocate resources. However, schools don't want to restrict the choice of their students. One solution is to create predictions of which students are likley to drop out of which courses and use these predictions to inform semester planning.

We will be using the tree algorithms to build models of which students are likely to drop out of which classes.

Software

In order to generate our models we will need several packages. The first package we should install is caret.There are many prediction packages available and they all have slightly different syntax. caret is a package that brings all the different algorithms under one hood using the same syntax.

The other packages we install is party and C50.

Data

The data comes from a university registrar's office. The code book for the variables are available in the file code-book.txt. Examine the variables and their definitions. Upload the drop-out.csv data into R as a data frame.

The next step is to separate the data set into a training set and a test set. Randomly select 25% of the students to be the test data set and leave the remaining 75% for your training data set.

length(unique(dropout$student_id))*0.75 # the result shows that 75% of students if 511

inTrain<-data.frame(sample(unique(dropout$student_id), size=511))
names(inTrain)<-c("student_id")

training<-inner_join(dropout, inTrain, by="student_id")
testing<-anti_join(dropout, inTrain, by="student_id")

Here I will be predicting the student level variable "complete". Visualize the relationships between the chosen variables as a scatterplot matrix, and explore relationships with other visualizations.

scatter

eda1

eda2

CART Trees

Construct a classification tree that predicts complete using the caret package.

Check the results

cart

Plot ROC against complexity

roc

Now predict results from the test data. Generate model statistics

predict1

Conditional Inference Tree

Train a Conditional Inference Tree using the party package on the same training data and examine the results.

# define the prediction model
partyFit<-ctree(complete~., data=training2)
# plot the tree
plot(partyFit)

party

Now test the new Conditional Inference model by predicting the test data and generating model fit statistics.

predict2

C50

Install the C50 package, train and then test the C5.0 model on the same data.

# Train the model using "C5.0"
c50Fit<-C5.0(training2[,-4], training2[,4])

# predict with c50Fit model
c50Classes<-predict(c50Fit,newdata=testing2)

confusionMatrix(c50Classes, testing2$complete)

predict3

Relevant Materials

Brooks, C. & Thompson, C. (2017). Predictive modelling in teaching and learning. In The Handbook of Learning Analytics. SOLAR: Vancouver, BC

Jalayer Academy. (2015). R - Classification Trees (part 1 using C5.0)

The caret package

predict_drop_out's People

Contributors

ab4499 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.