Coder Social home page Coder Social logo

dochoa1 / instacart Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 247.06 MB

InstaCart Market Basket Analysis

Home Page: https://www.kaggle.com/c/instacart-market-basket-analysis

HTML 99.86% R 0.06% Jupyter Notebook 0.09%
instacart machine-learning exploratory-data-analysis visualization predictive-modeling supervised-learning

instacart's Introduction

InstaCart Market Basket

Which products will an Instacart consumer purchase again?

This project is based on the data provided by InstaCart during this Kaggle competition.

Setup

Download the data provided by InstaCart here.

Be sure to download all of the files except for sample_submission.csv.zip, this project does not make use of that file.

Once downloaded make sure that all of the .csv files are exported to a subdirectory called Source.

If you find that your machine of choice lacks to computing power to work with the datasets as they are than we recommend you run /DataPrep/OriginalSamplingCode.R which will produce 20% randomly sampled orders and order_products datasets according to user_id in a reproducible manner. The percentage randomly sampled can be configured in the file. This files can be found in the Source directory.

If you would like to see predictions made by our models on the data then you will first need to run /DataPrep/OriginalSamplingCode.R on the desired data (sampled or not). This formats the data to be suited for predictive models. The output will be found at /Source/trainingData.csv and /Source/testingData.csv.

After the testing and training datasets have been created then you can run the gradient boosting model on the data and analyze its results at /PredictiveAnalysis/PredictiveAnalysis_Orders.Rmd.

Run the deep learning model

Download Anaconda and Jupyter notebook. Packages needed are sklearn, pandas, numpy and keras. Data used are /Source/trainingData.csv and /Source/testingData.csv. The model takes about two hours to run. The training data is divided into 80% train and 20% test. After building the model on the training data, it is used to make predictions on the testing data.

Run the logistic regression model

The file titled Logit_originalDf.R runs logistic regression model on the above-mentioned data frame. The rest of the logit files run logit on wrangled data sets. The first wrangling of the original data is a data frame titled data_final which can be found in the file titled PredictiveDataPrep_alt in the DataPrep folder. Other subsets of data_final comes from the file PredictiveDataPrep_sample. The data frames used in the analysis among the subset data are df.ordProd which is a filtered version of data_final containing users who have ordered 15 or more times and purchased at least 100 products. df.ordProd900 is a sample of 900 users from df.ordProd. Both PredictiveDataPrep_alt and PredictiveDataPrep_sample contain save codes that will save the data to the Source folder.

Some of the data frames under PredictiveDataPrep_sample come from clustering files, so run the clustering files first on logit analysis that has been done on clustered user_id or product_name. The clustered data frames the logit analysis uses are: clust3_Uid, clust4_Uid, and clust3_Prod.

The data is divided into 70% train and 30% test. As with other models, the predictions from the training data are made on the testing data.

Run the clusters

All cluster files that begin with PredictiveAnalysis_Clusterby require df.ordProd900 in the file PredictiveDataPrep_sample under DataPrep folder. These files contain save code that will save the data to the Source folder.

Citation

The data used in this project was made publicly available courtesy of InstaCart.

β€œThe Instacart Online Grocery Shopping Dataset 2017”, Accessed from https://www.instacart.com/datasets/grocery-shopping-2017 on <11/08/2017>

instacart's People

Contributors

hthaivu avatar qiuhansun avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

april825

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.