Coder Social home page Coder Social logo

anktplwl91 / amazon_reviews Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 773 KB

Multilabel classification of Amazon reviews based on the provided title and complete review text

Jupyter Notebook 100.00%
machine-learning data-science data-analysis python3 jupyter-notebook

amazon_reviews's Introduction

Amazon_Reviews

Multilabel classification of Amazon reviews based on the provided title and complete review text

Problem Statement

We are given Amazon reviews, review title and review complete text as input features, using which we are supposed to predict the labels for unseen reviews.

Train file has 3 columns, Review Title, Review Text and Topic(Target variable). Test file only has Review Title and Review Text.

Data Cleaning

As all Data Science projects should go, this is an essential first step to go about. The reviews are generally given by users on Amazon website, and might contain noisy words, expressions, numbers, or phrases which do not actually add up to model performance, or even sometimes degrade it. I used Regex to take out such quotations, weblinks, and few noisy expressions having Video ID.

Models Used

I experimented with 3 different models majorly : XgBoost, NB-SVM and Deep Learning (Bi-LSTM, LSTM-Conv1D). After training initial models, I used feature_importances from trained models to filter out only features which were impacting the Target variable. Once I had this small subset of features, I then searched for optimal hyper-parameters using RandomSearchCV on this small feature space.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.