Coder Social home page Coder Social logo

ankit152 / stackoverflow-tag-prediction Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 1.05 MB

A machine learning model that predicts tags for a given question and body.

License: MIT License

Jupyter Notebook 100.00%
stackoverflow tag-prediction machine-learning onevsrestclassifier hamming-loss micro-f1score nlp stemming text-mining text-data

stackoverflow-tag-prediction's Introduction

Stack Overflow Tag Prediction ๐Ÿท๏ธ

A machine learning model that predicts tags for a given question and body.

Dataset Link: https://www.kaggle.com/imoore/60k-stack-overflow-questions-with-quality-rate

For developers, by developers ๐Ÿ‘จโ€๐Ÿ’ป

Stack Overflow is an open community for anyone that codes. They help you get answers to your toughest coding questions, share knowledge with your coworkers in private, and find your next dream job.

For businesses, by developers ๐Ÿ•ด๏ธ

Their mission is to help developers write the script of the future. This means helping you find and hire skilled developers for your business and providing them the tools they need to share knowledge and work effectively.

Problem Defination ๐Ÿค”

Given a Title and the Body of a question, we have to predict the relevant tags such that the question gets recommended to the right domain expert so that the expert can answer the question correctly.

Business Constraints โœ”๏ธ

  • To predict as many tags as possible with very high precision and recall.
  • Incorrect tags could impact the customer experience on Stack Overflow.
  • No strict latency constraints. The model should be able to generate the relevant tags in a reasonable amount of time.

Data ๐Ÿ—„๏ธ

  • train.csv = 48 MB
  • test.csv = 16 MB

The data consists of 6 columns.

  1. Id: Represents the ID of the question
  2. Title: Represents the title of the question
  3. Body: Represents the body of the question where the question is explained properly
  4. Tags: The tags relevant for the question asked
  5. CreationDate: The date at which the question was asked
  6. Type: Deals with the quality of the question

Our main important features in the dataset are Title,Body and Tags.

Plots for better understanding ๐Ÿ“Š

Countplot of Tags per question ๐Ÿ“ˆ

This is the countplot of number of tags per question.

The key take away from the above plot is that most of the question has 2 or 3 tags in them.

Distribution of Tags ๐Ÿ“‰

This is the distribution of number of times the tag appeared in questions.

The key take away from the above plot is that a tag is appearing 5 time in max.

WordCloud โ˜๏ธ

This is the wordcloud generated from the tags and it's count.

The more frequent tags appears to be bigger in the wordcloud and vice versa.

stackoverflow-tag-prediction's People

Contributors

ankit152 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.