Coder Social home page Coder Social logo

rushi-the-neural-arch / openai-nlp-hackathon-symbiosis-university Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 69 KB

This repository is made in lieu of submission towards the solution of problem statement 2 of the OPEN AI NLP hackathon. The objective here is to classify the voice recordings of a call center proceeding by treating them as consumer complaints into the said categories of the automotive industry

Jupyter Notebook 97.75% Dockerfile 2.25%
nlp sentence-similarity sentence-embeddings sentence2vec gensim-word2vec crawler text-classification sentence-pairs

openai-nlp-hackathon-symbiosis-university's Introduction

OpenAI-NLP-Hackathon-Symbiosis-University

This repository is made in lieu of submission towards the solution of problem statement 2 of the OPEN AI NLP hackathon. The objective here is to classify the voice recordings of a call center proceeding by treating them as consumer complaints into the said categories of the automotive industry

Consumer-Complaint-Classification-OPEN-AI

This repository is made in lieu of submission towards the solution of problem statement 2 of the OPEN AI NLP hackathon. The objective here is to classify the voice recordings of a call center proceeding by treating them as consumer complaints into the said categories of the automotive industry.

Dataset

https://www.kaggle.com/dushyantv/consumer_complaints

Methodology

Pre-Training

  • Batching complaints from the common pool as a pair with SQUAD like format.
  • Evaluation model for pre-training used is: SQUAD 2.0 dev

2D Feature Space Generation

  • For each pair from the pre-trained pool, a 2d vector is plotted on the feature space.
  • The resultant magnitude of that vector is taken as a sentence embeddinng.
  • This embedding is validated using the gensim wordmodel, to create the sentence level representations.
  • These representations are further clustered by biased c-means to find the average number of classes.
  • Further, the number of classes is mapped against the existing classes of consumer complaints.

Model Training

  • A 6 layer, 32 node wide CNN is created with the node weights initialized with the sentence level representations.
  • This network is trained on the basis of end-member uniqueness for each class, to define the boundary of each class of complaints.

Model Evaluation / Testing

  • The results are evaluated using a reverse TF-IDF matrix, mapping it with the original class matrix.
  • The dot product of these matrices determines the correctness of the model:
    1. a dense matrix represents high variance and hence low accuracy
    2. a sparse matrix represents a low variance and high accuracy.

References

  • SQUAD 2.0 pre-training was referenced from the sequel paper on SQUAD 2.0 dataset and training strategy.
Rajpurkar, Pranav, Robin Jia, and Percy Liang. “Know What You Don’t Know: Unanswerable Questions for SQuAD.” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2018. https://doi.org/10.18653/v1/p18-2124.
  • Also, a shoutout to supreetkt for providing a foundation for consumer complaints classification on text based responses!

openai-nlp-hackathon-symbiosis-university's People

Contributors

rushi-the-neural-arch avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.