Coder Social home page Coder Social logo

raajanwankhade / bert-dialogue_classification Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 1.72 MB

A dialogue classifier from the show The Office. This code utilises the Bert (Base) model to classify the dialogues as Jim or Dwight's Dialogues.

Jupyter Notebook 100.00%

bert-dialogue_classification's Introduction

Dialogue Classification using Bidirectional Encoder Representations from Transformers (BERT)

A dialogue classifier from the show The Office. This code utilises the fine-tuned BERT-based model to classify the dialogues as Jim or Dwight's Dialogues.

Jim vs. Dwight Dialogue Speaker Classification

This project focuses on classifying speakers in TV series dialogues using a fine-tuned BERT-based model. The model is trained on a dataset containing dialogues from the TV series "The Office" and can predict the speaker of the line with a 84% accuracy.

Dataset

The dataset used in this project consists of dialogues from The Office. Note: The model was only trained on the dialogues given in the file train.csv for a Kaggle competition by IEEE NITK's Computer Intelligence Society.

Requirements

  • TensorFlow
  • Transformers
  • Pandas
  • NumPy

Application

  1. About the dataset:

    • The training dataset was in CSV format with columns: "id", "line", "speaker".
    • The validation dataset was split from the train.csv file, hence it is also in the the same format.
    • Data was properly cleaned and pre-processed.
  2. Fine tuning the BERT model:

    • Run the BERT-ieeekagglecup-2023.ipynb notebook to fine-tune the BERT model on the training dataset.
    • The notebook tokenizes the text, prepares input tensors, and trains the model.
  3. Evaluation:

    • The trained model is evaluated on the validation dataset using accuracy and classification report metrics.
    • The evaluation results can be used to assess the model's performance.
  4. Making predictions:

    • Use the trained model to predict the speaker of dialogue lines.
    • test.csv dataset is in CSV format with columns: "id", "line".
  5. Generated the submission CSV for the Kaggle Contest - my first one :)

Feel free to contribute, open issues, or submit pull requests to enhance the project!

bert-dialogue_classification's People

Contributors

raajanwankhade avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.