Coder Social home page Coder Social logo

data-augmentation-techniques's Introduction

Contributors

This is a group project implemented along with JayantSharma777 and sarthakdewan1601.

Methodology

Splash Screen

Data-Augmentation-Techniques

Machine learning in today's era is influencing various different features on various different products. Still, there are many low-resource task scenarios where performance of ML based models are not good enough. Aim of this project is to analyze and compare different data augmentation techniques for a benchmark dataset and build a model that can best perform on this dataset for text classification tasks.

Techniques Used:

  1. Easy Data Augmentation :-Studied and implemented easy data augumentation technique that involves 4 major operations that are Synonym Replacement(SR), Random Insertion(RI), Random Deletion(RD) and Random swap(RS).

  2. Tf-idf score and word embeddings: Studied about tf-idf,bert and various pretrained word embeddinds like Word2Vec, GloVe, FastText, Sent2Vec. After studying we implemented it by initially calculating the tf idf score and replacing the words with low tf idf score with the most similar word in the embedding space and hence increased the dataset.

  3. Text Augmentation using Back Traslation via MarianMT Transformer- We used a HuggingFace model to implement back translation in 3 differenent Romance Languages that are French, Espanol and Latin. We had 3 different models and compared the accuracy of the augmented data with the original on the basis of Text Classification using Gausian Naïve Bayes.

Accuracy Score based on Back Translation MarianMT model:

Original vs Augmented

Using Target Language as Spannish

image

Using Target Language as Latin

image

Accuracy score based on TF-IDF and Pretrained Glove Word embeddings

image

Results

The accuracies increased with the augmented datasets and hence proved our technques right.The next part of our project is to make a tool with these models trained on much larger datasets so that a better augmentation can be generated. This tool will generate a augmentation of any text we input in it which will be meaningfull as well.

data-augmentation-techniques's People

Contributors

tush1810 avatar jayantsharma777 avatar sarthakdewan1601 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.