Coder Social home page Coder Social logo

eftekhar-hossain / cuet_nlp-eacl_2021 Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 3.0 8.84 MB

This repository contains the system description and the codes that we implemented for participating in EACL-2021 shared tasks.

Jupyter Notebook 100.00%
offensive-language hope-speech-detection transformers multilingual code-mixed text-classification

cuet_nlp-eacl_2021's Introduction

My NLP Projects:

  • Created a tool that can detect the sentiment polarity (either positive or negative) of Book reviews written in Bengali Text.
  • Collected 1k book reviews from different online book shops as well as social media groups. Among these reviews 528 reviews are labelled as positve and 472 reviews are labelled as negative sentiment.
  • Extract Unigram, Bigram and Trigram features from the cleaned Text and use the TF-idf vectorizer as a feature extraction technique.
  • Employed different machine learning classifiers for the classification purpose. The used classifiers are Logistic Regression, Decision Tree, Multinomial Naive Bayes, Support Vector Machine and so on.
  • Evaluate the performance of the classification for every gram feature. Accuracy, Precision, Recall, F1-score, ROC curve and Precision-Recall curve used as evaluation metrics.
  • Finally, created a client facing API using Flask. App link
  • Publication: Link

book

  • Created a tool that can categorizes the Bengali news headlines into six category (National, Politics, International, Sports, Amusement, IT) using deep recurrent neural network.
  • A dataset of 0.13 Million news headlines is created. Chrome web scrapper used for scraping the news headlines from different Bengali online news portals such as Dainik Jugantor, Dainik Ittefaq, Dainik Kaler Kontho and so on.
  • Word embeeding feature represtations technique is used for extracting the semantic meaning of the words.
  • A deep learning model has been built by using a bidirectional gated recurrent network.
  • Finally, the model performance is evaluated using various evaluation measures such as confusion matrix, accuracy , precision, recall and f1-score.

headline

  • Created a tool that can identify the sentiment of a restaurant review written in Bengali Text. It classifies a review as positive or negative sentiment.
  • Collected 1.4k Bengali restaurant reviews from different social media groups of food or restaurant reviews. Among these reviews 630 reviews are labelled as positve and 790 reviews are labelled as negative sentiment.
  • Extract Unigram, Bigram and Trigram features from the cleaned Text and use the TF-idf vectorizer as a feature extraction technique.
  • Employed different machine learning classifiers for the classification purpose. The used classifiers are Logistic Regression, Decision Tree, Multinomial Naive Bayes, Support Vector Machine, Stochastic Gradient Descent and so on.
  • Evaluate the performance of the classification for every gram feature. Accuracy, Precision, Recall, F1-score, ROC curve and Precision-Recall curve used as evaluation metrics.
  • Finally, created a client facing API using Flask and deployed into cloud using Heroku. App Link
  • Publication: Link

rest

  • Developed a machine learning model that can classify the sentimental category (positive, negative and neutral) of a news comment written in Bangla Text.
  • For the implementation a publicly available dataset of 12k news comments have been used.
  • To create the system TF-idf feature extraction technique with n-gram features have been used.
  • Analysed the performance of different machine learning algorithms for n-gram feature by using various evaluation metrics such as accuracy, precision, recall and f1-score.

comment

  • Created a tool that can categorizes the Bengali news articles into 12 diffferent categories (Art, Politics, International, Sports, Science, Economics, Crime, Accident, Education, Entertainment, Environment, Opinion) using Deep Learning.
  • A publicly available dataset of 0.1 Million news articles is used to develop the system. The dataset consist 12 different categories news articles.
  • Word embeeding feature represtations technique is used for extracting the semantic meaning of the words.
  • A deep learning model has been built by using a Convolutional Neural Network and Long Short Term Memory.
  • The model performance is evaluated using various evaluation measures such as confusion matrix, accuracy , precision, recall and f1-score.
  • Finally, developed a client facing API using flask and heroku.
  • Here is the developed Flask App : Document Categorizer App

document

  • Created a word embedding model for Bangla text corpus.
  • Used Word2Vec algorithm.
  • Used a publicly availabe dataset of 0.1 Milion Bangla news articles.
  • Visualized the word similarity using t-sne plot.

word2vec

cuet_nlp-eacl_2021's People

Contributors

eftekhar-hossain avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.