Coder Social home page Coder Social logo

mahfujur1 / diversity-based-generalization Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jitinkrishnan/diversity-based-generalization

0.0 1.0 0.0 27.56 MB

Unsupervised Domain Adaptation in Text (ECML-PKDD '20)

License: MIT License

Python 100.00%

diversity-based-generalization's Introduction

Diversity-Based Generalization for Unsupervised Text Classification under Domain Shift

Purpose of the model: Train a classifier in one domain where plenty of data is available and generalize it to another domain that has no (labeled or unlabeled) data.

Paper/Cite

https://arxiv.org/abs/2002.10937 (To appear at ECML-PKDD 2020)

@article{krishnanDiversity,
  title={Diversity-Based Generalization for Unsupervised Text Classification under Domain Shift},
  author={Krishnan, Jitin and Purohit, Hemant and Rangwala, Huzefa},
  journal={In Proceedings of the 19th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
  year={2020}
}

Why use this method? (See paper for detailed performance comparison)

  • Unlike the existing state-of-the-art unsupervised methods, no unlabeled target data is needed to train the model. Our model is out-of-the-box adaptable to any domain.
  • Computationally much cheaper (days to a few hours on a cpu) as it does not use unlabeled target data (which means no gradient reversal or manual pivot extractions), with no trade-off in performance.

Results on Blitzer Dataset

Task Accuracy in %
books-dvd 87.46
books-electronics 86.08
books-kitchen 87.68
kitchen-books 84.23
kitchen-dvd 83.34
kitchen-electronics 89.22
electronics-books 84.33
electronics-kitchen 91.05
electronics-dvd 82.81
dvd-books 88.74
dvd-electronics 86.21
dvd-kitchen 87.37
Average 86.54

Results on the newly labeled Crisis Dataset

Task Accuracy in %
Harvey-Florence 78.11
Harvey-Irma 64.38

Results on very divergent datasets such as Yelp and IMDb, in addition to Amazon Reviews. (3 randomly selected combinations).

Task Accuracy in %
Electronics-Yelp 89.15
Kitchen-IMDb 78.33
Yelp-IMDb 77.28

Requirements

Python3.6, Keras, Tensorflow. Or pip install -r requirements.txt to install necessary packages.

Additional Requirements

Download GoogleNews-vectors-negative300.bin

Data

All datasets in raw_data folder. @user mentions anonymized for twitter data.

To add new dataset:

Place your positive/negative/unlabeled in the raw_data folder (no preprocessing needed) and name the files accordingly.

Sample Runs of all models

BiLSTM

python bilstm.py 'electronics' 'kitchen

BiLSTM + Attention

python bilstm_attention.py 'electronics' 'kitchen

BiLSTM + MHA

python bilstm_mha.py 'electronics' 'kitchen'

BiLSTM + MHAD

python bilstm_mhad.py 'electronics' 'kitchen'

BiLSTM + MHAD + Tri-I

python bilstm_mhad_tri_I.py 'electronics' 'kitchen'

BiLSTM + MHAD + Tri-II

python bilstm_mhad_tri_II.py 'electronics' 'kitchen'

Contact information

For help or issues, please submit a GitHub issue or contact Jitin Krishnan ([email protected]).

diversity-based-generalization's People

Contributors

jitinkrishnan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.