Coder Social home page Coder Social logo

ultradensifier's Introduction

Densifier - (Orthogonal) Transformation for Word Embeddings

An implementation of the Densifier introduced by Rothe et al. 2016 which aims at grouping words based on any given separating signals such as sentiment, concreteness, or frequency, as long as embeddings encode them.

General Idea and Training Objective

The training objective is to group words in an ultradense space, e.g. dim=1, according to provided separating signals. Setting the ultradense space with dim==1 yields lexicons based on embeddings.

Further Information about the Codes

These codes are optimized such that they only work when dim==1. However, it should be straightforward enough to modify them to output ultradense spaces with dim>1, which can be subsequently feed to NNs.

These codes are written in NumPy. For implementations using autograd frameworks, one can refer to here (for TensorFlow users) and here (for Keras users). Note, running Densifier on GPU may not be ideal -- there are some overheads moving around from tensor to ndarrays (also GPU <-> CPU) for doing the expensive SVD, see this thread.

Requirements and Compatibility

  • Python 2.7
  • NumPy 1.14.3
  • SciPy 1.1.0

All codes are written in Python 2.7, yet should be compatible with Python 3.

Usage

python Densifier.py
--LR		learning rate
--alpha		hyperparameter balancing two sub-objectives
--EPC 		epochs
--OUT_DIM	ultradense dimension size
--BATCH_SIZE	batch size
--EMB_SPACE	input embedding space
--SAVE_EVERY	save every N steps
--SAVE_TO	output trained transformation matrix

Results

Using the same Twitter embedding space in Rothe et al. 2016, these codes perform roughly the same to the TensorFlow implementation -- 0.48 v.s. 0.47 of kendall's tau on the SemEval2015 10B sentiment analysis task, which are unfortunately both lower than 0.65 reported by the original author 0.63 which are close to the reported value 0.65. Note, need to carry out evaluations on the joint vocabulary.

To replicate results in other languages and domains, the original test sets are required but I did not find them.

Some Discussion

  • Efficiency: Rothe el al. 2016 reported that all experiments were finished in 5 mins. Unfortunately I am not able to achieve this speed as it takes me ~0.2s to compute an SVD of a 400 x 400 dense matrix using NumPy. Autograd frameworks can take more time.
  • Orthogonal Constraint: Buechel et al. 2018 reports that enforcing the orthogonal constraint introduces no difference on performance. Similar observations occur in this implementation. Releasing the orthogonal constraint means we now can touch the norm of $Q$ so probably we need to regularize the $l2$ norm of $PQ$ such that the loss does not go to $-\infty$. Again, no improvements are observed in this case either. The orthorgonal constraint regularizes optimization steps go along the surface of the cube, but probably is not significantly helpful when the evaluation metric is ranking based.

References

Papers referred in this implementation:

@InProceedings{N16-1091,
  author = 	"Rothe, Sascha
		and Ebert, Sebastian
		and Sch{\"u}tze, Hinrich",
  title = 	"Ultradense Word Embeddings by Orthogonal Transformation",
  booktitle = 	"Proceedings of the 2016 Conference of the North American Chapter of the      Association for Computational Linguistics: Human Language Technologies    ",
  year = 	"2016",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"767--777",
  location = 	"San Diego, California",
  doi = 	"10.18653/v1/N16-1091",
  url = 	"http://aclweb.org/anthology/N16-1091"
}

@InProceedings{N18-1173,
  author = 	"Buechel, Sven
		and Hahn, Udo",
  title = 	"Word Emotion Induction for Multiple Languages as a Deep Multi-Task      Learning Problem    ",
  booktitle = 	"Proceedings of the 2018 Conference of the North American Chapter of the      Association for Computational Linguistics: Human Language Technologies,      Volume 1 (Long Papers)    ",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"1907--1918",
  location = 	"New Orleans, Louisiana",
  doi = 	"10.18653/v1/N18-1173",
  url = 	"http://aclweb.org/anthology/N18-1173"
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.