Coder Social home page Coder Social logo

cross-lingual_ner's Introduction

Neural Cross-Lingual Named Entity Recognition with Minimal Resources

This is the code we used in our paper

Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Jiateng Xie, Zhilin Yang, Graham Neubig, Noah A. Smith, Jaime Carbonell

EMNLP 2018

Requirements

Python 2.7 or 3.6

PyTorch >= 0.3.0

Theano 1.0

Lasagne 0.2

The original results of the paper are tuned and obtained using the NER model written in Theano/Lasagne. Everything else is in PyTorch. We also provide a PyTorch implementation of the NER model, which might produce slightly worse results, due to implementation differences between the libraries such as different weight initialization schemes.

Train Bilingual Word Embeddings

To train bilingual word embeddings, we use MUSE.

After installing MUSE, to get a mapping (e.g., en-es, identical character strings), first set VALIDATION_METRIC = 'mean_cosine-csls_knn_10-S2T-10000' in supervised.py, and then run, for instance:

python supervised.py --src_lang en --tgt_lang es --src_emb data/wiki.en.vec --tgt_emb data/wiki.es.vec --n_refinement 3 --dico_train identical_char --max_vocab 100000

which will produce a mapping at a location such as /your_path/MUSE/dumped/debug/qbun3algl8/best_mapping.pth

To create a word-to-word translation file, run:

./run_load_muse.sh

Note, if your embedding file contains a 1st line that specifies the size and the dimension of the embedding file, such as 2519370 300, remove it before you run this script (include it though when running MUSE).

Data Format

We use IOB2 tagging scheme, and NER data in the following format:

Peter B-PER

Blackburn I-PER

Transfer Training Data

Simply run:

./run_transfer_training_data.sh

Train Cross-Lingual NER Model

For the Lasagne/Theano implementation, to reproduce our results, run:

./run_lasagne_ncrf.sh

For the PyTorch implementation, run:

./run_pytorch_ncrf.sh

cross-lingual_ner's People

Contributors

lixin4ever avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.