Coder Social home page Coder Social logo

fcm's Introduction

Fcm

The Fcm is a compositional embedding model for relation classification combining unlexicalized linguistic context and word embeddings

Fcm paper (Gormley et al. 2015): http://www.cs.cmu.edu/~mgormley/papers/gormley+yu+dredze.emnlp.2015.pdf

The main purpose of this repository is to run the FCM for the relation classification task on several corpus, using multiples word embeddings and to compute results (such as micro-f1, macro-f1, weighted-f1 etc.)

This repository is made of multiple pieces, the heart being the FCM C++ implementation by Mo Yu

I have build two python scripts around it:

  • 1- The first (main) one is used to run the FCM on a chosen corpus, tuning learning rate and number of epochs, using one or many word embeddings and finally getting results in a file

  • 2- The second one is used to convert a corpus from a Semeval 2010 format to a format usable by the FCM (adding various taggs, dependency path information etc.), if you ever wish to use my work on another corpus and if you can easily have your corpus in a Semeval 2010 format ..

These 2 scripts are INDEPENDANT, if you wish to just use one of them no need to care for installation of the other

I already provide Semeval 2010, Semeval 2018 and reAce 2005 corpus with all results using several word embeddings (see results/macro_f1 folder) so the conversion script may not be that useful

Installation

This repository is for Windows use, a linux version might come in a near future and it should be relatively easy to make it yourself

For the main script you need python 3 and the following packages:

{ numpy, sklearn, scipy}

For the conversion script you need python 3 and the following packages:

{ numpy, scipy, spacy, networkx }

  • To use the main script, you first need to compile the FCM code, open a terminal in fcm folder and make, since this repo is for Windows I recommend using MinGW (don't forget to add it to your PATH environment variable)

Example: make with MinGW

mingw32-make
  • To use the conversion script, you need to compile the SST code (which is a tagger), open a terminal in data/corpus/raw_to_formated_script/sst folder and make (as before I recommend MinGW and mingw32-make)

For this script to run you also need gzip (precisely you need gunzip, its decompression tool) installed for command line usage, you can get it here (don't forget to add it to your PATH environment variable), gunzip might not be recognized as a terminal command, please refer to my Stackoverflow answer in that case

In conclusion the installation might seem complicated but for the main script to run you just need to "make" the FCM and the few python libraries listed, for the conversion script you need to "make" the SST and get gunzip as a terminal command

Usage main script

Open a terminal in the root folder and execute:

python fcm_global.py <train data> <test_data> <epochs> <learning rate> [word embeddings]

Example:

python fcm_global.py semeval2018_train semeval2018_test 30 0.005

Get results in the results/macro_f1 folder

Notes:

  • If you do not write a word embedding argument, it will run on every word embeddings available in the data/word_emb folder
  • Train data and test data files have to be in the data/corpus/formated folder
  • In this repo I only provide one small word embeddings (github size restriction) but you can get bigger and better performing on my drive

Usage conversion script

To convert a corpus in Semeval 2010 format to a format usable by FCM (see data/corpus/raw_to_formated_script.py comments for more details)

Open a terminal in the data/corpus/raw_to_formated_script folder and execute:

python raw_to_formated.py <file to convert>

Example:

python raw_to_formated.py semeval2018_train

Get results in the data/corpus/formated folder

Notes:

  • File to convert has to be in the data/corpus/raw folder and of course in a Semeval 2010 format
  • This script is available in a jupyter notebook version (in french) for better visual understanding in the ...notebook folder

Notes

Do not hesitate to contact me if you need some help

I let the Semeval 2010 official scorer il the results folder if you ever need to use it

Meta

Valentin Macé – LinkedInYouTubeTwitter [email protected]

Distributed under the MIT license. See LICENSE for more information.

fcm's People

Contributors

valentinmace avatar

Watchers

James Cloos avatar

Forkers

gguibon

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.