Coder Social home page Coder Social logo

ucleed's Introduction

UCL/UMass BioNLP Event Extractor

Installation

Get maven, untar/unzip and then run

$ mvn compile

If it looks like dependencies cannot be found, first try this.

ucleed stores preprocessed data in a mongo database. Hence you need to get mongo, and run the mongo server

$ mongod

You should also have an installation of the BioNLP reranking parser by David McClosky on your machine.

You also need to configure a few directory locations. Copy the example in src/main/resources/props/example.prop and modify as needed.

Setting up the syntactic parser

ucleed uses the reranking parser by David McClosky, in combination with his Improved self-trained biomedical parsing model. In the configuration file, set rerankparser to the main directory of the parser, and biomodel to the directory of the biomedical parsing model. Note that for some odd reason, the recent versions of the bllip reranker expects bzip files and not the gzip files provided in the biomodel. You can fix this by calling

$ gunzip *.gz; bzip *

in the biomodel/reranker directory.

Preprocessing

Before we train, we need to go through two preprocessing steps that prepare the data.

Data preprocessing

First call

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx1g -Dprop=props/example.prop  -cp %classpath cc.refectorie.proj.bionlp2011.ClearRaw"

to clear the database (this is actually only necessary if you want to rerun experiments but it shouldn't hurt). Then do

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx1g -Dprop=props/example.prop  -cp %classpath cc.refectorie.proj.bionlp2011.LowLevelAnnotation dev train test"

This will add tokenize, sentence-split etc. the data specified in the prop file.

Feature preprocessing

Next we run

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx1g -Dprop=props/example.prop  -cp %classpath cc.refectorie.proj.bionlp2011.ClearAnnotated"

to initialize the feature preprocessing database. Then do:

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx1g -Dprop=props/example.prop  -cp %classpath cc.refectorie.proj.bionlp2011.App dev train test"

This will prepare some candidate structures that are used during inference/learning.

Learning

Now copy data with features to the learning KB:

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx1g -Dprop=props/example.prop  -cp %classpath cc.refectorie.proj.bionlp2011.ClearLearningKB"

Finally, you're ready to train the model

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx8g -Dprop=props/example.prop -cp %classpath cc.refectorie.proj.bionlp2011.BioNLPLearner"

This will store weights for different epochs into $UMASSDIR/weights/[epoch]

Learning also runs evaluation on test and development sets. The results will appear in the outDir specified in the prop file.

Testing

You can use the stored weights in a standalone tool that applies the complete preprocessing chain and the event extractor model to input files. For this first set weightsSrc=weights/[epoch of choice] in the prop file. Generally epoch 4 or 5 seems to give good results, but can check what works best on the dev set.

Then run the standalone tool as follows:

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx80g -Dprop=props/example.prop -cp %classpath cc.refectorie.proj.bionlp2011.UMassBioEventExtractor [txt file] [a1file] [destfile]"

Further Reading and Citations

The most relevant citation for this work is our EMNLP paper. Further details can be found in our BioNLP shared task papers on system combination and dual decomposition.

ucleed's People

Contributors

riedelcastro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ucleed's Issues

Installation trouble

Firstly, Maven was unable to find certain packages such as factorie, mongodb, etc. I installed them as required by Maven. However, while compiling the sources, mvn comes up with the following error :-

[ERROR] ~/ucleed/src/main/scala/cc/refectorie/proj/bionlp2011/AntiTransitivityModule.scala:5: error: value factorie is not a member of package cc
[INFO] import cc.factorie._
[INFO]           ^

I am unable to find a solution to this. Could you throw some light on how to fix this issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.