Coder Social home page Coder Social logo

crf's Introduction

About

This Java project is an implementation of linear-chain Conditional Random Fields (CRF) in pure Java, with no third party dependency (except for log4j).

The CRF is exemplified by being used for the task of Part-Of-Speech tagging.

This is a free open-source project, which can be used also for commercial purposes. See LICENSE file.

Advantages

  1. Self contained: all the algorithms are implemented in the package itself.
  2. Free open source, also for commercial use.
  3. Relatively efficient: while this is a "by the book" implementation, some techniques have been invented and implemented to significantly improve run time efficiency.
  4. Clear, readable and well documented code. Might be useful also for educational purposes.

Self contained

The only required thrid party is Log4j. See http://logging.apache.org/log4j/1.2/ (Note that it is automatically downloaded by Maven. No user action required).

The algorithms, including function optimization using LBFGS algorithm, forward-backward algorithm and Viterbi algorithm are fully implemented in the code.

Compile and run

The project can be compiled with J2SE version 8.

This is a Maven project, compiled by simply running mvn compile.

To use this project as a library in another Maven project, add the following to the other project's POM file:

<dependency>
    <groupId>com.github.asher-stern</groupId>
    <artifactId>CRF</artifactId>
    <version>1.2.0</version>
</dependency>

Entry points

For those who are interested only in the CRF but not the POS-tagger, an example entry point is com.asher_stern.crf.crf.run.ExampleMain.

Note that this entry point is only a skeleton, and the user should copy it, and implement the feature generator and other stuff required to run the CRF for the user's specific problem.

Those who are interested in the POS-tagging example can use the following two entry points: com.asher_stern.crf.postagging.demo.TrainAndEvaluate and com.asher_stern.crf.postagging.demo.UsePosTagger.

The first entry point is for training the POS-tagger and evaluating it, and the second is for running it on test examples.

Note that training requires the Penn Tree-Bank corpus, and it should be provided as a directory with no subdirectories, which contains ".mrg" files, where each files contains parse-trees.

A note about the POS-tagging example

Please note that this is not a state-of-the-art POS tagger, since it does not employ state-of-the-art features. Rather, this POS-tagger uses very simplistic features, and is intended to exemplify CRF.

Enjoy!

crf's People

Contributors

asher-stern avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.