Coder Social home page Coder Social logo

pcrf's Introduction

pcrf

Python Linear CRF

= Introduction =

PCRF is a open source implementation of (Linear) Conditional Random Fields (CRFs) for segmenting/labeling sequential data. PCRF is implemented in Python (2.7.5) and can be applied to a variety of NLP tasks, such as Named Entity Recognition, Information Extraction , chinese word segmentation and Event Extraction.

== Features ==

Implemented in python, numpy, scipy. 100% python.

Fast training based on LBFGS( a quasi-newton algorithm for large scale numerical optimization problem)

Less memory usage in training, can handel tens of millions features.

decoding in practical time

Available as an open source software

Train file format and Feature template are compatible with popular CRF implementations, like CRF++

Multiprocessing

optimized for less energy conssumption and CO2 emission. (can ran half the speed as CRF++)

== Versions ==

pcrf ver 0.1. released 2014-03-01. Initial Release

== Download (Check out) ==

Github:

https://github.com/huangzhengsjtu/pcrf/

= Train file format and Feature template =

Compatible with CRF++. http://crfpp.googlecode.com/svn/trunk/doc/index.html#format

= Examples =

== a simple example==

The training , testing and template file come with pcrf codes.

using pcrf to Learn the model:

c:> python pcrf-train.py trainsimple.data templatesimple.txt model

using pcrf to test:

c:> python pcrf-test.py trainsimple.data model result.txt

You are expecting to get a 100% correct rate.

== chunk example==

The training data is from CRF++.

using pcrf to Learn the model:

c:> python pcrf-train.py trainchunk.data templatechunk model

trainchunk.data is the training data. templatechunk is the template file to generate features. model is the learnt model file (which stores all the \theta. ). After the Train process finished, you will get the model file. On an intel core i3 machine, it takes around 400 seconds to finish.

Valid Template Line Number: 20 read 10000 lines. number of labels: 17 Linear CRF in Python.. ver 0.1 B features: 289 U features: 1075420 total num: 1075709 training sequence number: 823 RUNNING THE L-BFGS-B CODE At iterate 0 f= 5.92173D+05 |proj g|= 3.40110D+03 ..... At iterate 61 f= 1.14233D+03 |proj g|= 6.15748D-01 At iterate 62 f= 1.14213D+03 |proj g|= 9.20878D-01 N Tit Tnf Tnint Skip Nact Projg F ***** 62 72 1 0 0 9.209D-01 1.142D+03 F = 1142.13417137736

using pcrf to test:

c:> python pcrf-test.py testchunk.data model result.txt

number of labels: 14 Linear CRF in Python.. ver 0.1 B features: 289 U features: 1075420 total num: 1075709 Prediction sequence number: 77 Note: If Y is useless, correct rate is also useless. correct: 1757 error: 139 correct rate: 0.926687763713 Write max(y) to file: results.txt Test finished in 1.68699979782 seconds.

testchunk.data is the testing data. model is the learnt model file in training. After the test process finished, you will get the result file. Test program also output a very simple correct ratio to check whether the CRF model works.

== experiment 2 ==

= Notes and Bugs =

On Windows system, sometime the training program can't start. Check if there is a iterate.dat file that is locked by other python process. Kill the idle python process, it should work. iterate.dat is generated by LBFGS module from scipy.optimize.

= To do =

extended to general CRF.

using BP to train and learn CRF.

Learn the structure of CRF.

pcrf's People

Contributors

huangzhengsjtu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.