Coder Social home page Coder Social logo

wapiti's Introduction

Wapiti - A linear-chain CRF tool

Copyright (c) 2009-2013  CNRS
All rights reserved.

For more detailed information see the homepage.

Wapiti is a very fast toolkit for segmenting and labeling sequences with discriminative models. It is based on maxent models, maximum entropy Markov models and linear-chain CRF and proposes various optimization and regularization methods to improve both the computational complexity and the prediction performance of standard models. Wapiti is ranked first on the sequence tagging task for more than a year on MLcomp web site.

Wapiti is developed by LIMSI-CNRS and was partially funded by ANR projects CroTaL (ANR-07-MDCO-003) and MGA (ANR-07-BLAN-0311-02).

For suggestions, comments, or patchs, you can contact me at [email protected]

If you use Wapiti for research purpose, please use the following citation:

@inproceedings{lavergne2010practical,
    author    = {Lavergne, Thomas and Capp\'{e}, Olivier and Yvon,
                 Fran\c{c}ois},
    title     = {Practical Very Large Scale {CRFs}},
    booktitle = {Proceedings the 48th Annual Meeting of the Association
                 for Computational Linguistics ({ACL})},
    month     = {July},
    year      = {2010},
    location  = {Uppsala, Sweden},
    publisher = {Association for Computational Linguistics},
    pages     = {504--513},
    url       = {http://www.aclweb.org/anthology/P10-1052}
}

wapiti's People

Contributors

arnsholt avatar jekub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wapiti's Issues

--model option

-m | --model
Specify a model file to load and to train again. This allow you
either to continue an interrupted training or to use an old
model as a starting point for a new training. Beware that no new
labels can be inserted in the model. As the training parameters
are not saved in the model file, you have to specify them again,
or specify new one if, for example, you want to continue train-
ing with another algorithm or a different penalty.

How to continue an interrupted training ? When I used "-m", it started from the beginning.

Training stops early with l-bfgs

I am using wapiti with a small set of data. When training with the other three optimization algorithms, everything is fine, the resulting models tag well, when tested.
Training with l-bfgs optimization, however, doesn't go through and stops after the 5th iteration. The model is saved, but it cannot find any entities in the test file. There is no error message, nor could I find how to switch to a more verbose mode with wapiti.
How do I find the source of the problem? is there any way to debug?

Here is what I get in the command line:

 % wapiti train -c -p patterns.txt train model
* Load patterns
* Load training data
   1000 sequences loaded
* Initialize the model
* Summary
    nb train:    1728
    nb labels:   5
    nb blocks:   111412
    nb features: 557060
* Train the model with l-bfgs
  [   1] obj=53109.96   act=240660   err= 3.71%/25.41% time=0.23s/0.23s
  [   2] obj=16052.89   act=201733   err= 3.71%/25.41% time=0.14s/0.37s
  [   3] obj=13922.99   act=125481   err= 3.71%/25.41% time=0.15s/0.52s
  [   4] obj=12309.84   act=88905    err= 3.71%/25.41% time=0.15s/0.67s
  [   5] obj=10402.72   act=64820    err= 3.71%/25.41% time=0.15s/0.82s
* Compacting the model
    - Scan the model
    - Compact it
       83385 observations removed
      416925 features removed
* Save the model
* Done

hard constraints in Wapiti

Is is possible to force a "hard-constraint"?
that is, to write a rule that whenever an observation is seen, it produces always a given tag.

segfault when training with bcd

Hi,

Wapiti segfaults on my linux box when training on a moderately large input file with the bcd algorithm:

  [snip]
  31000 sequences loaded
  32000 sequences loaded
  33000 sequences loaded
  34000 sequences loaded
* Initialize the model
* Summary
    nb train:    323212
    nb devel:    34346
    nb labels:   7
    nb blocks:   15080528
    nb features: 105563745
* Train the model with bcd
    - Build the index
        1/2 -- scan the sequences
./train.sh: line 1: 16978 Segmentation fault      (core dumped) wapiti train --compact --algo bcd --pattern patterns/brownpattern-2-self.txt --devel data/no-wiki/no-wiki-more-doc-only-all-brown_ak-ak1kmin10kpos-devel data/no-wiki/no-wiki-more-doc-only-all-brown_ak-ak1kmin10kpos-train models/wapiti/no-wiki-more-doc-only-all-self-bcd-brown_ak-ak1kmin10kpos-train

Rebuilding wapiti with debugging info gives me the not very useful stack-trace:

(gdb) bt
#0  0x000000000040284a in trn_bcd (mdl=<optimised out>) at src/bcd.c:298
#1  0x0000000000401ea6 in dotrain (mdl=0xab88d0) at src/wapiti.c:161
#2  main (argc=<optimised out>, argv=<optimised out>) at src/wapiti.c:401

Training with rprop works fine - I was just curious if I could train faster or a better model with bcd.

Incorrect classification of definite and indefinite articles in German

Thank you very much for sharing Wapiti. This is really awesome.

Before I describe a potential issue with the German model (my hypothesis) that can be downloaded from your homepage, let me say that I have little experience with NLP or linguistics. Thus, I might be completely wrong.

Having played around with Wapiti (and now also RFTagger), it seems Wapiti consistently swaps the classification of definite and indefinite articles. For example:

Der ART.Indef.Nom.Sg.Masc*
Mann    N.Reg.Nom.Sg.Masc
heiratet    VFIN.Full.3.Sg.Pres.Ind
die ART.Indef.Nom.Sg.Fem*
Schwester   N.Reg.Nom.Sg.Fem
des ART.Indef.Gen.Sg.Masc*
Freundes    N.Reg.Gen.Sg.Masc
.   SYM.Pun.Sent

Ein ART.Def.Nom.Sg.Masc*
Mann    N.Reg.Nom.Sg.Masc
heiratet    VFIN.Full.3.Sg.Pres.Ind
eine    ART.Def.Nom.Sg.Fem*
Frau    N.Reg.Nom.Sg.Fem
eines   ART.Def.Gen.Sg.Masc*
Freundes    N.Reg.Gen.Sg.Masc
.   SYM.Pun.Sent

In German, “der”, “die, and “des” are definite articles while “ein”, “eine”, and “eines” are indefinite.

Looking at the RFTagger homepage, this contains the following sample output of RFTagger (which seems to use the same tagset):

Das PRO.Dem.Subst.-3.Nom.Sg.Neut 
ist VFIN.Sein.3.Sg.Pres.Ind 
ein ART.Indef.Nom.Sg.Masc 
Testsatz    N.Reg.Nom.Sg.Masc 
.   SYM.Pun.Sent 

Having Wapiti tag the same sentence (using the German model I downloaded from your website) yields the following output:

Das PRO.Dem.Subst.-3.Nom.Sg.Neut
ist VFIN.Sein.3.Sg.Pres.Ind
ein ART.Def.Nom.Sg.Masc*
Testsatz    N.Reg.Nom.Sg.Masc
.   SYM.Pun.Sent

While RFTagger correctly classified “ein” as an indefinite article, Wapiti classifies it as a definite article.

Am I right in assuming this is an issue with the German model? What would be the best way to correct this?

Avoiding patterns and using my own features

Hi,

I have a dataset from which a 3rd party tool extracted features (works very well in CRFSuite). There are multiple string features extracted per word. Is there a way to train a model from such file or I am forced to reimplement the feature extraction using patterns?

When I try to train a model on my data, I get an error message:
error: invalid feature: U-hi-WORD|C

The "U-hi-WORD|C" is the first feature for the first word in the first sequence in the utterance.

Thanks

use labels of previous observeations as features

Maybe I am missing something, but there is no way to use the label for the previous line as a feature is there?

This would be use useful for IOB2 type tags, to make sure a I-* always follows a N-* .

Got out of memory error while working with large file

Hi Team,

When I run wapiti CRF on 36k training data with following command, return

"out of memory error, train model with L-BFGS. "

wapiti train -p ../template_7feats -1 5 --nthread 5 ../train_feats.txt 36kmodel_wapiti

Thanks,
Somnath A. Kadam

reader.c

Same goes for reader.c and file rewinding. Predefined models don't work at all without this change:

int autouni = rdr->autouni;
fpos_t pos;

fgetpos(file, &pos)

if (fscanf(file, "#rdr#%"SCNu32"/%"SCNu32"/%d\n", &rdr->npats, &rdr->ntoks, &autouni) != 3) {
  // This for compatibility with previous file format
  fsetpos(file, &pos);
  if (fscanf(file, "#rdr#%"SCNu32"/%"SCNu32"\n", &rdr->npats, &rdr->ntoks) != 2)
    fatal(err);
  }
...

Also PRIu32 should be changed to SCNu32

Is it possible to save the best model at the middle of iterations?

Generally wapiti model stop the iterations when the error is same for 5 iterations. I have fixed maximum iterations to 500. After few iterations, the err value just fluctuate little and because of this the max iterations is done. So is it possible to save the best model at the middle of iteration so that the overfitted model can be avoided? Also, can you please explain on the two errors values shown in the output?

Train the model with l-bfgs
[ 1] obj=9504837.02 act=17598257 err=72.68%/100.00% time=309.97s/309.97s
[ 2] obj=8309605.51 act=18434826 err=78.66%/100.00% time=188.79s/498.76s
[ 3] obj=7833355.05 act=19350196 err=52.76%/100.00% time=192.01s/690.76s
[ 4] obj=7421683.03 act=18613435 err=50.94%/100.00% time=196.62s/887.39s
[ 5] obj=6223601.64 act=16987349 err=45.05%/100.00% time=205.37s/1092.75s
[ 6] obj=4345596.68 act=15355941 err=38.59%/100.00% time=211.78s/1304.53s
[ 7] obj=3966547.51 act=14167829 err=28.94%/100.00% time=200.68s/1505.21s
[ 8] obj=3438943.46 act=14846582 err=30.09%/100.00% time=202.00s/1707.21s
[ 9] obj=3140222.40 act=14071699 err=25.07%/99.77% time=213.22s/1920.43s
[ 10] obj=2694572.72 act=13441289 err=23.84%/99.31% time=213.42s/2133.85s
[ 11] obj=2041933.47 act=12366576 err=19.74%/93.69% time=211.67s/2345.52s
[ 12] obj=1934548.49 act=12342957 err=28.05%/97.33% time=197.66s/2543.19s
[ 13] obj=1615030.54 act=13123501 err=17.34%/86.99% time=198.78s/2741.97s
[ 14] obj=1414008.73 act=12491249 err=16.62%/85.09% time=211.34s/2953.31s
[ 15] obj=1312758.08 act=12120282 err=16.21%/81.37% time=195.42s/3148.73s
[ 16] obj=1224466.19 act=11753770 err=19.01%/85.63% time=214.96s/3363.69s
[ 17] obj=1129987.49 act=11963602 err=15.37%/78.10% time=213.48s/3577.17s
[ 18] obj=1019288.79 act=11452956 err=15.89%/79.81% time=210.29s/3787.46s
[ 19] obj=974715.93 act=11212340 err=14.52%/74.45% time=202.03s/3989.48s
[ 20] obj=882033.03 act=10910928 err=15.20%/75.95% time=210.80s/4200.29s
[ 21] obj=811396.64 act=11123553 err=13.52%/70.41% time=213.60s/4413.89s
[ 22] obj=753678.97 act=10564180 err=16.13%/73.23% time=213.27s/4627.16s
[ 23] obj=710544.62 act=10845717 err=13.02%/67.73% time=201.23s/4828.39s
[ 24] obj=616492.72 act=10447271 err=13.58%/68.22% time=211.21s/5039.60s
[ 25] obj=582678.89 act=10243108 err=12.00%/65.61% time=207.64s/5247.24s
[ 26] obj=530190.50 act=9811981 err=13.14%/67.75% time=212.66s/5459.90s
[ 27] obj=496651.38 act=9421423 err=11.41%/64.56% time=212.92s/5672.82s
[ 28] obj=452394.58 act=8772761 err=12.53%/66.26% time=212.04s/5884.86s
[ 29] obj=420383.35 act=8828721 err=10.64%/63.20% time=200.33s/6085.19s
[ 30] obj=395130.11 act=8418783 err=11.12%/63.49% time=217.29s/6302.48s
[ 31] obj=375584.10 act=8205296 err=10.20%/61.75% time=211.70s/6514.18s
[ 32] obj=352458.25 act=7746058 err=10.57%/61.92% time=211.38s/6725.55s
[ 33] obj=333859.37 act=7705143 err= 9.67%/60.13% time=200.16s/6925.71s
[ 34] obj=318630.17 act=7450323 err= 9.98%/59.27% time=201.46s/7127.17s
[ 35] obj=303192.07 act=7453026 err= 9.14%/57.41% time=200.82s/7328.00s
[ 36] obj=287865.72 act=7103018 err= 9.25%/56.62% time=205.34s/7533.34s
[ 37] obj=275530.99 act=6825932 err= 8.63%/54.62% time=215.64s/7748.98s
[ 38] obj=259368.07 act=6599516 err= 8.59%/53.45% time=201.14s/7950.12s
[ 39] obj=250167.79 act=6480462 err= 8.10%/52.14% time=214.53s/8164.65s
[ 40] obj=240767.84 act=6235782 err= 8.16%/51.73% time=213.35s/8378.00s
[ 41] obj=232991.61 act=6117356 err= 7.67%/50.29% time=214.97s/8592.97s
[ 42] obj=224626.79 act=5950459 err= 7.71%/49.89% time=201.88s/8794.86s
[ 43] obj=219172.02 act=5834386 err= 7.26%/48.37% time=203.92s/8998.78s
[ 44] obj=212178.66 act=5644960 err= 7.23%/47.73% time=202.13s/9200.91s
[ 45] obj=205715.07 act=5498345 err= 6.88%/46.41% time=201.44s/9402.35s
[ 46] obj=198432.70 act=5252235 err= 6.73%/46.02% time=201.76s/9604.10s
[ 47] obj=193679.54 act=5136155 err= 6.48%/44.83% time=202.65s/9806.75s
[ 48] obj=187715.23 act=4944298 err= 6.38%/44.50% time=202.78s/10009.53s
[ 49] obj=182470.75 act=4756958 err= 6.02%/43.33% time=214.84s/10224.37s
[ 50] obj=176751.93 act=4551726 err= 5.81%/42.89% time=201.74s/10426.11s
[ 51] obj=172190.42 act=4417936 err= 5.49%/42.08% time=211.58s/10637.69s
[ 52] obj=167616.41 act=4280777 err= 5.19%/41.38% time=208.59s/10846.28s
[ 53] obj=163867.61 act=4172906 err= 4.95%/40.44% time=198.22s/11044.50s
[ 54] obj=159748.64 act=4032747 err= 4.54%/39.25% time=199.63s/11244.13s
[ 55] obj=155807.48 act=3886890 err= 4.30%/38.13% time=216.44s/11460.57s
[ 56] obj=152160.24 act=3719728 err= 3.86%/36.81% time=210.84s/11671.40s
[ 57] obj=149100.10 act=3635846 err= 3.65%/35.78% time=203.73s/11875.13s
[ 58] obj=145472.48 act=3471274 err= 3.14%/34.20% time=211.57s/12086.70s
[ 59] obj=142659.17 act=3327628 err= 2.97%/33.15% time=211.62s/12298.32s
[ 60] obj=139945.56 act=3202662 err= 2.55%/31.76% time=215.05s/12513.37s
[ 61] obj=137272.79 act=3118458 err= 2.46%/30.84% time=213.86s/12727.23s
[ 62] obj=134643.44 act=2963366 err= 2.10%/29.30% time=215.57s/12942.80s
[ 63] obj=132438.96 act=2764993 err= 2.08%/28.47% time=216.08s/13158.88s
[ 64] obj=130257.20 act=2608048 err= 1.78%/27.13% time=199.92s/13358.80s
[ 65] obj=128466.12 act=2521360 err= 1.83%/26.44% time=197.73s/13556.54s
[ 66] obj=126589.93 act=2331027 err= 1.53%/25.12% time=201.77s/13758.31s
[ 67] obj=125174.44 act=2206901 err= 1.60%/24.74% time=213.82s/13972.13s
[ 68] obj=123684.76 act=2095818 err= 1.36%/23.57% time=214.86s/14186.99s
[ 69] obj=122571.95 act=2040240 err= 1.41%/23.21% time=200.76s/14387.76s
[ 70] obj=121323.64 act=1935896 err= 1.24%/22.26% time=202.36s/14590.12s
[ 71] obj=120453.35 act=1868316 err= 1.26%/21.83% time=198.98s/14789.09s
[ 72] obj=119536.62 act=1793591 err= 1.14%/21.34% time=217.33s/15006.42s
[ 73] obj=118859.54 act=1746003 err= 1.15%/21.07% time=217.94s/15224.36s
[ 74] obj=118165.17 act=1682432 err= 1.05%/20.47% time=216.27s/15440.63s
[ 75] obj=117596.89 act=1629987 err= 1.08%/20.41% time=216.57s/15657.20s
[ 76] obj=116906.44 act=1578243 err= 1.01%/20.35% time=213.31s/15870.51s
[ 77] obj=116304.74 act=1509592 err= 1.04%/20.41% time=202.47s/16072.98s
[ 78] obj=115719.31 act=1460546 err= 0.95%/19.93% time=199.40s/16272.38s
[ 79] obj=115175.15 act=1406044 err= 0.99%/19.76% time=211.51s/16483.89s
[ 80] obj=114659.48 act=1362768 err= 0.91%/19.49% time=198.51s/16682.40s
[ 81] obj=114171.95 act=1314568 err= 0.93%/19.18% time=198.52s/16880.93s
[ 82] obj=113672.25 act=1283479 err= 0.85%/18.68% time=189.19s/17070.12s
[ 83] obj=113251.67 act=1247596 err= 0.91%/18.59% time=189.35s/17259.47s
[ 84] obj=112771.66 act=1207217 err= 0.82%/18.31% time=190.47s/17449.94s
[ 85] obj=112376.92 act=1170637 err= 0.87%/17.77% time=188.63s/17638.56s
[ 86] obj=111943.75 act=1139529 err= 0.79%/17.56% time=191.37s/17829.94s
[ 87] obj=111595.10 act=1109029 err= 0.85%/17.52% time=186.42s/18016.36s
[ 88] obj=111228.05 act=1080666 err= 0.77%/17.29% time=187.65s/18204.01s
[ 89] obj=110930.41 act=1055096 err= 0.85%/17.45% time=186.70s/18390.70s
[ 90] obj=110595.83 act=1028565 err= 0.77%/17.51% time=186.41s/18577.11s
[ 91] obj=110325.97 act=1002468 err= 0.83%/17.13% time=186.08s/18763.19s
[ 92] obj=110045.08 act=980569 err= 0.76%/17.31% time=187.33s/18950.52s
[ 93] obj=109811.15 act=962188 err= 0.82%/17.27% time=191.53s/19142.05s
[ 94] obj=109552.69 act=945077 err= 0.75%/17.34% time=189.67s/19331.72s
[ 95] obj=109337.72 act=924094 err= 0.80%/17.01% time=192.58s/19524.31s
[ 96] obj=109081.11 act=907977 err= 0.74%/17.17% time=189.05s/19713.35s
[ 97] obj=108887.67 act=890653 err= 0.78%/16.95% time=188.32s/19901.68s
[ 98] obj=108664.58 act=874575 err= 0.72%/16.87% time=184.46s/20086.14s
[ 99] obj=108497.79 act=859353 err= 0.78%/16.80% time=184.99s/20271.13s
[ 100] obj=108286.33 act=845753 err= 0.70%/16.74% time=192.82s/20463.95s
[ 101] obj=108134.56 act=831615 err= 0.75%/16.32% time=187.14s/20651.09s
[ 102] obj=107942.17 act=817775 err= 0.69%/16.54% time=193.93s/20845.02s
[ 103] obj=107802.89 act=806867 err= 0.74%/15.86% time=188.06s/21033.08s
[ 104] obj=107614.82 act=795848 err= 0.67%/15.98% time=187.50s/21220.58s
[ 105] obj=107492.65 act=784901 err= 0.73%/15.80% time=188.91s/21409.50s
[ 106] obj=107322.01 act=775746 err= 0.67%/16.25% time=187.60s/21597.10s
[ 107] obj=107209.00 act=764014 err= 0.72%/15.61% time=189.13s/21786.23s
[ 108] obj=107052.94 act=756152 err= 0.67%/16.17% time=187.79s/21974.02s
[ 109] obj=106944.42 act=746412 err= 0.72%/15.74% time=188.81s/22162.83s
[ 110] obj=106794.95 act=739299 err= 0.66%/15.92% time=185.21s/22348.04s
[ 111] obj=106695.49 act=731362 err= 0.71%/15.51% time=190.19s/22538.23s
[ 112] obj=106555.37 act=723453 err= 0.67%/15.91% time=187.87s/22726.10s
[ 113] obj=106471.96 act=715790 err= 0.70%/15.51% time=191.75s/22917.85s
[ 114] obj=106327.68 act=708527 err= 0.66%/16.11% time=192.42s/23110.27s
[ 115] obj=106252.77 act=700616 err= 0.70%/15.52% time=191.48s/23301.75s
[ 116] obj=106124.19 act=694484 err= 0.66%/15.99% time=186.72s/23488.46s
[ 117] obj=106048.09 act=687502 err= 0.69%/15.64% time=191.85s/23680.32s
[ 118] obj=105928.77 act=680818 err= 0.66%/15.91% time=188.99s/23869.31s
[ 119] obj=105858.43 act=674683 err= 0.69%/15.66% time=193.26s/24062.57s
[ 120] obj=105745.15 act=668089 err= 0.66%/16.00% time=187.76s/24250.33s
[ 121] obj=105679.16 act=662379 err= 0.67%/15.24% time=187.59s/24437.92s
[ 122] obj=105575.81 act=655739 err= 0.65%/15.84% time=192.71s/24630.63s
[ 123] obj=105509.44 act=649720 err= 0.66%/15.18% time=187.39s/24818.02s
[ 124] obj=105410.52 act=644031 err= 0.64%/15.67% time=191.18s/25009.20s
[ 125] obj=105347.85 act=639068 err= 0.67%/15.40% time=191.76s/25200.96s
[ 126] obj=105254.20 act=633618 err= 0.65%/15.84% time=185.40s/25386.37s
[ 127] obj=105198.42 act=628642 err= 0.67%/15.44% time=185.47s/25571.84s
[ 128] obj=105103.37 act=622991 err= 0.65%/15.90% time=187.94s/25759.77s
[ 129] obj=105047.79 act=618674 err= 0.65%/14.88% time=186.22s/25945.99s
[ 130] obj=104957.76 act=613423 err= 0.64%/15.62% time=190.43s/26136.42s
[ 131] obj=104904.83 act=609578 err= 0.64%/14.79% time=191.34s/26327.76s
[ 132] obj=104819.64 act=604263 err= 0.62%/15.31% time=186.30s/26514.06s
[ 133] obj=104774.92 act=599702 err= 0.65%/15.18% time=188.00s/26702.06s
[ 134] obj=104690.60 act=595024 err= 0.63%/15.69% time=186.93s/26888.99s
[ 135] obj=104642.61 act=591258 err= 0.64%/15.00% time=192.50s/27081.49s
[ 136] obj=104565.44 act=586998 err= 0.62%/15.52% time=190.40s/27271.89s
[ 137] obj=104517.57 act=583343 err= 0.63%/14.84% time=186.35s/27458.25s
[ 138] obj=104446.61 act=578995 err= 0.61%/15.18% time=185.80s/27644.04s
[ 139] obj=104403.81 act=575003 err= 0.63%/14.58% time=186.48s/27830.52s
[ 140] obj=104331.11 act=571148 err= 0.61%/15.06% time=185.66s/28016.18s
[ 141] obj=104289.95 act=567841 err= 0.63%/14.79% time=187.76s/28203.95s
[ 142] obj=104220.20 act=564173 err= 0.62%/15.28% time=185.54s/28389.48s
[ 143] obj=104178.20 act=560759 err= 0.63%/14.70% time=185.47s/28574.96s
[ 144] obj=104113.04 act=556913 err= 0.61%/15.19% time=190.64s/28765.60s
[ 145] obj=104074.13 act=554441 err= 0.62%/14.55% time=186.03s/28951.63s
[ 146] obj=104010.11 act=550571 err= 0.61%/15.01% time=187.09s/29138.72s
[ 147] obj=103971.51 act=547869 err= 0.61%/14.41% time=186.30s/29325.01s
[ 148] obj=103906.20 act=544301 err= 0.60%/14.95% time=185.63s/29510.64s
[ 149] obj=103868.55 act=541054 err= 0.61%/14.42% time=192.96s/29703.60s
[ 150] obj=103805.95 act=537732 err= 0.60%/14.92% time=186.81s/29890.41s
[ 151] obj=103770.19 act=534958 err= 0.62%/14.81% time=190.80s/30081.21s
[ 152] obj=103709.94 act=532034 err= 0.61%/15.29% time=186.76s/30267.97s
[ 153] obj=103677.53 act=529659 err= 0.61%/14.70% time=188.12s/30456.09s
[ 154] obj=103616.72 act=526795 err= 0.61%/15.26% time=187.33s/30643.43s
[ 155] obj=103584.43 act=524065 err= 0.61%/14.62% time=187.62s/30831.05s
[ 156] obj=103526.38 act=521171 err= 0.60%/14.94% time=187.89s/31018.94s
[ 157] obj=103495.48 act=518745 err= 0.60%/14.36% time=185.55s/31204.49s
[ 158] obj=103437.57 act=516123 err= 0.60%/14.99% time=186.01s/31390.50s
[ 159] obj=103409.69 act=513683 err= 0.60%/14.36% time=186.28s/31576.78s
[ 160] obj=103350.91 act=510897 err= 0.60%/14.96% time=183.85s/31760.63s
[ 161] obj=103321.33 act=508491 err= 0.61%/14.64% time=190.84s/31951.47s
[ 162] obj=103267.75 act=505544 err= 0.60%/15.04% time=189.75s/32141.22s
[ 163] obj=103237.50 act=503039 err= 0.60%/14.43% time=187.06s/32328.28s
[ 164] obj=103187.65 act=500267 err= 0.60%/15.00% time=190.29s/32518.57s
[ 165] obj=103159.95 act=498281 err= 0.59%/14.16% time=189.94s/32708.51s
[ 166] obj=103108.29 act=495740 err= 0.59%/14.78% time=190.82s/32899.33s
[ 167] obj=103081.01 act=493616 err= 0.60%/14.52% time=190.88s/33090.21s
[ 168] obj=103029.62 act=491241 err= 0.60%/15.06% time=188.25s/33278.46s
[ 169] obj=103002.16 act=489487 err= 0.59%/14.26% time=191.59s/33470.05s
[ 170] obj=102953.82 act=487013 err= 0.60%/15.11% time=194.26s/33664.31s
[ 171] obj=102927.79 act=485238 err= 0.59%/14.33% time=189.03s/33853.34s
[ 172] obj=102882.32 act=482969 err= 0.59%/14.84% time=188.16s/34041.50s
[ 173] obj=102859.03 act=481137 err= 0.59%/14.27% time=187.57s/34229.07s
[ 174] obj=102810.28 act=479317 err= 0.59%/14.87% time=183.54s/34412.61s
[ 175] obj=102785.12 act=477529 err= 0.60%/14.50% time=186.26s/34598.87s
[ 176] obj=102740.49 act=475387 err= 0.59%/14.92% time=189.59s/34788.45s
[ 177] obj=102714.81 act=473846 err= 0.58%/14.26% time=185.69s/34974.15s
[ 178] obj=102673.85 act=471325 err= 0.59%/14.86% time=191.47s/35165.62s
[ 179] obj=102653.24 act=469426 err= 0.58%/14.13% time=187.94s/35353.56s
[ 180] obj=102604.73 act=467700 err= 0.59%/14.63% time=186.05s/35539.60s
[ 181] obj=102585.02 act=466307 err= 0.58%/14.37% time=189.27s/35728.87s
[ 182] obj=102539.52 act=464939 err= 0.60%/15.05% time=187.07s/35915.94s
[ 183] obj=102517.21 act=463249 err= 0.58%/14.22% time=184.73s/36100.67s
[ 184] obj=102478.30 act=461301 err= 0.59%/14.93% time=184.22s/36284.89s
[ 185] obj=102458.40 act=460002 err= 0.57%/14.13% time=184.29s/36469.18s
[ 186] obj=102419.61 act=457795 err= 0.59%/14.77% time=188.76s/36657.95s
[ 187] obj=102399.39 act=456576 err= 0.57%/14.05% time=185.22s/36843.16s
[ 188] obj=102360.00 act=454684 err= 0.59%/14.75% time=195.65s/37038.81s
[ 189] obj=102339.72 act=453552 err= 0.57%/14.01% time=190.15s/37228.96s
[ 190] obj=102303.26 act=451690 err= 0.58%/14.71% time=186.37s/37415.33s
[ 191] obj=102284.90 act=450088 err= 0.58%/14.17% time=190.25s/37605.58s
[ 192] obj=102250.11 act=448289 err= 0.58%/14.74% time=186.57s/37792.15s
[ 193] obj=102232.08 act=447065 err= 0.58%/14.05% time=186.47s/37978.62s
[ 194] obj=102195.45 act=445642 err= 0.59%/14.80% time=188.26s/38166.88s
[ 195] obj=102176.52 act=444624 err= 0.57%/14.03% time=191.65s/38358.53s
[ 196] obj=102141.58 act=443142 err= 0.58%/14.69% time=186.07s/38544.60s
[ 197] obj=102123.76 act=441762 err= 0.58%/14.25% time=188.47s/38733.07s
[ 198] obj=102091.13 act=439948 err= 0.58%/14.88% time=185.79s/38918.86s
[ 199] obj=102074.84 act=438820 err= 0.58%/14.20% time=190.14s/39108.99s
[ 200] obj=102041.59 act=437635 err= 0.59%/14.83% time=187.71s/39296.70s

train large corpus throws segmentation fault

I used wapiti to train a CRFs model.

When the size of data file was 11M, it's ok. But when the size came to 24M, the training model OS threw a segmentation fault.

I traced the bug and discovered you wrote "uint32_t out[T]" to declare an array in function "tag_evalsub" of "src/decoder.c".

I suggest to use "xmalloc" to fix the problem.

Thank you for sharing.

Stopped process

Hello,

Sometimes, Wapiti justs stops and I can't figure out what is wrong.

lanvin@Lanvin:~/wapiti/wapiti-1.5.0/dat# wapiti train -p patterns/p4c -t 16 -a rprop- training/training231116 models/modelp4c231116rprop-

  • Load patterns
  • Load training data
    1000 sequences loaded
    2000 sequences loaded
    3000 sequences loaded
    4000 sequences loaded
    5000 sequences loaded
    6000 sequences loaded
    7000 sequences loaded
    8000 sequences loaded
    9000 sequences loaded
    10000 sequences loaded
  • Initialize the model
  • Summary
    nb train: 10874
    nb labels: 295
    nb blocks: 299582
    nb features: 5625255290
  • Train the model with rprop-
    Processus arrêté

Can you please help ?

Provide documentation for update mode

What is the update mode and what can I use it for?

I was going over the documentation but it seems that there isn't any for update mode. Could you provide some examples of using update mode?

Retraining with --model ---> causes a crash

Hi,

there was one question related to retraining, but it was not answered. So, I try it again.
I have a working trained model, which labels semantic tags to base forms like this:

lemma tag

I try to retrain this existing model with new training data , which consist of same type of data: lemma tag.

My command is: srun ./wapiti train --me --model eduskunta_model subtitle_eka_kolmas.sem eduskunta_model_subtitle

It starts, loads sequences, and then crashes:
......
......
17847000 sequences loaded
17848000 sequences loaded

  • Resync the model
  • Summary
    nb train: 17848060
    nb labels: 1524
    nb blocks: 77421
    nb features: 117989604
  • Train the model with l-bfgs
    srun: error: r18c37: task 0: Segmentation fault
    srun: launch/slurm: _step_signal: Terminating StepId=10457036.0

Any idea, what is going wrong?

K. Kettunen

Summary when training

What exactly are 'obj', 'act', 'nb train', 'nb blocks' and 'nb features' please ?

Real-valued features

Thanks for wapiti, great tool!

How hard would it be to make it possible to use real-valued features (as opposed to only binary features)? My use case is to exploit word embeddings and soft word classes. Currently they have to be discretized.

Leaks file handle of model in labeling mode

When using a model to label some text, valgrind gave me:

==5332== HEAP SUMMARY:
==5332==     in use at exit: 472 bytes in 1 blocks
==5332==   total heap usage: 602,017 allocs, 602,016 frees, 1,352,158,567 bytes allocated
==5332== 
==5332== 472 bytes in 1 blocks are still reachable in loss record 1 of 1
==5332==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==5332==    by 0x4A6AAAD: __fopen_internal (iofopen.c:65)
==5332==    by 0x4A6AAAD: fopen@@GLIBC_2.2.5 (iofopen.c:86)
==5332==    by 0x10A7C8: main (in /home/ans/Downloads/postagger/c/wapiti)
==5332== 
==5332== LEAK SUMMARY:
==5332==    definitely lost: 0 bytes in 0 blocks
==5332==    indirectly lost: 0 bytes in 0 blocks
==5332==      possibly lost: 0 bytes in 0 blocks
==5332==    still reachable: 472 bytes in 1 blocks
==5332==         suppressed: 0 bytes in 0 blocks

It seems like this file handle is not closed:

FILE *file = fopen(mdl->opt->model, "r");

A simple fclose(file); after the following line should suffice:

mdl_load(mdl, file);

Cheers!

model.c

There's a problem with mdl_load() function. The code should rewind the file first like:

if (fscanf(file, "#mdl#%d#%"SCNu64"\n", &type, &nact) == 2) {
  mdl->type = type;
} else {
  rewind(file);
  if (fscanf(file, "#mdl#%"SCNu64"\n", &nact) == 1) {
    mdl->type = 0;
  } else {
    fatal(err);
  }
}

Windows build

Hi,
I want to build this package for windows.
I'm using visual studio 2013 but when I compile this package I got this erros

    bcd.c
    c:\users\m-r\appdata\local\temp\pip-build-rbuqyz8g\libwapiti\cwapiti\src\model.h(33) : fatal error C1083: Cannot open include file: 'sys/time.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 12.0\\VC\\BIN\\cl.exe' failed with exit status 2

This answer said that 'sys/time.h' not supported on windows.
what can I do for fixing this problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.