jekub / wapiti Goto Github PK

A simple and fast discriminative sequence labeling toolkit ( http://wapiti.limsi.fr )

License: Other

C 95.86% C++ 4.14%

wapiti's Introduction

Wapiti - A linear-chain CRF tool

Copyright (c) 2009-2013  CNRS
All rights reserved.

For more detailed information see the homepage.

Wapiti is a very fast toolkit for segmenting and labeling sequences with discriminative models. It is based on maxent models, maximum entropy Markov models and linear-chain CRF and proposes various optimization and regularization methods to improve both the computational complexity and the prediction performance of standard models. Wapiti is ranked first on the sequence tagging task for more than a year on MLcomp web site.

Wapiti is developed by LIMSI-CNRS and was partially funded by ANR projects CroTaL (ANR-07-MDCO-003) and MGA (ANR-07-BLAN-0311-02).

For suggestions, comments, or patchs, you can contact me at [email protected]

If you use Wapiti for research purpose, please use the following citation:

@inproceedings{lavergne2010practical,
    author    = {Lavergne, Thomas and Capp\'{e}, Olivier and Yvon,
                 Fran\c{c}ois},
    title     = {Practical Very Large Scale {CRFs}},
    booktitle = {Proceedings the 48th Annual Meeting of the Association
                 for Computational Linguistics ({ACL})},
    month     = {July},
    year      = {2010},
    location  = {Uppsala, Sweden},
    publisher = {Association for Computational Linguistics},
    pages     = {504--513},
    url       = {http://www.aclweb.org/anthology/P10-1052}
}

wapiti's People

Contributors

Stargazers

Watchers

Forkers

mainka kevinking rsennrich liyanghua shurain duanhongyi elif021 ahmed26 deepim mtfelix gloaec shannonyu syllog1sm buaaswf ty01csbaidu kmike marquisthunder kermitt2 souhirg shenmeng hellcoderz mspandit martinec isaachaze boumenot faizagara galibert njtwomey ericxsun wencanluo nitingupta910 seasonlaw yanqingmen poneyo jxmas dsindex dlauc philgooch sandy4321 tpetmanson domenicosolazzo rcastromamani hungryquiter thomasbarnekow vchalmel xiaowanlinju v-shinc ubuntu733 phuysmans laudarch grseb9s tien-le-grenoble mosynaq chenying99 arjunmenon weejang yoanndupont hanfeijp hins aymara rmitanch chenkovsky qitsweauca chenmoshushi cyberzo liweiyanm vbondarevsky tianyikenan mijing oroszgy ryanmiao yuuichihosomi pangjian-pj clover-mxc goodgood12128 zanachka yunn-yu changchun-lin miku

wapiti's Issues

--model option

-m | --model
Specify a model file to load and to train again. This allow you
either to continue an interrupted training or to use an old
model as a starting point for a new training. Beware that no new
labels can be inserted in the model. As the training parameters
are not saved in the model file, you have to specify them again,
or specify new one if, for example, you want to continue train-
ing with another algorithm or a different penalty.

How to continue an interrupted training ? When I used "-m", it started from the beginning.

Training stops early with l-bfgs

I am using wapiti with a small set of data. When training with the other three optimization algorithms, everything is fine, the resulting models tag well, when tested.
Training with l-bfgs optimization, however, doesn't go through and stops after the 5th iteration. The model is saved, but it cannot find any entities in the test file. There is no error message, nor could I find how to switch to a more verbose mode with wapiti.
How do I find the source of the problem? is there any way to debug?

Here is what I get in the command line:

 % wapiti train -c -p patterns.txt train model
* Load patterns
* Load training data
   1000 sequences loaded
* Initialize the model
* Summary
    nb train:    1728
    nb labels:   5
    nb blocks:   111412
    nb features: 557060
* Train the model with l-bfgs
  [   1] obj=53109.96   act=240660   err= 3.71%/25.41% time=0.23s/0.23s
  [   2] obj=16052.89   act=201733   err= 3.71%/25.41% time=0.14s/0.37s
  [   3] obj=13922.99   act=125481   err= 3.71%/25.41% time=0.15s/0.52s
  [   4] obj=12309.84   act=88905    err= 3.71%/25.41% time=0.15s/0.67s
  [   5] obj=10402.72   act=64820    err= 3.71%/25.41% time=0.15s/0.82s
* Compacting the model
    - Scan the model
    - Compact it
       83385 observations removed
      416925 features removed
* Save the model
* Done

Bug with bigram features in raw data mode

Bigram features are ignored when data are loaded in raw mode.

hard constraints in Wapiti

Is is possible to force a "hard-constraint"?
that is, to write a rule that whenever an observation is seen, it produces always a given tag.

How should I lose some low-frequency feature

first thank you very much ;

How should I lose some low-frequency feature ..

in crf++ i use -f to lost it.

but in wapiti i can not find it .

segfault when training with bcd

Hi,

Wapiti segfaults on my linux box when training on a moderately large input file with the bcd algorithm:

  [snip]
  31000 sequences loaded
  32000 sequences loaded
  33000 sequences loaded
  34000 sequences loaded
* Initialize the model
* Summary
    nb train:    323212
    nb devel:    34346
    nb labels:   7
    nb blocks:   15080528
    nb features: 105563745
* Train the model with bcd
    - Build the index
        1/2 -- scan the sequences
./train.sh: line 1: 16978 Segmentation fault      (core dumped) wapiti train --compact --algo bcd --pattern patterns/brownpattern-2-self.txt --devel data/no-wiki/no-wiki-more-doc-only-all-brown_ak-ak1kmin10kpos-devel data/no-wiki/no-wiki-more-doc-only-all-brown_ak-ak1kmin10kpos-train models/wapiti/no-wiki-more-doc-only-all-self-bcd-brown_ak-ak1kmin10kpos-train

Rebuilding wapiti with debugging info gives me the not very useful stack-trace:

(gdb) bt
#0  0x000000000040284a in trn_bcd (mdl=<optimised out>) at src/bcd.c:298
#1  0x0000000000401ea6 in dotrain (mdl=0xab88d0) at src/wapiti.c:161
#2  main (argc=<optimised out>, argv=<optimised out>) at src/wapiti.c:401

Training with rprop works fine - I was just curious if I could train faster or a better model with bcd.

Incorrect classification of definite and indefinite articles in German

Thank you very much for sharing Wapiti. This is really awesome.

Before I describe a potential issue with the German model (my hypothesis) that can be downloaded from your homepage, let me say that I have little experience with NLP or linguistics. Thus, I might be completely wrong.

Having played around with Wapiti (and now also RFTagger), it seems Wapiti consistently swaps the classification of definite and indefinite articles. For example:

Der ART.Indef.Nom.Sg.Masc*
Mann    N.Reg.Nom.Sg.Masc
heiratet    VFIN.Full.3.Sg.Pres.Ind
die ART.Indef.Nom.Sg.Fem*
Schwester   N.Reg.Nom.Sg.Fem
des ART.Indef.Gen.Sg.Masc*
Freundes    N.Reg.Gen.Sg.Masc
.   SYM.Pun.Sent

Ein ART.Def.Nom.Sg.Masc*
Mann    N.Reg.Nom.Sg.Masc
heiratet    VFIN.Full.3.Sg.Pres.Ind
eine    ART.Def.Nom.Sg.Fem*
Frau    N.Reg.Nom.Sg.Fem
eines   ART.Def.Gen.Sg.Masc*
Freundes    N.Reg.Gen.Sg.Masc
.   SYM.Pun.Sent

In German, “der”, “die, and “des” are definite articles while “ein”, “eine”, and “eines” are indefinite.

Looking at the RFTagger homepage, this contains the following sample output of RFTagger (which seems to use the same tagset):

Das PRO.Dem.Subst.-3.Nom.Sg.Neut 
ist VFIN.Sein.3.Sg.Pres.Ind 
ein ART.Indef.Nom.Sg.Masc 
Testsatz    N.Reg.Nom.Sg.Masc 
.   SYM.Pun.Sent

Having Wapiti tag the same sentence (using the German model I downloaded from your website) yields the following output:

Das PRO.Dem.Subst.-3.Nom.Sg.Neut
ist VFIN.Sein.3.Sg.Pres.Ind
ein ART.Def.Nom.Sg.Masc*
Testsatz    N.Reg.Nom.Sg.Masc
.   SYM.Pun.Sent

While RFTagger correctly classified “ein” as an indefinite article, Wapiti classifies it as a definite article.

Am I right in assuming this is an issue with the German model? What would be the best way to correct this?

Avoiding patterns and using my own features

Hi,

I have a dataset from which a 3rd party tool extracted features (works very well in CRFSuite). There are multiple string features extracted per word. Is there a way to train a model from such file or I am forced to reimplement the feature extraction using patterns?

When I try to train a model on my data, I get an error message:
error: invalid feature: U-hi-WORD|C

The "U-hi-WORD|C" is the first feature for the first word in the first sequence in the utterance.

Thanks

use labels of previous observeations as features

Maybe I am missing something, but there is no way to use the label for the previous line as a feature is there?

This would be use useful for IOB2 type tags, to make sure a I-* always follows a N-* .

Got out of memory error while working with large file

Hi Team,

When I run wapiti CRF on 36k training data with following command, return

"out of memory error, train model with L-BFGS. "

wapiti train -p ../template_7feats -1 5 --nthread 5 ../train_feats.txt 36kmodel_wapiti

Thanks,
Somnath A. Kadam

reader.c

Same goes for reader.c and file rewinding. Predefined models don't work at all without this change:

int autouni = rdr->autouni;
fpos_t pos;

fgetpos(file, &pos)

if (fscanf(file, "#rdr#%"SCNu32"/%"SCNu32"/%d\n", &rdr->npats, &rdr->ntoks, &autouni) != 3) {
  // This for compatibility with previous file format
  fsetpos(file, &pos);
  if (fscanf(file, "#rdr#%"SCNu32"/%"SCNu32"\n", &rdr->npats, &rdr->ntoks) != 2)
    fatal(err);
  }
...

Also PRIu32 should be changed to SCNu32

Is it possible to save the best model at the middle of iterations?

Generally wapiti model stop the iterations when the error is same for 5 iterations. I have fixed maximum iterations to 500. After few iterations, the err value just fluctuate little and because of this the max iterations is done. So is it possible to save the best model at the middle of iteration so that the overfitted model can be avoided? Also, can you please explain on the two errors values shown in the output?

Train the model with l-bfgs
[ 1] obj=9504837.02 act=17598257 err=72.68%/100.00% time=309.97s/309.97s
[ 2] obj=8309605.51 act=18434826 err=78.66%/100.00% time=188.79s/498.76s
[ 3] obj=7833355.05 act=19350196 err=52.76%/100.00% time=192.01s/690.76s
[ 4] obj=7421683.03 act=18613435 err=50.94%/100.00% time=196.62s/887.39s
[ 5] obj=6223601.64 act=16987349 err=45.05%/100.00% time=205.37s/1092.75s
[ 6] obj=4345596.68 act=15355941 err=38.59%/100.00% time=211.78s/1304.53s
[ 7] obj=3966547.51 act=14167829 err=28.94%/100.00% time=200.68s/1505.21s
[ 8] obj=3438943.46 act=14846582 err=30.09%/100.00% time=202.00s/1707.21s
[ 9] obj=3140222.40 act=14071699 err=25.07%/99.77% time=213.22s/1920.43s
[ 10] obj=2694572.72 act=13441289 err=23.84%/99.31% time=213.42s/2133.85s
[ 11] obj=2041933.47 act=12366576 err=19.74%/93.69% time=211.67s/2345.52s
[ 12] obj=1934548.49 act=12342957 err=28.05%/97.33% time=197.66s/2543.19s
[ 13] obj=1615030.54 act=13123501 err=17.34%/86.99% time=198.78s/2741.97s
[ 14] obj=1414008.73 act=12491249 err=16.62%/85.09% time=211.34s/2953.31s
[ 15] obj=1312758.08 act=12120282 err=16.21%/81.37% time=195.42s/3148.73s
[ 16] obj=1224466.19 act=11753770 err=19.01%/85.63% time=214.96s/3363.69s
[ 17] obj=1129987.49 act=11963602 err=15.37%/78.10% time=213.48s/3577.17s
[ 18] obj=1019288.79 act=11452956 err=15.89%/79.81% time=210.29s/3787.46s
[ 19] obj=974715.93 act=11212340 err=14.52%/74.45% time=202.03s/3989.48s
[ 20] obj=882033.03 act=10910928 err=15.20%/75.95% time=210.80s/4200.29s
[ 21] obj=811396.64 act=11123553 err=13.52%/70.41% time=213.60s/4413.89s
[ 22] obj=753678.97 act=10564180 err=16.13%/73.23% time=213.27s/4627.16s
[ 23] obj=710544.62 act=10845717 err=13.02%/67.73% time=201.23s/4828.39s
[ 24] obj=616492.72 act=10447271 err=13.58%/68.22% time=211.21s/5039.60s
[ 25] obj=582678.89 act=10243108 err=12.00%/65.61% time=207.64s/5247.24s
[ 26] obj=530190.50 act=9811981 err=13.14%/67.75% time=212.66s/5459.90s
[ 27] obj=496651.38 act=9421423 err=11.41%/64.56% time=212.92s/5672.82s
[ 28] obj=452394.58 act=8772761 err=12.53%/66.26% time=212.04s/5884.86s
[ 29] obj=420383.35 act=8828721 err=10.64%/63.20% time=200.33s/6085.19s
[ 30] obj=395130.11 act=8418783 err=11.12%/63.49% time=217.29s/6302.48s
[ 31] obj=375584.10 act=8205296 err=10.20%/61.75% time=211.70s/6514.18s
[ 32] obj=352458.25 act=7746058 err=10.57%/61.92% time=211.38s/6725.55s
[ 33] obj=333859.37 act=7705143 err= 9.67%/60.13% time=200.16s/6925.71s
[ 34] obj=318630.17 act=7450323 err= 9.98%/59.27% time=201.46s/7127.17s
[ 35] obj=303192.07 act=7453026 err= 9.14%/57.41% time=200.82s/7328.00s
[ 36] obj=287865.72 act=7103018 err= 9.25%/56.62% time=205.34s/7533.34s
[ 37] obj=275530.99 act=6825932 err= 8.63%/54.62% time=215.64s/7748.98s
[ 38] obj=259368.07 act=6599516 err= 8.59%/53.45% time=201.14s/7950.12s
[ 39] obj=250167.79 act=6480462 err= 8.10%/52.14% time=214.53s/8164.65s
[ 40] obj=240767.84 act=6235782 err= 8.16%/51.73% time=213.35s/8378.00s
[ 41] obj=232991.61 act=6117356 err= 7.67%/50.29% time=214.97s/8592.97s
[ 42] obj=224626.79 act=5950459 err= 7.71%/49.89% time=201.88s/8794.86s
[ 43] obj=219172.02 act=5834386 err= 7.26%/48.37% time=203.92s/8998.78s
[ 44] obj=212178.66 act=5644960 err= 7.23%/47.73% time=202.13s/9200.91s
[ 45] obj=205715.07 act=5498345 err= 6.88%/46.41% time=201.44s/9402.35s
[ 46] obj=198432.70 act=5252235 err= 6.73%/46.02% time=201.76s/9604.10s
[ 47] obj=193679.54 act=5136155 err= 6.48%/44.83% time=202.65s/9806.75s
[ 48] obj=187715.23 act=4944298 err= 6.38%/44.50% time=202.78s/10009.53s
[ 49] obj=182470.75 act=4756958 err= 6.02%/43.33% time=214.84s/10224.37s
[ 50] obj=176751.93 act=4551726 err= 5.81%/42.89% time=201.74s/10426.11s
[ 51] obj=172190.42 act=4417936 err= 5.49%/42.08% time=211.58s/10637.69s
[ 52] obj=167616.41 act=4280777 err= 5.19%/41.38% time=208.59s/10846.28s
[ 53] obj=163867.61 act=4172906 err= 4.95%/40.44% time=198.22s/11044.50s
[ 54] obj=159748.64 act=4032747 err= 4.54%/39.25% time=199.63s/11244.13s
[ 55] obj=155807.48 act=3886890 err= 4.30%/38.13% time=216.44s/11460.57s
[ 56] obj=152160.24 act=3719728 err= 3.86%/36.81% time=210.84s/11671.40s
[ 57] obj=149100.10 act=3635846 err= 3.65%/35.78% time=203.73s/11875.13s
[ 58] obj=145472.48 act=3471274 err= 3.14%/34.20% time=211.57s/12086.70s
[ 59] obj=142659.17 act=3327628 err= 2.97%/33.15% time=211.62s/12298.32s
[ 60] obj=139945.56 act=3202662 err= 2.55%/31.76% time=215.05s/12513.37s
[ 61] obj=137272.79 act=3118458 err= 2.46%/30.84% time=213.86s/12727.23s
[ 62] obj=134643.44 act=2963366 err= 2.10%/29.30% time=215.57s/12942.80s
[ 63] obj=132438.96 act=2764993 err= 2.08%/28.47% time=216.08s/13158.88s
[ 64] obj=130257.20 act=2608048 err= 1.78%/27.13% time=199.92s/13358.80s
[ 65] obj=128466.12 act=2521360 err= 1.83%/26.44% time=197.73s/13556.54s
[ 66] obj=126589.93 act=2331027 err= 1.53%/25.12% time=201.77s/13758.31s
[ 67] obj=125174.44 act=2206901 err= 1.60%/24.74% time=213.82s/13972.13s
[ 68] obj=123684.76 act=2095818 err= 1.36%/23.57% time=214.86s/14186.99s
[ 69] obj=122571.95 act=2040240 err= 1.41%/23.21% time=200.76s/14387.76s
[ 70] obj=121323.64 act=1935896 err= 1.24%/22.26% time=202.36s/14590.12s
[ 71] obj=120453.35 act=1868316 err= 1.26%/21.83% time=198.98s/14789.09s
[ 72] obj=119536.62 act=1793591 err= 1.14%/21.34% time=217.33s/15006.42s
[ 73] obj=118859.54 act=1746003 err= 1.15%/21.07% time=217.94s/15224.36s
[ 74] obj=118165.17 act=1682432 err= 1.05%/20.47% time=216.27s/15440.63s
[ 75] obj=117596.89 act=1629987 err= 1.08%/20.41% time=216.57s/15657.20s
[ 76] obj=116906.44 act=1578243 err= 1.01%/20.35% time=213.31s/15870.51s
[ 77] obj=116304.74 act=1509592 err= 1.04%/20.41% time=202.47s/16072.98s
[ 78] obj=115719.31 act=1460546 err= 0.95%/19.93% time=199.40s/16272.38s
[ 79] obj=115175.15 act=1406044 err= 0.99%/19.76% time=211.51s/16483.89s
[ 80] obj=114659.48 act=1362768 err= 0.91%/19.49% time=198.51s/16682.40s
[ 81] obj=114171.95 act=1314568 err= 0.93%/19.18% time=198.52s/16880.93s
[ 82] obj=113672.25 act=1283479 err= 0.85%/18.68% time=189.19s/17070.12s
[ 83] obj=113251.67 act=1247596 err= 0.91%/18.59% time=189.35s/17259.47s
[ 84] obj=112771.66 act=1207217 err= 0.82%/18.31% time=190.47s/17449.94s
[ 85] obj=112376.92 act=1170637 err= 0.87%/17.77% time=188.63s/17638.56s
[ 86] obj=111943.75 act=1139529 err= 0.79%/17.56% time=191.37s/17829.94s
[ 87] obj=111595.10 act=1109029 err= 0.85%/17.52% time=186.42s/18016.36s
[ 88] obj=111228.05 act=1080666 err= 0.77%/17.29% time=187.65s/18204.01s
[ 89] obj=110930.41 act=1055096 err= 0.85%/17.45% time=186.70s/18390.70s
[ 90] obj=110595.83 act=1028565 err= 0.77%/17.51% time=186.41s/18577.11s
[ 91] obj=110325.97 act=1002468 err= 0.83%/17.13% time=186.08s/18763.19s
[ 92] obj=110045.08 act=980569 err= 0.76%/17.31% time=187.33s/18950.52s
[ 93] obj=109811.15 act=962188 err= 0.82%/17.27% time=191.53s/19142.05s
[ 94] obj=109552.69 act=945077 err= 0.75%/17.34% time=189.67s/19331.72s
[ 95] obj=109337.72 act=924094 err= 0.80%/17.01% time=192.58s/19524.31s
[ 96] obj=109081.11 act=907977 err= 0.74%/17.17% time=189.05s/19713.35s
[ 97] obj=108887.67 act=890653 err= 0.78%/16.95% time=188.32s/19901.68s
[ 98] obj=108664.58 act=874575 err= 0.72%/16.87% time=184.46s/20086.14s
[ 99] obj=108497.79 act=859353 err= 0.78%/16.80% time=184.99s/20271.13s
[ 100] obj=108286.33 act=845753 err= 0.70%/16.74% time=192.82s/20463.95s
[ 101] obj=108134.56 act=831615 err= 0.75%/16.32% time=187.14s/20651.09s
[ 102] obj=107942.17 act=817775 err= 0.69%/16.54% time=193.93s/20845.02s
[ 103] obj=107802.89 act=806867 err= 0.74%/15.86% time=188.06s/21033.08s
[ 104] obj=107614.82 act=795848 err= 0.67%/15.98% time=187.50s/21220.58s
[ 105] obj=107492.65 act=784901 err= 0.73%/15.80% time=188.91s/21409.50s
[ 106] obj=107322.01 act=775746 err= 0.67%/16.25% time=187.60s/21597.10s
[ 107] obj=107209.00 act=764014 err= 0.72%/15.61% time=189.13s/21786.23s
[ 108] obj=107052.94 act=756152 err= 0.67%/16.17% time=187.79s/21974.02s
[ 109] obj=106944.42 act=746412 err= 0.72%/15.74% time=188.81s/22162.83s
[ 110] obj=106794.95 act=739299 err= 0.66%/15.92% time=185.21s/22348.04s
[ 111] obj=106695.49 act=731362 err= 0.71%/15.51% time=190.19s/22538.23s
[ 112] obj=106555.37 act=723453 err= 0.67%/15.91% time=187.87s/22726.10s
[ 113] obj=106471.96 act=715790 err= 0.70%/15.51% time=191.75s/22917.85s
[ 114] obj=106327.68 act=708527 err= 0.66%/16.11% time=192.42s/23110.27s
[ 115] obj=106252.77 act=700616 err= 0.70%/15.52% time=191.48s/23301.75s
[ 116] obj=106124.19 act=694484 err= 0.66%/15.99% time=186.72s/23488.46s
[ 117] obj=106048.09 act=687502 err= 0.69%/15.64% time=191.85s/23680.32s
[ 118] obj=105928.77 act=680818 err= 0.66%/15.91% time=188.99s/23869.31s
[ 119] obj=105858.43 act=674683 err= 0.69%/15.66% time=193.26s/24062.57s
[ 120] obj=105745.15 act=668089 err= 0.66%/16.00% time=187.76s/24250.33s
[ 121] obj=105679.16 act=662379 err= 0.67%/15.24% time=187.59s/24437.92s
[ 122] obj=105575.81 act=655739 err= 0.65%/15.84% time=192.71s/24630.63s
[ 123] obj=105509.44 act=649720 err= 0.66%/15.18% time=187.39s/24818.02s
[ 124] obj=105410.52 act=644031 err= 0.64%/15.67% time=191.18s/25009.20s
[ 125] obj=105347.85 act=639068 err= 0.67%/15.40% time=191.76s/25200.96s
[ 126] obj=105254.20 act=633618 err= 0.65%/15.84% time=185.40s/25386.37s
[ 127] obj=105198.42 act=628642 err= 0.67%/15.44% time=185.47s/25571.84s
[ 128] obj=105103.37 act=622991 err= 0.65%/15.90% time=187.94s/25759.77s
[ 129] obj=105047.79 act=618674 err= 0.65%/14.88% time=186.22s/25945.99s
[ 130] obj=104957.76 act=613423 err= 0.64%/15.62% time=190.43s/26136.42s
[ 131] obj=104904.83 act=609578 err= 0.64%/14.79% time=191.34s/26327.76s
[ 132] obj=104819.64 act=604263 err= 0.62%/15.31% time=186.30s/26514.06s
[ 133] obj=104774.92 act=599702 err= 0.65%/15.18% time=188.00s/26702.06s
[ 134] obj=104690.60 act=595024 err= 0.63%/15.69% time=186.93s/26888.99s
[ 135] obj=104642.61 act=591258 err= 0.64%/15.00% time=192.50s/27081.49s
[ 136] obj=104565.44 act=586998 err= 0.62%/15.52% time=190.40s/27271.89s
[ 137] obj=104517.57 act=583343 err= 0.63%/14.84% time=186.35s/27458.25s
[ 138] obj=104446.61 act=578995 err= 0.61%/15.18% time=185.80s/27644.04s
[ 139] obj=104403.81 act=575003 err= 0.63%/14.58% time=186.48s/27830.52s
[ 140] obj=104331.11 act=571148 err= 0.61%/15.06% time=185.66s/28016.18s
[ 141] obj=104289.95 act=567841 err= 0.63%/14.79% time=187.76s/28203.95s
[ 142] obj=104220.20 act=564173 err= 0.62%/15.28% time=185.54s/28389.48s
[ 143] obj=104178.20 act=560759 err= 0.63%/14.70% time=185.47s/28574.96s
[ 144] obj=104113.04 act=556913 err= 0.61%/15.19% time=190.64s/28765.60s
[ 145] obj=104074.13 act=554441 err= 0.62%/14.55% time=186.03s/28951.63s
[ 146] obj=104010.11 act=550571 err= 0.61%/15.01% time=187.09s/29138.72s
[ 147] obj=103971.51 act=547869 err= 0.61%/14.41% time=186.30s/29325.01s
[ 148] obj=103906.20 act=544301 err= 0.60%/14.95% time=185.63s/29510.64s
[ 149] obj=103868.55 act=541054 err= 0.61%/14.42% time=192.96s/29703.60s
[ 150] obj=103805.95 act=537732 err= 0.60%/14.92% time=186.81s/29890.41s
[ 151] obj=103770.19 act=534958 err= 0.62%/14.81% time=190.80s/30081.21s
[ 152] obj=103709.94 act=532034 err= 0.61%/15.29% time=186.76s/30267.97s
[ 153] obj=103677.53 act=529659 err= 0.61%/14.70% time=188.12s/30456.09s
[ 154] obj=103616.72 act=526795 err= 0.61%/15.26% time=187.33s/30643.43s
[ 155] obj=103584.43 act=524065 err= 0.61%/14.62% time=187.62s/30831.05s
[ 156] obj=103526.38 act=521171 err= 0.60%/14.94% time=187.89s/31018.94s
[ 157] obj=103495.48 act=518745 err= 0.60%/14.36% time=185.55s/31204.49s
[ 158] obj=103437.57 act=516123 err= 0.60%/14.99% time=186.01s/31390.50s
[ 159] obj=103409.69 act=513683 err= 0.60%/14.36% time=186.28s/31576.78s
[ 160] obj=103350.91 act=510897 err= 0.60%/14.96% time=183.85s/31760.63s
[ 161] obj=103321.33 act=508491 err= 0.61%/14.64% time=190.84s/31951.47s
[ 162] obj=103267.75 act=505544 err= 0.60%/15.04% time=189.75s/32141.22s
[ 163] obj=103237.50 act=503039 err= 0.60%/14.43% time=187.06s/32328.28s
[ 164] obj=103187.65 act=500267 err= 0.60%/15.00% time=190.29s/32518.57s
[ 165] obj=103159.95 act=498281 err= 0.59%/14.16% time=189.94s/32708.51s
[ 166] obj=103108.29 act=495740 err= 0.59%/14.78% time=190.82s/32899.33s
[ 167] obj=103081.01 act=493616 err= 0.60%/14.52% time=190.88s/33090.21s
[ 168] obj=103029.62 act=491241 err= 0.60%/15.06% time=188.25s/33278.46s
[ 169] obj=103002.16 act=489487 err= 0.59%/14.26% time=191.59s/33470.05s
[ 170] obj=102953.82 act=487013 err= 0.60%/15.11% time=194.26s/33664.31s
[ 171] obj=102927.79 act=485238 err= 0.59%/14.33% time=189.03s/33853.34s
[ 172] obj=102882.32 act=482969 err= 0.59%/14.84% time=188.16s/34041.50s
[ 173] obj=102859.03 act=481137 err= 0.59%/14.27% time=187.57s/34229.07s
[ 174] obj=102810.28 act=479317 err= 0.59%/14.87% time=183.54s/34412.61s
[ 175] obj=102785.12 act=477529 err= 0.60%/14.50% time=186.26s/34598.87s
[ 176] obj=102740.49 act=475387 err= 0.59%/14.92% time=189.59s/34788.45s
[ 177] obj=102714.81 act=473846 err= 0.58%/14.26% time=185.69s/34974.15s
[ 178] obj=102673.85 act=471325 err= 0.59%/14.86% time=191.47s/35165.62s
[ 179] obj=102653.24 act=469426 err= 0.58%/14.13% time=187.94s/35353.56s
[ 180] obj=102604.73 act=467700 err= 0.59%/14.63% time=186.05s/35539.60s
[ 181] obj=102585.02 act=466307 err= 0.58%/14.37% time=189.27s/35728.87s
[ 182] obj=102539.52 act=464939 err= 0.60%/15.05% time=187.07s/35915.94s
[ 183] obj=102517.21 act=463249 err= 0.58%/14.22% time=184.73s/36100.67s
[ 184] obj=102478.30 act=461301 err= 0.59%/14.93% time=184.22s/36284.89s
[ 185] obj=102458.40 act=460002 err= 0.57%/14.13% time=184.29s/36469.18s
[ 186] obj=102419.61 act=457795 err= 0.59%/14.77% time=188.76s/36657.95s
[ 187] obj=102399.39 act=456576 err= 0.57%/14.05% time=185.22s/36843.16s
[ 188] obj=102360.00 act=454684 err= 0.59%/14.75% time=195.65s/37038.81s
[ 189] obj=102339.72 act=453552 err= 0.57%/14.01% time=190.15s/37228.96s
[ 190] obj=102303.26 act=451690 err= 0.58%/14.71% time=186.37s/37415.33s
[ 191] obj=102284.90 act=450088 err= 0.58%/14.17% time=190.25s/37605.58s
[ 192] obj=102250.11 act=448289 err= 0.58%/14.74% time=186.57s/37792.15s
[ 193] obj=102232.08 act=447065 err= 0.58%/14.05% time=186.47s/37978.62s
[ 194] obj=102195.45 act=445642 err= 0.59%/14.80% time=188.26s/38166.88s
[ 195] obj=102176.52 act=444624 err= 0.57%/14.03% time=191.65s/38358.53s
[ 196] obj=102141.58 act=443142 err= 0.58%/14.69% time=186.07s/38544.60s
[ 197] obj=102123.76 act=441762 err= 0.58%/14.25% time=188.47s/38733.07s
[ 198] obj=102091.13 act=439948 err= 0.58%/14.88% time=185.79s/38918.86s
[ 199] obj=102074.84 act=438820 err= 0.58%/14.20% time=190.14s/39108.99s
[ 200] obj=102041.59 act=437635 err= 0.59%/14.83% time=187.71s/39296.70s

train large corpus throws segmentation fault

I used wapiti to train a CRFs model.

When the size of data file was 11M, it's ok. But when the size came to 24M, the training model OS threw a segmentation fault.

I traced the bug and discovered you wrote "uint32_t out[T]" to declare an array in function "tag_evalsub" of "src/decoder.c".

I suggest to use "xmalloc" to fix the problem.

Thank you for sharing.

Stopped process

Hello,

Sometimes, Wapiti justs stops and I can't figure out what is wrong.

lanvin@Lanvin:~/wapiti/wapiti-1.5.0/dat# wapiti train -p patterns/p4c -t 16 -a rprop- training/training231116 models/modelp4c231116rprop-

Load patterns
Load training data
1000 sequences loaded
2000 sequences loaded
3000 sequences loaded
4000 sequences loaded
5000 sequences loaded
6000 sequences loaded
7000 sequences loaded
8000 sequences loaded
9000 sequences loaded
10000 sequences loaded
Initialize the model
Summary
nb train: 10874
nb labels: 295
nb blocks: 299582
nb features: 5625255290
Train the model with rprop-
Processus arrêté

Can you please help ?

Provide documentation for update mode

What is the update mode and what can I use it for?

I was going over the documentation but it seems that there isn't any for update mode. Could you provide some examples of using update mode?

Retraining with --model ---> causes a crash

Hi,

there was one question related to retraining, but it was not answered. So, I try it again.
I have a working trained model, which labels semantic tags to base forms like this:

lemma tag

I try to retrain this existing model with new training data , which consist of same type of data: lemma tag.

My command is: srun ./wapiti train --me --model eduskunta_model subtitle_eka_kolmas.sem eduskunta_model_subtitle

It starts, loads sequences, and then crashes:
......
......
17847000 sequences loaded
17848000 sequences loaded

Resync the model
Summary
nb train: 17848060
nb labels: 1524
nb blocks: 77421
nb features: 117989604
Train the model with l-bfgs
srun: error: r18c37: task 0: Segmentation fault
srun: launch/slurm: _step_signal: Terminating StepId=10457036.0

Any idea, what is going wrong?

K. Kettunen

==5332== HEAP SUMMARY:
==5332==     in use at exit: 472 bytes in 1 blocks
==5332==   total heap usage: 602,017 allocs, 602,016 frees, 1,352,158,567 bytes allocated
==5332== 
==5332== 472 bytes in 1 blocks are still reachable in loss record 1 of 1
==5332==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==5332==    by 0x4A6AAAD: __fopen_internal (iofopen.c:65)
==5332==    by 0x4A6AAAD: fopen@@GLIBC_2.2.5 (iofopen.c:86)
==5332==    by 0x10A7C8: main (in /home/ans/Downloads/postagger/c/wapiti)
==5332== 
==5332== LEAK SUMMARY:
==5332==    definitely lost: 0 bytes in 0 blocks
==5332==    indirectly lost: 0 bytes in 0 blocks
==5332==      possibly lost: 0 bytes in 0 blocks
==5332==    still reachable: 472 bytes in 1 blocks
==5332==         suppressed: 0 bytes in 0 blocks

It seems like this file handle is not closed:

Wapiti/src/wapiti.c

Line 195 in 569fbe5

FILE *file = fopen(mdl->opt->model, "r");

A simple fclose(file); after the following line should suffice:

Wapiti/src/wapiti.c

Line 198 in 569fbe5

mdl_load(mdl, file);

Cheers!

model.c

There's a problem with mdl_load() function. The code should rewind the file first like:

if (fscanf(file, "#mdl#%d#%"SCNu64"\n", &type, &nact) == 2) {
  mdl->type = type;
} else {
  rewind(file);
  if (fscanf(file, "#mdl#%"SCNu64"\n", &nact) == 1) {
    mdl->type = 0;
  } else {
    fatal(err);
  }
}

Windows build

Hi,
I want to build this package for windows.
I'm using visual studio 2013 but when I compile this package I got this erros

    bcd.c
    c:\users\m-r\appdata\local\temp\pip-build-rbuqyz8g\libwapiti\cwapiti\src\model.h(33) : fatal error C1083: Cannot open include file: 'sys/time.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 12.0\\VC\\BIN\\cl.exe' failed with exit status 2

This answer said that 'sys/time.h' not supported on windows.
what can I do for fixing this problem?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.