microth / pathlstm Goto Github PK

Neural SRL model

Java 99.85% Shell 0.15%

pathlstm's Introduction

News

August 2017: The FrameNet-based model has been updated to FrameNet 1.7 and now uses Stanford CoreNLP for preprocessing.

April 2017: The repository now also contains a compiled archive (pathlstm.jar) of the PathLSTM source code. Feel free to use this if you are unable or unwilling to compile the code yourself.

May 2017: The source code and pre-compiled jar file are updated with additional code to support the FrameNet-based SRL model described in Roth (ICCG 2016). Note that this model requires syntactic preprocessing using external tools.

PathLSTM

This repository contains code for the PathLSTM semantic role labeler introduced in Roth and Lapata, 2016. It is built on top of the mate-tools semantic role labeler. The PathLSTM model achieves state-of-the-art results on the in-domain (87.9) and out-of-domain (76.6) test sets of the CoNLL-2009 data set.

Dependencies

The following libraries and model files need to be downloaded in order to run the PathLSTM PropBank/NomBank model on English text:

Bernd Bohnet's dependency parser and model files (anna-3.3.jar and CoNLL2009-ST-English*.model¹
The WSJ tokenizer from Stanford CoreNLP (stanford-corenlp-3.x.jar)
The most recent PathLSTM SRL model file (July 2016), available on Google Drive here

The SRL classes can easily be compiled using maven (mvn compile).

For Frame-Semantic Role Labeling, the following dependencies are required:

Stanford CoreNLP 3.8.0 (https://stanfordnlp.github.io/CoreNLP/, make sure to use -stanford!)
A copy of FrameNet version 1.7 (http://framenet.icsi.berkeley.edu/, make sure to use -framenet [FNDIR]!)
The most recent PathLSTM Frame-SRL model file (August 2017), available on Google Drive here

To replicate the results from the abstract published at ICCG 2016, please contact me personally.

Running PathLSTM

If copies of all required libraries and models are available in the subdirectories lib/ and models/, respectively, PathLSTM can simply be executed as a standalone application using the script scripts/parse.sh. These scripts run necessary preprocessing tools on a given input text file (assuming one sentence per line), and apply our state-of-the-art model for identifying and role labeling of semantic predicate-argument structures.

It is also possible to apply the PathLSTM model on already preprocessed text in the CoNLL 2009 format, using the Java class se.lth.cs.srl.Parse. Since PathLSTM is trained based on preprocessed input from specific pipelines, however, we strongly recommend to use the complete pipeline to achieve best performance.

References

If you are using the PathLSTM SRL model in your work--and we highly recommend you do!--please cite the following publication:

Michael Roth and Mirella Lapata (2016). Neural semantic role labelling with dependency path embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany, August, pp. 1192--1202.

For the Frame-SRL model, please cite the following publication:

Michael Roth (2016). Improving frame semantic parsing via dependency path embeddings. Book of Abstracts of the 9th International Conference on Construction Grammar, Juiz de Fora, Brazil, October, pp. 165--167.

For the built-in preprocessing pipeline, please also cite the following publication:

Bernd Bohnet (2010). Very high accuracy and fast dependency parsing is not a contradiction. The 23rd International Conference on Computational Linguistics (COLING), Beijing, China.

¹ To reproduce our evaluation results on the CoNLL-2009 data set, preprocessing components must be retrained on the training split only, using 10-fold jackknifing.

pathlstm's People

Contributors

Stargazers

Watchers

Forkers

stevenlol alesuglia christos-c gitorto ebronstein karthi2016 vyso akb89 rukor yusifu flyrae jasonzou denniscraandijk sanyu12

pathlstm's Issues

problems using "pathlstm.jar"

I tried directly using "pathlstm.jar" as I was unable to compile through "mvn compile". I am getting an error. Can you please tell me if I am doing something wrong as soon as possible.

java -Xmx40g -cp libs/anna-3.3.jar:target/pathlstm.jar se.lth.cs.srl.CompletePipeline eng -lemma models/CoNLL2009-ST-English-ALL.anna-3.3.lemmatizer.model -tagger models/CoNLL2009-ST-English-ALL.anna-3.3.postagger.model -parser models/CoNLL2009-ST-English-ALL.anna-3.3.parser.model -srl models/srl-ACL2016-eng.model -tokenize -reranker -externalNNs -test sample.txt

54.21.744 is2.data.ParametersFloat 121:read -> read parameters 134217727 not zero 296071
54.21.763 is2.data.Cluster 113: -> Read cluster with 0 words
54.21.764 is2.lemmatizer.Lemmatizer 192:readModel -> Loading data finished.
54.21.764 is2.lemmatizer.Lemmatizer 194:readModel -> number of params 134217727
54.21.765 is2.lemmatizer.Lemmatizer 195:readModel -> number of classes 92
54.26.6 is2.data.ParametersFloat 121:read -> read parameters 134217727 not zero 1613201
54.26.6 is2.data.Cluster 113: -> Read cluster with 0 words
54.26.7 is2.tag.Lexicon 103: -> Read lexicon with 0 words
54.26.7 is2.tag.Tagger 141:readModel -> Loading data finished.
54.26.55 is2.parser.Parser 188:readModel -> Reading data started
54.26.102 is2.data.Cluster 113: -> Read cluster with 0 words
54.31.336 is2.parser.ParametersFloat 101:read -> read parameters 134217727 not zero 19957525
54.31.336 is2.parser.Parser 201:readModel -> parsing -- li size 134217727
54.31.354 is2.parser.Parser 211:readModel -> Stacking false
54.31.355 is2.parser.Extractor 56:initStat -> mult (d4)
Used parser class is2.parser.Parser
Creation date 2012.11.02 14:33:53
Training data CoNLL2009-ST-English-ALL.txt.crossannotated
Iterations 10 Used sentences 10000000
Cluster null
54.31.361 is2.parser.Parser 240:readModel -> Reading data finnished
54.31.363 is2.parser.Extractor 56:initStat -> mult (d4)
Loading pipeline from models/srl-ACL2016-eng.model
Loading reranker from models/srl-ACL2016-eng.model
Writing corpus to out.txt...
Exception in thread "main" java.lang.Error: Unresolved compilation problems:
PTBTokenizer cannot be resolved to a type
Word cannot be resolved to a type
PTBTokenizer cannot be resolved
Word cannot be resolved to a type

at se.lth.cs.srl.preprocessor.tokenization.StanfordPTBTokenizer.tokenizeplus(StanfordPTBTokenizer.java:35)
at se.lth.cs.srl.preprocessor.Preprocessor.tokenizeplus(Preprocessor.java:37)
at se.lth.cs.srl.CompletePipeline.parse(CompletePipeline.java:73)
at se.lth.cs.srl.CompletePipeline.parseNonSegmentedLineByLine(CompletePipeline.java:165)
at se.lth.cs.srl.CompletePipeline.main(CompletePipeline.java:138)

Exception in thread "main" java.lang.NoClassDefFoundError: edu/stanford/nlp/process/PTBTokenizer ,Linux ubuntu 16:04

Hey microth,

After fixing the previous bugs, I had a new issue.

My command :
java -Xmx40g -cp "libs/anna-3.3.jar:target/classes" se.lth.cs.srl.CompletePipeline eng -lemma models/CoNLL2009-ST-English-ALL.anna-3.3.lemmatizer.model -tagger models/CoNLL2009-ST-English-ALL.anna-3.3.postagger.model -parser models/CoNLL2009-ST-English-ALL.anna-3.3.parser.model -srl models/srl-ACL2016-eng.model -tokenize -reranker -externalNNs -test models/text_to_parse.txt

The output :

9.40.461   is2.data.ParametersFloat 121:read ->        read parameters 134217727 not zero 296071
9.40.469   is2.data.Cluster 113:<init> ->              Read cluster with 0 words 
9.40.469   is2.lemmatizer.Lemmatizer 192:readModel ->  Loading data finished. 
9.40.469   is2.lemmatizer.Lemmatizer 194:readModel ->  number of params  134217727
9.40.469   is2.lemmatizer.Lemmatizer 195:readModel ->  number of classes 92
9.44.359   is2.data.ParametersFloat 121:read ->        read parameters 134217727 not zero 1613201
9.44.359   is2.data.Cluster 113:<init> ->              Read cluster with 0 words 
9.44.360   is2.tag.Lexicon 103:<init> ->               Read lexicon with 0 words 
9.44.360   is2.tag.Tagger 141:readModel ->             Loading data finished. 
9.44.462   is2.parser.Parser 188:readModel ->          Reading data started
9.44.518   is2.data.Cluster 113:<init> ->              Read cluster with 0 words 
9.49.103   is2.parser.ParametersFloat 101:read ->      read parameters 134217727 not zero 19957525
9.49.103   is2.parser.Parser 201:readModel ->          parsing -- li size 134217727
9.49.108   is2.parser.Parser 211:readModel ->          Stacking false
9.49.108   is2.parser.Extractor 56:initStat ->         mult  (d4) 
Used parser   class is2.parser.Parser
Creation date 2012.11.02 14:33:53
Training data CoNLL2009-ST-English-ALL.txt.crossannotated
Iterations    10 Used sentences 10000000
Cluster       null
9.49.110   is2.parser.Parser 240:readModel ->          Reading data finnished
9.49.110   is2.parser.Extractor 56:initStat ->         mult  (d4) 
Loading pipeline from models/srl-ACL2016-eng.model
Loading reranker from models/srl-ACL2016-eng.model
Writing corpus to out.txt...
Exception in thread "main" java.lang.NoClassDefFoundError: edu/stanford/nlp/process/PTBTokenizer
	at se.lth.cs.srl.preprocessor.tokenization.StanfordPTBTokenizer.tokenizeplus(StanfordPTBTokenizer.java:35)
	at se.lth.cs.srl.preprocessor.Preprocessor.tokenizeplus(Preprocessor.java:37)
	at se.lth.cs.srl.CompletePipeline.parse(CompletePipeline.java:73)
	at se.lth.cs.srl.CompletePipeline.parseNonSegmentedLineByLine(CompletePipeline.java:165)
	at se.lth.cs.srl.CompletePipeline.main(CompletePipeline.java:138)
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.process.PTBTokenizer
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 5 more

Concerning the downloaded files, stanford-corenlp-full-2016-10-31 is unzipped into the libs/ sub-directory.

Best,
Julien

ArrayIndexOutOfBoundsException with parse_fn.sh

I got the error above in two cases: when there are empty lines in the input file (so I got rid of them), and again immediately after getting ERROR: sentence length mismatches token number in Stanford annotation, maybe it has something to do with one of the words in that sentence being "voila" with an accented letter "a".

Is there a flag I can pass so that the pipeline will silently ignore such errors? On the same note, I've 23M sentences to label - do you think it's better to split them to N files and run N processes for parse_fn.sh, or I should stick to my current 1 file with 23M sentences?

Thanks!

Error running srl-ACL-2016-eng.model

Hello @microth ,
I have tried running PathLSTM but when running it seems some classes are missing, could you instruct me any further?

Loading pipeline from models\srl-ACL2016-eng.model
java.lang.ClassNotFoundException: uk.ac.ed.inf.srl.features.FeatureGenerator
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Unknown Source)
        at java.io.ObjectInputStream.resolveClass(Unknown Source)
        at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
        at java.io.ObjectInputStream.readClassDesc(Unknown Source)
        at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at se.lth.cs.srl.pipeline.Pipeline.fromZipFile(Pipeline.java:192)
        at se.lth.cs.srl.pipeline.Pipeline.fromZipFile(Pipeline.java:226)
        at se.lth.cs.srl.pipeline.Reranker.<init>(Reranker.java:63)
        at se.lth.cs.srl.CompletePipeline.getCompletePipeline(CompletePipeline.java:52)
        at se.lth.cs.srl.CompletePipeline.main(CompletePipeline.java:122)
Writing corpus to out.txt...
Exception in thread "main" java.lang.NullPointerException
        at se.lth.cs.srl.pipeline.Reranker.parse(Reranker.java:96)
        at se.lth.cs.srl.SemanticRoleLabeler.parseSentence(SemanticRoleLabeler.java:12)
        at se.lth.cs.srl.CompletePipeline.parseX(CompletePipeline.java:93)
        at se.lth.cs.srl.CompletePipeline.parse(CompletePipeline.java:73)
        at se.lth.cs.srl.CompletePipeline.parseNonSegmentedLineByLine(CompletePipeline.java:165)
        at se.lth.cs.srl.CompletePipeline.main(CompletePipeline.java:138)

I used the pre Illinois built branch because the main branch gives me:

[INFO] 16 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 7.809s
[INFO] Finished at: Sat Apr 29 18:18:38 CEST 2017
[INFO] Final Memory: 22M/346M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.0:compile (default-compile) on project PathLSTM: Compilation failure: Compilation failure:
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/pipeline/LBJavaArgumentClassifier.java:[6,44] package edu.illinois.cs.cogcomp.chunker.main does not exist
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/pipeline/LBJavaArgumentClassifier.java:[12,40] package edu.illinois.cs.cogcomp.depparse does not exist
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/pipeline/LBJavaArgumentClassifier.java:[16,46] package edu.illinois.cs.cogcomp.nlp.lemmatizer does not exist
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/pipeline/LBJavaArgumentClassifier.java:[17,35] package edu.illinois.cs.cogcomp.pos does not exist
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/preprocessor/IllinoisPreprocessor.java:[6,44] package edu.illinois.cs.cogcomp.chunker.main does not exist
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/preprocessor/IllinoisPreprocessor.java:[11,40] package edu.illinois.cs.cogcomp.depparse does not exist
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/preprocessor/IllinoisPreprocessor.java:[12,46] package edu.illinois.cs.cogcomp.nlp.lemmatizer does not exist
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/preprocessor/IllinoisPreprocessor.java:[13,35] package edu.illinois.cs.cogcomp.pos does not exist
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/pipeline/LBJavaArgumentClassifier.java:[44,37] cannot find symbol
[ERROR] symbol:   class POSAnnotator
[ERROR] location: class se.lth.cs.srl.pipeline.LBJavaArgumentClassifier
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/pipeline/LBJavaArgumentClassifier.java:[45,39] cannot find symbol
[ERROR] symbol:   class IllinoisLemmatizer
[ERROR] location: class se.lth.cs.srl.pipeline.LBJavaArgumentClassifier
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/pipeline/LBJavaArgumentClassifier.java:[46,39] cannot find symbol
[ERROR] symbol:   class ChunkerAnnotator
[ERROR] location: class se.lth.cs.srl.pipeline.LBJavaArgumentClassifier
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/pipeline/LBJavaArgumentClassifier.java:[47,40] cannot find symbol
[ERROR] symbol:   class DepAnnotator
[ERROR] location: class se.lth.cs.srl.pipeline.LBJavaArgumentClassifier
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/preprocessor/IllinoisPreprocessor.java:[28,30] cannot find symbol
[ERROR] symbol:   class DepAnnotator
[ERROR] location: class se.lth.cs.srl.preprocessor.IllinoisPreprocessor
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/preprocessor/IllinoisPreprocessor.java:[29,27] cannot find symbol
[ERROR] symbol:   class POSAnnotator
[ERROR] location: class se.lth.cs.srl.preprocessor.IllinoisPreprocessor
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/preprocessor/IllinoisPreprocessor.java:[30,29] cannot find symbol
[ERROR] symbol:   class IllinoisLemmatizer
[ERROR] location: class se.lth.cs.srl.preprocessor.IllinoisPreprocessor
[ERROR] /C:/Users/Daniel/Desktop/PathLSTM-master/src/main/java/se/lth/cs/srl/preprocessor/IllinoisPreprocessor.java:[31,29] cannot find symbol
[ERROR] symbol:   class ChunkerAnnotator
[ERROR] location: class se.lth.cs.srl.preprocessor.IllinoisPreprocessor
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Best regards,
Daniel

Could not find or load main class se.lth.cs.srl.CompletePipeline, on Linux

Hi,
I'm working on a Ubuntu 16:04.

I created the two subdirectories libs/ and models/, where I put the librairies and models.
When I execute :
.../PathLSTM$ ./scripts/parse.sh
the following error appears:
Could not find or load main class se.lth.cs.srl.CompletePipeline

Any idea to fix it?

Best wishes for the new year,
Julien

How can we get the end sign of the argument consisting of multiple words?

Hi, Michael.

We are wondering if you can tell us how to know where the last term of a argument is.
For example, "My room contains a book, a dog and a cat."
In the demo web site, we can see that "a book, a dog and a cat" is the A1 of contain.01.
But in the table below (and out.txt) we cannot know that the cat is the last word of A1.
Should I change the code or something?

Thank you very much.

Verb Sense Accuracy - Set-up Issues?

In a short test, I tried the sentences below to ascertain the accuracy of PathLSTM PropBank/NomBank for verb/nun sense. As for find out and go, the results seem puzzling and I wonder if there is something wrong with my setup, as these verbs should be find.03 and go.02 respectively:

The	the	the	DT	DT	_	_	2	2	NMOD	NMOD	_	_	_
2	waitress	waitress	waitress	NN	NN	_	_	3	3	SBJ	SBJ	_	_	A0
3	found	found	found	VBD	VBD	_	_	0	0	ROOT	ROOT	Y	find.01	_
4	out	out	out	RP	RP	_	_	3	3	PRT	PRT	_	_	_
5	that	that	that	IN	IN	_	_	3	3	OBJ	OBJ	_	_	A1
6	she	she	she	PRP	PRP	_	_	7	7	SBJ	SBJ	_	_	_
7	was	be	be	VBD	VBD	_	_	5	5	SUB	SUB	_	_	_
8	fat	fat	fat	JJ	JJ	_	_	7	7	PRD	PRD	_	_	_
9	.	.	.	.	.	_	_	3	3	P	P	_	_	_

1	We	we	we	PRP	PRP	_	_	2	2	SBJ	SBJ	_	_	A1
2	are	be	be	VBP	VBP	_	_	0	0	ROOT	ROOT	_	_	_
3	going	go	go	VBG	VBG	_	_	2	2	VC	VC	Y	go.01	_
4	on	on	on	IN	IN	_	_	3	3	ADV	ADV	_	_	_
5	vacation	vacation	vacation	NN	NN	_	_	4	4	PMOD	PMOD	_	_	_
6	to	to	to	TO	TO	_	_	3	3	DIR	DIR	_	_	A4
7	Singapore	singapore	singapore	NNP	NNP	_	_	6	6	PMOD	PMOD	_	_	_
8	.	.	.	.	.	_	_	2	2	P	P	_	_	_

I used the following models:

srl-ACL2016-eng
CoNLL2009-ST-English-ALL.anna-3.3.lemmatizer
CoNLL2009-ST-English-ALL.anna-3.3.parser
CoNLL2009-ST-English-ALL.anna-3.3.postagger
stanford-corenlp-3.7.0

Some nouns are considered as predicates incorrectly.

Hi, Micheal

For example, the result of sentence "My cat is sitting on my book." label "cat" and "book" to be cat.01 and book.01. But cat and book are nouns in this sentence (which is correctly identified in the POS column).

Shouldn't each predicate be a VB*? I am quite confused.

Bill

Compilation error: "package se.lth.cs.srl.features does not exist"

Hello,

I downloaded all dependencies and tried to mvn compile, but getting the above error.

A simple grep showed that 45 source files are importing this non-existing package. I tried switching to master branch, just to be sure, but no luck.

Thank you for your time :)

Error running FrameNet model: java.io.InvalidObjectException:

I tried running the scripts/parse.sh script using the FrameNet model (srl-ICCG16-eng.model), but I got the error: java.io.InvalidObjectException: enum constant PathEmbeddingacN_FNET_seed3 does not exist in class uk.ac.ed.inf.srl.features.FeatureName. The other model (srl-ACL2016-eng.model) works fine, but the FrameNet model does not. I have SRL_MODEL=models/srl-ICCG16-eng.model in scripts/parse.sh and the FrameNet data in models/fndata-1.5/. I'm not sure where I should specify where the FrameNet data is though, or if I need to do that at all. Could the problem be that I didn't add something to the classpath? I haven't added anything because the other model worked fine without it.

Here is the stacktrace:

scripts/parse.sh tests/testParse1In.txt
54.53.188 is2.data.ParametersFloat 121:read -> read parameters 134217727 not zero 296071
54.53.196 is2.data.Cluster 113: -> Read cluster with 0 words
54.53.196 is2.lemmatizer.Lemmatizer 192:readModel -> Loading data finished.
54.53.197 is2.lemmatizer.Lemmatizer 194:readModel -> number of params 134217727
54.53.197 is2.lemmatizer.Lemmatizer 195:readModel -> number of classes 92
54.59.358 is2.data.ParametersFloat 121:read -> read parameters 134217727 not zero 1613201
54.59.359 is2.data.Cluster 113: -> Read cluster with 0 words
54.59.362 is2.tag.Lexicon 103: -> Read lexicon with 0 words
54.59.364 is2.tag.Tagger 141:readModel -> Loading data finished.
54.59.391 is2.parser.Parser 188:readModel -> Reading data started
54.59.431 is2.data.Cluster 113: -> Read cluster with 0 words
55.6.812 is2.parser.ParametersFloat 101:read -> read parameters 134217727 not zero 19957525
55.6.814 is2.parser.Parser 201:readModel -> parsing -- li size 134217727
55.6.826 is2.parser.Parser 211:readModel -> Stacking false
55.6.827 is2.parser.Extractor 56:initStat -> mult (d4)
Used parser class is2.parser.Parser
Creation date 2012.11.02 14:33:53
Training data CoNLL2009-ST-English-ALL.txt.crossannotated
Iterations 10 Used sentences 10000000
Cluster null
55.6.830 is2.parser.Parser 240:readModel -> Reading data finnished
55.6.831 is2.parser.Extractor 56:initStat -> mult (d4)
Loading pipeline from models/srl-ICCG16-eng.model
java.io.InvalidObjectException: enum constant PathEmbeddingacN_FNET_seed3 does not exist in class uk.ac.ed.inf.srl.features.FeatureName
at java.io.ObjectInputStream.readEnum(ObjectInputStream.java:1746)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at java.util.HashMap.readObject(HashMap.java:1394)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at se.lth.cs.srl.pipeline.Pipeline.fromZipFile(Pipeline.java:192)
at se.lth.cs.srl.pipeline.Pipeline.fromZipFile(Pipeline.java:226)
at se.lth.cs.srl.pipeline.Reranker.(Reranker.java:63)
at se.lth.cs.srl.CompletePipeline.getCompletePipeline(CompletePipeline.java:52)
at se.lth.cs.srl.CompletePipeline.main(CompletePipeline.java:122)
Caused by: java.lang.IllegalArgumentException: No enum constant uk.ac.ed.inf.srl.features.FeatureName.PathEmbeddingacN_FNET_seed3
at java.lang.Enum.valueOf(Enum.java:238)
at java.io.ObjectInputStream.readEnum(ObjectInputStream.java:1743)
... 21 more
Writing corpus to out.txt...
Exception in thread "main" java.lang.NullPointerException
at se.lth.cs.srl.pipeline.Reranker.parse(Reranker.java:96)
at se.lth.cs.srl.SemanticRoleLabeler.parseSentence(SemanticRoleLabeler.java:12)
at se.lth.cs.srl.CompletePipeline.parseX(CompletePipeline.java:93)
at se.lth.cs.srl.CompletePipeline.parse(CompletePipeline.java:73)
at se.lth.cs.srl.CompletePipeline.parseNonSegmentedLineByLine(CompletePipeline.java:165)
at se.lth.cs.srl.CompletePipeline.main(CompletePipeline.java:138)

Request for added detail in README.md

If you please, could you provide a downloadable reference for the other model files referenced in scripts.parse.sh? Specifically, LEMMA_MODEL, POS_MODEL, and PARSER_MODEL.

Generate Dependency Path Embedding

Hi,

Its a great effort towards SRL Task 👍 I was looking for getting dependency path embeddings instead of end-to-end SRL pipeline. Could you please share or guide in achieving sample code which illustrates generation of dependency path embeddings with some sample text input. (may be dig into intermediate steps as well)

It would be great learning more about dependency path embeddings from you. Looking forward from you !!

FrameNet Based Semantic Role Annotation

Hey again, @microth :)
I'm sorry to keep raising issues for problems this trivial, but its really hard to find help related to this package. Its just my deficiency in understanding certain subjects I'm hoping you can help me with.

Uh, I was successful in using the complete pipeline on a sentence, and the result was this

I just wanted to know how I could get the FrameNet kind of semantic annotations, on the input text, like shown below. (Taken from the SEMAFOR Demo page.)

Like, 'born' has a Being_Born semantic frame to it, and I'd like to have my input text annotated as shown in the second picture. Is it possible? If you don't mind, could you guide me on how I could achieve this? Below are the arguments I passed

CompletePipelineCMDLineOptions options = new CompletePipelineCMDLineOptions(); String[] arss = {"eng", "-lemma", "/Users/vishnumohan/Desktop/LTh/PathLSTM-pre-illinois-built/src/main/java/se/lth/cs/srl/models/CoNLL2009-ST-English-ALL.anna-3.3.lemmatizer.model", "-tagger", "/Users/vishnumohan/Desktop/LTh/PathLSTM-pre-illinois-built/src/main/java/se/lth/cs/srl/models/CoNLL2009-ST-English-ALL.anna-3.3.postagger.model", "-parser", "/Users/vishnumohan/Desktop/LTh/PathLSTM-pre-illinois-built/src/main/java/se/lth/cs/srl/models/CoNLL2009-ST-English-ALL.anna-3.3.parser.model", "-srl", "/Users/vishnumohan/Desktop/LTh/PathLSTM-pre-illinois-built/srl-FN17.model", "-framenet", "/Users/vishnumohan/Desktop/LTh/fndata-1.7", "-tokenize", "-reranker", "-externalNNs", "-test", "/Users/vishnumohan/Desktop/LTh/PathLSTM-pre-illinois-built/src/main/java/se/lth/cs/srl/tesen.txt"}; options.parseCmdLineArgs(arss);

After reading through the docs, I also saw people mentioning a srl-ICCG16-eng.model file. Could you provide me with a link, for the same?

Best Regards,
Vishnu

Where do i download the models from?

Where can i download the models from as stated below?

LEMMA_MODEL=models/lemma-eng.model
POS_MODEL=models/tagger-eng.model
PARSER_MODEL=models/parse-eng.model
SRL_MODEL=models/srl-ACL2016-eng.model

Strange heap space error keeps getting thrown

Hey, @microth
I've recently started using the PathLSTM, for an application which requires srl, but the sheer size of the model file ( srl-ACL2016-eng.model || 2.7G ), throws me an insufficient heap space - Out of memory error

Loading pipeline from
C:\Users\Vyso\Downloads\NLP\SRL\SEMAFOR\absSemafor\LTH\wttv\PathLSTM-pre-
illinois-built\srl-ACL2016-eng.model
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.reflect.Array.newInstance(Array.java:75)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1883)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1919)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1919)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)

Process finished with exit code 1

So naturally, I changed the heap space of both the JVM on my system, and my IDE, where I could change the vmoptions, as follows. (It was -Xms128m and -Xmx512m, by default).

custom IntelliJ IDEA VM options

-Xms2048m
-Xmx4000m
-XX:ReservedCodeCacheSize=240m
-XX:+UseConcMarkSweepGC
-XX:SoftRefLRUPolicyMSPerMB=50
-ea
-Dsun.io.useCanonCaches=false
-Djava.net.preferIPv4Stack=true
-XX:+HeapDumpOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow

But even after assigning around 4G as Max Heap space, I get the error. Funny thing is, in the memory management toolbar of my IDE, I can see that the code uses a max of just 500m, during runtime, so I really don't know how this heap space error is still getting thrown.

Could you tell me if this is unusual an error with this algorithm, or if you've seen it before, too?
Also, should I add all the mentioned dependencies just for the parse class to function? Is there a probability that this error gets thrown because of incomplete dependency additions?

Maybe it's just a beginner level mistake from my side, but I've been trying to get out out of this problem for quite a few days now, and I'd really appreciate it if you could instruct me on how I could get rid of this error.

Thank You,
Vishnu

Using Frame SRL

Hello,

I didn't really understand how to use the parse.sh script for FrameNet SRL. I've all the models and libraries, including BISTparser and NLP4J, and FrameNet 1.5 (downloaded using NLTK).

From here, I'm pretty lost. How to retrain BISTparser and NLP4J using the 10-fold jackknifing, to recreate your results? Do I even have to do that?

Sorry if it's a newbie question, thank your for your time!

Error running PathLSTM, LibLinear model throws null Exception

Hello Mike,
I have trouble with parse method when model is loaded, it throws null pointer wxception in LibLinearModel.java on line 43, here is my stack trace:

`java.lang.NullPointerException
        at uk.ac.ed.inf.srl.ml.liblinear.LibLinearModel.classify(LibLinearModel.
java:48)
        at se.lth.cs.srl.pipeline.ArgumentStep.classifyInstance(ArgumentStep.jav
a:143)
        at se.lth.cs.srl.pipeline.ArgumentIdentifier.parse(ArgumentIdentifier.ja
va:43)
        at se.lth.cs.srl.pipeline.Pipeline.parse(Pipeline.java:104)
        at se.lth.cs.srl.SemanticRoleLabeler.parseSentence(SemanticRoleLabeler.j
ava:12)
        at se.lth.cs.srl.CompletePipeline.parse(CompletePipeline.java:79)
        at scroll.mate.MateAnnotations.parse(MateAnnotations.java:36)`

Thanks in advance,
Daniel

Minor: upgrading stanford dependency would break anything?

Would anything break if I use version 3.7.0 for Stanford dependency?