akb89 / pyfn Goto Github PK

View Code? Open in Web Editor NEW

23.0 23.0 5.0 484 KB

A python module to process data for Frame Semantic Parsing

License: MIT License

Shell 26.02% Python 73.98%

coling2018 frame-semantic-parsing framenet framenet-xml-data open-sesame pipeline preprocessing semafor

pyfn's People

Contributors

Stargazers

Watchers

Forkers

pradipcyb anjapago ftamburin dogblack

pyfn's Issues

Is there some way to test/predict on new data?

Thanks for your hard work!

I have a question about how to test or predict on new data. If I have some new labeled data (other than official framenet) or just want to do frame semantic parsing on unlabeled sentences using trained semafor and open-sesame, should I prepare the data in the same style as fulltext.xml and then do unmarshalling?

Creating Directory instead of Converting from Semafor to Semeval

When I run the pyfn convert command to convert from SEMAFOR CoNLL format into Semeval XML, I run into IsADirectoryError: [Errno 21] Is a directory error.

I found the command now creates the directory with the inclusion of the filename /home/zxy485/zxy485gallinahome/week1/pyfn/experiments/xp_001/output/test.predicted.xml/

$ pyfn convert \
>   --from semafor \
>   --to semeval \
>   --source /home/zxy485/zxy485gallinahome/week1/pyfn/experiments/xp_001/data/test.frame.elements \
>   --target /home/zxy485/zxy485gallinahome/week1/pyfn/experiments/xp_001/output/test.predicted.xml \
>   --sent /home/zxy485/zxy485gallinahome/week1/pyfn/experiments/xp_001/data/test.sentences
INFO - Marshalling pyfn.AnnotationSet objects to SEMEVAL XML...
INFO - Marshalling pyfn.AnnotationSet objects to SEMEVAL XML...
INFO - Marshalling pyfn.AnnotationSet objects to SEMEVAL XML...
INFO - Saving output to /home/zxy485/zxy485gallinahome/week1/pyfn/experiments/xp_001/output/test.predicted.xml
INFO - Saving output to /home/zxy485/zxy485gallinahome/week1/pyfn/experiments/xp_001/output/test.predicted.xml
INFO - Saving output to /home/zxy485/zxy485gallinahome/week1/pyfn/experiments/xp_001/output/test.predicted.xml
Traceback (most recent call last):
  File "/home/zxy485/.local/bin/pyfn", line 10, in <module>
    sys.exit(main())
  File "/home/zxy485/.local/lib/python3.6/site-packages/pyfn/main.py", line 198, in main
    args.func(args)
  File "/home/zxy485/.local/lib/python3.6/site-packages/pyfn/main.py", line 91, in _convert
    args.excluded_annosets)
  File "/home/zxy485/.local/lib/python3.6/site-packages/pyfn/marshalling/marshallers/semeval.py", line 128, in marshall_annosets
    excluded_sentences, excluded_annosets)
  File "/home/zxy485/.local/lib/python3.6/site-packages/pyfn/marshalling/marshallers/semeval.py", line 110, in _marshall_annosets
    pretty_print=True)
  File "src/lxml/etree.pyx", line 2048, in lxml.etree._ElementTree.write
  File "src/lxml/serializer.pxi", line 721, in lxml.etree._tofilelike
  File "src/lxml/serializer.pxi", line 780, in lxml.etree._create_output_buffer
  File "src/lxml/serializer.pxi", line 770, in lxml.etree._create_output_buffer
IsADirectoryError: [Errno 21] Is a directory

Document unmarshaller

In my application, I typically need to write parsing code to load FrameNet data. It would be nice if I just could use the data structure and code that is already in pyfn to load FrameNet into my application. As I assume the code for that is already there, it would just need documentation to tell users how to load FrameNet data into Python objects.

Error with Embeddings

When I run ./frameid.sh -m train -x 101 after conversion and preprocessing with
./preprocess.sh -x 101 -t nlp4j -d bmst -p semafor
I encounter this error:

Preparing files for frame identification...
Converting to .flattened format for the SEMAFOR parser...
Processing file: /home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../experiments/xp_101/frameid/data/corpora/test.sentences.conllx
Done
Training frame identification on all models...
Using TensorFlow backend.
['/home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../lib/eacl2017-oodFrameNetSRL/simpleFrameId/main.py', 'train', '/home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../experiments/xp_101/frameid']
Traceback (most recent call last):
  File "/home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../lib/eacl2017-oodFrameNetSRL/simpleFrameId/main.py", line 186, in <module>
    EMBEDDINGS_NAME = sys.argv[3]
IndexError: list index out of range
Done

It seems like the embedding file is missing. May I know where to seek for the embedding file? Thank you!

Which formats can be converted to?

Hello,

The readme says:

"pyfn can be used to:

convert data to and from FRAMENET XML, SEMEVAL XML, SEMAFOR CoNLL, BIOS and CoNLL-X"

But after this, CONLL-X is not mentioned. I do not see an option for this in the code, either.

I need to convert to a standard CONLL format so that I can match it to another dataset in another CONLL format. CONLL-X would work for this purpose.* Is there actually a way to convert to this format, or is the readme in error?

I don't think I can use SEMAFOR CONLL because I need a standard CONLL to use a standard CONLL converter to match them. So, unless SEMAFOR CONLL conforms exactly to a standard CONLL format (e.g., CONLL-05, CONLL-12, CONLL-X, CONLL-U), it will not work. If "SEMAFOR CONLL" is just another name for one of these, can someone kindly inform we which one?

Thank you!
Alan

Additional:

After trying a conversion to semafor conll format, I have get this:

dev.frames dev.sentences test.frames test.sentences train.frame.elements train.sentences

x.sentences files are just the raw text sentences. Like so:

" 'The true voodoo-worshipper attempts nothing of importance without certain sacrifices which are intended to propitiate his unclean gods .
" A chaotic case , my dear Watson , " said Holmes over an evening pipe .
" A lamb , I should say , or a kid . "

the x.frames files contain only frame information, no frame element info:

1       0.0     1       Stimulus_focus  nice.a  13      nice    1232
1       0.0     1       Buildings       pub.n   14      pubs    1232
1       0.0     1       Education_teaching      teach.v 3       taught  1233

For the dev and test sets, this is all the information contained in these files. IOW, where is the frame element data for the test and dev splits? Am I confused? Is this info somewhere else?

Thanks again,
Alan

pyfn convert crashes because it cannot find logging.yml

Steps to reproduce:

1, Create a virtual environment (I used 3.6 with no site packages)
2. Install pyfn via pip
3. Run pyfn convert, I think that the command line arguments do not matter, it crashes before doing anything.

pyfn convert --from fnxml --to semeval --source XXX --target XXX --splits {train,dev,test}
Traceback (most recent call last):
  File "/home/jck/git/XXXY/venv/bin/pyfn", line 7, in <module>
    from pyfn.main import main
  File "/home/jck/git/XXXY/venv/lib/python3.6/site-packages/pyfn/main.py", line 27, in <module>
    os.path.join(os.path.dirname(__file__), 'logging', 'logging.yml')))
  File "/home/jck/git/XXXY/venv/lib/python3.6/site-packages/pyfn/utils/config.py", line 19, in load
    with open(config_file, 'r') as config_stream:
FileNotFoundError: [Errno 2] No such file or directory: '/home/jck/git/XXXY/venv/lib/python3.6/site-packages/pyfn/logging/logging.yml'

pyfn convert crashes when target folder does not exist

When I run convert and use a target folder that does not exist, then pyfn crashes with a unspecific error message.

I see two solutions:

Convert should create the target folder if it does not exist
Log an error

I would like to have 1.

pyfn convert --from fnxml --to semeval --source /home/jck/git/fn/data/fndata-1.7-with-dev --target /home/jck/git/fn/data/fndata-1.7-converted --splits train
INFO - Marshalling pyfn.AnnotationSet objects to SEMEVAL XML...
INFO - Marshalling pyfn.AnnotationSet objects to SEMEVAL XML...
INFO - Marshalling pyfn.AnnotationSet objects to SEMEVAL XML...
INFO - Saving output to /home/jck/git/fn/data/fndata-1.7-converted/train.gold.xml
INFO - Saving output to /home/jck/git/fn/data/fndata-1.7-converted/train.gold.xml
INFO - Saving output to /home/jck/git/fn/data/fndata-1.7-converted/train.gold.xml
Traceback (most recent call last):
  File "/home/jck/git/fn/venv/bin/pyfn", line 11, in <module>
    sys.exit(main())
  File "/home/jck/git/fn/venv/lib/python3.6/site-packages/pyfn/main.py", line 197, in main
    args.func(args)
  File "/home/jck/git/fn/venv/lib/python3.6/site-packages/pyfn/main.py", line 90, in _convert
    args.excluded_annosets)
  File "/home/jck/git/fn/venv/lib/python3.6/site-packages/pyfn/marshalling/marshallers/semeval.py", line 121, in marshall_annosets
    excluded_sentences, excluded_annosets)
  File "/home/jck/git/fn/venv/lib/python3.6/site-packages/pyfn/marshalling/marshallers/semeval.py", line 110, in _marshall_annosets
    pretty_print=True)
  File "src/lxml/etree.pyx", line 2039, in lxml.etree._ElementTree.write
  File "src/lxml/serializer.pxi", line 721, in lxml.etree._tofilelike
  File "src/lxml/serializer.pxi", line 780, in lxml.etree._create_output_buffer
  File "src/lxml/serializer.pxi", line 770, in lxml.etree._create_output_buffer
FileNotFoundError: [Errno 2] No such file or directory
Makefile:4: recipe for target 'convert' failed
make: *** [convert] Error 1

Unclear explanation on `splits` in README

--splits: specify which splits should be converted. Use --splits dev to only process dev and test splits and guarantee no overlap between dev and test. Use --splits train to process train dev and test splits and guarantee no overlap across splits. Default to --splits test.

Is really confusing, as you say it defaults to test, which you do not explain, the example uses more than one split {test,dev} but this explanation uses just one. Is it just using part of the data or splitting all the data into 2 splits instead of three when I say {test,dev}?
The example is:

pyfn convert \
  --from fnxml \
  --to bios \
  --source /abs/path/to/fndata-1.x \
  --target /abs/path/to/xp/data/output/dir \
  --splits train \
  --output_sentences \
  --filter overlap_fes

Missing `test.predicted.xml` after running decoding with SEMAFOR

May I know what should I expect after running decoding with SEMAFOR? I thought I would receive test.predicted.xml, or a document where it stores the predicted arguments as I learn from running score.sh but I couldn't find the file in /home/zxy485/zxy485gallinahome/week1/pyfn/experiments/xp_001/data

Codes that I run:

$ ./semafor.sh -m decode -x 001 -s test
ROFAMES TRAIN MODE OPTIONS
  JAVA_HOME_BIN = /usr/bin/
  CLASSPATH = /home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../lib/semafor/bin/../rofames-1.0.0.jar
  XP_DIR = /home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../experiments/xp_001
  splits = test
  kbest = 1
  max_ram = 8g
  with_hierarchy = FALSE
  LOGS_DIR = /home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../log

Decoding with ROFAMES...
[INFO] ScoreWithGoldFrames:44  - Initializing parser for scoring...
[INFO] ScoreWithGoldFrames:45  - Extracting dependency-parsed testing sentences...
[INFO] ScoreWithGoldFrames:46  - 	from: /home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../experiments/xp_001/data/test.sentences.conllx
[INFO] ScoreWithGoldFrames:49  - Done extracting dependency-parsed testing sentences
[INFO] ScoreWithGoldFrames:50  - Extracting frames...
[INFO] ScoreWithGoldFrames:51  - 	from: /home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../experiments/xp_001/data/test.frames
[INFO] ScoreWithGoldFrames:55  - Done extracting frames
[INFO] ScoreWithGoldFrames:56  - Extracting argument identification alphabet...
[INFO] ScoreWithGoldFrames:57  - 	from: /home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../experiments/xp_001/model/parser.conf
0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 1100000 1200000 1300000 1400000 1500000 1600000 1700000 1800000 1900000 2000000 2100000 2200000 2300000 2400000 2500000 2600000 2700000
[INFO] ScoreWithGoldFrames:60  - Done extracting argument identification alphabet
[INFO] ScoreWithGoldFrames:61  - Extracting Frame2FrameElement dictionary...
[INFO] ScoreWithGoldFrames:62  - 	from: /home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../experiments/xp_001/data/framenet.frame.element.map
[INFO] ScoreWithGoldFrames:64  - Done extracting Frame2FrameElement dictionary
[INFO] ScoreWithGoldFrames:65  - Initializing decoder...
[INFO] ScoreWithGoldFrames:66  - 	from: /home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../experiments/xp_001/model/argmodel.dat
[INFO] ScoreWithGoldFrames:67  - 	and from: /home/zxy485/zxy485gallinahome/week1/pyfn/scripts/../experiments/xp_001/model/parser.conf
[INFO] ScoreWithGoldFrames:70  - Done initializing decoder
[INFO] ScoreWithGoldFrames:74  - Done initializing parser
[INFO] ScoreWithGoldFrames:86  - Scoring with gold frames...
[INFO] ScoreWithGoldFrames:87  - Predicting arguments...
[INFO] StaticSemafor:111 - sentences.size = 1247
[INFO] StaticSemafor:112 - frameSplitsMap.size = 1247
[INFO] StaticSemafor:129 - There are 0 sentences without frame annotation
[INFO] ScoreWithGoldFrames:91  - Done predicting arguments

Directory and Files in /home/zxy485/zxy485gallinahome/week1/pyfn/experiments/xp_001

# data/
dev.frames
dev.sentences
framenet.frame.element.map
frames.xml
frRelations.xml
test.frame.elements
test.frames
test.gold.xml
test.sentences
test.sentences.conllx
train.frame.elements
train.sentences
train.sentences.conllx
train.sentences.conllx.flattened

# model/
argmodel.dat
featurecache.jobj
parser.conf
train.events.bin
train.sentences.frame.elements.spans

frameid.sh - ValueError: need at least one array to concatenate

Thank you so much for the reply #13 !
I now run into a new error, but I am unsure what's the main cause of it. Is it because my data file is corrupted as the error is ValueError: need at least one array to concatenate?

$ module load python/3.6.6
$ pyfn convert \
  --from fnxml \
  --to semafor \
  --source /path/to/fndata-1.7-with-dev \
  --target /path/to/experiments/xp_101/data \
  --splits train \
  --output_sentences
$ ./preprocess.sh -x 101 -t nlp4j -d bmst -p semafor  # no error
$ ./prepare.sh -x 101 -p semafor -s test -f /home/zxy485/zxy485gallinahome/week1-4/pyfn/data/fndata-1.7-with-dev  # no error
$ module load python2/2.7.13
$ ./frameid.sh -m train -x 101
Preparing files for frame identification...
Converting to .flattened format for the SEMAFOR parser...
Processing file: /home/zxy485/zxy485gallinahome/week1-4/pyfn/scripts/../experiments/xp_101/frameid/data/corpora/test.sentences.conllx
Done
Training frame identification on all models...
Using TensorFlow backend.
train
Starting resource manager
Initializing reporters
Running the experiments!
12 configurations,  1  train-test pairs ->  12  runs
Malformed parse data in sentence 0
Malformed parse data in sentence 1
Malformed parse data in sentence 2
Malformed parse data in sentence 3
Malformed parse data in sentence 4
Malformed parse data in sentence 5
Malformed parse data in sentence 6
Malformed parse data in sentence 7
Malformed parse data in sentence 8
Malformed parse data in sentence 9
Malformed parse data in sentence 10
Malformed parse data in sentence 11
Malformed parse data in sentence 12
Malformed parse data in sentence 13
Malformed parse data in sentence 14
Malformed parse data in sentence 15
Malformed parse data in sentence 16
Malformed parse data in sentence 17
Malformed parse data in sentence 18
Malformed parse data in sentence 19
Malformed parse data in sentence 20
Malformed parse data in sentence 21
Malformed parse data in sentence 22
Malformed parse data in sentence 23
Malformed parse data in sentence 24
Malformed parse data in sentence 25
Malformed parse data in sentence 26
Malformed parse data in sentence 27
Malformed parse data in sentence 28
Malformed parse data in sentence 29
Malformed parse data in sentence 30
Malformed parse data in sentence 31
Malformed parse data in sentence 32
Malformed parse data in sentence 33
Malformed parse data in sentence 34
Malformed parse data in sentence 35
Malformed parse data in sentence 36
Malformed parse data in sentence 37
Malformed parse data in sentence 38
Malformed parse data in sentence 39
Malformed parse data in sentence 40
Malformed parse data in sentence 41
Malformed parse data in sentence 42
Malformed parse data in sentence 43
Malformed parse data in sentence 44
Malformed parse data in sentence 45
Malformed parse data in sentence 46
Malformed parse data in sentence 47
Malformed parse data in sentence 48
Malformed parse data in sentence 49
Malformed parse data in sentence 50
Malformed parse data in sentence 51
Malformed parse data in sentence 52
Malformed parse data in sentence 53
Malformed parse data in sentence 54
Malformed parse data in sentence 55
Malformed parse data in sentence 56
Malformed parse data in sentence 57
Malformed parse data in sentence 58
Malformed parse data in sentence 59
Malformed parse data in sentence 60
Malformed parse data in sentence 61
Malformed parse data in sentence 62
Malformed parse data in sentence 63
Malformed parse data in sentence 64
Malformed parse data in sentence 65
Malformed parse data in sentence 66
Malformed parse data in sentence 67
Malformed parse data in sentence 68
Malformed parse data in sentence 69
Malformed parse data in sentence 70
Malformed parse data in sentence 71
Malformed parse data in sentence 72
Malformed parse data in sentence 73
Malformed parse data in sentence 74
train.sentences.conllx.flattened train.frame.elements labeled: 3362 parsed: 75 graphs: 0
Traceback (most recent call last):
  File "/home/zxy485/zxy485gallinahome/week1-4/pyfn/scripts/../lib/eacl2017-oodFrameNetSRL/simpleFrameId/main.py", line 188, in <module>
    _train_all(HOME, EMBEDDINGS_NAME)
  File "/home/zxy485/zxy485gallinahome/week1-4/pyfn/scripts/../lib/eacl2017-oodFrameNetSRL/simpleFrameId/main.py", line 138, in _train_all
    X_train, y_train, lemmapos_train, gid_train = mapper.get_matrix(g_train)
  File "/mnt/rds/redhen/gallina/home/zxy485/week1-4/pyfn/lib/eacl2017-oodFrameNetSRL/simpleFrameId/representation.py", line 29, in get_matrix
    X = np.vstack(X)
  File "/home/zxy485/.local/lib/python2.7/site-packages/numpy/core/shape_base.py", line 237, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate
Done

akb89 / pyfn Goto Github PK

pyfn's People

Contributors

Stargazers

Watchers

Forkers

pyfn's Issues

Is there some way to test/predict on new data?

Creating Directory instead of Converting from Semafor to Semeval

Document unmarshaller

Error with Embeddings

Which formats can be converted to?

pyfn convert crashes because it cannot find logging.yml

pyfn convert crashes when target folder does not exist

Unclear explanation on `splits` in README

Missing `test.predicted.xml` after running decoding with SEMAFOR

frameid.sh - ValueError: need at least one array to concatenate

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent