aonotas / deep-crf Goto Github PK
View Code? Open in Web Editor NEWAn implementation of Conditional Random Fields (CRFs) with Deep Learning Method
Home Page: http://deep-crf.com
License: MIT License
An implementation of Conditional Random Fields (CRFs) with Deep Learning Method
Home Page: http://deep-crf.com
License: MIT License
rnn_list
in deepcrf/bi_lstm.py line 52 maybe typo.
When we predict named entity tags using named entity tagger such as CRF ++, KyTea and KNP, stdin and stdout are used for inputting target data and outputting prediction results.
Therefore, it seems to be good to make stdin and stdout available for inputting target data and outputting prediction results when we predict those.
I think that option of the prediction command should be imported from train_config file as much as possible because option of the train command and that of the prediction command has overlapping portions.
I have tried to install it by instructed README file, but I always get error:
In file included from /tmp/easy_install-btbh8apr/h5py-2.7.1/h5py/defs.c:569:0:
/tmp/easy_install-btbh8apr/h5py-2.7.1/h5py/api_compat.h:27:18: fatal error: hdf5.h: No such file or directory
compilation terminated.
error: Setup script exited with error: command 'gcc' failed with exit status 1
Can anybody help me to out from this.
After installing getting this error:
C:\WINDOWS\system32>deep-crf train --help
Traceback (most recent call last):
File "C:\Users\UserName\AppData\Local\Programs\Python\Python37\Scripts\deep-crf-script.py", line 11, in <module>
load_entry_point('DeepCRF==1.0', 'console_scripts', 'deep-crf')()
File "C:\Users\UserName\AppData\Local\Programs\Python\Python37\lib\site-packages\pkg_resources\__init__.py", line 480, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "C:\Users\UserName\AppData\Local\Programs\Python\Python37\lib\site-packages\pkg_resources\__init__.py", line 2693, in load_entry_point
return ep.load()
File "C:\Users\UserName\AppData\Local\Programs\Python\Python37\lib\site-packages\pkg_resources\__init__.py", line 2324, in load
return self.resolve()
File "C:\Users\UserName\AppData\Local\Programs\Python\Python37\lib\site-packages\pkg_resources\__init__.py", line 2330, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
ModuleNotFoundError: No module named 'deepcrf'
Installation output:
C:\WINDOWS\system32>python setup.py install
running install
running bdist_egg
running egg_info
writing DeepCRF.egg-info\PKG-INFO
writing dependency_links to DeepCRF.egg-info\dependency_links.txt
writing entry points to DeepCRF.egg-info\entry_points.txt
writing requirements to DeepCRF.egg-info\requires.txt
writing top-level names to DeepCRF.egg-info\top_level.txt
reading manifest file 'DeepCRF.egg-info\SOURCES.txt'
writing manifest file 'DeepCRF.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
warning: install_lib: 'build\lib' does not exist -- no Python modules to install
creating build\bdist.win-amd64\egg
creating build\bdist.win-amd64\egg\EGG-INFO
copying DeepCRF.egg-info\PKG-INFO -> build\bdist.win-amd64\egg\EGG-INFO
copying DeepCRF.egg-info\SOURCES.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying DeepCRF.egg-info\dependency_links.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying DeepCRF.egg-info\entry_points.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying DeepCRF.egg-info\requires.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying DeepCRF.egg-info\top_level.txt -> build\bdist.win-amd64\egg\EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist\DeepCRF-1.0-py3.7.egg' and adding 'build\bdist.win-amd64\egg' to it
removing 'build\bdist.win-amd64\egg' (and everything under it)
Processing DeepCRF-1.0-py3.7.egg
Removing c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages\DeepCRF-1.0-py3.7.egg
Copying DeepCRF-1.0-py3.7.egg to c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages
DeepCRF 1.0 is already the active version in easy-install.pth
Installing deep-crf-script.py script to C:\Users\UserName\AppData\Local\Programs\Python\Python37\Scripts
Installing deep-crf.exe script to C:\Users\UserName\AppData\Local\Programs\Python\Python37\Scripts
Installed c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages\deepcrf-1.0-py3.7.egg
Processing dependencies for DeepCRF==1.0
Searching for h5py==2.9.0
Best match: h5py 2.9.0
Adding h5py 2.9.0 to easy-install.pth file
Using c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages
Searching for Click==7.0
Best match: Click 7.0
Adding Click 7.0 to easy-install.pth file
Using c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages
Searching for numpy==1.16.2
Best match: numpy 1.16.2
Adding numpy 1.16.2 to easy-install.pth file
Installing f2py-script.py script to C:\Users\UserName\AppData\Local\Programs\Python\Python37\Scripts
Installing f2py.exe script to C:\Users\UserName\AppData\Local\Programs\Python\Python37\Scripts
Using c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages
Searching for six==1.12.0
Best match: six 1.12.0
Adding six 1.12.0 to easy-install.pth file
Using c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages
Finished processing dependencies for DeepCRF==1.0
Chainer installation output:
C:\WINDOWS\system32>pip install chainer==2.1.0
Collecting chainer==2.1.0
Downloading https://files.pythonhosted.org/packages/3a/67/35f757014d733e0193a1f9b2b466750754723f22a13c0c546810bf137590/chainer-2.1.0.tar.gz (324kB)
100% |████████████████████████████████| 327kB 633kB/s
Requirement already satisfied: filelock in c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages (from chainer==2.1.0) (3.0.12)
Requirement already satisfied: mock in c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages (from chainer==2.1.0) (2.0.0)
Requirement already satisfied: nose in c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages (from chainer==2.1.0) (1.3.7)
Requirement already satisfied: numpy>=1.9.0 in c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages (from chainer==2.1.0) (1.16.2)
Requirement already satisfied: protobuf>=2.6.0 in c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages (from chainer==2.1.0) (3.7.1)
Requirement already satisfied: six>=1.9.0 in c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages (from chainer==2.1.0) (1.12.0)
Requirement already satisfied: pbr>=0.11 in c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages (from mock->chainer==2.1.0) (5.1.3)
Requirement already satisfied: setuptools in c:\users\UserName\appdata\local\programs\python\python37\lib\site-packages (from protobuf>=2.6.0->chainer==2.1.0) (39.0.1)
Building wheels for collected packages: chainer
Running setup.py bdist_wheel for chainer ... done
Stored in directory: C:\Users\UserName\AppData\Local\pip\Cache\wheels\a0\23\6e\9db9f23a5317e5c75b5a4b82f8b3a8db7f9feedbeac2877542
Successfully built chainer
Installing collected packages: chainer
Found existing installation: chainer 1.24.0
Uninstalling chainer-1.24.0:
Successfully uninstalled chainer-1.24.0
Successfully installed chainer-2.1.0
I will add trained models for POS tagging, Chunking, NER and Japanese sentence splitter.
In this tool's prediction results, only predicted labels are displayed now.
However, I think that it is more convenient to be able to also display input data.
For Example (--enable_combined_input
is option of combination of input data and prediction result):
$ cat predict input_file_multi.txt
Barack NN
Hussein NN
Obama NN
is VBZ
a DT
man NN
. .
Yuji NN B−PERSON
Matsumoto NN E−PERSON
is VBZ O
a DT O
man NN O
. . O
$ deep-crf predict input_file_multi.txt --delimiter=' ' --input_idx 0,1 --output_idx 2 --model_filename ./save_model_dir/bilstm-cnn-crf_multi_epoch3.model --save_dir save_model_dir --save_name bilstm-cnn-crf_multi --predicted_output predicted.txt --enable_combined_input
$ cat predicted.txt
Barack NN B−PERSON
Hussein NN I−PERSON
Obama NN E−PERSON
is VBZ O
a DT O
man NN O
. . O
Yuji NN B−PERSON
Matsumoto NN E−PERSON
is VBZ O
a DT O
man NN O
. . O
what is the use of --dev?
what does early stopping mean?
The example under Additional Feature Support in the readme, mentions arguments −−input idx and −−output idx, that are not handled yet.
This is definitely a great functionality to have, since it would be possible to ignore metadata fields from the raw data files.
What is --dev_file
option?
Is supported tag format by deep-crf "[BI] - {Tag Name}" and "O" only?
Currently it seems to support only Python 2.x, but will it also support Python 3.x?
I plan to use it for fine-tuning. However, L.NStepBiLSTM
can be fine-tuning only in Chainer v 3.0.0 or later. This is because __init__
function of neural networks for fine tuning must have two parameters initialW
and initial_bias
. So I would like to ask if it supports Chainer v 3.0.0.
When I predicted with English test file, it worked.
But when I predicted with Japanese test file (and set pre-trained Japanese word embeddings file), I got the following error.
File "build/bdist.linux-x86_64/egg/deepcrf/__init__.py", line 119, in predict
File "build/bdist.linux-x86_64/egg/deepcrf/main.py", line 149, in run
File "build/bdist.linux-x86_64/egg/deepcrf/util.py", line 65, in load_vocab
ValueError: need more than 1 value to unpack
My command is like:
deep-crf predict input_test_jp.txt --delimiter=" " --model_filename ./save_jpmodel_dir/bilstm-cnn-crf_adam_jp_epoch41.model --save_dir save_jpmodel_dir --save_name bilstm-cnn-crf_adam_jp --word_emb_file jp_word_emb300.txt --n_word_emb 300 --word_emb_vocab_type replace_only --predicted_output predicted41_jp.txt --gpu 0
Any ideas? Thank you.
There are issues running the current build, due to code-breaking changes introduced in Chainer version 2.0. We should specify the chainer version during the installtion process as 1.24.0 or make changes to the existing code to support v2.
@aonotas, Let me know if you need me to send a pull request.
When I run the following command to predict, I got this error.
$ deep-crf predict data/predict --delimiter=' ' --model_filename ./save_model_dir/bilstm-cnn-crf_adam_epoch43.model --save_dir save_model_dir/ --save_name bilstm-cnn-crf_adam --predicted_output predicted43.txt --gpu 0
data/predict
is the following data format.
SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRISE DEFEAT .
Nadim Ladki
AL-AIN , United Arab Emirates 1996-12-06
Japan began the defence of their Asian Cup title with a lucky 2-1 win against Syria in a Group C championship match on Friday .
But China saw their luck desert them in the second match of the group , crashing to a surprise 2-0 defeat to newcomers Uzbekistan .
I have no idea to solve it.
Any ideas? Thank you.
When I trained with English train/dev files, it worked.
But when I trained with Japanese train/dev files (and set pre-trained Japanese word embeddings file), I got the following error.
File "build/bdist.linux-x86_64/egg/deepcrf/__init__.py", line 66, in train
File "build/bdist.linux-x86_64/egg/deepcrf/main.py", line 98, in run
File "build/bdist.linux-x86_64/egg/deepcrf/util.py", line 102, in read_conll_file
TypeError: object of type 'int' has no len()
I want to set pre-trained Japanese char embeddings file, but it looks like there is not --char_emb_file
option.
I am wondering if this is the cause of the error.
Does it support Japanese train/dev file (or --char_emb_file option) ?
Thank you.
Hello,
I have the following issue. I have looked through all existing issues and it seems that it is a new issue. It happens when I try to train a model. The errors happen for both dummy data (I got on your README, and the delimiter is one space character, https://www.dropbox.com/s/e7lflyuuahox2ym/dummy_training.txt?dl=0) and my data. Could you give me insights into where the problem is?
deep-crf train dummy_training.txt --delimiter=' ' --dev_file input_file_dev.txt --save_dir . --save_name bilstm-cnn-crf_adam --optimizer adam
[2017-12-31 08:05:23,945] [INFO] start training... ([email protected]:417)
[2017-12-31 08:05:23,945] [INFO] epoch:0 ([email protected]:424)
[2017-12-31 08:05:23,945] [INFO] [train] ([email protected]:425)
[2017-12-31 08:05:24,165] [INFO] loss :8.92536354065 ([email protected]:462)
[2017-12-31 08:05:24,165] [INFO] accuracy :23.076923076923077 ([email protected]:463)
Traceback (most recent call last):
File "/Users/longpham/anaconda/bin/deep-crf", line 11, in <module>
load_entry_point('DeepCRF==1.0', 'console_scripts', 'deep-crf')()
File "/Users/longpham/anaconda/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/Users/longpham/anaconda/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/Users/longpham/anaconda/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/longpham/anaconda/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/longpham/anaconda/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/Users/longpham/anaconda/lib/python3.6/site-packages/DeepCRF-1.0-py3.6.egg/deepcrf/__init__.py", line 66, in train
File "/Users/longpham/anaconda/lib/python3.6/site-packages/DeepCRF-1.0-py3.6.egg/deepcrf/main.py", line 467, in run
File "/Users/longpham/anaconda/lib/python3.6/site-packages/DeepCRF-1.0-py3.6.egg/deepcrf/main.py", line 369, in eval_loop
ValueError: not enough values to unpack (expected 2, got 0)
how can i train it in batch?
what is the format of input_file.txt ? can you give me en example?
I used the following command:
deep-crf predict input_file_multi.txt --delimiter=' ' --input_idx 0,1 --output_idx 2 --model_filename ./save_model_dir/bilstm-cnn-crf_multi_epoch3.model --save_dir save_model_dir --save_name bilstm-cnn-crf_multi --predicted_output predicted.txt
and I am getting the following error:
ValueError: Invalid input feature sizes: "7". Please check at line [3]
The training and dev data files are in CoNLL format. Can anyone help?
Thanks in advance.
(Related to #7)
These methods use deprecated methods when using Chainer v1.24.0:
deep-crf/deepcrf/util_chainer.py
Lines 44 to 49 in ee2880c
deep-crf/deepcrf/util_chainer.py
Lines 52 to 57 in ee2880c
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.