crownpku / information-extraction-chinese Goto Github PK

Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取

Python 100.00%

nlp chinese-nlp information-extraction relation-extraction named-entity-recognition

information-extraction-chinese's Introduction

Information-Extraction-Chinese

Chinese information extraction, including named entity recognition, relation extraction and more, focused on state-of-art deep learning methods.

To make clear, this project has several sub-tasks with detailed separate README.md.

Chinese Relation Extraction by biGRU with Character and Sentence Attentions

Details in folder RE_BGRU_2ATT/

Chinese Named Entity Recognition by IDCNN/biLSTM with CRF layer

Details in folder NER_IDCNN_CRF/

Chinese Word Segmentation by Iterated Dilated Convolutions

Details at https://github.com/hankcs/ID-CNN-CWS

Reference

information-extraction-chinese's People

Contributors

Stargazers

Watchers

Forkers

zxsted itgirls babyzpj ericxsun zhweizhang tedrepo dongfangyixi colinsongf ajoeajoe dt1219 cutecha mqrshiyan hycyc keaideii lyfree132 rubeeny fulquan pingoogle yaps lxj0276 hydercps allensmile chagge 10183308 benjamesbabala spongebbob mathshelly2014 yinmingjun feixuan090803 coder3344 zhengyu19921215 chinesektry leezqcst lu839684437 ryfan-rs wwf5067 xsongx mariobai iamsile pokbe chybot chuzig meccy hsd315 tgworld gonewithgt yanglijun960703 cyjack niu2niu2niu louis-xuy zjukongming mansteinliliang du-yang greengrass2015 fendaq runander befeng geeseek changfengfeng chenjun0210 wuwuyang juary88 sysulj abc3436645 cshong9 yangvict jane8816 zhianyang xitongdashi xtuyaowu aim-for-better zhyuxie generalzh hwaking berryhn tfnlp ufully topgunforone coopertian liweijin winnerineast speak2me jasonhoou wonderfulsuccess chenglongchen guojiangwei2 shenyong123 jackysnake june-cheer qenvelope guidachengong lium226 godsme fuyanzhe michaelfeng87 whumatrix frankslb lrxzhy hengqujushi fanfanfeng

information-extraction-chinese's Issues

Test_GRU.py 运行问题

我已经pull了最新的项目了，可是但我在运行test_GRU.py还是出现了下面的这个问题
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_BACKWARD/multi_rnn_cell/cell_0/gru_cell/candidate/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_BACKWARD/multi_rnn_cell/cell_0/gru_cell/gates/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_BACKWARD/multi_rnn_cell/cell_0/gru_cell/candidate/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_BACKWARD/multi_rnn_cell/cell_0/gru_cell/gates/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_FORWARD/multi_rnn_cell/cell_0/gru_cell/candidate/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_FORWARD/multi_rnn_cell/cell_0/gru_cell/candidate/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_FORWARD/multi_rnn_cell/cell_0/gru_cell/gates/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_FORWARD/multi_rnn_cell/cell_0/gru_cell/gates/weights not found in checkpoint

tensorflow.python.framework.errors_impl.NotFoundError: Key model/GRU_BACKWARD/multi_rnn_cell/cell_0/gru_cell/candidate/biases not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

我的配置：
Python：ananconda python3.5
tensorflow: 1.0.0
这是我出错的问题。请问一下，这个是什么问题。
还有就是，问什么我自己训练不了模型，只能使用你的模型，我重新训练模型train_GRU.py并不能更新你之前的那个模型。

用过initial.py预处理问题

您好，我在用initial.py处理train.txt和test.txt时生成的train_q&a.txt和test__q&a.txt文件中的样本个数均小于原始的样本个数，而且数据量越大，少的越多，这是怎么回事啊？

'ascii' codec error

when I run python3 main.py --train=True --clean=True --model_type=idcnn in my GPU,the error is:
Traceback (most recent call last):
File "main.py", line 10, in
from model import Model
File "/root/wangys/NLP/Information-Extraction-Chinese-master/NER_IDCNN_CRF/model.py", line 8, in
from NER_IDCNN_CRF.utils import result_to_json
ImportError: No module named 'NER_IDCNN_CRF'

如果增加一个关系种类，那么如何改进已有代码

前提：用示例数据可以运行成功
在示例数据的12个关系后新增一个关系
我在network.py中将Settings类的num_classes由12改为13
在test_GRU,py中将main_for_evaluation()中的test_settings.num_classes和main(_)中的test_settings.num_classes更改为13
同时original_data中的train.txt/test.txt/relation2id.txt也作了相应更新，增加了新数据和关系
确保是第一次运行initial.py（即之前没有data中之前的npy）
但运行仍然报错如下

Caused by op 'save/Assign_10', defined at:
  File "test_GRU.py", line 339, in <module>
    tf.app.run()
  File "/home/wxw/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "test_GRU.py", line 177, in main
    saver = tf.train.Saver(names_to_vars)
  File "/home/wxw/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1218, in __init__
    self.build()
  File "/home/wxw/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1227, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/wxw/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1263, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/wxw/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 751, in _build_internal
    restore_sequentially, reshape)
  File "/home/wxw/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 439, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/home/wxw/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 160, in restore
    self.op.get_shape().is_fully_defined())
  File "/home/wxw/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 276, in assign
    validate_shape=validate_shape)
  File "/home/wxw/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 57, in assign
    use_locking=use_locking, name=name)
  File "/home/wxw/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/wxw/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/wxw/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [13] rhs shape= [12]
	 [[Node: save/Assign_10 = Assign[T=DT_FLOAT, _class=["loc:@model/bias_d"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/bias_d, save/RestoreV2_10)]]

暂时不知道还有哪些需要更改的地方，希望指点一二

执行报错

1、下载最新的源码包
2、win7x64 python 3.6.1 pip install tensorflow jieba
3、按照readme中的指导，执行Information-Extraction-Chinese-master\NER_IDCNN_CRF\main.py
报错，操作命令如下：
D:\cnn\Information-Extraction-Chinese-master\NER_IDCNN_CRF>python main.py --train=True --clean=True --model_type=idcnn
Building prefix dict from the default dictionary ...
Dumping model to file cache C:\Users\ADMINI~1\AppData\Local\Temp\jieba.cache
Loading model cost 1.436 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "main.py", line 46, in
flags.DEFINE_string("result_path", "result", "Path for results")
NameError: name 'os' is not defined

然后使用Pyscripter IDE运行这个main程序，倒是可以执行。

执行结果发现有点不对，结巴分词似乎没有生效，出来的信息都是一个字一个字的。

有点迷惑。

先看看为啥jieb结巴分词没有体现效果。
再看看为啥python 解释器中不能够运行，而在pycripter ide中却能够执行。

演示训练语料咨询

你好！请教一个小白问题，origin_data里的训练及预测语料是用什么工具整理成那种格式的，能提供下代码吗？谢谢！

训练时出现编码问题

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte
将train改成true会在codecs.py里出现这个问题

[NER_IDCNN_CRF] Get rid of dependency with Jieba?

Is it possible to use character embedding directly and get rid of dependency with Jieba segmentation tool?

Since using jieba introduces upstream error, it's hard to say how this idea will affect NER performance.

[NER_IDCNN_CRF] Training biLSTM+CRF: 'rnn' not defined问题

按照README.md介绍的方法执行：

#python3 main.py --train=True --clean=True --model_type=bilstm

出现了以下问题：

Traceback (most recent call last):
  File "main.py", line 228, in <module>
    tf.app.run(main)
  File "D:\Anaconda\envs\keras\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 222, in main
    train()
  File "main.py", line 174, in train
    model = create_model(sess, Model, FLAGS.ckpt_path, load_word2vec, config, id_to_char, logger)
  File "C:\Users\cloudy\Desktop\Information-Extraction-Chinese\NER_IDCNN_CRF\utils.py", line 172, in create_model
    model = Model_class(config)
  File "C:\Users\cloudy\Desktop\Information-Extraction-Chinese\NER_IDCNN_CRF\model.py", line 86, in __init__
    model_outputs = self.biLSTM_layer(model_inputs, self.lstm_dim, self.lengths)
  File "C:\Users\cloudy\Desktop\Information-Extraction-Chinese\NER_IDCNN_CRF\model.py", line 161, in biLSTM_layer
    lstm_cell[direction] = rnn.CoupledInputForgetGateLSTMCell(
NameError: name 'rnn' is not defined

查看源码，发现rnn确实没有定义，希望得到您的帮助，谢谢!

FileNotFoundError: [Errno 2] No such file or directory: 'maps.pkl'

Hello, I straightly run the main.py using pretrained model, but I get this error. I see the main.py and find that 'maps.pkl' need to be produced in training process. So if I don't want to train this model, how can i get this' maps.pkl' file?

支持tensorflow 1.5

您好，本人是初学者，希望通过您的项目学习和使用。
目前正在搭建运行环境，请问项目是否可以支持tensorflow 1.5。
谢谢。

关于vec.txt的生成

你好，我想请教下，你的仓库中的pretrained word embedding文件vec.txt,是怎么生成的啊？急求

run main.py and have import error

NER_IDCNN_CRF训练模型怎样修改迭代次数？max_epoch这个参数没用上？

引用咨询

您好，我目前在做英文文献生物信息方面知识图谱构建，需要以NER为基础工作，请问，我是否可以以您的模型为基础改进后引入本人工作，可否提供您的文献方便引用？

测试模型问题

[RE_BGRU_2ATT] 测试问题

您好，请问在运行test_GRU.py的时候有什么需要注意的么？直接运行提示文件找不到，谢谢

请教

我想请问一下，对于中文的实体关系抽取：
比如说我现在的句子为：**的首都是北京。这个句子中**和北京是实体，那么在计算每个词距离实体之间的相对位置的时候，是否要将两个实体看成一个整体比如：
** 的首都是北京。所有的词距离实体1的相对位置 ---> 0 0 1 2 3 4 5 6 .这种是将实体分开来计算距离实体1的相对距离。第二种情况是：** 的首都是北京。所有的词距离实体1的相对位置 0 1 2 3 4 5.这种方法是将实体看成一个整体。

我想请问一下，对于中文的每个词距离实体之间的相对位置，应该采用哪种方案？

模型训练时evaluate报错，错误在bath_paths

你好，我在进行模型训练时选择IDCNN，300M的训练语料，训练测试比9:1，在进行训练时，报错，原因是id_to_tag字典键缺失，我打印batch_paths，发现里边确实有9，而我只有9个标签，index最大应该是8，不知道是什么原因引起的，换了bi-lstm则没有出现任何问题，不知能否帮忙解答，谢谢

训练完模型，输入测试句子无法识别实体

大神，您好~
我用IOB方法标注的医学数据训练完模型后，输入训练集中有的句子进行测试，未能识别出任何实体。
输入任何医学相关的句子都无法识别出实体。
想请问，模型的效果是否与训练的数据量有关？是否与分词有关？
另外，您给的数据集为IOB标注方式，为何能在tags_schema为IOBES时跑通，而我的数据会报错？
期待您的回答

改写成提供API接口的方式

你好，我想把它改写成API接口的方式，使用flask 提供web服务，发现最后出来的结果非常差，下面试代码

def predict_line():
config = load_config(FLAGS.config_file)
logger = get_logger(FLAGS.log_file)
# limit GPU memory
tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
with open(FLAGS.map_file, "rb") as f:
char_to_id, id_to_char, tag_to_id, id_to_tag = pickle.load(f)
with tf.Session(config=tf_config) as sess:
model = create_model(sess, Model, FLAGS.ckpt_path, load_word2vec, config, id_to_char, logger)
#result = model.evaluate_line(sess, input_from_line(line, char_to_id), id_to_tag)
#return result
return model
先返回模型，再在flask中调用模型，原有的会每次都会create model ，

import tensorflow as tf
import numpy as np
from model import Model
from loader import load_sentences, update_tag_scheme
from loader import char_mapping, tag_mapping
from loader import augment_with_pretrained, prepare_dataset
from utils import get_logger, make_path, clean, create_model, save_model
from utils import print_config, save_config, load_config, test_ner
from data_utils import load_word2vec, create_input, input_from_line, BatchManager

#flags = tf.app.flags
#flags.DEFINE_string("map_file","maps.pkl","file for maps")

app = Flask(name)

predictmodel = predict_line()
print('model is loaded')

@app.route('/getNameModel', methods=['POST'])
def getNameModel():
title = request.json['title']
print(title)
tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
with open("maps.pkl", "rb") as f:
char_to_id, id_to_char, tag_to_id, id_to_tag = pickle.load(f)
result = ''
with tf.Session(config=tf_config) as sess:
sess.run(tf.global_variables_initializer())
result = predictmodel.evaluate_line(sess, input_from_line(title, char_to_id), id_to_tag)
print(result)
return json.dumps(result, ensure_ascii=False)

可是返回结果是：
{"entities": [{"end": 2, "start": 1, "type": "PER", "word": "想"}, {"end": 3, "start": 2, "type": "PER", "word": "集"}, {"end": 7, "start": 6, "type": "LOC", "word": "部"}, {"end": 8, "start": 7, "type": "ORG", "word": "位"}, {"end": 14, "start": 13, "type": "ORG", "word": "席"}, {"end": 15, "start": 12, "type": "LOC", "word": "联团的总于北京,首执"}, {"end": 16, "start": 15, "type": "PER", "word": "行"}], "string": "联想集团的总部位于北京,首席执行官是杨元庆先生"}

什么原因呢？谢谢

Dropout is a tensor which means that "if" doesn't work here. Try use tf.where of tf.cond.

Information-Extraction-Chinese/NER_IDCNN_CRF/model.py

Line 184 in 9d6fcfb

if self.dropout == 1.0:

NER任务，换了数据集，训练时NER loss出现负数

您好～我想用你的代码跑自己的数据，我的数据共有十三个实体类别。现在修改到能跑起来了，但是跑起来之后NER LOSS出现负数，如下图，我使用的是IDCNN, iob标注格式。

2018-03-22 18:32:19,216 - log/train.log - INFO - iteration:1 step:100/1449, NER loss:14548.164062
2018-03-22 18:32:24,091 - log/train.log - INFO - iteration:1 step:200/1449, NER loss:3423.436279
2018-03-22 18:32:28,313 - log/train.log - INFO - iteration:1 step:300/1449, NER loss:2515.652588
2018-03-22 18:32:31,800 - log/train.log - INFO - iteration:1 step:400/1449, NER loss:1957.449707
2018-03-22 18:32:36,219 - log/train.log - INFO - iteration:1 step:500/1449, NER loss:-4663.758301
2018-03-22 18:32:40,437 - log/train.log - INFO - iteration:1 step:600/1449, NER loss:-51438.355469
2018-03-22 18:32:44,024 - log/train.log - INFO - iteration:1 step:700/1449, NER loss:-87006806016.000000
2018-03-22 18:32:47,033 - log/train.log - INFO - iteration:1 step:800/1449, NER loss:-699415090167808.000000
2018-03-22 18:32:50,832 - log/train.log - INFO - iteration:1 step:900/1449, NER loss:-180303070683463680.000000
2018-03-22 18:32:53,791 - log/train.log - INFO - iteration:1 step:1000/1449, NER loss:-11487585336817614848.000000
2018-03-22 18:32:57,894 - log/train.log - INFO - iteration:1 step:1100/1449, NER loss:-426165464454536364032.000000

有关test文件中 main_for_evaluation() 的问题

您好，请问：在运行 main_for_evaluation() 时，程序并未报错而是一直卡在运行状态无变化。
经调试发现程序运行到：
with tf.variable_scope("model"):
mtest = network.GRU(is_training=False, word_embeddings=wordembedding, settings=test_settings)
上面代码中的network里的：
gru_cell_forward = tf.contrib.rnn.GRUCell(gru_size) #46行，调试到此句发现程序卡住不动。
gru_cell_backward = tf.contrib.rnn.GRUCell(gru_size)
于是找到了程序卡住的地方。但是并不知道如何纠错，希望得到您的指教，谢谢了！

运行代码RE_BGRU_2ATT中的initial.py，报错说找不到NA

File "F:/Information-Extraction-Chinese-master/RE_BGRU_2ATT/initial.py", line 85, in init
relation = relation2id['NA']
KeyError: 'NA'

另外，en1 = content[0]，en2 = content[1]
我觉得是否 en1 = content[1]， en2 = content[2]

我是小白一枚，请大神求教，Thanks♪(･ω･)ﾉ

的

-。-

TypeError: slice indices must be integers or None or have an index method

Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 1.034 seconds.
Prefix dict has been built succesfully.
Found 4313 unique words (979180 in total)
Loading pretrained embeddings from data/vec.txt...
Found 13 unique named entity tags
20864 / 2318 / 4636 sentences in train / dev / test.
Traceback (most recent call last):
File "main.py", line 228, in
tf.app.run(main)
File "/data/app/Anaconda/envs/resume/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "main.py", line 222, in main
train()
File "main.py", line 153, in train
train_manager = BatchManager(train_data, FLAGS.batch_size)
File "/data/app/resumeSpider/Information-Extraction-Chinese/NER_IDCNN_CRF/data_utils.py", line 285, in init
self.batch_data = self.sort_and_pad(data, batch_size)
File "/data/app/resumeSpider/Information-Extraction-Chinese/NER_IDCNN_CRF/data_utils.py", line 293, in sort_and_pad
batch_data.append(self.pad_data(sorted_data[i*batch_size : (i+1)*batch_size]))
TypeError: slice indices must be integers or None or have an index method

[RE_BGRU_2ATT]Data directory missing?

Just confirm if this folder is not provided.

跑已有的模型，执行python test_GRU.py报错

您好，在跑 test_GRU.py时提示
NotFoundError (see above for traceback): Key model/GRU_BACKWARD/multi_rnn_cell/cell_0/gru_cell/candidate/biases not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

而且在pycharm里直接跑没有提示我输入实体1 实体2 句子就说没有test

[RE_BGRU_2ATT] entity location embedding issue

Current entity location embedding is the first character of the entity, the rest of the entity characters are calculated as part of the sentence.

An improvement is to rewrite the original sentence, and make all characters within the entity as one unit only so that only entity location is embedded, and characters within the entity will not be involved into embedding calculation.

我碰到了这样问题

Traceback (most recent call last):
File "main.py", line 229, in
tf.app.run(main)
File "D:\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 223, in main
train()
File "main.py", line 180, in train
step, batch_loss = model.run_step(sess, True, batch)
File "C:\Users\weihao\Desktop\NER_IDCNN_CRF0\NER_IDCNN_CRF0\model.py", line 343, in run_step
feed_dict)
File "D:\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 789, in run
run_metadata_ptr)
File "D:\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 968, in _run
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
File "D:\Anaconda3\envs\tensorflow-gpu\lib\site-packages\numpy\core\numeric.py", line 531, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

训练数据获取

请问一下，你这个训练数据是怎么获取的。

NER_IDCNN_CRF中的tran.txt的内容怎么产生的

请问这个字标注是用CRF产生的吗，能提供代码吗？谢谢！

NameError: name 'os' is not defined

I try to run this code,but the main.py always to show this wrong .Although I have trying my best to solve this problem. it's still there. could you give me some advice? Other's code is OK!

关于test_GRU.py运行的问题

你好！我在训练后运行test_GRU.py进行测试，但是却报出如下错误：
PS D:\Information-Extraction-Chinese-master\Information-Extraction-Chinese-master\RE_BGRU_2ATT> python .\test_GRU.py
2018-02-12 11:31:16.730063: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-02-12T11:48:17.277201
Evaluating all test data and save data for PR curve
saving all test result...
Traceback (most recent call last):
File ".\test_GRU.py", line 339, in
tf.app.run()
File "C:\Users\a8524\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File ".\test_GRU.py", line 122, in main
average_precision = average_precision_score(allans, allprob)
File "C:\Users\a8524\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\metrics\ranking.py", line 188, in average_precision_score
sample_weight=sample_weight)
File "C:\Users\a8524\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\metrics\base.py", line 75, in _average_binary_score
return binary_metric(y_true, y_score, sample_weight=sample_weight)
File "C:\Users\a8524\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\metrics\ranking.py", line 180, in _binary_uninterpolated_average_precision
y_true, y_score, sample_weight=sample_weight)
File "C:\Users\a8524\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\metrics\ranking.py", line 417, in precision_recall_curve
sample_weight=sample_weight)
File "C:\Users\a8524\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\metrics\ranking.py", line 302, in _binary_clf_curve
check_consistent_length(y_true, y_score)
File "C:\Users\a8524\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\utils\validation.py", line 173, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [1100, 0]
还望解答！

[RE_BGRU_2ATT] [NER_IDCNN_CRF] 仅仅支持人名吗

想问下仅支持人名吗

[RE_BGRU_2ATT] 执行test_GRU.py报错

报错信息:
tensorflow.python.framework.errors_impl.NotFoundError: Key model/GRU_FORWARD/multi_rnn_cell/cell_0/gru_cell/gates/biases not found in checkpoint

怀疑是离线存储的模型的问题,我自己重新训练了一版,没有这个问题,只是数据有点少,效果不大好,楼主有更多的数据可以共享下吗

请输入中文句子，格式为 "name1 name2 sentence":李晓华王大牛李晓华和她的高中同学王大牛两个人前日一起去英国旅行
实体1: 李晓华
实体2: 王大牛
李晓华和她的高中同学王大牛两个人前日一起去英国旅行
关系是:
No.1: 夫妻, Probability is 0.345401
No.2: 父母, Probability is 0.178258
No.3: unknown, Probability is 0.135266
请输入中文句子，格式为 "name1 name2 sentence":李晓华王大牛王大牛命令李晓华在周末前完成这份代码。
实体1: 李晓华
实体2: 王大牛
王大牛命令李晓华在周末前完成这份代码。
关系是:
No.1: 父母, Probability is 0.257889
No.2: 夫妻, Probability is 0.207047
No.3: unknown, Probability is 0.0861223

FileNotFoundError: [Errno 2] No such file or directory: './data/train_q&a.txt'

我将项目clone下来发现没有/data目录

一句话中有多对实体关系的情况您怎么考虑呢

一句话中有多对实体关系的情况您怎么考虑呢（如ABC三个实体中，存在A_B,A_C两对实体关系的情况），拆成两行？

另外，您当前模型的精度怎么样呢

还有就是您模型中前后窗口长度设置的是多少呢

可能问题比较naive，还请您耐心赐教！

window8 tensorflow (1.3.0) 报错，请问怎么解决呢？

2018-02-28 11:58:39,624 - log\train.log - INFO - evaluate:dev
Traceback (most recent call last):
File "main.py", line 228, in
tf.app.run(main)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 222, in main
train()
File "main.py", line 188, in train
best = evaluate(sess, model, "dev", dev_manager, id_to_tag, logger)
File "main.py", line 88, in evaluate
eval_lines = test_ner(ner_results, FLAGS.result_path)
File "E:\pyworkspace\Information-Extraction-Chinese\NER_IDCNN_CRF\utils.py", line 66, in test_ner
eval_lines = return_report(output_file)
File "E:\pyworkspace\Information-Extraction-Chinese\NER_IDCNN_CRF\conlleval.py", line 282, in return_report
counts = evaluate(f)
File "E:\pyworkspace\Information-Extraction-Chinese\NER_IDCNN_CRF\conlleval.py", line 91, in evaluate
raise FormatError('unexpected number of features in line %s' % line)
conlleval.FormatError: unexpected number of features in line O O

封装实体识别的evaluate_line(),多个语句循环调用报错NotFoundError (see above for traceback): Key Variable_3 not found in checkpoint

我把实体识别的evaluate_line()封装成一个evaluate_line2(sentenceInfo)，在外层调用的代码是：
def test():
sentenceInfos = ['李晓华和她的丈夫王大牛，还有同事李小明前日一起去英国旅行了', '王大牛命令李晓华在周末前完成这份代码。']
for sentenceInfo in sentenceInfos:
# print(sentenceInfo)
print('---------------', sentenceInfo)
entities = main.evaluate_line2(sentenceInfo)
print('*'*10)
print(entities)
在第二次循环时报错如下NotFoundError (see above for traceback): Key Variable_3 not found in checkpoint
[[Node: save_1/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2/tensor_names, save_1/RestoreV2/shape_and_slices)]]
这是什么错误，我应该怎么修改啊，希望能解答，着急，谢谢了