lingluodlut / att-chemdner Goto Github PK

View Code? Open in Web Editor NEW

109.0 109.0 40.0 112.57 MB

Att-ChemdNER

License: Apache License 2.0

Python 94.81% Perl 5.19%

att-chemdner's Introduction

Ling Luo's home page 👋

att-chemdner's People

Contributors

Stargazers

Watchers

Forkers

andyrbm tybiot alexmonash nitya-yekkirala lukliz duterscmy buerkobe qiuyuew rhezaboge lzixuan mindis kaeflint jeromedockes jind11 xiongshufeng vinylbromide jcsyl damengde ai1361720220000 waterwind xiaojie2018 fangfang22-oss whuxiaobenben growingluffy huicao1995 dyiiiya flyrainkey qygjw akhilgakhar adaojie xiangqinyu monkeytb aiedward legendrobert zhouzb2007 satadisha twtkthh2008 redbirdtx athiban2001 2414466154

att-chemdner's Issues

ImportError: No module named tensorflow_backend

Successfully installed theano-0.9.0
You are using pip version 9.0.3, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
mldl@mldlUB1604:/ub16_prj/Att-ChemdNER/src$ python train.py --train trainfile --dev devfile --test testfile --pre_emb word_embedding.model
Using TensorFlow backend.
Traceback (most recent call last):
File "train.py", line 12, in
from utils import create_input
File "/home/mldl/ub16_prj/Att-ChemdNER/src/utils.py", line 187, in
import initializations;
File "/home/mldl/ub16_prj/Att-ChemdNER/src/initializations.py", line 3, in
import backend as K
File "/home/mldl/ub16_prj/Att-ChemdNER/src/backend/init.py", line 67, in
from .tensorflow_backend import *
ImportError: No module named tensorflow_backend
mldl@mldlUB1604:/ub16_prj/Att-ChemdNER/src$

Updating model

Hi
BilSTMCRF model train code is given and it generates the model, but how do we update the word2vec model for new words?

chunk and lemma missing in model

In the pre-trained model, if we pass the chunk and lemma, it is failing to interpret, was it trained without chunk and lemma? Kindly let know

dataset preprocessing

Hello, thanks for sharing the codes. I want to apply your model on my own dataset, so I need to preprocess my documents so that it could fit to your model. Can you tell me how to preprocess it, or can you share the preprocess code? Thank you a lot.

assertion error on training Bi-LSTM CRF

Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Using TensorFlow backend.
Traceback (most recent call last):
File "train.py", line 176, in
assert os.path.isfile(opts.train)
AssertionError

关于语料预处理的问题

你好，我想问下Chemical的原始语料用Genia tagger分词后，对应的BIO标签该如何获取？因为分词后的语料与标签文件给定的实体位置不一致，而且部分单词也被切分了，难以跟原本的标签对上，请问你是如何处理的？谢谢

关于对新文档进行预测标注

谢谢你的工作。
我想请问的是，你代码里对新文档预测标注的应该只有chemdner数据集的？
我跑了cdr数据集的模型，用来预测，报错越界，主要是loder.py文件的226行，
ner=[dic_to_id[w[4]] for w in s];
这里报list越界，debug之后感觉和数据集有关。
不知道作者能否提供什么帮助，非常感谢

TypeError: object of type 'TensorVariable' has no len()

报错如上，发现问题出在nn.py中的link函数中，不太清楚它需要的input是什么样子的

以图1，我的报错为例
word_input = word_layer.link(word_ids)
其中的参数word_ids是Theano.tensor.ivector，他是一维向量： ivector(int 类型的向量)
进入link函数中就报错了（参数input显示为 'TensorVariable' 类型）
-----------------------------------------依赖包

Name Version Build Channel

absl-py 1.2.0 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
backend 0.2.4.1 pypi_0 pypi
blas 1.0 mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ca-certificates 2022.07.19 haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
certifi 2022.9.14 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cython 0.29.28 pypi_0 pypi
gast 0.2.2 pypi_0 pypi
gensim 4.2.0 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.48.1 pypi_0 pypi
h5py 3.7.0 pypi_0 pypi
importlib-metadata 4.12.0 pypi_0 pypi
intel-openmp 2021.4.0 haa95532_3556 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
keras 2.2.5 pypi_0 pypi
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
markdown 3.4.1 pypi_0 pypi
markupsafe 2.1.1 pypi_0 pypi
mkl 2021.4.0 haa95532_640 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl-service 2.4.0 py37h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numpy 1.21.6 pypi_0 pypi
openssl 1.1.1q h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
opt-einsum 3.3.0 pypi_0 pypi
pip 22.1.2 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
protobuf 3.20.1 pypi_0 pypi
python 3.7.13 h6244533_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pyyaml 6.0 pypi_0 pypi
scipy 1.7.3 pypi_0 pypi
setuptools 65.3.0 pypi_0 pypi
six 1.16.0 pyhd3eb1b0_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
smart-open 6.2.0 pypi_0 pypi
sqlite 3.39.2 h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorboard 1.15.0 pypi_0 pypi
tensorflow 1.15.0 pypi_0 pypi
tensorflow-estimator 1.15.1 pypi_0 pypi
termcolor 2.0.1 pypi_0 pypi
theano 1.0.5 pypi_0 pypi
typing-extensions 4.3.0 pypi_0 pypi
vc 14.2 h21ff451_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
vs2015_runtime 14.27.29016 h5e58377_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
werkzeug 2.2.2 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wincertstore 0.2 py37haa95532_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wrapt 1.14.1 pypi_0 pypi
zipp 3.8.1 pypi_0 pypi

关于AttenTrain.py的代码

def step(self,state,attended,source):
#from theano.gradient import disconnected_grad;
#state=disconnected_grad(state_);
#M_state=T.dot(self.W_A_h,state) ;

    _energy=self.scoreFun(attended,state,self.W_A)
    energy=T.nnet.softmax(_energy);
    #energy=self.softmaxReScale(_energy,0.02);
    #energy=self.reScale(energy.flatten(),0.02).reshape((1,-1))
    #energyIndex=energy.flatten().argmin(axis=-1);
    glimpsed=(energy.T*source).sum(axis=0)
    #glimpsed=source[energyIndex];
    return energy.flatten(),glimpsed;

def link(self,attended,state,source):
    step_function=self.step;
    attended_=T.tanh(T.dot(attended,self.W_A_X))+self.b_A_X;
    #attended_=attended;
    [energy,glimpsed],_=theano.scan(fn=step_function,
                        sequences=[attended_],
                           outputs_info=None,
                        non_sequences=[attended_,source]);
    self.energy=energy;
    
    #combine 
    #combine=T.concatenate([glimpsed,attended],axis=-1);
    combine=T.concatenate([glimpsed,source],axis=-1);
    combined=T.tanh(T.dot(combine,self.W_A_combine))+self.b_A_combine;
    #no source
    #combined=T.tanh(T.dot(glimpsed,self.W_A_combine))+self.b_A_combine;
    return combined;

在model.py中，调用了此link函数，传入的参数是（final_output,final_c,final_output），此link函数中有个scan函数，传入到step_funtion中的是attended_，attended_，source，传入了两个一样的attended_，那么_energy=self.scoreFun(attended,state,self.W_A)得到的不是零吗？传入的attended和state相等的话计算距离公式不是为零吗。小菜鸟提个问，希望能得到解释，万分感谢！

I have installed the following package:

Package Version

absl-py 0.2.2
astor 0.7.1
backports.weakref 1.0.post1
bleach 1.5.0
enum34 1.1.6
funcsigs 1.0.2
futures 3.2.0
gast 0.2.0
grpcio 1.13.0
h5py 2.8.0
html5lib 0.9999999
Keras 2.2.0
Keras-Applications 1.0.2
Keras-Preprocessing 1.0.1
Markdown 2.6.11
mock 2.0.0
numpy 1.14.5
pbr 4.1.0
pip 10.0.1
protobuf 3.6.0
PyYAML 3.13
scipy 1.1.0
setuptools 39.1.0
six 1.11.0
tensorboard 1.9.0
tensorflow-gpu 1.3.0
tensorflow-tensorboard 0.1.8
termcolor 1.1.0
Theano 0.9.0
Werkzeug 0.14.1
wheel 0.31.1

After running the command line in the readme:

python train.py --train trainfile --dev devfile --test testfile --pre_emb word_embedding.model

I am getting this error:

Using TensorFlow backend.
Traceback (most recent call last):
  File "train.py", line 12, in <module>
    from utils import create_input
  File "/home/fdaha/virt_env_att/Att-ChemdNER-master/src/utils.py", line 187, in <module>
    import initializations;
  File "/home/fdaha/virt_env_att/Att-ChemdNER-master/src/initializations.py", line 3, in <module>
    import  backend as K
  File "/home/fdaha/virt_env_att/Att-ChemdNER-master/src/backend/__init__.py", line 67, in <module>
    from .tensorflow_backend import *
ImportError: No module named tensorflow_backend