attardi / deepnl Goto Github PK

View Code? Open in Web Editor NEW

457.0 457.0 117.0 8.23 MB

Deep Learning for Natural Language Processing

License: GNU General Public License v3.0

Python 95.59% C++ 4.36% Makefile 0.05%

deepnl's People

Contributors

Stargazers

Watchers

Forkers

nick-magnini soujanyaporia123 prabindh matanox ajaali raysor jankim isabelleaugenstein htys2013 cerisara rkp7 strategist922 daniele-sartiano avmb garzen ahmedkamalak sld eriche2016 viveksck little1tow nxr101 madms phecy hydercps wanjinchang zhoujialinmumu fancycheung loperntu cc13ny xuanhan863 hugohn nooralahzadeh meshiguge zbxzc35 ligadous dreadlord1984 shannonyu sethips aporia3517 bariscimen decebel heshenghuan saikswaroop yogeshlc thientu ioffl ahmadnaufal jerrygaolondon tshiamo neozoik agangzz harshanimmagadda44 khadirlamrani bfsujason diptarkbose fatmas1982 loganding obinsc liwzhi idon2020 denisff zwd798 cayorodriguez fantajeon 2dpodcast sapna13 hades210 mnrmja007 abhishekkodi semanticbeeng minhpqn harshithapr mgsong billpei mohabdel2013 shravankumar147 florianberthelot xiliangsong rahulrawat11 akshayjh okcing stansilas haonanli doddaiah xiaojunzhao xrick abhtfrnd turtle23 yuechengli tthustla chenghuige reiisky rajathmc bboniao sumit-research ramananm liuning123 xychenunc hoangngth hulalazz

deepnl's Issues

multithreading

Thanks for sharing this great implementation of Collobert11 !
Your code is really nicely structured, commented and pleasant to read !
You mention in some places Asynchronous SGD, but when I run the code, it's single thread.
Did I miss something, do you have a multithread version of the code ?
Thanks again !

Unable to install deepnl

I have tried several ways to install deepnl on my windows device. However, using plain conda install doesn't work, since the package cannot be found. Then I tried pip install (pip install git+https://github.com/attardi/deepnl which returns that it cannot find command git) in the anaconda cmd window. Finally I used the same command in the git cmd window which starts off promising, as it starts collecting and cloning. But then it takes forever, uses the full processor power, but it doesn't do anything anymore. And I don't expect an 8mb package to take that long to download and install.

I am not that experienced in git and installing packages if 'conda install' or 'pip install' don't work, but I'd really like to install this deepnl package. Could anyone help me please?

compilation error for OSX Yosmite

/usr/bin/clang++ -bundle -undefined dynamic_lookup -arch i386 -arch x86_64 -g build/temp.macosx-10.6-x86_64-2.7/deepnl/hpca.o build/temp.macosx-10.6-x86_64-2.7/deepnl/HPCA.o -o build/lib.macosx-10.6-x86_64-2.7/deepnl/hpca.so -fopenmp
ld: library not found for -lgomp
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command '/usr/bin/clang++' failed with exit status 1

hello,Does not This project support python3.6?

Problem while giving Test data

Hi,
I have trained a model successfully. But when I try passing a test file as : python2.7 dl-pos.py pos.dnn < test_records.tsv
I get the following error :
Traceback (most recent call last):
File "dl-pos.py", line 338, in
main()
File "dl-pos.py", line 324, in main
print("Sent=",tagger.tag(sent))
File "deepnl/tagger.pyx", line 50, in deepnl.tagger.Tagger.tag (deepnl/tagger.cpp:3170)
File "deepnl/tagger.pyx", line 60, in deepnl.tagger.Tagger.tag (deepnl/tagger.cpp:2863)
File "deepnl/extractors.pyx", line 132, in deepnl.extractors.Converter.convert (deepnl/extractors.cpp:4799)
File "deepnl/extractors.pyx", line 263, in deepnl.extractors.Extractor.extract (deepnl/extractors.cpp:8305)
File "build/bdist.linux-x86_64/egg/deepnl/word_dictionary.py", line 209, in getitem
TypeError: unhashable type: 'list'

Example how to get word embeddings from corpus via Hellinger PCA

I would really like one. Docstrings are not very clear.

Index error while running dl-sentiwords.py code

Hi, I faced a run-time error while running sentiment specific word embedding code(dl-sentiwords.py).

I tried to run dl-sentiwords.py with some arguments. And The vocabulary and vector files are empty.

./dl-sentiwords.py --vocab VOCAB.txt --vectors Vector.txt data/train.tsv

The error appears as follows:

    Saving vocabulary in VOCAB.txt
    Creating new network...
    ... with the following parameters:

            Input layer size: 350
            Hidden layer size: 20
            Output size: 2

    Starting training
    Traceback (most recent call last):
      File "./dl-sentiwords.py", line 218, in <module>
        args.iterations, report_intervals)
      File "deepnl/sentiwords.pyx", line 301, in deepnl.sentiwords.SentimentTrainer.train (deepnl/sentiwords.cpp:6471)                                  
        File "deepnl/sentiwords.pyx", line 126, in deepnl.sentiwords.SentimentTrainer._train_pair_s  (deepnl/sentiwords.cpp:4235)
        File "deepnl/extractors.pyx", line 153, in deepnl.extractors.Converter.lookup  (deepnl/extractors.cpp:4809)
        File "deepnl/extractors.pyx", line 236, in deepnl.extractors.Extractor.__getitem__(deepnl/extractors.cpp:6880)
      IndexError: index 1209 is out of bounds for axis 0 with size 1209

If I change the number of rows in data file (data/train.tsv), the error is same with above case except last line of error.

    IndexError: index 1210 is out of bounds for axis 0 with size 1210

I think the problem is that some code of training part access the last element of a list or an array with wrong index.

Could you please explain this problem?

Very thanks

Using dev set on train

Hello, thanks for the great library!

In README noted to use dev data to train model.

Is it necessary to use dev data (eng.testa + eng.train) in train stage to reproduce 89% F1?

DeepNL installation problem in Windows 8.1

I am trying to install the deepnl library using the install command but getting the following issue. Any help would be appreciated.
`C:\Users\mauli_000\Documents\Python Scripts\deepnl-master>python setup.py instal
l
running install
running bdist_egg
running egg_info
writing deepnl.egg-info\PKG-INFO
writing top-level names to deepnl.egg-info\top_level.txt
writing dependency_links to deepnl.egg-info\dependency_links.txt
reading manifest file 'deepnl.egg-info\SOURCES.txt'
writing manifest file 'deepnl.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
running build_ext
building 'deepnl/hpca' extension
C:\Users\mauli_000\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python
\9.0\VC\Bin\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -ID:\Anaconda2\lib
\site-packages\numpy\core\include -I/usr/include/eigen3 -ID:\Anaconda2\include -
ID:\Anaconda2\PC /Tpdeepnl/hpca.cpp /Fobuild\temp.win-amd64-2.7\Release\deepnl/h
pca.obj -std=c++11
cl : Command line warning D9002 : ignoring unknown option '-std=c++11'
hpca.cpp
d:\anaconda2\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h
(12) : Warning Msg: Using deprecated NumPy API, disable it by #defining NPY_NO_D
EPRECATED_API NPY_1_7_API_VERSION
C:\Users\mauli_000\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python
\9.0\VC\Include\xlocale(342) : warning C4530: C++ exception handler used, but un
wind semantics are not enabled. Specify /EHsc
deepnl/hpca.cpp(1685) : warning C4244: '=' : conversion from 'npy_intp' to 'int'
, possible loss of data
deepnl/hpca.cpp(6349) : warning C4146: unary minus operator applied to unsigned
type, result still unsigned
deepnl/hpca.cpp(6367) : warning C4146: unary minus operator applied to unsigned
type, result still unsigned
deepnl/hpca.cpp(6385) : warning C4146: unary minus operator applied to unsigned
type, result still unsigned
deepnl/hpca.cpp(6533) : warning C4146: unary minus operator applied to unsigned
type, result still unsigned
deepnl/hpca.cpp(6551) : warning C4146: unary minus operator applied to unsigned
type, result still unsigned
deepnl/hpca.cpp(6569) : warning C4146: unary minus operator applied to unsigned
type, result still unsigned
deepnl/hpca.cpp(7035) : warning C4146: unary minus operator applied to unsigned
type, result still unsigned
deepnl/hpca.cpp(7053) : warning C4146: unary minus operator applied to unsigned
type, result still unsigned
deepnl/hpca.cpp(7071) : warning C4146: unary minus operator applied to unsigned
type, result still unsigned
C:\Users\mauli_000\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python
\9.0\VC\Bin\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -ID:\Anaconda2\lib
\site-packages\numpy\core\include -I/usr/include/eigen3 -ID:\Anaconda2\include -
ID:\Anaconda2\PC /Tpdeepnl/HPCA_impl.cpp /Fobuild\temp.win-amd64-2.7\Release\dee
pnl/HPCA_impl.obj -std=c++11
cl : Command line warning D9002 : ignoring unknown option '-std=c++11'
HPCA_impl.cpp
d:\anaconda2\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h
(12) : Warning Msg: Using deprecated NumPy API, disable it by #defining NPY_NO_D
EPRECATED_API NPY_1_7_API_VERSION
C:\Users\mauli_000\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python
\9.0\VC\Include\strings.h(5) : fatal error C1083: Cannot open include file: 'std
/config.H': No such file or directory
error: command 'C:\Users\mauli_000\AppData\Local\Programs\Common\Microsof
t\Visual C++ for Python\9.0\VC\Bin\amd64\cl.exe' failed with exit status 2

C:\Users\mauli_000\Documents\Python Scripts\deepnl-master>`

Build deepnl

Is this normal?

Please put "# distutils: language=c++" in your .pyx or .pxd file(s)
Warning: Extension name 'deepnl/words' does not match fully qualified name 'words' of 'deepnl/words.pyx'
deepnl/words.pyx: cannot find cimported module 'trainer'
deepnl/words.pyx: cannot find cimported module 'network'
deepnl/words.pyx: cannot find cimported module 'math'
deepnl/words.pyx: cannot find cimported module 'extractors'
deepnl/words.pxd: cannot find cimported module 'trainer'
deepnl/words.pxd: cannot find cimported module 'network'
Warning: Extension name 'deepnl/hpca' does not match fully qualified name 'hpca' of 'deepnl/hpca.pyx'
deepnl/hpca.pyx: cannot find cimported module 'network'
...

Train ner model error

Note: missing a closing paranthese in line 15 in toIOB.py:
print('usage:', sys.argv[0], '[-hr] < inFile '

vocab.txt & vectors.txt structure

hi
i build my model with gensim Word2Vec and i want to learn my model with NER.
is there any document to explain vocab.txt and vectors.txt structure?

i used this script:
bin/dl-ner.py ner.dnn -t train+dev
—vocab vocab.txt —vectors vectors.txt
—caps —suffix —suffixes —gazetteer eng.list
-e 40 -l 0.01 -w 5 -n 300 -v

dl-words.py

Hi, I am trying to execute the dl-words.py file but am facing the error given below.

Creating new network...
... with the following parameters:

    Input layer size: 56
    Hidden layer size: 200
    Output size: 1

Starting training
Estimating max number of pairs
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in *bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(_self.__args, _self.__kwargs)
File "deepnl/words.pyx", line 248, in deepnl.words.LmTrainer.train.train_worker (deepnl/words.cpp:5112)
File "deepnl/words.pyx", line 443, in deepnl.words.LmWorker.__init (deepnl/words.cpp:8889)
File "deepnl/words.pyx", line 118, in deepnl.words.LmTrainer.init (deepnl/words.cpp:3993)
TypeError: init() takes exactly 3 positional arguments (7 given)

DeepNL NER issue I-ORG appears with out B-ORG

Hi,

for the following sentence -

"Try One Maine, Two Maine When Williams flung the ball in to Stephenson, Cutler whistled the play dead and ruled five seconds, delighting the Knicks and their fans, shocking the Hornets and infuriating their coach, Steve Clifford."

Output:
sentence=[(u'Try', u'O'), (u'One', u'O'), (u'Maine,', u'I-PER'), (u'Two', u'I-ORG'), (u'Maine', u'I-ORG'), (u'When', u'O'), (u'Williams', u'B-PER'), (u'flung', u'O'), (u'the', u'O'), (u'ball', u'O'), (u'in', u'O'), (u'to', u'O'), (u'Stephenson,', u'B-PER'), (u'Cutler', u'I-PER'), (u'whistled', u'O'), (u'the', u'O'), (u'play', u'O'), (u'dead', u'O'), (u'and', u'O'), (u'ruled', u'O'), (u'five', u'O'), (u'seconds,', u'O'), (u'delighting', u'O'), (u'the', u'O'), (u'Knicks', u'B-ORG'), (u'and', u'O'), (u'their', u'O'), (u'fans,', u'O'), (u'shocking', u'O'), (u'the', u'O'), (u'Hornets', u'B-ORG'), (u'and', u'O'), (u'infuriating', u'O'), (u'their', u'O'), (u'coach,', u'O'), (u'Steve', u'B-PER'), (u'Clifford.', u'I-PER')]

Here Two and Maine words are tagged as I-ORG. However there is no B-ORG prior to this in tagged sequence. Is this expected ???

Training ended after 3 epochs even though 5 epochs is given as parameter

[ec2-user@ip-172-31-54-168 deepnl-master]$ time python bin/dl-ner.py ner.dnn -t ~/data/wiki_conll2.iob --vocab ~/data/vocab.txt --vectors ~/data/vectors.txt --caps --suffix --s
uffixes ~/data/suffix.lst --gazetteer ~/data/eng.list -e 5 --variant senna -l 0.0003 -w 5 -n 300 -v
Creating capitalization features...
Generated 5 feature vectors with 5 features each.
Loading suffix list...
Generated 457 feature vectors with 5 features each.
Following is the issue:

Loading gazetteers
Generated 3 feature vectors with 5 features each.
Generated 3 feature vectors with 5 features each.
Generated 3 feature vectors with 5 features each.
Generated 3 feature vectors with 5 features each.
Creating new network...
... with the following parameters:

    Input layer size: 400
    Hidden layer size: 300
    Output size: 17

Starting training with 286490 sentences
Training for up to 5 epochs
.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........
+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+........
.+.........+.........+.........+.........+.........+.........+.........+.........+.........+.......
....+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.......
3 epochs Examples: 20576589 Error: 0.193514 Accuracy: 0.953445 48600 corrections skipped
Saving trained model ...
... to ner.dnn

no file or directory

f>>> from geniatagger import GeniaTagger

tagger=GeniaTagger('geniatagger')
Traceback (most recent call last):
File "", line 1, in
File "/home/rana/Downloads/geniatagger-python-0.1/geniatagger.py", line 21, in init
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
File "/usr/local/lib/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/local/lib/python2.7/subprocess.py", line 1343, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory: ''

In Extractors.pyx: TypeError:long()

I have been trying to train dl-ner.py on a new dataset. Could you kindly help me in tracing the error log shown below:

Generated 215 feature vectors with 100 features each.
Overriding vocabulary in deepnl_parameters/vocab
Creating capitalization features...
Generated 5 feature vectors with 5 features each.
Loading suffix list...
Generated 74 feature vectors with 5 features each.
Loading prefix list...
Generated 200 feature vectors with 5 features each.
Loading gazetteers
Generated 3 feature vectors with 5 features each.
Traceback (most recent call last):
File "../bin/dl-ner.py", line 358, in
main()
File "../bin/dl-ner.py", line 325, in main
sentences.append(converter.convert(sent))
File "deepnl/extractors.pyx", line 126, in deepnl.extractors.Converter.convert (deepnl/extractors.cpp:4571)
File "deepnl/extractors.pyx", line 132, in deepnl.extractors.Converter.convert (deepnl/extractors.cpp:4489)
TypeError: long() argument must be a string or a number, not 'NoneType'

Train Emotion Specific Word Embeddings

I am trying to generate emotion specific word embeddings with a similar spproach as SSWE. The sentiment is a three class label, positive negative and neutral.
My emotion set has eight classes. Can you help me with the changes I need to make.

In reader.csv I have changes the default_polarities to a map of length 8. Are their any other changes. The code compiles but I am not sure if its producing correct word embeddings.

unable to install deepnl in windows

hi
i want install deepnl in windows 10 x64. when i excute "python setup.py install" i get follow error :
plus i have problem to install some dependencies like Eigen.
any Idea?

C:\Users\regularman>python setup.py install
Please put "# distutils: language=c++" in your .pyx or .pxd file(s)
Traceback (most recent call last):
  File "C:\Users\regularman\Desktop\New Folder (4)\deepnl-master\setup.py", line 62, in <module>
    nthreads=4),
  File "C:\Users\regularman\AppData\Local\Programs\Python\Python35-32\lib\site-packages\Cython\Build\Dependencies.py", line 818, in cythonize
    aliases=aliases)
  File "C:\Users\regularman\AppData\Local\Programs\Python\Python35-32\lib\site-packages\Cython\Build\Dependencies.py", line 704, in create_extension_list
    for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
  File "C:\Users\regularman\AppData\Local\Programs\Python\Python35-32\lib\site-packages\Cython\Build\Dependencies.py", line 108, in nonempty
    raise ValueError(error_msg)
ValueError: 'deepnl/words.pyx' doesn't match any files

deepnl ner error - SuffixExtractor has no attribute create

Hi @attardi,

I'm facing following issue while training deepnl for ner.

Traceback (most recent call last):
File "bin/dl-ner.py", line 330, in
main()
File "bin/dl-ner.py", line 258, in main
extractor = SuffixExtractor.create(args.suffix, args.suffixes)
AttributeError: type object 'deepnl.extractors.SuffixExtractor' has no attribute 'create'

ValueError: Buffer has wrong number of dimensions (expected 2, got 1)

I'm facing following error while testing:

[ec2-user@ip-172-31-54-168 deepnl-master]$ time python bin/dl-ner.py ner.dnn < ~/data/test.iob > ~/data/test.out.iob
Traceback (most recent call last):
File "bin/dl-ner.py", line 342, in
main()
File "bin/dl-ner.py", line 337, in main
ConllWriter.write(tagger.tag(sent))
File "/home/ec2-user/deepnl-master/bin/../build/lib.linux-x86_64-2.7/deepnl/ner_tagger.py", line 60, in tag
tags = self.toIOB(self.tag_sequence(sent))
File "deepnl/tagger.pyx", line 61, in deepnl.tagger.Tagger.tag_sequence (deepnl/tagger.cpp:2650)
cdef np.ndarray[INT_t,ndim=2] seq = self.converter.convert(tokens)
ValueError: Buffer has wrong number of dimensions (expected 2, got 1)

real 0m5.887s
user 0m5.552s
sys 0m0.332s

Do we have already trained ner model on senna distribution ?

dl-words usage

Hi,
I am trying to generate vectors through dl-words.py but having some issues with the vectors file. The docstring says that the vector file is either read, updated or created

    parser.add_argument('--vectors', required=True,
                        help='Embeddings file, either read and updated or created')

but the dl-words command gives an error if the vector file is not specified (obvious from required=True) and even if an empty file is specified. Can you clarify how can one generate word vectors from text file?

FloatingPointError: underflow encountered in multiply

File "bin/dl-ner.py", line 342, in
main()
File "bin/dl-ner.py", line 325, in main
args.threads)
File "deepnl/trainer.pyx", line 126, in deepnl.trainer.Trainer.train (deepnl/trainer.cpp:3737)
self._train_epoch(examples, outcomes)
File "deepnl/trainer.pyx", line 340, in deepnl.trainer.TaggerTrainer._train_epoch (deepnl/trainer.cpp:8557)
self.update(grads, self.learning_rate, sent, ada)
File "deepnl/trainer.pyx", line 404, in deepnl.trainer.TaggerTrainer.update (deepnl/trainer.cpp:9623)
self.nn.update(grads, learning_rate, ada)
File "deepnl/network.pyx", line 295, in deepnl.network.Network.update (deepnl/network.cpp:7077)
self.p.update(grads, learning_rate, ada)
File "deepnl/networkseq.pyx", line 48, in deepnl.networkseq.SeqParameters.update (deepnl/networkseq.cpp:3130)
super(SeqParameters, self).update(grads, learning_rate, ada)
File "deepnl/network.pyx", line 80, in deepnl.network.Parameters.update (deepnl/network.cpp:4271)
cpdef update(self, Gradients grads, float learning_rate,
File "deepnl/network.pyx", line 96, in deepnl.network.Parameters.update (deepnl/network.cpp:3772)
ada.addSquare(grads)
File "deepnl/networkseq.pyx", line 111, in deepnl.networkseq.SeqGradients.addSquare (deepnl/networkseq.cpp:4540)
self.transitions += grads.transitions * grads.transitions
FloatingPointError: underflow encountered in multiply

How to use trained model

Good evening sir. I trained a model using convolution network script dl-conv.py
my doubt is how to use the model as a classifier? no information given in the documentation for using it as a classifier.
how to provide the sentences or tweets for which we want the sentiment?
any help would be appreciated.

hidden_weights dot function prototype incompatible

Facing this issue with latest code.

Creating new network...
... with the following parameters:

Input layer size: 400
Hidden layer size: 300
Output size: 17

Starting training with 22138 sentences
Training for up to 40 epochs
Traceback (most recent call last):
File "bin/dl-ner.py", line 331, in
main()
File "bin/dl-ner.py", line 314, in main
args.threads)
File "deepnl/trainer.pyx", line 123, in deepnl.trainer.Trainer.train (deepnl/trainer.cpp:3526)
self._train_epoch(examples, outcomes)
File "deepnl/trainer.pyx", line 341, in deepnl.trainer.TaggerTrainer._train_epoch (deepnl/trainer.cpp:8359)
scores = self.tagger._tag_sequence(sent, True)
File "deepnl/tagger.pyx", line 72, in deepnl.tagger.Tagger._tag_sequence (deepnl/tagger.cpp:3605)
cpdef np.ndarray[FLOAT_t,ndim=2] _tag_sequence(self,
File "deepnl/tagger.pyx", line 124, in deepnl.tagger.Tagger._tag_sequence (deepnl/tagger.cpp:3439)
nn.forward(vars)
File "deepnl/network.pyx", line 210, in deepnl.network.Network.forward (deepnl/network.cpp:6406)
cpdef forward(self, Variables vars):
File "deepnl/network.pyx", line 215, in deepnl.network.Network.forward (deepnl/network.cpp:6255)
self.p.hidden_weights.dot(vars.input, vars.hidden)
TypeError: function takes exactly 1 argument (2 given)

dl-sentiwords throws error "IndexError: index 1 is out of bounds for axis 0 with size 1"

Hi all ,
When I use dl-sentiwords.py trained1.tsv --vocab words.txt --vectors vectors.txt I got this error

Saving vocabulary in words.txt
Creating new network...
... with the following parameters:
    Input layer size: 550
    Hidden layer size: 200
    Output size: 2
    Starting training

Traceback (most recent call last):
File "/usr/local/bin/dl-sentiwords.py", line 4, in
import('pkg_resources').run_script('deepnl==1.3.18', 'dl-sentiwords.py')
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 742, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 1510, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/deepnl-1.3.18-py2.7-linux-x86_64.egg/EGG-INFO/scripts/dl-sentiwords.py", line 218, in

File "deepnl/sentiwords.pyx", line 53, in itertrie
File "deepnl/sentiwords.pyx", line 126, in deepnl.sentiwords.SentimentTrainer._train_pair_s
File "deepnl/extractors.pyx", line 153, in deepnl.extractors.Converter.lookup
File "deepnl/extractors.pyx", line 236, in deepnl.extractors.Extractor.getitem
IndexError: index 1 is out of bounds for axis 0 with size 1

trained1.tsv is a file with the follwing format :

  <SID><tab><UID><tab><positive|negative|neutral|objective><tab><TWITTER_MESSAGE>

I have obtained the tsv file by transforming a huge dataset of tweets into a tsv and by making some transformation to the columns so that it suits the format mentioned above.
For further details about my code here is the https://github.com/AzizCode92/text_mining_project/blob/master/csv_tsv.py

Issue in running dl-sentiwords.py

Working on Windows 7 python 2.7, I just cloned the deepnl library and tried to run the script directly from the bin without installing anything, but it shows the error that No module named deepnl

Runtime error

While building this library (deepnl) I get a Runtime error "An attempt has been made to start a new process before the current process has finished its boostrapping phase.
This means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module:
if name== 'main:
freeze_support()'"

Train POS tagger

From the limited documentation on dl-pos.py it is not clear exactly what should be in the vocabulary file. Is it the same text as the training data but in sentence format? Or the same as the vector file? And how large is the training data set typically?

Also can the code deal with word embeddings trained by other programs or do you have to create a new one with the provided deepNL code?

How to train new SRL model ?

Hi, deepnl is cool, but i cannot find good tutorial on how to train my custom Semantic Role labeling model (language other than english).
I have read presentation and article http://docslide.us/documents/the-tsunami-of-deep-learning-over-nlp-giuseppe-attardi-dipartimento-di-informatica.html , http://www.aclweb.org/anthology/W15-1515.

What i need: i want to pass a bunch of .txt files with text data to deepnl and get result as pretrained model. Then as i ques i can pass this model to tagger = SRLTagger.load(open(filename)) and now it is ready to add semantic roles to each word in sentence.

Then i want to use semantic roles to identify facts and opinions about some objects, i.e.: "I dont like BankName because it doesn't supply customer service" - the output will be BankName - customer service. That means the problem with this bank is customer service.
Is i am on right path?

unable to import tagger

from deepnl.tagger import Tagger
gives the following error
ImportError: No module named tagger

Python3 compatibility

Hi,

There seem to be a few things breaking deepnl in Python3 during install (not build); for example:


Extracting deepnl-1.3.15-py3.4-linux-x86_64.egg to /usr/local/lib/python3.4/dist-packages
  File "/usr/local/lib/python3.4/dist-packages/deepnl-1.3.15-py3.4-linux-x86_64.egg/deepnl/corpus.py", line 73
    print '\t'.join([item.encode('utf-8') for item in token])
             ^
SyntaxError: invalid syntax

Sorry: TabError: inconsistent use of tabs and spaces in indentation (reader.py, line 181)
  File "/usr/local/lib/python3.4/dist-packages/deepnl-1.3.15-py3.4-linux-x86_64.egg/deepnl/pos_tagger.py", line 28
    print writer.write(self.tag_sequence(sent))
               ^
SyntaxError: invalid syntax

  File "/usr/local/lib/python3.4/dist-packages/deepnl-1.3.15-py3.4-linux-x86_64.egg/deepnl/utils.py", line 61
    text = re.sub(ur"(?u)(^|\W)[‘’′`']", r'\1"', text)
                                      ^
SyntaxError: invalid syntax

Problem while running dl-sentiwords.py

Hi,

I have created a word embedding file using Gensim, and am using the word2vec option while running dl-sentiwords.py. I am doing:

python dl-sentiwords.py training.tsv --vocab vocab.txt --vectors wiki2.text.vector --variant word2vec

However, it shows a zero-division error while running. The message is as follows:

Adding 2 special symbols
Saving vocabulary in vocab.txt
Creating new network...
... with the following parameters:

    Input layer size: 4400
    Hidden layer size: 200
    Output size: 2

Starting training
Traceback (most recent call last):
File "dl-sentiwords.py", line 217, in
trainer.train(converted_sentences,reader.polarities,trie,args.iterations,report_intervals)
File "deepnl/sentiwords.pyx", line 288, in deepnl.sentiwords.SentimentTrainer.train (deepnl/sentiwords.cpp:6437)
ZeroDivisionError: float division

I tried to see the corresponding line (line 288 in sentiwords.pyx). However, I am not sure which parameter is obtaining a zero value resulting in this error. Could you please point me towards the possible reasons because of which this issue might occur?

Thanks

ValueError: Buffer has wrong number of dimensions (expected 2, got 1)

On the below example, i'm getting Buffer has wrong number of dimensions error. what could be the reason ?
"He called five seconds , man , Im not going to say anything thats going to get me in trouble , said Hornets forward Marvin Williams , who was whistled for a five-second inbounds infraction with : 36.6 left in the game ."

No file or directory : ''

setup.py problem and future import problem when running dl-words.py

I believe I installed everything according to the README, except when I run

python setup.py build

I get

23 warnings generated.
c++ -bundle -undefined dynamic_lookup -arch x86_64 -arch i386 -Wl,-F. build/temp.macosx-10.9-intel-2.7/deepnl/extractors.o -o build/lib.macosx-10.9-intel-2.7/deepnl/extractors.so
running build_scripts

I'm not sure if this is normal or something is wrong. In any case, when I try to create word embeddings with dl-words.py I get the following issue:

Traceback (most recent call last):
File "dl-words.py", line 27, in
from deepnl.extractors import *
File "build/bdist.macosx-10.9-intel/egg/deepnl/extractors.py", line 7, in
File "build/bdist.macosx-10.9-intel/egg/deepnl/extractors.py", line 6, in bootstrap
File "deepnl/extractors.pyx", line 27, in init deepnl.extractors (deepnl/extractors.cpp:25639)
File "/Users/felicialiu/mscproject/mscenv/lib/python2.7/site-packages/deepnl-1.3.17-py2.7-macosx-10.9-intel.egg/deepnl/embeddings.py", line 11
SyntaxError: from __future__ imports must occur at the beginning of the file

I can only access the embeddings.py that's in my folder, but I can't access the one that is in the virtual environment and edit it. How to fix this problem?

Error using dl-sentwords.py with SENNA

When I run

bin/dl-sentiwords.py tweets.txt --vocab vocab.txt --vectors vectors.txt --vocab-size 10 --textField 0 --tagField 1 --variant senna -v

I get this response:

Generated 13 feature vectors with 50 features each.
Saving vocabulary in vocab.txt
Creating new network...
Exception TypeError: 'an integer is required' in 'deepnl.extractors.Extractor.get_padding_left' ignored
Exception TypeError: 'an integer is required' in 'deepnl.extractors.Extractor.get_padding_right' ignored
... with the following parameters:

        Input layer size: 550
        Hidden layer size: 200
        Output size: 2
        
Starting training
Hello <deepnl.extractors.ConvertGenerator object at 0x7f0ea0cbe780> 100 2
Traceback (most recent call last):
  File "bin/dl-sentiwords.py", line 218, in <module>
    args.iterations, report_intervals)
  File "deepnl/sentiwords.pyx", line 266, in deepnl.sentiwords.SentimentTrainer.train (deepnl/sentiwords.cpp:6262)
    cdef float_t all_cases = float(sum([len(sen) for sen in sentences]) * epochs * self.ngram_size)
  File "deepnl/extractors.pyx", line 67, in __iter__ (deepnl/extractors.cpp:3470)
    c =  self.converter.convert(s)
  File "deepnl/extractors.pyx", line 133, in deepnl.extractors.Converter.convert (deepnl/extractors.cpp:4590)
    return INT(zip(*[(<Extractor>e).extract(sent, field) for e, field in zip(self.extractors, self.fields)]))
TypeError: long() argument must be a string or a number, not 'NoneType'

Where vocab.txt and vectors.txt are non-existing files, and tweets.txt looks like this:

HERE today, gone tomorrow- but still here! A short note on Nokia's patent deals with @Microsoft and @Alcatel_Lucent  http://t.co/y5wFgUygFD	neutral
@joebelfiore @GabeAul @Microsoft @Lumia @satyanadella plz add L1520 in the 1st wave of windows10 phones release.plz dont hurt ur diehardfans	neutral
If I make a game as a #windows10 Universal App. Will #xboxone owners be able to download and play it in November? @majornelson @Microsoft	neutral
@tomwarren @microsoft the lumia cityman looks terrible and its blue not Cyan?! Buttons may be too small. #Lumia 730/735 has a superior look.	negative
@microsoft using Office 2013's Bing dictionary. type in "bound." This is the 3rd picture they show me. WTF?... http://t.co/VzwGIoLxco	negative
@Microsoft - congratulations on the 20th Birth Anniversary of @Windows 95.  20 years since we've come to love you (&amp; backward compatibility)	positive
http://t.co/luX5VvBrmJ   Register 4 the NACR Skype for Business event with @Microsoft for Sept 16th Chevy Chase, MD #skype4B #contactcenter	positive
@vukosi @Microsoft Please have a look at this link and see if the errors mentioned may correspond to your errors. http://t.co/9Tj4Nhkoyo	neutral
Predictive Analytics with @Microsoft #Azure #MachineLearning 2nd ed. Now available. http://t.co/frOUbXXOzU @MSAdvAnalytics	neutral
@microsoft ur company will give me my 500 pounds plus the cost of the laptop on Monday for what u did to my laptop!	negative

Using the word2vec variant runs fine. Any idea what's wrong?

Undefined reference to `hpca::cooccurrence_matrix

Hello,

Compiling with Mingw I get this error:

build\temp.win32-2.7\Release\deepnl\hpca.o:HPCA.cpp:(.text+0x8d55): undefined reference to `hpca::cooccurrence_matrix(char_, char_, unsigned int, unsigned int)'

In fact, I can´t find any definition of this function.

Zero accuracy while training the Sentiment Specific embedding

Hello!

I am training the Sentiment Specific embedding. At the end of each epoch, I have got a message like this:

23 epochs Examples: 7818897 Error: 326588146461.176880 Accuracy: 0.000000 23589 corrections skipped

The accuracy remains always zero, no matter the number of epochs.
Is it ok?

unable to load model with deepnl

I want to improve polyglot NER. i create a model with gensim Word2Vec. when i want load model with deepnl.ner_tagger i get follow error.
any Idea?

File "//word2vec/LoadModel/mydeepnl.py", line 3, in <module>
    tagger = NerTagger.load(open('sen.model'))
  File "deepnl/tagger.pyx", line 149, in deepnl.tagger.Tagger.load (deepnl/tagger.cpp:3984)
  File "deepnl/networkseq.pyx", line 433, in deepnl.networkseq.SequenceNetwork.load (deepnl/networkseq.cpp:9036)
  File "deepnl/networkseq.pyx", line 87, in deepnl.networkseq.SeqParameters.load (deepnl/networkseq.cpp:4122)
  File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 1297, in __getitem__
    return self.wv.__getitem__(words)
AttributeError: 'Word2Vec' object has no attribute 'wv'

SENNA distribution missing file

Hi,

I cannot reproduce the result listed, instead I got this
processed 46435 tokens with 5648 phrases; found: 5642 phrases; correct: 4576.
accuracy: 96.04%; precision: 81.11%; recall: 81.02%; FB1: 81.06
LOC: precision: 79.34%; recall: 88.19%; FB1: 83.53 1854
MISC: precision: 68.24%; recall: 67.95%; FB1: 68.09 699
ORG: precision: 77.88%; recall: 69.96%; FB1: 73.71 1492
PER: precision: 91.80%; recall: 90.66%; FB1: 91.23 1597

I have strictly followed your instruction, except for the senna/hash/words.txt wasn't found, so I use senna/hash/words.lst instead, is that ok?

TypeError: coercing to Unicode: need string or buffer, file found , while running dl-words.py file

I am getting the following error while generating embeddings (vectors.txt). I have given input as vocab.txt a list of words. tried with ascii and utf-8. train file text_corpus contains a list of sentences. This file also experimented with ascii and utf-8. But I am getting the following error.

]$./dl-words.py --verbose --train text_corpus.ascii --vocab vocab.txt --vectors vectors.txt -o embed.dnn
Traceback (most recent call last):
File "../../deepnl/bin/dl-words.py", line 199, in
main()
File "../../deepnl/bin/dl-words.py", line 158, in main
variant=args.variant)
File "deepnl/extractors.pyx", line 292, in deepnl.extractors.Embeddings.init (deepnl/extractors.cpp:7876)
File "/usr/lib/python2.7/genericpath.py", line 18, in exists
os.stat(path)
TypeError: coercing to Unicode: need string or buffer, file found

Compilation Error for OSX El Capitan (Duplicate Symbol, File Insensitive System)

Hi @attardi ,
I encountered the same problem as dungtn (#1)

How can i configure my system for case sensitive file names? I made a Google search, and couldn't find a suitable tutorial showing how this can be done.

And moreover, most opinions out there are strongly against making File System case sensitive.

Is there any other way of building the files while still having a case insensitive file system?

ImportError: No module named 'cPickle'

Running this line of code (python3.4 and ubuntu 14.04):

from deepnl.tagger import Tagger

an error occured

Traceback (most recent call last): File "/home/hassan/PycharmProjects/nlp/__deepnlp.py", line 1, in <module> from deepnl.tagger import Tagger File "deepnl/extractors.pxd", line 13, in init deepnl.tagger (deepnl/tagger.cpp:7442) File "deepnl/extractors.pyx", line 19, in init deepnl.extractors (deepnl/extractors.cpp:25552) ImportError: No module named 'cPickle'

and when try to install with pip3 got this error:

Could not find any downloads that satisfy the requirement cpickle

Probability or score of named entities

Hi,

Can we get the score / probability for named entity tagging ?

NER Tagger object serialization issue

Hi,

NER Tagger object is not getting serialized using pickle, jsonpickle or dill. Is this some known issue ?

ValueError: all the input arrays must have same number of dimensions

Facing following issue now:

... with the following parameters:

    Input layer size: 400
    Hidden layer size: 300
    Output size: 17

Starting training with 22138 sentences
Training for up to 40 epochs
Traceback (most recent call last):
File "bin/dl-ner.py", line 331, in
main()
File "bin/dl-ner.py", line 314, in main
args.threads)
File "deepnl/trainer.pyx", line 123, in deepnl.trainer.Trainer.train (deepnl/trainer.cpp:3526)
self._train_epoch(examples, outcomes)
File "deepnl/trainer.pyx", line 341, in deepnl.trainer.TaggerTrainer._train_epoch (deepnl/trainer.cpp:8359)
scores = self.tagger._tag_sequence(sent, True)
File "deepnl/tagger.pyx", line 72, in deepnl.tagger.Tagger._tag_sequence (deepnl/tagger.cpp:3605)
cpdef np.ndarray[FLOAT_t,ndim=2] _tag_sequence(self,
File "deepnl/tagger.pyx", line 101, in deepnl.tagger.Tagger._tag_sequence (deepnl/tagger.cpp:3213)
np.concatenate((self.pre_padding,
ValueError: all the input arrays must have same number of dimensions

'<' not supported between instances of 'dict' and 'dict'

I'm trying to install the library on windows 10 on Anaconda 3 and I get the error. I run the command: python setup.py build and I get this:

Traceback (most recent call last):
File "setup.py", line 62, in
nthreads=4),
File "C:\Users\DominikaSarkowicz\Anaconda3\envs\tensorflow\lib\site-packages\Cython\Build\Dependencies.py", line 1000, in cythonize
to_compile.sort()
TypeError: '<' not supported between instances of 'dict' and 'dict'

Error Rate and Accuracy Sentiment Specific Embeddings

I am using dl-sentiwords.py with the word2vec variant. (word2vec vector file downloaded from GloVe).
The training runs successfully but the error increases with every epoch iteration and the accuracy remains at 0.00. Any inputs on what could be done to fix the issue?

maulik@maulik-VPCEH38FN:~/deepnl/bin$ python dl-sentiwords.py training.tsv --vectors vectors.txt --variant word2vec --vocab vocab.txt --model model1 -e 30 --hidden 20
Saving vocabulary in vocab.txt
Creating new network...
... with the following parameters:

    Input layer size: 550
    Hidden layer size: 20
    Output size: 2

Starting training
Epoch: 0, pairs: 10000, sent: 52656, avg. error: 4.176
1 epochs Examples: 10041 Error: 62460.347864 Accuracy: 0.000000 6925 corrections skipped
Epoch: 1, pairs: 20000, sent: 52337, avg. error: 8.991
2 epochs Examples: 20082 Error: 207734.402282 Accuracy: 0.000000 7105 corrections skipped
Epoch: 2, pairs: 30000, sent: 52020, avg. error: 14.748
3 epochs Examples: 30123 Error: 388989.344357 Accuracy: 0.000000 7052 corrections skipped
Epoch: 3, pairs: 40000, sent: 51791, avg. error: 20.649
4 epochs Examples: 40164 Error: 573208.304635 Accuracy: 0.000000 6956 corrections skipped
Epoch: 4, pairs: 50000, sent: 51598, avg. error: 26.750
5 epochs Examples: 50205 Error: 767234.504043 Accuracy: 0.000000 6906 corrections skipped
Epoch: 5, pairs: 60000, sent: 51239, avg. error: 32.770
6 epochs Examples: 60246 Error: 951057.397979 Accuracy: 0.000000 6981 corrections skipped
Epoch: 6, pairs: 70000, sent: 51166, avg. error: 38.861
7 epochs Examples: 70287 Error: 1129657.364728 Accuracy: 0.000000 7030 corrections skipped
Epoch: 7, pairs: 80000, sent: 50936, avg. error: 45.031
8 epochs Examples: 80328 Error: 1316129.222171 Accuracy: 0.000000 6920 corrections skipped
Epoch: 8, pairs: 90000, sent: 50857, avg. error: 51.284
9 epochs Examples: 90369 Error: 1530057.560660 Accuracy: 0.000000 6999 corrections skipped
Epoch: 9, pairs: 100000, sent: 50718, avg. error: 57.600
10 epochs Examples: 100410 Error: 1705694.047717 Accuracy: 0.000000 7016 corrections skipped
Epoch: 10, pairs: 110000, sent: 50425, avg. error: 63.825
11 epochs Examples: 110451 Error: 1882818.481616 Accuracy: 0.000000 7110 corrections skipped
Epoch: 11, pairs: 120000, sent: 50216, avg. error: 69.894
12 epochs Examples: 120492 Error: 2074675.285961 Accuracy: 0.000000 7015 corrections skipped
Epoch: 12, pairs: 130000, sent: 49730, avg. error: 76.021
13 epochs Examples: 130533 Error: 2265213.329245 Accuracy: 0.000000 7052 corrections skipped
Epoch: 13, pairs: 140000, sent: 49498, avg. error: 82.320
14 epochs Examples: 140574 Error: 2463388.976950 Accuracy: 0.000000 7122 corrections skipped
Epoch: 14, pairs: 150000, sent: 49306, avg. error: 88.331
15 epochs Examples: 150615 Error: 2622099.455269 Accuracy: 0.000000 7068 corrections skipped
Epoch: 15, pairs: 160000, sent: 49051, avg. error: 94.802
16 epochs Examples: 160656 Error: 2853079.937031 Accuracy: 0.000000 7071 corrections skipped
Epoch: 16, pairs: 170000, sent: 48561, avg. error: 101.018
17 epochs Examples: 170697 Error: 2995733.476832 Accuracy: 0.000000 7057 corrections skipped
Epoch: 17, pairs: 180000, sent: 48292, avg. error: 107.027
18 epochs Examples: 180738 Error: 3206937.575615 Accuracy: 0.000000 6999 corrections skipped
Epoch: 18, pairs: 190000, sent: 48281, avg. error: 113.053
19 epochs Examples: 190779 Error: 3404130.852323 Accuracy: 0.000000 7008 corrections skipped
Epoch: 19, pairs: 200000, sent: 47593, avg. error: 119.447
20 epochs Examples: 200820 Error: 3593241.777677 Accuracy: 0.000000 7039 corrections skipped
Epoch: 20, pairs: 210000, sent: 47402, avg. error: 126.057
21 epochs Examples: 210861 Error: 3837913.673458 Accuracy: 0.000000 7039 corrections skipped
Epoch: 21, pairs: 220000, sent: 47181, avg. error: 132.405
22 epochs Examples: 220902 Error: 3961487.942201 Accuracy: 0.000000 7049 corrections skipped
Epoch: 22, pairs: 230000, sent: 47081, avg. error: 138.319
23 epochs Examples: 230943 Error: 4089176.958714 Accuracy: 0.000000 7027 corrections skipped
Epoch: 23, pairs: 240000, sent: 46938, avg. error: 144.327
24 epochs Examples: 240984 Error: 4287874.864616 Accuracy: 0.000000 7095 corrections skipped
Epoch: 24, pairs: 250000, sent: 46900, avg. error: 150.805
25 epochs Examples: 251025 Error: 4578433.858292 Accuracy: 0.000000 7048 corrections skipped
Epoch: 25, pairs: 260000, sent: 46743, avg. error: 156.881
26 epochs Examples: 261066 Error: 4756786.779839 Accuracy: 0.000000 6964 corrections skipped
Epoch: 26, pairs: 270000, sent: 46546, avg. error: 162.995
27 epochs Examples: 271107 Error: 4879088.927508 Accuracy: 0.000000 6984 corrections skipped
Epoch: 27, pairs: 280000, sent: 46313, avg. error: 169.356
28 epochs Examples: 281148 Error: 5061753.620850 Accuracy: 0.000000 7013 corrections skipped
Epoch: 28, pairs: 290000, sent: 46215, avg. error: 175.712
29 epochs Examples: 291189 Error: 5285739.418424 Accuracy: 0.000000 7078 corrections skipped
Epoch: 29, pairs: 300000, sent: 45871, avg. error: 181.846
30 epochs Examples: 301230 Error: 5465347.502520 Accuracy: 0.000000 7121 corrections skipped
Overriding vectors to vectors.txt
Saving trained model to model1