No such file or directory : pentree_char_and_word.npz about groundhog HOT 11 CLOSED

pascanur commented on August 18, 2024

No such file or directory : pentree_char_and_word.npz

from groundhog.

Comments (11)

tomsbergmanis commented on August 18, 2024

What is pentree_char_and_word.npz?

That's the training file, you need to supply.
It is generated by a script similar to this.
https://www.dropbox.com/s/kiewfm3s9mfh4u3/generate.py?dl=0
On 10 September 2014 17:27, Marco Ippolito [email protected] wrote:

I would like to use your good GroundHog to implement a sentences
segmentation task.

In order to understanf how GroundHog library works, I tried to run
DT_RNN_Tut.py

But it says:
time python DT_RNN_Tut.py
Traceback (most recent call last):
File "DT_RNN_Tut.py", line 431, in
jobman(state, None)
File "DT_RNN_Tut.py", line 114, in jobman
train_data, valid_data, test_data = get_text_data(state)
File "DT_RNN_Tut.py", line 71,pentree_char_and_word.npz in get_text_data
can_fit=True)
File
"/home/ubuntu/ggc/prove/DRNN/GroundHog-master/groundhog/datasets/LM_dataset.py",
line 97, in init
self.load_files()
File
"/home/ubuntu/ggc/prove/DRNN/GroundHog-master/groundhog/datasets/LM_dataset.py",
line 105, in load_files
penn_data = numpy.load(self.path, mmap_mode=mmap_mode)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py",
line 370, in load
fid = open(file, "rb")
IOError: [Errno 2] No such file or directory:
'/data/lisa/data/PennTreebankCorpus/pentree_char_and_word.npz'

What is pentree_char_and_word.npz?
And how to make DT_RNN_Tut.py working?

Looking forward to receive your kind helpfull hints.
Kind regards.
Marco

—
Reply to this email directly or view it on GitHub
#1.

from groundhog.

kyunghyuncho commented on August 18, 2024

Thanks for the script!

@tomsbergmanis If it's okay with you, can we (or you) add that script to tutorials/?

from groundhog.

marcoippolito commented on August 18, 2024

thanks for the script also from me.

time python generate.py
Constructing the vocabulary ..
Traceback (most recent call last):
File "generate.py", line 198, in
main(get_parser())
File "generate.py", line 75, in main
vocab, freqs, freq_wd = construct_vocabulary(dataset, o.oov_rate, o.level)
File "generate.py", line 21, in construct_vocabulary
fd = open(filename, 'rt')
IOError: [Errno 2] No such file or directory: 'path to file/train'

from groundhog.

tomsbergmanis commented on August 18, 2024

Sure. I think it was given by you or someone else from your group.

Sent from my BlackBerry® smartphone

-----Original Message-----
From: Kyunghyun Cho [email protected]
Date: Wed, 10 Sep 2014 09:56:00
To: pascanur/[email protected]
Reply-To: pascanur/GroundHog [email protected]
Cc: [email protected]
Subject: Re: [GroundHog] No such file or directory : pentree_char_and_word.npz
(#1)

Thanks for the script!

@tomsbergmanis If it's okay with you, can we (or you) add that script to tutorials/?

Reply to this email directly or view it on GitHub:
#1 (comment)

from groundhog.

tomsbergmanis commented on August 18, 2024

For this script you also need to supply your training data - look at the code - filename variable is filled with a dummy value.
Sent from my BlackBerry® smartphone

-----Original Message-----
From: Marco Ippolito [email protected]
Date: Wed, 10 Sep 2014 09:59:46
To: pascanur/[email protected]
Reply-To: pascanur/GroundHog [email protected]
Cc: [email protected]
Subject: Re: [GroundHog] No such file or directory : pentree_char_and_word.npz
(#1)

thanks for the script also from me.

Reply to this email directly or view it on GitHub:
#1 (comment)

from groundhog.

marcoippolito commented on August 18, 2024

sorry, may be because I'm a bit tired I didn't get the whole thing.
Do I have to download the training data from here? http://mattmahoney.net/dc/textdata.html

that 's because I read at the end of generate.py:
"def get_parser():
usage = """
This script parses the wikipedia dataset from
http://mattmahoney.net/dc/text.html, and generates more numpy friendly
format of the dataset. Please use this friendly formats as temporary forms
of the dataset (i.e. delete them after you're done).
"

from groundhog.

kyunghyuncho commented on August 18, 2024

Right. The script assumes that you have three text files (train, valid and test). Given those files, this script will generate npz file that can be used by groundhog/dataset/LM_dataset.py.

from groundhog.

marcoippolito commented on August 18, 2024

time python generate.py
Constructing the vocabulary ..
.. sorting words
.. shrinking the vocabulary size
Traceback (most recent call last):
File "generate.py", line 199, in
main(get_parser())
File "generate.py", line 78, in main
oov_default = vocab[""]
KeyError: ''

real 0m15.787s
user 0m13.737s
sys 0m2.048s

looking forward to your helpfull hints.
Marco

from groundhog.

tomsbergmanis commented on August 18, 2024

comment out this line:
oov_default = vocab[""]

and un-comment these:
"""if o.oov == '-1':
oov_default = -1
else:
oov_default = len(vocab)"""

Also - try to read the code and understand it, as the you won't have that
many questions. GH code will be more difficult to comprehend.

On 10 September 2014 18:28, Marco Ippolito [email protected] wrote:

time python generate.py
Constructing the vocabulary ..
.. sorting words
.. shrinking the vocabulary size
Traceback (most recent call last):
File "generate.py", line 199, in
main(get_parser())
File "generate.py", line 78, in main
oov_default = vocab[""]
KeyError: ''

real 0m15.787s
user 0m13.737s
sys 0m2.048s

—
Reply to this email directly or view it on GitHub
#1 (comment).

from groundhog.

marcoippolito commented on August 18, 2024

Sorry for asking you to help me debugging once again.
But I came to a dead point, which prevents me to use your library (a true pity for all of us...isn't it?)

Here is the error message got:

time python generate.py
Constructing the vocabulary ..
.. sorting words
.. shrinking the vocabulary size
EOL 0
Constructing train set
o.n_chains= 1
Constructing valid set
Constructing test set
Saving data
Killed

real 0m49.369s
user 0m36.546s
sys 0m8.769s

these are the lines of generate.py which could be linked to the problem:

print 'Saving data'

numpy.savez(o.dest,
            train_words=train,
            valid_words=valid,
            test_words=test,
            oov=oov_default,
            freqs = numpy.array(freqs),
            n_words=len(vocab),
            n_chars=0,  # I ran generate.py also after commenting this line, but the saving is still killed
            vocabulary = vocab,
            freq_wd = freq_wd
           )
inv_map = {v:k for k, v in vocab.items()}

numpy.savez(o.dest+"_dict", unique_words=inv_map)
print '... Done'

A file :tmp_data.npz is produced.
When running DT_RNN_Tut.py the resulting error message is:

Traceback (most recent call last):
File "DT_RNN_Tut.py", line 431, in
jobman(state, None)
File "DT_RNN_Tut.py", line 114, in jobman
train_data, valid_data, test_data = get_text_data(state)
File "DT_RNN_Tut.py", line 71, in get_text_data
can_fit=True)
File "/home/ubuntu/ggc/prove/DRNN/GroundHog-master/groundhog/datasets/LM_dataset.py", line 102, in init
self.load_files()
File "/home/ubuntu/ggc/prove/DRNN/GroundHog-master/groundhog/datasets/LM_dataset.py", line 112, in load_files
penn_data = numpy.load("tmp_data.npz", mmap_mode=mmap_mode)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 388, in load
return NpzFile(fid, own_fid=tmp)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 192, in init
_zip = zipfile_factory(fid)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 131, in zipfile_factory
return zipfile.ZipFile(_args, *_kwargs)
File "/usr/lib/python2.7/zipfile.py", line 770, in init
self._RealGetContents()
File "/usr/lib/python2.7/zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
Exception AttributeError: "'NpzFile' object has no attribute 'zip'" in <bound method NpzFile.del of <numpy.lib.np yio.NpzFile object at 0x7fa7c0602650>> ignored

Looking forward to your kind hints.
Kind regards.
Marco

from groundhog.

kyunghyuncho commented on August 18, 2024

I have pushed generate.py to the LISA fork of GroundHog (https://github.com/lisa-groundhog/GroundHog). See tutorials directory there.

Effectively, you can generate a data file, assuming that you have plan text files, {path}/train, {path}/valid and {path}/test, by

python generate.py --dest=data_chars --level=chars --oov-rate=5 --dtype=int64 {path}
python generate.py --dest=data_words --level=words --oov-rate=5 --dtype=int64 {path}

Obviously, afterward, you need to fix state['path'] and state['dictionary'] accordingly.

from groundhog.

No such file or directory : pentree_char_and_word.npz about groundhog HOT 11 CLOSED

Comments (11)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent