Coder Social home page Coder Social logo

Comments (11)

tomsbergmanis avatar tomsbergmanis commented on August 18, 2024

What is pentree_char_and_word.npz?

That's the training file, you need to supply.
It is generated by a script similar to this.
https://www.dropbox.com/s/kiewfm3s9mfh4u3/generate.py?dl=0
On 10 September 2014 17:27, Marco Ippolito [email protected] wrote:

I would like to use your good GroundHog to implement a sentences
segmentation task.

In order to understanf how GroundHog library works, I tried to run
DT_RNN_Tut.py

But it says:
time python DT_RNN_Tut.py
Traceback (most recent call last):
File "DT_RNN_Tut.py", line 431, in
jobman(state, None)
File "DT_RNN_Tut.py", line 114, in jobman
train_data, valid_data, test_data = get_text_data(state)
File "DT_RNN_Tut.py", line 71,pentree_char_and_word.npz in get_text_data
can_fit=True)
File
"/home/ubuntu/ggc/prove/DRNN/GroundHog-master/groundhog/datasets/LM_dataset.py",
line 97, in init
self.load_files()
File
"/home/ubuntu/ggc/prove/DRNN/GroundHog-master/groundhog/datasets/LM_dataset.py",
line 105, in load_files
penn_data = numpy.load(self.path, mmap_mode=mmap_mode)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py",
line 370, in load
fid = open(file, "rb")
IOError: [Errno 2] No such file or directory:
'/data/lisa/data/PennTreebankCorpus/pentree_char_and_word.npz'

What is pentree_char_and_word.npz?
And how to make DT_RNN_Tut.py working?

Looking forward to receive your kind helpfull hints.
Kind regards.
Marco


Reply to this email directly or view it on GitHub
#1.

from groundhog.

kyunghyuncho avatar kyunghyuncho commented on August 18, 2024

Thanks for the script!

@tomsbergmanis If it's okay with you, can we (or you) add that script to tutorials/?

from groundhog.

marcoippolito avatar marcoippolito commented on August 18, 2024

thanks for the script also from me.

time python generate.py
Constructing the vocabulary ..
Traceback (most recent call last):
File "generate.py", line 198, in
main(get_parser())
File "generate.py", line 75, in main
vocab, freqs, freq_wd = construct_vocabulary(dataset, o.oov_rate, o.level)
File "generate.py", line 21, in construct_vocabulary
fd = open(filename, 'rt')
IOError: [Errno 2] No such file or directory: 'path to file/train'

from groundhog.

tomsbergmanis avatar tomsbergmanis commented on August 18, 2024

Sure. I think it was given by you or someone else from your group.

Sent from my BlackBerry® smartphone

-----Original Message-----
From: Kyunghyun Cho [email protected]
Date: Wed, 10 Sep 2014 09:56:00
To: pascanur/[email protected]
Reply-To: pascanur/GroundHog [email protected]
Cc: [email protected]
Subject: Re: [GroundHog] No such file or directory : pentree_char_and_word.npz
(#1)

Thanks for the script!

@tomsbergmanis If it's okay with you, can we (or you) add that script to tutorials/?


Reply to this email directly or view it on GitHub:
#1 (comment)

from groundhog.

tomsbergmanis avatar tomsbergmanis commented on August 18, 2024

For this script you also need to supply your training data - look at the code - filename variable is filled with a dummy value.
Sent from my BlackBerry® smartphone

-----Original Message-----
From: Marco Ippolito [email protected]
Date: Wed, 10 Sep 2014 09:59:46
To: pascanur/[email protected]
Reply-To: pascanur/GroundHog [email protected]
Cc: [email protected]
Subject: Re: [GroundHog] No such file or directory : pentree_char_and_word.npz
(#1)

thanks for the script also from me.

time python generate.py
Constructing the vocabulary ..
Traceback (most recent call last):
File "generate.py", line 198, in
main(get_parser())
File "generate.py", line 75, in main
vocab, freqs, freq_wd = construct_vocabulary(dataset, o.oov_rate, o.level)
File "generate.py", line 21, in construct_vocabulary
fd = open(filename, 'rt')
IOError: [Errno 2] No such file or directory: 'path to file/train'


Reply to this email directly or view it on GitHub:
#1 (comment)

from groundhog.

marcoippolito avatar marcoippolito commented on August 18, 2024

sorry, may be because I'm a bit tired I didn't get the whole thing.
Do I have to download the training data from here? http://mattmahoney.net/dc/textdata.html

that 's because I read at the end of generate.py:
"def get_parser():
usage = """
This script parses the wikipedia dataset from
http://mattmahoney.net/dc/text.html, and generates more numpy friendly
format of the dataset. Please use this friendly formats as temporary forms
of the dataset (i.e. delete them after you're done).
"

from groundhog.

kyunghyuncho avatar kyunghyuncho commented on August 18, 2024

Right. The script assumes that you have three text files (train, valid and test). Given those files, this script will generate npz file that can be used by groundhog/dataset/LM_dataset.py.

from groundhog.

marcoippolito avatar marcoippolito commented on August 18, 2024

time python generate.py
Constructing the vocabulary ..
.. sorting words
.. shrinking the vocabulary size
Traceback (most recent call last):
File "generate.py", line 199, in
main(get_parser())
File "generate.py", line 78, in main
oov_default = vocab[""]
KeyError: ''

real 0m15.787s
user 0m13.737s
sys 0m2.048s

looking forward to your helpfull hints.
Marco

from groundhog.

tomsbergmanis avatar tomsbergmanis commented on August 18, 2024

comment out this line:
oov_default = vocab[""]

and un-comment these:
"""if o.oov == '-1':
oov_default = -1
else:
oov_default = len(vocab)"""

Also - try to read the code and understand it, as the you won't have that
many questions. GH code will be more difficult to comprehend.

On 10 September 2014 18:28, Marco Ippolito [email protected] wrote:

time python generate.py
Constructing the vocabulary ..
.. sorting words
.. shrinking the vocabulary size
Traceback (most recent call last):
File "generate.py", line 199, in
main(get_parser())
File "generate.py", line 78, in main
oov_default = vocab[""]
KeyError: ''

real 0m15.787s
user 0m13.737s
sys 0m2.048s


Reply to this email directly or view it on GitHub
#1 (comment).

from groundhog.

marcoippolito avatar marcoippolito commented on August 18, 2024

Sorry for asking you to help me debugging once again.
But I came to a dead point, which prevents me to use your library (a true pity for all of us...isn't it?)

Here is the error message got:

time python generate.py
Constructing the vocabulary ..
.. sorting words
.. shrinking the vocabulary size
EOL 0
Constructing train set
o.n_chains= 1
Constructing valid set
Constructing test set
Saving data
Killed

real 0m49.369s
user 0m36.546s
sys 0m8.769s

these are the lines of generate.py which could be linked to the problem:

print 'Saving data'

numpy.savez(o.dest,
            train_words=train,
            valid_words=valid,
            test_words=test,
            oov=oov_default,
            freqs = numpy.array(freqs),
            n_words=len(vocab),
            n_chars=0,  # I ran generate.py also after commenting this line, but the saving is still killed
            vocabulary = vocab,
            freq_wd = freq_wd
           )
inv_map = {v:k for k, v in vocab.items()}

numpy.savez(o.dest+"_dict", unique_words=inv_map)
print '... Done'

A file :tmp_data.npz is produced.
When running DT_RNN_Tut.py the resulting error message is:

Traceback (most recent call last):
File "DT_RNN_Tut.py", line 431, in
jobman(state, None)
File "DT_RNN_Tut.py", line 114, in jobman
train_data, valid_data, test_data = get_text_data(state)
File "DT_RNN_Tut.py", line 71, in get_text_data
can_fit=True)
File "/home/ubuntu/ggc/prove/DRNN/GroundHog-master/groundhog/datasets/LM_dataset.py", line 102, in init
self.load_files()
File "/home/ubuntu/ggc/prove/DRNN/GroundHog-master/groundhog/datasets/LM_dataset.py", line 112, in load_files
penn_data = numpy.load("tmp_data.npz", mmap_mode=mmap_mode)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 388, in load
return NpzFile(fid, own_fid=tmp)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 192, in init
_zip = zipfile_factory(fid)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 131, in zipfile_factory
return zipfile.ZipFile(_args, *_kwargs)
File "/usr/lib/python2.7/zipfile.py", line 770, in init
self._RealGetContents()
File "/usr/lib/python2.7/zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
Exception AttributeError: "'NpzFile' object has no attribute 'zip'" in <bound method NpzFile.del of <numpy.lib.np yio.NpzFile object at 0x7fa7c0602650>> ignored

Looking forward to your kind hints.
Kind regards.
Marco

from groundhog.

kyunghyuncho avatar kyunghyuncho commented on August 18, 2024

I have pushed generate.py to the LISA fork of GroundHog (https://github.com/lisa-groundhog/GroundHog). See tutorials directory there.

Effectively, you can generate a data file, assuming that you have plan text files, {path}/train, {path}/valid and {path}/test, by

python generate.py --dest=data_chars --level=chars --oov-rate=5 --dtype=int64 {path}
python generate.py --dest=data_words --level=words --oov-rate=5 --dtype=int64 {path}

Obviously, afterward, you need to fix state['path'] and state['dictionary'] accordingly.

from groundhog.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.