Coder Social home page Coder Social logo

Comments (3)

swethmandava avatar swethmandava commented on May 10, 2024

bookcorpus/download_files.py was cloned from this repository https://github.com/soskek/bookcorpus

from deeplearningexamples.

David-Levinthal avatar David-Levinthal commented on May 10, 2024

yes I found that and have more firewall /http issues yesterday
tried again just now..and perhaps this now works ittermittently
but I get messages like:
Failed to open https://www.smashwords.com/books/download/490185/8/latest/0/0/existence.epub
HTTPError: HTTP Error 503: Service Temporarily Unavailable
Succeeded in opening https://www.smashwords.com/books/download/490185/8/latest/0/0/existence.epub

trying just the bookcorpus download distributed here results in:
~/DeepLearningExamples/TensorFlow/LanguageModeling/BERT$ sudo bash scripts/data_download4.sh

================
== TensorFlow ==

NVIDIA Release 19.03 (build 5809531)
TensorFlow Version 1.13.1

Container image Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
Copyright 2017-2018 The TensorFlow Authors. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

ERROR: Detected MOFED driver 3.0-1, but this container has version 4.4-1.0.0.
Unable to automatically upgrade this container.
Use of RDMA for multi-node communication will be unreliable.

NOTE: MOFED driver was detected, but nv_peer_mem driver was not detected.
Multi-node communication performance may be reduced.

0 files had already been saved in /workspace/bert/data/bookcorpus/download.
Failed to open https://www.smashwords.com/books/download/246580/6/latest/0/0/silence.txt |
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Failed to open https://www.smashwords.com/books/download/246580/6/latest/0/0/silence.txt
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Gave up to open https://www.smashwords.com/books/download/246580/6/latest/0/0/silence.txt
local variable 'response' referenced before assignment
Failed to open https://www.smashwords.com/books/download/88690/6/latest/0/0/how-to-be-free.txt
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Failed to open https://www.smashwords.com/books/download/88690/6/latest/0/0/how-to-be-free.txt
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Gave up to open https://www.smashwords.com/books/download/88690/6/latest/0/0/how-to-be-free.txt
local variable 'response' referenced before assignment

from deeplearningexamples.

nvcforster avatar nvcforster commented on May 10, 2024

Hi David,

The 503 error is likely due to a server overload or maintenance and is unfortunately outside of our control. In my experience a retry a couple hours later seems to work. On the up side, the downloader script for BookCorpus is smart enough to skip already downloaded items, so multiple attempts to get all of the books is easier.

Docker can be configured to use HTTP_PROXY and HTTPS_PROXY environment variables. These can be passed in manually by modifying the script here by adding '-e HTTP_PROXY=your.httpproxyserver.com:optionalport' to the docker run command. The same step can be repeated for https.

If you prefer to do this step outside of the container, copying and modifying this line of code is possible after cloning the BooksCorpus downloader repo on your host machine. The resulting download directory can be mounted on the docker run command referenced above. Hopefully this helps. Please let us know if you continue to experience problems.

from deeplearningexamples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.