Comments (3)
bookcorpus/download_files.py was cloned from this repository https://github.com/soskek/bookcorpus
from deeplearningexamples.
yes I found that and have more firewall /http issues yesterday
tried again just now..and perhaps this now works ittermittently
but I get messages like:
Failed to open https://www.smashwords.com/books/download/490185/8/latest/0/0/existence.epub
HTTPError: HTTP Error 503: Service Temporarily Unavailable
Succeeded in opening https://www.smashwords.com/books/download/490185/8/latest/0/0/existence.epub
trying just the bookcorpus download distributed here results in:
~/DeepLearningExamples/TensorFlow/LanguageModeling/BERT$ sudo bash scripts/data_download4.sh
================
== TensorFlow ==
NVIDIA Release 19.03 (build 5809531)
TensorFlow Version 1.13.1
Container image Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
Copyright 2017-2018 The TensorFlow Authors. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
ERROR: Detected MOFED driver 3.0-1, but this container has version 4.4-1.0.0.
Unable to automatically upgrade this container.
Use of RDMA for multi-node communication will be unreliable.
NOTE: MOFED driver was detected, but nv_peer_mem driver was not detected.
Multi-node communication performance may be reduced.
0 files had already been saved in /workspace/bert/data/bookcorpus/download.
Failed to open https://www.smashwords.com/books/download/246580/6/latest/0/0/silence.txt |
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Failed to open https://www.smashwords.com/books/download/246580/6/latest/0/0/silence.txt
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Gave up to open https://www.smashwords.com/books/download/246580/6/latest/0/0/silence.txt
local variable 'response' referenced before assignment
Failed to open https://www.smashwords.com/books/download/88690/6/latest/0/0/how-to-be-free.txt
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Failed to open https://www.smashwords.com/books/download/88690/6/latest/0/0/how-to-be-free.txt
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Gave up to open https://www.smashwords.com/books/download/88690/6/latest/0/0/how-to-be-free.txt
local variable 'response' referenced before assignment
from deeplearningexamples.
Hi David,
The 503 error is likely due to a server overload or maintenance and is unfortunately outside of our control. In my experience a retry a couple hours later seems to work. On the up side, the downloader script for BookCorpus is smart enough to skip already downloaded items, so multiple attempts to get all of the books is easier.
Docker can be configured to use HTTP_PROXY and HTTPS_PROXY environment variables. These can be passed in manually by modifying the script here by adding '-e HTTP_PROXY=your.httpproxyserver.com:optionalport' to the docker run command. The same step can be repeated for https.
If you prefer to do this step outside of the container, copying and modifying this line of code is possible after cloning the BooksCorpus downloader repo on your host machine. The resulting download directory can be mounted on the docker run command referenced above. Hopefully this helps. Please let us know if you continue to experience problems.
from deeplearningexamples.
Related Issues (20)
- [Model/Framework] What is the problem?
- [Model/Framework] What is the problem?
- NVIDIA
- [Model/Framework] What is the problem?
- [Model/Framework or something else] Feature requested
- [Model/Framework or something else] Feature requested
- [FastPitch] Why do you hierarchically predict the variance features (pitch and energy)? HOT 2
- [BERT/PyTorch] How can we use
- [BERT/PyTorch] How can we use create_datasets_from_start.sh for BERT pretraining HOT 1
- Seeking Help with Tacotron 2 Training for Telugu Language
- [Model/Framework or something else] Feature requested
- [ResNet-50/pytorch] FP32 and AMP Mode taking same time to complete 90 Epochs HOT 2
- [Model/Framework] in the model_zoo.py the torch.hub api use wrong
- Inconsistent librosa versions PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech
- Support for Ada Lovelace Architecture
- [nnUNet] pytorch_lightning.utilities.exceptions.MisconfigurationException when training
- [nnUNET/PyTorch] Training step running into "RuntimeError: Critical error in pipeline: Error when executing CPU operator readers__Numpy, instance name: "ReaderX", encountered: CUDA allocation failed Current pipeline object is no longer valid."
- How to train ResNet50 for ImageNet1k HOT 1
- [BERT/TF2] Global batch size not matching with the description
- [DLRM/PyTorch] repository name (library/image-machine-DGX-A100) must be lowercase
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deeplearningexamples.