Comments (32)
EDIT: As of 2cc8795, this is the default method. There's no need to download the box explicitly. vagrant up --no-parallel
will automatically perform all four steps.
(In the unlikely case one would like to use the legacy method it could be done by doing mv Vagrantfile.dockerhost.src Vagrantfile.dockerhost
.)
Back to this for a second. I've set-up an alternative way to run the examples of the book. It must be easier/faster but it might not always have the latest and greatest versions of software and it might stop working in the future (again due to software versioning)... or maybe not. So consider this version as somewhat experimental.
Step 1. Download the image (2.8GB) from http://scrapybook.com/scrapybook.box e.g. wget http://scrapybook.com/scrapybook.box
Step 2. Install the Vagrant box: vagrant box add lookfwd/scrapybook scrapybook.box
Step 3. Start Vagrant with vagrant up --no-parallel
The huge benefit is that you download a single file and you download it from the massively scalable Amazon S3. As soon as this is done (~20 min) you don't need anything else.
from scrapybook.
@lookfwd Awesome,
said, you are the best?Well I can write it again. After your work, I will keep updated!
Tons of Thanks!
from scrapybook.
How fast is your internet connection? It took about 10 minutes on my 100mbit/s connection.
This is described in detail on "System setup and operations FAQ" chapter.
from scrapybook.
Thanks a lot @aiscy, that's my experience as well... most of the time :) @gfd6th I don't know if it's dockerhub's, vagrant's or ubuntu's hosting but in some places, it might be quite some pain before you have a successful vagrant up
. That's why I've written this very detailed "System setup and operations FAQ" chapter but I would also like to try to get a vagrant box hosted in the (much more reliable) S3. Unfortunately can't do this before Monday, but I'll get back with an update soon.
from scrapybook.
@lookfwd @scalingexcellence
Thanks for all your replies, I have installed the docker after thousand tries, fortunately, I installed it successfully.
from scrapybook.
I have trouble with vagrant up, it said:
The following ssh command responded with a non-zero exit status
apt-get update -qq -y
Stdout from the command:
Stderr from the command:
stdin: is not a tty
W:failed to fetch http:// archive.ubuntu.com/ubuntu/dists/trusty-updates/main/source/Sources Hash
sum mismatch
...
...
W:failed to fetch http:// archive.ubuntu.com/ubuntu/dists/trusty-updates/main/i18n/Translation-en Hash sum mismatch
from scrapybook.
@MiaLiang , do you use the process described above using the http://scrapybook.com/scrapybook.box
image or the default described on the book?
from scrapybook.
Hey Lookfwd(or Dimitrios),
first really great work with your book.
I am learning scraping technologies(especially with Python), for a personal project.
I do have W7, and not the possibility to use Linux in the short(used it few years ago), unfortunately my CPU does not support 64bit virtualitation(I used the software you suggest,and even download anyway the box at this page, but of course when I do run the enviroment it say to me that the CPU does not allow that). A solution would be use the cloud you propose, but I did not understood how to access it, or if needed something extra, please do not hesistate to tell me(I am not experienced with S3)
Thanks
Carmine
from scrapybook.
Hi,
I double check my CPU specs from Intel and unfortunately it does not meant to allow 64-bit virtualitation. Could you please refer me a way to set-up myself a similar system env but at 32 -bit?That I can do on my hardware. AS it is a study on scrapy, do not really matter about performance of all the simulated systems.
Thanks
Carmine
from scrapybook.
Hello @TCarmine. I see that you are a person passionate with computers, able to check the CPU specs and find all these system deficits. Don't you think that such a person deserves a better computer? Setting this book aside, how many more programs you won't be able to run and how many hours will you have to spend working around your computer's deficits?
from scrapybook.
@TCarmine Hello, I think it's best you just rent a remote server from a webhosting company where you can for a few dollars a month have your 64bit shell. Just google a few of them and check them out. If you can't figure out how to login to a ssh box you hire from a company then I really think scrapy isn't for you.
from scrapybook.
@lookfwd ,
you are right. At the moment was not in the though of change machine. I do not know if I am passionate about computer, for sure sort out stuff(pay more off be a dev that a medical engineer, that is what I studied few years ago. For sure computer have more "standard behaviour" then humans body. By the way I complitely understand what you mean.
Thank a lot for your answer.
from scrapybook.
no that's a solution I am considering. I do not know if scarping is for me, that's what I am considering, and if fit my "ideas". For sure anythink can be learned. The main variable is always time, and then money, but the first is the more valuable.
Thank for your suggestion.
from scrapybook.
I will show you a way to get started with your computer. You can easily start a 32-bit Ubuntu VM and run examples and everything will go fine for the first few chapters. I conciously chose to exclude this flow from the supported ones in the book though because it's a bit of pain and limited and for me what a book must provide is a system that is comfortable and easy... Something that can be used as a starting point for further exploration.
Since you're in Germany, one of my favourite hosting companies is hetzner.de . They provide very affordable small VMs, they have great bandwidth and they are very reliable. Have been hosting some stuff with them for about a decade now. Amazon and Rackspace are obviously highly recommended too.
I believe that opening a terminal might be easy, but installing software etc. might be non-trivial for a Linux beginner. So some extra guidance will be needed.
So, stay tuned and I will post something in a few hours.
from scrapybook.
@lookfwd,
that sounds awesome, I wasn't expecting so much, you are the best, even if you will have no time to do that, knowing it is feasible, will give me another option for testing.
Thanks a lots
Carmine
from scrapybook.
@TCarmine - yes, it's great! I did some work on this yesterday. If you checkout the updated git, there's a Vagrantfile.32
file. If you replace the default Vagrantfile
with it (mv Vagrantfile.32 Vagrantfile
) and then vagrant up
you will have a limited VM that is able to support all but the last chapter.
I guess you have limited memory as well so Vagrant picks a small size to begin with (as well as number of cores). You will likely want to go to VM's settings to adjust them to as much as you have available on your system.
from scrapybook.
Hey @TCarmine, did it work? :)
from scrapybook.
Hi,
Thanks for the book, I'm sure it will be good, but I'm having a bit of trouble getting started.
I haven't had any experience with vagrant before. I've cloned the code, but when I cd into scrapybook and run vagrant up --no-parallel I get:
Bringing machine 'web' up with 'docker' provider...
Bringing machine 'spark' up with 'docker' provider...
Bringing machine 'es' up with 'docker' provider...
Bringing machine 'redis' up with 'docker' provider...
Bringing machine 'mysql' up with 'docker' provider...
Bringing machine 'scrapyd1' up with 'docker' provider...
Bringing machine 'scrapyd2' up with 'docker' provider...
Bringing machine 'scrapyd3' up with 'docker' provider...
Bringing machine 'dev' up with 'docker' provider...
==> web: Creating the container...
web: Name: web
web: Image: scrapybook/web
web: Port: 9312:9312
This just hangs with no progress. My internet connection isn't the greatest, so maybe that's it. As I mentioned I have no experience with this sort of thing, but I imagine that at this point the process is downloading images to create local copies on my machine? Is this correct, or am I off the mark? If so is there a way to show progress or some output to see what's going on? I've tried the --provision flag but getting no joy there. I'm using an ubuntu 64-bit machine as the host.
Thanks in advance for your help, looking forward to working through the book.
from scrapybook.
Hello @sensorseverywhere, Ubuntu 64 - nice! So what happens there, is that vagrant cleverly uses local docker installation to run quickly. Indeed it will try to download 2 GBs of data as shown at Appendix A and this takes some time. If the internet connection is shaky, you might have to restart it a few times.
That said, a much easier and stable way to go is to follow the "Some Reference Material" flow from here. This will bypass the local docker installation and will use virtualbox (you will have to have virtualbox installed on linux but that's an easy process). Essentially all you have to do is those two steps:
export SCRAPYBOOK_FORCE_HOST_VM=TRUE
vagrant up --no-parallel
Alternatively, if you want to go with docker. It will work but you may have to restart a few times due to shaky connection. To restart - docker has two components: a) the client/vagrant and b) the daemon. You stop the client easily with Ctrl+C
but in order to restart the daemon, you have to do sudo restart docker
. This will shut it down and restart it. It's important to do this, because it's the daemon that does the downloading and even if you restart vagrant up --no-parallel
several times, the daemon might be still stuck trying to download partially loaded files from Docker Hub.
P.S. You can get vagrant debug info like this: VAGRANT_LOG=info vagrant up --no-parallel
. It might be too verbose or just ok. Overall I hope you won't need it.
from scrapybook.
Hey lookfwd,
Thanks for that, I got it running with on the VM with the info in the P.S. section above. It was good for me because I could see what was going on with regard to the download. Now I'll be able to continue onward and upward, great stuff! Thanks again.
from scrapybook.
Fantastic, have fun!! :)
from scrapybook.
Excuse me, but I have got a problem in running the command "vagrant up --no-parallel".
The platform it was running in is Windows 10(64bit)
The error information is like the following:
A Docker command executed by Vagrant didn't complete successfully!
The command run along with the output from the command is shown
below.
Command: ["docker", "ps", "-a", "-q", "--no-trunc", {:notify=>[:stdout, :stderr]}]
Stderr: An error occurred trying to connect: Get http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.24/containers/json?all=1: open //./pipe/docker_engine: The system cannot find the file specified.
Hope to get your help @lookfwd
from scrapybook.
@ctenix The problem was that recent version of vagrant recognizes the existence of docker (docker engine) on Windows and it tries to use that instead of VirtualBox. i.e. it tries to do exactly the same it would do on a linux host if it found docker.
Now the real issue is that in order to start docker in windows you have to run 1-2 commands. But why do that more complex flow? All I did is to set the SCRAPYBOOK_FORCE_HOST_VM
environment variable to bypass "docker detection" with SET SCRAPYBOOK_FORCE_HOST_VM=1
and then made the change permanent (see here). Thanks a lot for reporting this.
For everybody, if you also have this problem, please let me know with a mention etc. in order to change the default behaviour of SCRAPYBOOK_FORCE_HOST_VM
or something.
from scrapybook.
vagrant up --no-parallel does not work on Windows 10. If fact, trying to get Vagrant and VirtualBox to work has been nothing but a real pain.
from scrapybook.
Yes, it seems to not work. In fact, though, I'm not willing to support anything other than this flow up to 2025 or whatever, when the book is still going to be in print and those Scrapy versions won't even be available on the net apart from some archive or something. So no matter how much (temporary) pain it might be, it's the only reasonable way for me, and any author - I believe, to have only the "vagrant up --no-parallel
doesn't work" problem and not the thousands of "pip install scrapy doesn't find that dependency on that version of Ubuntu" or "Scrapy doesn't install at all on Win 10" or "I I can't install OpenSSL Win binaries because it's illegal in my country" problems. I hope this is clear to anyone joining this thread now and in the years to come. I trust Oracle (VirtualBox) and the quite large Vagrant community to provide a good and stable experience most of the time. When at any point something breaks, I will be trying to fix it to keep book's code working. That's the big plan!
from scrapybook.
Ok guys & girls, with commit af8040f those latest Win problems must have been fixed. Please let me know if something doesn't work again.
from scrapybook.
I created this video that shows step-by-step how to install and run a few examples from the book.
https://www.youtube.com/watch?v=r84-dsIRFI8
from scrapybook.
Hello lookfwd
Can you help me this error
==> spark: Creating the container...
spark: Name: spark
spark: Image: scrapybook/spark
spark: Volume: /var/lib/docker/docker_d0704418c2a02eab9405ec12acd5fbe6:/root/book
spark: Port: 68:68
spark: Port: 60000:60000
spark: Port: 60001:60001
spark: Port: 60002:60002
spark: Port: 60003:60003
spark: Port: 60004:60004
spark: Port: 60005:60005
spark: Port: 60006:60006
spark: Port: 60007:60007
spark: Port: 60008:60008
spark: Port: 60009:60009
A Docker command executed by Vagrant didn't complete successfully!
The command run along with the output from the command is shown
below.
Command: "docker" "run" "--name" "spark" "-d" "-p" "68:68" "-p" "60000:60000" "-p" "60001:60001" "-p" "60002:60002" "-p" "60003:60003" "-p" "60004:60004" "-p" "60005:60005" "-p" "60006:60006" "-p" "60007:60007" "-p" "60008:60008" "-p" "60009:60009" "-v" "/var/lib/docker/docker_d0704418c2a02eab9405ec12acd5fbe6:/root/book" "-h" "spark" "scrapybook/spark"
Stderr: docker: Error response from daemon: Conflict. The name "/spark" is already in use by container 9d8204b8afe84939e03400295688203b3d99a8f9e76c674e63fe8584e486ed64. You have to remove (or rename) that container to be able to reuse that name..
See 'docker run --help'.
Stdout:
I couldn't find container above
I used : docker container ls -a --filter status=exited --filter status=created
but nothing here.
Thanks you very much
from scrapybook.
I stop docker and run again vagrant up --no-parallel. OK run, but I can't run ssh , I think the container has been created before
But It is the seem wrong way.
Tell me the trust way.
Some trouble here:
C:\Users\Admin\scrapybook>vagrant ssh
The container hasn't been created yet.
C:\Users\Admin\scrapybook>vagrant reload
==> web: Docker host is required. One will be created if necessary...
web: Vagrant will now create or start a local VM to act as the Docker
web: host. You'll see the output of the vagrant up
for this VM below.
web:
docker-provider: Checking if box 'lookfwd/scrapybook' version '1.0.0' is up to date...
The Docker provider was able to bring up the host VM successfully
but the host VM is still reporting that SSH is unavailable. This
sometimes happens with certain providers due to bugs in the
underlying hypervisor, and can be fixed with a vagrant reload
.
The ID for the host VM is shown below for convenience.
If this does not fix it, please verify that the host VM provider
is functional and properly configured.
Host VM ID: fa9284ce-ca40-40ca-8467-2459deba2d3d
C:\Users\Admin\scrapybook>ssh
usage: ssh [-46AaCfGgKkMNnqsTtVvXxYy] [-B bind_interface]
[-b bind_address] [-c cipher_spec] [-D [bind_address:]port]
[-E log_file] [-e escape_char] [-F configfile] [-I pkcs11]
[-i identity_file] [-J [user@]host[:port]] [-L address]
[-l login_name] [-m mac_spec] [-O ctl_cmd] [-o option] [-p port]
[-Q query_option] [-R address] [-S ctl_path] [-W host:port]
[-w local_tun[:remote_tun]] destination [command]
from scrapybook.
Oh sorry guy, I fixed all issues, but one left
when I went vagrant ssh
I don't understand should I need what is the password here?
Pass of docker or Pass of admin All thing wrong here
Can you explain it for me?
C:\Users\Admin\scrapybook>vagrant ssh
==> dev: SSH will be proxied through the Docker virtual machine since we're
==> dev: not running Docker natively. This is just a notice, and not an error.
[email protected]'s password:
from scrapybook.
Ok guy, I have resolved all issues above, but still one left about password
I used both admin/docker pass but what is the wrong matter here
Let's me know what's the password here?
C:\Users\Admin\scrapybook>vagrant ssh
==> dev: SSH will be proxied through the Docker virtual machine since we're
==> dev: not running Docker natively. This is just a notice, and not an error.
[email protected]'s password:
from scrapybook.
heyza , myself question is myself answer
default pass is (vagrant) exactly.
Thanks your post
I saw many nice thing here.
from scrapybook.
Related Issues (20)
- The index in the xpath doesn't work HOT 2
- vagrant up error HOT 1
- can't access http://scrapybook.s3.amazonaws.com/properties/ 403 forbidden HOT 1
- there is an Syntax Error on page 16
- is it because of socks5?
- seems that I have the same problem: HOT 1
- install panda
- OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to d1kby793vfk4bq.cloudfront.net:443 HOT 1
- Virtual machine has terminated unexpectedly during startup with exit code 1 (0x1) HOT 1
- Can't deploy 'properties' spider to scrapinghub.com from the docker container (chapter 6)
- Vagrant up --no-parallel You are trying to forward a host IP that does not exist. HOT 1
- package 'openssh-server' is not installed
- The problem of setting up the environment HOT 1
- Cloning into 'algo-cs503'... fatal: unable to access 'https://github.com/saqibutm/algo-cs503.git/': error setting certificate verify locations: CAfile: D:/4th semster/ds/Git/mingw64/ssl/certs/ca-bundle.crt CApath: none this is the issue can plzz solve the issue
- Vagrant Setup - Resolving port conflicts on Mac HOT 1
- !!
- can't visit http://web:9312/ HOT 1
- how to connect local github with github id
- VAGRANT UP ERROR 2022
- vagrant up --no-parallel command not working
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapybook.