Coder Social home page Coder Social logo

Comments (32)

lookfwd avatar lookfwd commented on August 21, 2024 1

EDIT: As of 2cc8795, this is the default method. There's no need to download the box explicitly. vagrant up --no-parallel will automatically perform all four steps.

(In the unlikely case one would like to use the legacy method it could be done by doing mv Vagrantfile.dockerhost.src Vagrantfile.dockerhost.)

Back to this for a second. I've set-up an alternative way to run the examples of the book. It must be easier/faster but it might not always have the latest and greatest versions of software and it might stop working in the future (again due to software versioning)... or maybe not. So consider this version as somewhat experimental.

Step 1. Download the image (2.8GB) from http://scrapybook.com/scrapybook.box e.g. wget http://scrapybook.com/scrapybook.box
Step 2. Install the Vagrant box: vagrant box add lookfwd/scrapybook scrapybook.box
Step 3. Start Vagrant with vagrant up --no-parallel

The huge benefit is that you download a single file and you download it from the massively scalable Amazon S3. As soon as this is done (~20 min) you don't need anything else.

from scrapybook.

TCarmine avatar TCarmine commented on August 21, 2024 1

@lookfwd Awesome,

said, you are the best?Well I can write it again. After your work, I will keep updated!

Tons of Thanks!

from scrapybook.

aiscy avatar aiscy commented on August 21, 2024

How fast is your internet connection? It took about 10 minutes on my 100mbit/s connection.
This is described in detail on "System setup and operations FAQ" chapter.

from scrapybook.

lookfwd avatar lookfwd commented on August 21, 2024

Thanks a lot @aiscy, that's my experience as well... most of the time :) @gfd6th I don't know if it's dockerhub's, vagrant's or ubuntu's hosting but in some places, it might be quite some pain before you have a successful vagrant up. That's why I've written this very detailed "System setup and operations FAQ" chapter but I would also like to try to get a vagrant box hosted in the (much more reliable) S3. Unfortunately can't do this before Monday, but I'll get back with an update soon.

from scrapybook.

0xuhe avatar 0xuhe commented on August 21, 2024

@lookfwd @scalingexcellence
Thanks for all your replies, I have installed the docker after thousand tries, fortunately, I installed it successfully.

from scrapybook.

MiaLiang avatar MiaLiang commented on August 21, 2024

I have trouble with vagrant up, it said:
The following ssh command responded with a non-zero exit status
apt-get update -qq -y
Stdout from the command:

Stderr from the command:

stdin: is not a tty
W:failed to fetch http:// archive.ubuntu.com/ubuntu/dists/trusty-updates/main/source/Sources Hash
sum mismatch
...
...
W:failed to fetch http:// archive.ubuntu.com/ubuntu/dists/trusty-updates/main/i18n/Translation-en Hash sum mismatch

from scrapybook.

lookfwd avatar lookfwd commented on August 21, 2024

@MiaLiang , do you use the process described above using the http://scrapybook.com/scrapybook.box image or the default described on the book?

from scrapybook.

TCarmine avatar TCarmine commented on August 21, 2024

Hey Lookfwd(or Dimitrios),

first really great work with your book.

I am learning scraping technologies(especially with Python), for a personal project.
I do have W7, and not the possibility to use Linux in the short(used it few years ago), unfortunately my CPU does not support 64bit virtualitation(I used the software you suggest,and even download anyway the box at this page, but of course when I do run the enviroment it say to me that the CPU does not allow that). A solution would be use the cloud you propose, but I did not understood how to access it, or if needed something extra, please do not hesistate to tell me(I am not experienced with S3)

Thanks
Carmine

from scrapybook.

TCarmine avatar TCarmine commented on August 21, 2024

Hi,

I double check my CPU specs from Intel and unfortunately it does not meant to allow 64-bit virtualitation. Could you please refer me a way to set-up myself a similar system env but at 32 -bit?That I can do on my hardware. AS it is a study on scrapy, do not really matter about performance of all the simulated systems.

Thanks
Carmine

from scrapybook.

lookfwd avatar lookfwd commented on August 21, 2024

Hello @TCarmine. I see that you are a person passionate with computers, able to check the CPU specs and find all these system deficits. Don't you think that such a person deserves a better computer? Setting this book aside, how many more programs you won't be able to run and how many hours will you have to spend working around your computer's deficits?

from scrapybook.

yssoe avatar yssoe commented on August 21, 2024

@TCarmine Hello, I think it's best you just rent a remote server from a webhosting company where you can for a few dollars a month have your 64bit shell. Just google a few of them and check them out. If you can't figure out how to login to a ssh box you hire from a company then I really think scrapy isn't for you.

from scrapybook.

TCarmine avatar TCarmine commented on August 21, 2024

@lookfwd ,

you are right. At the moment was not in the though of change machine. I do not know if I am passionate about computer, for sure sort out stuff(pay more off be a dev that a medical engineer, that is what I studied few years ago. For sure computer have more "standard behaviour" then humans body. By the way I complitely understand what you mean.

Thank a lot for your answer.

from scrapybook.

TCarmine avatar TCarmine commented on August 21, 2024

@yssoe,

no that's a solution I am considering. I do not know if scarping is for me, that's what I am considering, and if fit my "ideas". For sure anythink can be learned. The main variable is always time, and then money, but the first is the more valuable.

Thank for your suggestion.

from scrapybook.

lookfwd avatar lookfwd commented on August 21, 2024

I will show you a way to get started with your computer. You can easily start a 32-bit Ubuntu VM and run examples and everything will go fine for the first few chapters. I conciously chose to exclude this flow from the supported ones in the book though because it's a bit of pain and limited and for me what a book must provide is a system that is comfortable and easy... Something that can be used as a starting point for further exploration.

Since you're in Germany, one of my favourite hosting companies is hetzner.de . They provide very affordable small VMs, they have great bandwidth and they are very reliable. Have been hosting some stuff with them for about a decade now. Amazon and Rackspace are obviously highly recommended too.

I believe that opening a terminal might be easy, but installing software etc. might be non-trivial for a Linux beginner. So some extra guidance will be needed.

So, stay tuned and I will post something in a few hours.

from scrapybook.

TCarmine avatar TCarmine commented on August 21, 2024

@lookfwd,
that sounds awesome, I wasn't expecting so much, you are the best, even if you will have no time to do that, knowing it is feasible, will give me another option for testing.

Thanks a lots
Carmine

from scrapybook.

lookfwd avatar lookfwd commented on August 21, 2024

@TCarmine - yes, it's great! I did some work on this yesterday. If you checkout the updated git, there's a Vagrantfile.32 file. If you replace the default Vagrantfile with it (mv Vagrantfile.32 Vagrantfile) and then vagrant up you will have a limited VM that is able to support all but the last chapter.

I guess you have limited memory as well so Vagrant picks a small size to begin with (as well as number of cores). You will likely want to go to VM's settings to adjust them to as much as you have available on your system.

ukrk5

from scrapybook.

lookfwd avatar lookfwd commented on August 21, 2024

Hey @TCarmine, did it work? :)

from scrapybook.

sensorseverywhere avatar sensorseverywhere commented on August 21, 2024

Hi,
Thanks for the book, I'm sure it will be good, but I'm having a bit of trouble getting started.
I haven't had any experience with vagrant before. I've cloned the code, but when I cd into scrapybook and run vagrant up --no-parallel I get:

Bringing machine 'web' up with 'docker' provider...
Bringing machine 'spark' up with 'docker' provider...
Bringing machine 'es' up with 'docker' provider...
Bringing machine 'redis' up with 'docker' provider...
Bringing machine 'mysql' up with 'docker' provider...
Bringing machine 'scrapyd1' up with 'docker' provider...
Bringing machine 'scrapyd2' up with 'docker' provider...
Bringing machine 'scrapyd3' up with 'docker' provider...
Bringing machine 'dev' up with 'docker' provider...
==> web: Creating the container...
web: Name: web
web: Image: scrapybook/web
web: Port: 9312:9312

This just hangs with no progress. My internet connection isn't the greatest, so maybe that's it. As I mentioned I have no experience with this sort of thing, but I imagine that at this point the process is downloading images to create local copies on my machine? Is this correct, or am I off the mark? If so is there a way to show progress or some output to see what's going on? I've tried the --provision flag but getting no joy there. I'm using an ubuntu 64-bit machine as the host.
Thanks in advance for your help, looking forward to working through the book.

from scrapybook.

lookfwd avatar lookfwd commented on August 21, 2024

Hello @sensorseverywhere, Ubuntu 64 - nice! So what happens there, is that vagrant cleverly uses local docker installation to run quickly. Indeed it will try to download 2 GBs of data as shown at Appendix A and this takes some time. If the internet connection is shaky, you might have to restart it a few times.

That said, a much easier and stable way to go is to follow the "Some Reference Material" flow from here. This will bypass the local docker installation and will use virtualbox (you will have to have virtualbox installed on linux but that's an easy process). Essentially all you have to do is those two steps:

export SCRAPYBOOK_FORCE_HOST_VM=TRUE
vagrant up --no-parallel

Alternatively, if you want to go with docker. It will work but you may have to restart a few times due to shaky connection. To restart - docker has two components: a) the client/vagrant and b) the daemon. You stop the client easily with Ctrl+C but in order to restart the daemon, you have to do sudo restart docker. This will shut it down and restart it. It's important to do this, because it's the daemon that does the downloading and even if you restart vagrant up --no-parallel several times, the daemon might be still stuck trying to download partially loaded files from Docker Hub.

P.S. You can get vagrant debug info like this: VAGRANT_LOG=info vagrant up --no-parallel. It might be too verbose or just ok. Overall I hope you won't need it.

from scrapybook.

sensorseverywhere avatar sensorseverywhere commented on August 21, 2024

Hey lookfwd,
Thanks for that, I got it running with on the VM with the info in the P.S. section above. It was good for me because I could see what was going on with regard to the download. Now I'll be able to continue onward and upward, great stuff! Thanks again.

from scrapybook.

lookfwd avatar lookfwd commented on August 21, 2024

Fantastic, have fun!! :)

from scrapybook.

ctenix avatar ctenix commented on August 21, 2024

Excuse me, but I have got a problem in running the command "vagrant up --no-parallel".
The platform it was running in is Windows 10(64bit)
The error information is like the following:

A Docker command executed by Vagrant didn't complete successfully!
The command run along with the output from the command is shown
below.

Command: ["docker", "ps", "-a", "-q", "--no-trunc", {:notify=>[:stdout, :stderr]}]

Stderr: An error occurred trying to connect: Get http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.24/containers/json?all=1: open //./pipe/docker_engine: The system cannot find the file specified.

Hope to get your help @lookfwd

from scrapybook.

lookfwd avatar lookfwd commented on August 21, 2024

@ctenix The problem was that recent version of vagrant recognizes the existence of docker (docker engine) on Windows and it tries to use that instead of VirtualBox. i.e. it tries to do exactly the same it would do on a linux host if it found docker.

Now the real issue is that in order to start docker in windows you have to run 1-2 commands. But why do that more complex flow? All I did is to set the SCRAPYBOOK_FORCE_HOST_VM environment variable to bypass "docker detection" with SET SCRAPYBOOK_FORCE_HOST_VM=1 and then made the change permanent (see here). Thanks a lot for reporting this.

For everybody, if you also have this problem, please let me know with a mention etc. in order to change the default behaviour of SCRAPYBOOK_FORCE_HOST_VM or something.

from scrapybook.

 avatar commented on August 21, 2024

vagrant up --no-parallel does not work on Windows 10. If fact, trying to get Vagrant and VirtualBox to work has been nothing but a real pain.

from scrapybook.

lookfwd avatar lookfwd commented on August 21, 2024

Yes, it seems to not work. In fact, though, I'm not willing to support anything other than this flow up to 2025 or whatever, when the book is still going to be in print and those Scrapy versions won't even be available on the net apart from some archive or something. So no matter how much (temporary) pain it might be, it's the only reasonable way for me, and any author - I believe, to have only the "vagrant up --no-parallel doesn't work" problem and not the thousands of "pip install scrapy doesn't find that dependency on that version of Ubuntu" or "Scrapy doesn't install at all on Win 10" or "I I can't install OpenSSL Win binaries because it's illegal in my country" problems. I hope this is clear to anyone joining this thread now and in the years to come. I trust Oracle (VirtualBox) and the quite large Vagrant community to provide a good and stable experience most of the time. When at any point something breaks, I will be trying to fix it to keep book's code working. That's the big plan!

from scrapybook.

lookfwd avatar lookfwd commented on August 21, 2024

Ok guys & girls, with commit af8040f those latest Win problems must have been fixed. Please let me know if something doesn't work again.

from scrapybook.

lookfwd avatar lookfwd commented on August 21, 2024

I created this video that shows step-by-step how to install and run a few examples from the book.

screen shot 2016-08-15 at 09 32 49

https://www.youtube.com/watch?v=r84-dsIRFI8

from scrapybook.

duydp avatar duydp commented on August 21, 2024

Hello lookfwd
Can you help me this error

==> spark: Creating the container...
spark: Name: spark
spark: Image: scrapybook/spark
spark: Volume: /var/lib/docker/docker_d0704418c2a02eab9405ec12acd5fbe6:/root/book
spark: Port: 68:68
spark: Port: 60000:60000
spark: Port: 60001:60001
spark: Port: 60002:60002
spark: Port: 60003:60003
spark: Port: 60004:60004
spark: Port: 60005:60005
spark: Port: 60006:60006
spark: Port: 60007:60007
spark: Port: 60008:60008
spark: Port: 60009:60009
A Docker command executed by Vagrant didn't complete successfully!
The command run along with the output from the command is shown
below.

Command: "docker" "run" "--name" "spark" "-d" "-p" "68:68" "-p" "60000:60000" "-p" "60001:60001" "-p" "60002:60002" "-p" "60003:60003" "-p" "60004:60004" "-p" "60005:60005" "-p" "60006:60006" "-p" "60007:60007" "-p" "60008:60008" "-p" "60009:60009" "-v" "/var/lib/docker/docker_d0704418c2a02eab9405ec12acd5fbe6:/root/book" "-h" "spark" "scrapybook/spark"

Stderr: docker: Error response from daemon: Conflict. The name "/spark" is already in use by container 9d8204b8afe84939e03400295688203b3d99a8f9e76c674e63fe8584e486ed64. You have to remove (or rename) that container to be able to reuse that name..
See 'docker run --help'.

Stdout:

I couldn't find container above
I used : docker container ls -a --filter status=exited --filter status=created
but nothing here.

Thanks you very much

from scrapybook.

duydp avatar duydp commented on August 21, 2024

I stop docker and run again vagrant up --no-parallel. OK run, but I can't run ssh , I think the container has been created before
But It is the seem wrong way.
Tell me the trust way.

Some trouble here:
C:\Users\Admin\scrapybook>vagrant ssh
The container hasn't been created yet.

C:\Users\Admin\scrapybook>vagrant reload
==> web: Docker host is required. One will be created if necessary...
web: Vagrant will now create or start a local VM to act as the Docker
web: host. You'll see the output of the vagrant up for this VM below.
web:
docker-provider: Checking if box 'lookfwd/scrapybook' version '1.0.0' is up to date...
The Docker provider was able to bring up the host VM successfully
but the host VM is still reporting that SSH is unavailable. This
sometimes happens with certain providers due to bugs in the
underlying hypervisor, and can be fixed with a vagrant reload.
The ID for the host VM is shown below for convenience.

If this does not fix it, please verify that the host VM provider
is functional and properly configured.

Host VM ID: fa9284ce-ca40-40ca-8467-2459deba2d3d

C:\Users\Admin\scrapybook>ssh
usage: ssh [-46AaCfGgKkMNnqsTtVvXxYy] [-B bind_interface]
[-b bind_address] [-c cipher_spec] [-D [bind_address:]port]
[-E log_file] [-e escape_char] [-F configfile] [-I pkcs11]
[-i identity_file] [-J [user@]host[:port]] [-L address]
[-l login_name] [-m mac_spec] [-O ctl_cmd] [-o option] [-p port]
[-Q query_option] [-R address] [-S ctl_path] [-W host:port]
[-w local_tun[:remote_tun]] destination [command]

from scrapybook.

duydp avatar duydp commented on August 21, 2024

Oh sorry guy, I fixed all issues, but one left
when I went vagrant ssh
I don't understand should I need what is the password here?
Pass of docker or Pass of admin All thing wrong here

Can you explain it for me?

C:\Users\Admin\scrapybook>vagrant ssh
==> dev: SSH will be proxied through the Docker virtual machine since we're
==> dev: not running Docker natively. This is just a notice, and not an error.
[email protected]'s password:

from scrapybook.

duydp avatar duydp commented on August 21, 2024

Ok guy, I have resolved all issues above, but still one left about password
I used both admin/docker pass but what is the wrong matter here
Let's me know what's the password here?
C:\Users\Admin\scrapybook>vagrant ssh
==> dev: SSH will be proxied through the Docker virtual machine since we're
==> dev: not running Docker natively. This is just a notice, and not an error.
[email protected]'s password:

from scrapybook.

duydp avatar duydp commented on August 21, 2024

heyza , myself question is myself answer
default pass is (vagrant) exactly.
Thanks your post
I saw many nice thing here.

from scrapybook.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.