Coder Social home page Coder Social logo

scrapybook's Introduction

Learning Scrapy Book

This book covers the long awaited Scrapy v 1.0 that empowers you to extract useful data from virtually any source with very little effort. It starts off by explaining the fundamentals of Scrapy framework, followed by a thorough description of how to extract data from any source, clean it up, shape it as per your requirement using Python and 3rd party APIs. Next you will be familiarised with the process of storing the scrapped data in databases as well as search engines and performing real time analytics on them with Spark Streaming. By the end of this book, you will perfect the art of scraping data for your applications with ease.

This book is now available on Amazon and Packt.

What you will learn

  • Understand HTML pages and write XPath to extract the data you need
  • Write Scrapy spiders with simple Python and do web crawls
  • Push your data into any database, search engine or analytics system
  • Configure your spider to download files, images and use proxies
  • Create efficient pipelines that shape data in precisely the form you want
  • Use Twisted Asynchronous API to process hundreds of items concurrently
  • Make your crawler super-fast by learning how to tune Scrapy's performance
  • Perform large scale distributed crawls with scrapyd and scrapinghub

Tutorials

  • How to Setup Software and Run Examples On A Windows Machine

image

  • Chapter 4 - Create Appery.io mobile application - Updated process

image

  • Chapter 3 & 9 on a 32-bit VM (for computers limited memory/processing power)

image

To use Docker directly without installing Vagrant

A docker-compose.yml file is included, mainly for those who already have Docker installed. For completeness, here are the links to go about installing Docker.

Once you have Docker installed and started, change to the project directory and run:

  1. docker-compose pull - To check for updated images
  2. docker-compose up - Will scroll log messages as various containers (virtual machines) start up. To stop the containers, Ctrl-C in this window, or enter docker-compose down in another shell window.

docker system prune will delete the system-wide Docker images, containers, and volumes that are not in use when you want to recover space.

See also the official website

scrapybook's People

Contributors

lookfwd avatar normangilmore avatar polar9527 avatar scalingexcellence avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scrapybook's Issues

Learning Scrapy - How to get shared folders working between host (Mac OSX) and guest OS (Linux)?

Hey all, sorry for the newbie question. I am working through Dmitrios Kouzis-Loukas' book Learning Scrapy.

In that book, he outlines how to setup a dev environment using Vagrant and Docker. I am having trouble getting the Shared Folders working so I can edit files in the dev environment using the text editor on my host machine.

I have found a few write ups, but have not had success. I think that I need to install Guest Additions on the guest OS. How do I do this on the Linux guest OS?

http://helpdeskgeek.com/virtualization/virtualbox-share-folder-host-guest/

Thank you,

Matt

the host VM is still reporting that SSH is unavailable

Sorry for opening a new issue. I ran into some addition errors in the install, but I solved those. I am getting a new error now, and I am at a totally lost with this. I did a Google search, but there literally is not any useful information about this on the internet.

In solving a seemingly unrelated issue; I did have to make some changes to the vagrantfile.dockerhost file, but I don't expect that they should have caused any errors. Especially not causing the error that I am getting now. I am including this in case it is actually relevant. The reason for the changes are that my CPU does not support virtualization technology. Virtualization technology is expected to be on by default, for virtualbox. It had to be explicitly turned off for the VM. Also my machine only has 4GB of ram. So here are those changes:

# Set the mem/cpu requirements
config.vm.provider :virtualbox do |vb|
    vb.memory = 1024
    vb.cpus = 1
            ...
    vb.customize ["modifyvm", :id, "--hwvirtex", "off"]

The terminal session showing the error:
"
~/$ sudo vagrant destroy
~/$ sudo vagrant up --no-parallel
....
docker-provider: SSH address: 127.0.0.1:2222
docker-provider: SSH username: vagrant
docker-provider: SSH auth method: private key
Timed out while waiting for the machine to boot. This means that
Vagrant was unable to communicate with the guest machine within
the configured ("config.vm.boot_timeout" value) time period.

If you look above, you should be able to see the error(s) that
Vagrant had when attempting to connect to the machine. These errors
are usually good hints as to what may be wrong.

If you're using a custom box, make sure that networking is properly
working and you're able to connect to the machine. It is a common
problem that networking isn't setup properly in these boxes.
Verify that authentication configurations are also setup properly,
as well.

If the box appears to be booting properly, you may want to increase
the timeout ("config.vm.boot_timeout") value.
~/$
~/$ sudo vagrant up --no-parallel
...
web:
docker-provider: Checking if box 'lookfwd/scrapybook' is up to date...
The Docker provider was able to bring up the host VM successfully
but the host VM is still reporting that SSH is unavailable. This
sometimes happens with certain providers due to bugs in the
underlying hypervisor, and can be fixed with a vagrant reload.
The ID for the host VM is shown below for convenience.

If this does not fix it, please verify that the host VM provider
is functional and properly configured.

Host VM ID: 581b3b42-9013-4e84-ae9f-6a8a48b67905
"

Oh, and I did attempt ~/$ vagrant reload , but it just results in the same timeout error. I am not sure how to verify that the host VM provider is functional and properly configured. Doesn't the vagrantfile handle that?

Can't access the /book files in the dev machine

Hi,

I'm currently on Windows 10 and I've setup the environment.

I run the vagrant up --no-parallel command and get this:

C:\scrapybook>vagrant up --no-parallel
Bringing machine 'web' up with 'docker' provider...
Bringing machine 'spark' up with 'docker' provider...
Bringing machine 'es' up with 'docker' provider...
Bringing machine 'redis' up with 'docker' provider...
Bringing machine 'mysql' up with 'docker' provider...
Bringing machine 'scrapyd1' up with 'docker' provider...
Bringing machine 'scrapyd2' up with 'docker' provider...
Bringing machine 'scrapyd3' up with 'docker' provider...
Bringing machine 'dev' up with 'docker' provider...
==> web: Docker host is required. One will be created if necessary...
web: Docker host VM is already ready.
==> web: Starting container...
==> web: Provisioners will not be run since container doesn't support SSH.
==> spark: Docker host is required. One will be created if necessary...
spark: Docker host VM is already ready.
==> spark: Vagrant has noticed that the synced folder definitions have changed.
==> spark: With Docker, these synced folder changes won't take effect until you
==> spark: destroy the container and recreate it.
==> spark: Starting container...
==> spark: Provisioners will not be run since container doesn't support SSH.
==> es: Docker host is required. One will be created if necessary...
es: Docker host VM is already ready.
==> es: Starting container...
==> es: Provisioners will not be run since container doesn't support SSH.
==> redis: Docker host is required. One will be created if necessary...
redis: Docker host VM is already ready.
==> redis: Starting container...
==> redis: Provisioners will not be run since container doesn't support SSH.
==> mysql: Docker host is required. One will be created if necessary...
mysql: Docker host VM is already ready.
==> mysql: Starting container...
==> mysql: Provisioners will not be run since container doesn't support SSH.
==> scrapyd1: Docker host is required. One will be created if necessary...
scrapyd1: Docker host VM is already ready.
==> scrapyd1: Starting container...
==> scrapyd1: Provisioners will not be run since container doesn't support SSH.
==> scrapyd2: Docker host is required. One will be created if necessary...
scrapyd2: Docker host VM is already ready.
==> scrapyd2: Starting container...
==> scrapyd2: Provisioners will not be run since container doesn't support SSH.
==> scrapyd3: Docker host is required. One will be created if necessary...
scrapyd3: Docker host VM is already ready.
==> scrapyd3: Starting container...
==> scrapyd3: Provisioners will not be run since container doesn't support SSH.
==> dev: Docker host is required. One will be created if necessary...
dev: Docker host VM is already ready.
==> dev: Vagrant has noticed that the synced folder definitions have changed.
==> dev: With Docker, these synced folder changes won't take effect until you
==> dev: destroy the container and recreate it.
==> dev: Starting container...
==> dev: Provisioners will not be run since container doesn't support SSH.`

Then I run the vagrant ssh command and follow the path to /books and it's empty...

root@dev:~# ls
book
root@dev:~# cd book/
root@dev:~/book# ls

Here's a pic of the scrapybook directory

image

How can I solve this?

Question: Self hosted Scrapinghub?

I really like your book, but I have a question: Are there any possibilities to self-host scraping hub? A lot of people have their own infrastructure and so it would be nice to use it. Could you recommend some sort of free and open-source scrapy-management tool and Web-UI?

gpg: could not parse keyserver URL

image

Trying to install Docker, but below command doesn't work:

$ sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80—recv-keys 58118E89F3A912897C070ADBF76221572C52609D

docker-provider: Booting VM error

After last october Windows 10 Version 1703 x64 updating my vagrant up --no-parallel has next error:
"...
docker-provider: Booting VM...
There was an error while executing VBoxManage, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.

Command: ["startvm", "cd8f2d95-2d3c-44d5-a57f-16c594a357f5", "--type", "headless"]

Stderr: VBoxManage.exe: error: The virtual machine 'docker-provider' has terminated unexpectedly during startup with exit code -1073741819 (0xc0000005)
VBoxManage.exe: error: Details: code E_FAIL (0x80004005), component MachineWrap, interface IMachine"

Could you please help me to fix that issue?

problem in example page 46 (populating an item)

Hi
Could you please help with this:
I did follow step by step the example on page 46 exactly, but I got the following report as output and not as same the book's example:

_root@dev:~/book/MasoudProject/properties# scrapy crawl basic
2018-02-04 14:40:25 [scrapy] INFO: Scrapy 1.0.3 started (bot: properties)
2018-02-04 14:40:25 [scrapy] INFO: Optional features available: ssl, http11, boto
2018-02-04 14:40:25 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'properties.spiders', 'SPIDER_MODULES': ['properties.spiders'], 'BOT_NAME': 'properties'}
2018-02-04 14:40:25 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2018-02-04 14:40:25 [boto] DEBUG: Retrieving credentials from metadata server.
2018-02-04 14:40:25 [boto] ERROR: Caught exception reading instance data
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url
r = opener.open(req, timeout=timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 101] Network is unreachable>
2018-02-04 14:40:25 [boto] ERROR: Unable to read instance data, giving up
2018-02-04 14:40:25 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2018-02-04 14:40:25 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2018-02-04 14:40:25 [scrapy] INFO: Enabled item pipelines:
2018-02-04 14:40:25 [scrapy] INFO: Spider opened
2018-02-04 14:40:25 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-02-04 14:40:25 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-02-04 14:40:25 [scrapy] DEBUG: Crawled (200) <GET http://web:9312/properties/property_000000.html> (referer: None)
2018-02-04 14:40:25 [scrapy] ERROR: Spider error processing <GET http://web:9312/properties/property_000000.html> (referer: None)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in runCallbacks
current.result = callback(current.result, args, **kw)
File "/root/book/MasoudProject/properties/properties/spiders/basic.py", line 38, in parse
item['address'] = response.xpath('//
[@itemtype="http://schema.org/''Place"][1]/text()').extract()
File "/usr/local/lib/python2.7/dist-packages/scrapy/item.py", line 63, in setitem
(self.class.name, key))
KeyError: 'PropertiesItem does not support field: address'
2018-02-04 14:40:25 [scrapy] INFO: Closing spider (finished)
2018-02-04 14:40:25 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 232,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 792,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 2, 4, 14, 40, 25, 736406),
'log_count/DEBUG': 3,
'log_count/ERROR': 3,
'log_count/INFO': 7,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'spider_exceptions/KeyError': 1,
'start_time': datetime.datetime(2018, 2, 4, 14, 40, 25, 241964)}
2018-02-04 14:40:25 [scrapy] INFO: Spider closed (finished)

Could you please guide me how make it true?
Thank you

I don't understand the spark mentioned in the ch11 chapter

I don't understand the spark mentioned in the ch11 chapter, and the following statement

FEED_URI='ftp://anonymous@spark/% (batch) s_% (name) s_% (time) s.jl'() -a batch=12, to complete the distributed scrapy crawler, is the need to spark support it. If there is a detailed description of the relevant, will be better?

Windows10 and vbox or vagrant

Hello, Read the first part of the book, but chapters 3-5 are tough without running Vagrant. I have windows 10 on a lenovo laptop, all downlloads went ok but vagrant up still not working....was the windows 10 problem ever fixed ? Is there an easier method to run the book examples on my local machine without vagrant ?....thanks -John

Installation on MAC

Hi. I have successfully installed on a remote AWS linux server but am experiencing an issue installing on my MAC.

I am getting the following error when committing the "vagrant up --no-parallel" command.

_**The box 'lookfwd/scrapybook' could not be found or
could not be accessed in the remote catalog. If this is a private
box on HashiCorp's Atlas, please verify you're logged in via
vagrant login. Also, please double-check the name. The expanded
URL and error message are shown below:

URL: ["https://atlas.hashicorp.com/lookfwd/scrapybook"]
Error: **_

I have noticed that if I copy the URL into a browser it returns a different URL - not sure if this helps.

"https://atlas.hashicorp.com/lookfwd/boxes/scrapybook"

I have tried to the use the --provider virtualbox too.........I have grep'd my Vagrant files for this URL and thought I would ask you.

Vagrant not finding Docker provider

I have a clean AWS m4.large Ubuntu 14.04.5 and have followed the Appendix A to the letter.

I am getting this error.

ubuntu@ xxxxxxxxxxxx:$ git clone https://github.com/scalingexcellence/scrapybook.git
Cloning into 'scrapybook'...
remote: Counting objects: 1181, done.
remote: Total 1181 (delta 0), reused 0 (delta 0), pack-reused 1181
Receiving objects: 100% (1181/1181), 614.42 KiB | 0 bytes/s, done.
Resolving deltas: 100% (579/579), done.
Checking connectivity... done.
ubuntu@ xxxxxxxxxxxx:
$ cd scrapybook/
ubuntu@ xxxxxxxxxxxx /scrapybook$ ls
README.md Vagrantfile Vagrantfile.32 Vagrantfile.dockerhost Vagrantfile.dockerhost.src ch03 ch04 ch05 ch06 ch07 ch08 ch09 ch10 ch11 insecure_key lint
ubuntu@xxxxxxxxxxxx:
/scrapybook$ vagrant up --no-parallel
Bringing machine 'web' up with 'docker' provider...
Bringing machine 'spark' up with 'docker' provider...
Bringing machine 'es' up with 'docker' provider...
Bringing machine 'redis' up with 'docker' provider...
Bringing machine 'mysql' up with 'docker' provider...
Bringing machine 'scrapyd1' up with 'docker' provider...
Bringing machine 'scrapyd2' up with 'docker' provider...
Bringing machine 'scrapyd3' up with 'docker' provider...
Bringing machine 'dev' up with 'docker' provider...
==> web: Docker host is required. One will be created if necessary...
No usable default provider could be found for your system.

I ran with the --provider=docker but same

also ran with --debug on and noticed this:
INFO handle_box: Skipping HandleBox because no box is set
INFO warden: Calling IN action: #<Proc:0x00000000d43140@/opt/vagrant/embedded/gems/gems/vagrant-1.8.7/lib/vagrant/action/warden.rb:94 (lambda)>
INFO warden: Calling IN action: #Vagrant::Action::Builtin::ConfigValidate:0x0000000184b6f0
INFO warden: Calling IN action: #VagrantPlugins::DockerProvider::Action::HostMachine:0x0000000184b6c8
INFO interface: output: Docker host is required. One will be created if necessary...
INFO interface: output: ==> web: Docker host is required. One will be created if necessary...

The $USER env $PATH does not have /opt set, unless you do. - not sure this is relative

Both Docker and Vagrant work from the command line.

No usable default provider could be found for your system.

Hi. I followed the instructions from the book, and from https://docs.docker.com/engine/installation/linux/ubuntulinux/ to install vagrant and docker. First I attempted this on an Ubuntu 14.04 LTS install. After spending a day on trying to figure out how to get it to work I installed a fresh OS; this time Ubuntu 16.04 LTS. I have been trying for 3 days, and I am still having problems with getting this to work. The error is always identical in all of my attempted initializations:

~/scrapybook$ sudo vagrant up --no-parallel
Bringing machine 'web' up with 'docker' provider...
Bringing machine 'spark' up with 'docker' provider...
Bringing machine 'es' up with 'docker' provider...
Bringing machine 'redis' up with 'docker' provider...
Bringing machine 'mysql' up with 'docker' provider...
Bringing machine 'scrapyd1' up with 'docker' provider...
Bringing machine 'scrapyd2' up with 'docker' provider...
Bringing machine 'scrapyd3' up with 'docker' provider...
Bringing machine 'dev' up with 'docker' provider...
web: Docker host is required. One will be created if necessary...
No usable default provider could be found for your system.

Vagrant relies on interactions with 3rd party systems, known as
"providers", to provide Vagrant with resources to run development
environments. Examples are VirtualBox, VMware, Hyper-V.

The easiest solution to this message is to install VirtualBox, which
is available for free on all major platforms.

If you believe you already have a provider available, make sure it
is properly installed and configured. You can see more details about
why a particular provider isn't working by forcing usage with
vagrant up --provider=PROVIDER, which should give you a more specific
error message for that particular provider."

Can you please help me?

vagrant up --no-parallel error

Hi,
I have installed the newest vagrant (Vagrant 1.9.5) and docker-ce on ubuntu 16.04.1.
But when I run the "vagrant up --no-parallel ", it output the error like this:
vagrant up --provider=docker
Bringing machine 'web' up with 'docker' provider...
Bringing machine 'spark' up with 'docker' provider...
Bringing machine 'es' up with 'docker' provider...
Bringing machine 'redis' up with 'docker' provider...
Bringing machine 'mysql' up with 'docker' provider...
Bringing machine 'scrapyd1' up with 'docker' provider...
Bringing machine 'scrapyd2' up with 'docker' provider...
Bringing machine 'scrapyd3' up with 'docker' provider...
Bringing machine 'dev' up with 'docker' provider...
==> web: Docker host is required. One will be created if necessary...
==> es: Docker host is required. One will be created if necessary...
==> scrapyd3: Docker host is required. One will be created if necessary...
==> redis: Docker host is required. One will be created if necessary...
==> scrapyd1: Docker host is required. One will be created if necessary...
==> spark: Docker host is required. One will be created if necessary...
==> dev: Docker host is required. One will be created if necessary...
==> scrapyd1: An error occurred. The error will be shown after all tasks complete.
==> web: An error occurred. The error will be shown after all tasks complete.
==> es: An error occurred. The error will be shown after all tasks complete.
==> scrapyd2: Docker host is required. One will be created if necessary...
==> spark: An error occurred. The error will be shown after all tasks complete.
==> mysql: Docker host is required. One will be created if necessary...
==> scrapyd3: An error occurred. The error will be shown after all tasks complete.
==> redis: An error occurred. The error will be shown after all tasks complete.
==> dev: An error occurred. The error will be shown after all tasks complete.
==> scrapyd2: An error occurred. The error will be shown after all tasks complete.
==> mysql: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.

An error occurred while executing the action on the 'web'
machine. Please handle this error then try again:

No usable default provider could be found for your system.

Vagrant relies on interactions with 3rd party systems, known as
"providers", to provide Vagrant with resources to run development
environments. Examples are VirtualBox, VMware, Hyper-V.

The easiest solution to this message is to install VirtualBox, which
is available for free on all major platforms.

If you believe you already have a provider available, make sure it
is properly installed and configured. You can see more details about
why a particular provider isn't working by forcing usage with
vagrant up --provider=PROVIDER, which should give you a more specific
error message for that particular provider.

Also it can run "docker run hello-world" well in my machine.

Hope to get help.

can't boot VMMachine with vagrant up --no-parallel

i git clone the code ,and boot the VM with vagrant up --no-parallel,than i get stack,the below is cmd message.
$ vagrant up --no-parallel
Bringing machine 'web' up with 'docker' provider...
Bringing machine 'spark' up with 'docker' provider...
Bringing machine 'es' up with 'docker' provider...
Bringing machine 'redis' up with 'docker' provider...
Bringing machine 'mysql' up with 'docker' provider...
Bringing machine 'scrapyd1' up with 'docker' provider...
Bringing machine 'scrapyd2' up with 'docker' provider...
Bringing machine 'scrapyd3' up with 'docker' provider...
Bringing machine 'dev' up with 'docker' provider...
==> web: Docker host is required. One will be created if necessary...

Your help is much appreciated.

Issue on chater 3

This is related to chapter 3, the book instructs me to run on Addess Item xpath => //[@itemtype="http://schema.org/Place"][1]/text().
However I'm getting this:
In [27]: response.xpath('//
[@itemtype="http://schema.org/Place"][1]/text()').extract()
Out[27]:
[u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ']

When I run it with out the text () I get this:
[u'\n West Hampstead, London',
u'\n Angel, London',
u'\n Tower Bridge, London',
u'\n Canary Wharf, London',
u'\n Whitechapel, London',
u'\n Chelsea, London',
u'\n Hackney, London',
u'\n Stratford, London',
u'\n Canary Wharf, London',
u'\n Chiswick, London',
u'\n Highbury, London',
u'\n Notting Hill, London',
u'\n Brixton, London',
u'\n Greenwich, London',
u'\n Canary Wharf, London',
u'\n Battersea, London',
u'\n South Kensington, London',
u'\n Camden, London',
u'\n Wimbledon, London',
u'\n West Hampstead, London',
u'\n West Hampstead, London',
u'\n Elephant And Castle, London',
u'\n Angel, London',
u'\n Heathrow, London',
u'\n Bayswater, London',
u'\n Seven Sisters, London',
u'\n Angel, London',
u'\n Angel, London',
u'\n Battersea, London',
u'\n Bethnal Green, London']
I tried paying with it and I came up with this:
In [32]: response.xpath('//*[@itemtype="http://schema.org/Place"][1]/span/text()').extract()
Out[32]:
[u'West Hampstead, London',
u'Angel, London',
u'Tower Bridge, London',
u'Canary Wharf, London',
u'Whitechapel, London',
u'Chelsea, London',
u'Hackney, London',
u'Stratford, London',
u'Canary Wharf, London',
u'Chiswick, London',
u'Highbury, London',
u'Notting Hill, London',
u'Brixton, London',
u'Greenwich, London',
u'Canary Wharf, London',
u'Battersea, London',
u'South Kensington, London',
u'Camden, London',
u'Wimbledon, London',
u'West Hampstead, London',
u'West Hampstead, London',
u'Elephant And Castle, London',
u'Angel, London',
u'Heathrow, London',
u'Bayswater, London',
u'Seven Sisters, London',
u'Angel, London',
u'Angel, London',
u'Battersea, London',
u'Bethnal Green, London']

**My questions which xpath expresion is right????? And why I'm getting an array instead of single values???

Stderr:docker Error

I am relatively new to vagrant and docker.
i get the following error when i perform the following command.
vagrant up --no-parallel

A Docker command executed by Vagrant didn't complete successfully!
The command run along with the output from the command is shown
below.

Command: "docker" "run" "--name" "spark" "-d" "-p" "21:21" "-p" "30000:30000" "-p" "30001:30001" "-p" "30002:30002" "-p" "30003:30003" "-p" "30004:30004" "-p" "30005:30005" "-p" "30006:30006" "-p" "30007:30007" "-p" "30008:30008" "-p" "30009:30009" "-v" "var\lib\docker\docker_1495457854_50822:/root/book" "-h" "spark" "scrapybook/spark"

Stderr: docker: Error response from daemon: create var\lib\docker\docker_1495457854_50822: volume name invalid: "var\\lib\\docker\\docker_1495457854_50822" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed.
See 'docker run --help'.

Provisioners will not be run since container doesn't support SSH

How could I see what's inside the box?
Reading the book and doing what it says is very difficult to not get crazy, later than run $ vagrant up --no-parallel I try to run $ vagrant ssh and it doesn't anything, no errors, nothing. The only clue that I have is when running $ vagrant up it gives a message saying the next:
$ vagrant up
Bringing machine 'web' up with 'docker' provider...
Bringing machine 'spark' up with 'docker' provider...
Bringing machine 'es' up with 'docker' provider...
Bringing machine 'redis' up with 'docker' provider...
Bringing machine 'mysql' up with 'docker' provider...
Bringing machine 'scrapyd1' up with 'docker' provider...
Bringing machine 'scrapyd2' up with 'docker' provider...
Bringing machine 'scrapyd3' up with 'docker' provider...
Bringing machine 'dev' up with 'docker' provider...
==> web: Provisioners will not be run since container doesn't support SSH.
==> mysql: Provisioners will not be run since container doesn't support SSH.
==> scrapyd2: Provisioners will not be run since container doesn't support SSH.
==> dev: Provisioners will not be run since container doesn't support SSH.
==> es: Provisioners will not be run since container doesn't support SSH.
==> spark: Provisioners will not be run since container doesn't support SSH.
==> scrapyd3: Provisioners will not be run since container doesn't support SSH.
==> scrapyd1: Provisioners will not be run since container doesn't support SSH.
==> redis: Provisioners will not be run since container doesn't support SSH.

“The host VM is still reporting that SSH is unavailable”

Hello,

When I was running "vagrant up --no-parallel", I encountered the following error message. I have no idea on how to fix it. Do you have any advice? Thanks a lot!

==> web: Docker host is required. One will be created if necessary...
web: Vagrant will now create or start a local VM to act as the Docker
web: host. You'll see the output of the vagrant up for this VM below.
web:
docker-provider: Checking if box 'lookfwd/scrapybook' is up to date...
The Docker provider was able to bring up the host VM successfully
but the host VM is still reporting that SSH is unavailable. This
sometimes happens with certain providers due to bugs in the
underlying hypervisor, and can be fixed with a vagrant reload.
The ID for the host VM is shown below for convenience.

If this does not fix it, please verify that the host VM provider
is functional and properly configured.

Host VM ID: d7219853-07ee-499e-8bf6-1fac9b3c3888

problem starting Vagrant

Hi, I have struggled with Vagrant, desperately need some help.

Here is my environment:

  1. host PC: windows 10 pro
  2. docker for windows is installed
  3. installed a Bash for Ubuntu for windows (for the Linux environment)
  4. installed Docker in the Ubuntu (docker works fine in Ubuntu, I've tested it)
  5. installed Vagrant, Git, etc.

When I start Vagrant, I first ran into error "No usable default provider could be found for your system", I corrected it by changing force_host_vm from TRUE to FALSE; then I am having another error:

Command: ["docker", "run", "--name", "spark", "-d", "-p", "21:21", "-p", "30000:30000", "-p", "30001:30001", "-p", "30002:30002", "-p", "30003:30003", "-p", "30004:30004", "-p", "30005:30005", "-p", "30006:30006", "-p", "30007:30007", "-p", "30008:30008", "-p", "30009:30009", "-v", "home\myusername\scrapybook:/root/book", "-v", "home\myusername\scrapybook:/vagrant", "-h", "spark", "scrapybook/spark", {:notify=>[:stdout, :stderr]}]

I do not know where "home\myusername\scrapybook" comes from, I have searched all over Vagrant and docker-compose file, and replaced all possible places with absolute path "/home/myusername/scrapybook"

thanks!
javelinlz

unable to run vagrant up

(c) 2016 Microsoft Corporation. All rights reserved.

C:\Users\robert>cd desktop

C:\Users\robert\Desktop>cd scrapybook

C:\Users\robert\Desktop\scrapybook>vagrant up --no-parallel
Bringing machine 'web' up with 'docker' provider...
Bringing machine 'spark' up with 'docker' provider...
Bringing machine 'es' up with 'docker' provider...
Bringing machine 'redis' up with 'docker' provider...
Bringing machine 'mysql' up with 'docker' provider...
Bringing machine 'scrapyd1' up with 'docker' provider...
Bringing machine 'scrapyd2' up with 'docker' provider...
Bringing machine 'scrapyd3' up with 'docker' provider...
Bringing machine 'dev' up with 'docker' provider...
==> web: Docker host is required. One will be created if necessary...
web: Vagrant will now create or start a local VM to act as the Docker
web: host. You'll see the output of the vagrant up for this VM below.
web:
docker-provider: Importing base box 'lookfwd/scrapybook'...
docker-provider: Matching MAC address for NAT networking...
docker-provider: Checking if box 'lookfwd/scrapybook' is up to date...
docker-provider: Setting the name of the VM: docker-provider
C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/is_port_open.rb:21:in initialize': The requested address is not valid in its context. - connect(2) for "0.0.0.0" port 9200 (Errno::EADDRNOTAVAIL) from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/is_port_open.rb:21:in new'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/is_port_open.rb:21:in block in is_port_open?' from C:/HashiCorp/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:88:in block in timeout'
from C:/HashiCorp/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:32:in block in catch' from C:/HashiCorp/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:32:in catch'
from C:/HashiCorp/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:32:in catch' from C:/HashiCorp/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:103:in timeout'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/is_port_open.rb:19:in is_port_open?' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/handle_forwarded_port_collisions.rb:248:in port_check'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/handle_forwarded_port_collisions.rb:121:in []' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/handle_forwarded_port_collisions.rb:121:in block in handle'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/handle_forwarded_port_collisions.rb:257:in block in with_forwarded_ports' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/handle_forwarded_port_collisions.rb:253:in each'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/handle_forwarded_port_collisions.rb:253:in with_forwarded_ports' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/handle_forwarded_port_collisions.rb:98:in handle'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/handle_forwarded_port_collisions.rb:42:in block in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/environment.rb:567:in lock'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/handle_forwarded_port_collisions.rb:41:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/prepare_forwarded_port_collision_params.rb:30:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/env_set.rb:19:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/provision.rb:80:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/clear_forwarded_ports.rb:15:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/set_name.rb:50:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/clean_machine_folder.rb:17:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/check_accessible.rb:18:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:95:in block in finalize_action' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builder.rb:116:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in block in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/busy.rb:19:in busy'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/call.rb:53:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:95:in block in finalize_action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builder.rb:116:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in block in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/busy.rb:19:in busy' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/call.rb:53:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:95:in block in finalize_action' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builder.rb:116:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in block in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/busy.rb:19:in busy'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/call.rb:53:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/box_check_outdated.rb:78:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/config_validate.rb:25:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/check_virtualbox.rb:17:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:95:in block in finalize_action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/match_mac_address.rb:19:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/discard_state.rb:15:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/import.rb:74:in import' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/import.rb:13:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/prepare_clone_snapshot.rb:17:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/prepare_clone.rb:15:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/customize.rb:40:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/check_accessible.rb:18:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:95:in block in finalize_action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builder.rb:116:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in block in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/busy.rb:19:in busy' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/call.rb:53:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/config_validate.rb:25:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:95:in block in finalize_action' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/handle_box.rb:56:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:95:in block in finalize_action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builder.rb:116:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in block in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/busy.rb:19:in busy' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/call.rb:53:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/virtualbox/action/check_virtualbox.rb:17:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builder.rb:116:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in block in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/busy.rb:19:in busy' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/machine.rb:225:in action_raw' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/machine.rb:200:in block in action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/environment.rb:567:in lock' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/machine.rb:186:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/machine.rb:186:in action' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/docker/action/host_machine.rb:59:in block in setup_host_machine'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/machine.rb:548:in block in with_ui' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/machine.rb:544:in synchronize'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/machine.rb:544:in with_ui' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/docker/action/host_machine.rb:58:in setup_host_machine'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/docker/action/host_machine.rb:28:in block in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/docker/provider.rb:116:in block (2 levels) in host_vm_lock'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/environment.rb:567:in lock' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/docker/provider.rb:115:in block in host_vm_lock'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/docker/provider.rb:114:in synchronize' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/docker/provider.rb:114:in host_vm_lock'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/providers/docker/action/host_machine.rb:27:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/config_validate.rb:25:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:95:in block in finalize_action' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/handle_box.rb:25:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:95:in block in finalize_action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builder.rb:116:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in block in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/busy.rb:19:in busy' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builtin/call.rb:53:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/builder.rb:116:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in block in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/busy.rb:19:in busy' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/action/runner.rb:66:in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/machine.rb:225:in action_raw' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/machine.rb:200:in block in action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/environment.rb:567:in lock' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/machine.rb:186:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/machine.rb:186:in action' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/batch_action.rb:82:in block (2 levels) in run'

link need update

Amazon link on Website : scrapybook.com is redirecting to the old book version.

I met a problem at the time of pipelines, MySQL error OperationalError: (1241, 'Operand should contain 1 column (s) ')

Here is the code::pipelines.py
`def process_item(self, item, spider):
db = MySQLdb.connect("localhost","root","2955112","properties",charset="utf8")
cursor = db.cursor()
#sql = """CREATE TABLE IF NOT EXISTS baidu_data2 (
#title VARCHAR(250),
#content varchar(550),
#contenthref varchar(450)
#) ENGINE=MyISAM DEFAULT CHARSET=utf8 """

    #cursor.execute(sql)        
    sql2 = """INSERT INTO baidu_data2 (title,content,contenthref) VALUES (%s,%s,%s)"""


    args = (item["title"],item["content"],item["contenthref"])   
    cursor.execute(sql2,args)`

Here is the code:spiders
`def parse_item(self,response):
#selec=response.xpath("")

    #l= ItemLoader(item=SosoItem(), response=response)
    #l.add_xpath('title',"//h3[@class='vrTitle']/a")
    #l.add_xpath('content',"//div[@class='str_info_div']/p")
    #l.add_xpath('contenthref',"//h3[@class='vrTitle']/a/@href")

    #return l.load_item()
    items=[]
    select=response.xpath("//div[@class='vrwrap']")
    for i in select:
        item = SosoItem()
        item["title"]=i.xpath("h3[@class='vrTitle']/a").extract()
        item["content"]=i.xpath("div[1]/p").extract()
        item['contenthref']=i.xpath("h3[@class='vrTitle']/a/@href").extract()
        items.append(item)

    return items    `

ERROR information

ERROR: Error processing {'content': [u'

\n[\u56fe\u6587]ABC\u7ae5\u978b\u54c1\u724c\u4e3a\u5168\u56fd\u7684\u7ae5\u978b\u52a0\u76df\u4ee3\u7406\u6279\u53d1\u5546\u63d0\u4f9bABC\u7ae5\u978b2015\u65b0\u6b3e\uff0cABC\u7ae5\u978b\u54c1\u724c\u52a0\u76df\u8d39\uff0cABC\u7ae5\u978b\u4e13\u5356\u5e97\u52a0\u76df\u6761\u4ef6\uff0cABC\u7ae5\u978b\u52a0\u76df\u7535\u8bdd\u7b49\u4fe1\u606f\uff0c\u54a8\u8be2\u70ed\u7ebf\uff1a400\u20146929\u2014...

'],
'contenthref': [u'http://www.sogou.com/link?url=DSOYnZeCC_rNvR6aXaV4WJFzyG5FAO1LDz_NR33A47Q.&query=abc'],
'title': [u'\u7ae5\u978b\u52a0\u76df_',
u'\u7ae5\u978b\u52a0\u76df\u4ee3\u7406_',
u'\u7ae5\u978b\u5b98\u7f51 -\u4e2d\u56fd\u978b\u7f51']}
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
current.result = callback(current.result, _args, *_kw)
File "/usr/soso/soso/pipelines.py", line 28, in process_item
cursor.execute(sql2,args)
File "/usr/local/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 205, in execute
self.errorhandler(self, exc, value)
File "/usr/local/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
OperationalError: (1241, 'Operand should contain 1 column(s)')

I think it is a problem with the above code, but I don't know how to solve?

ls: cannot open directory /root/book/: Permission denied

I use only docker, and used docker-compose to start up the containers.

When I connect to scrapybook_dev_1 container, I get this message:
ls: cannot open directory /root/book/: Permission denied

Could be something related to SELinux contexts:

root@3eb13a1bf6aa:/# ls -laZ /root|grep book
drwxrwxr-x. 13 1000 1000 unconfined_u:object_r:user_home_t:s0                4096 Apr 19 15:01 book

vagrant up --no-parallel ?

PS D:> cd .\scrapybook
PS D:\scrapybook> vagrant up --no-parallel
D:/scrapybook/Vagrantfile:4: warning: constant ::TRUE is deprecated
Bringing machine 'web' up with 'docker' provider...
Bringing machine 'spark' up with 'docker' provider...
Bringing machine 'es' up with 'docker' provider...
Bringing machine 'redis' up with 'docker' provider...
Bringing machine 'mysql' up with 'docker' provider...
Bringing machine 'scrapyd1' up with 'docker' provider...
Bringing machine 'scrapyd2' up with 'docker' provider...
Bringing machine 'scrapyd3' up with 'docker' provider...
Bringing machine 'dev' up with 'docker' provider...
==> web: Docker host is required. One will be created if necessary...
The version of powershell currently installed on this host is less than
the required minimum version. Please upgrade the installed version of
powershell to the minimum required version and run the command again.

Installed version: 2

Minimum required version: 3
PS D:\scrapybook>

git clone: error setting certificate verify locations

Getting this error while cloning the book:

C:\Users\me> git clone https://github.com/scalingexcellence/scrapybook.git
Cloning into 'scrapybook'...
fatal: unable to access 'https://github.com/scalingexcellence/scrapybook.git/':
error setting certificate verify locations:
  CAfile: C:\Program Files\Git\mingw64/bin/curl-ca-bundle.crt
  CApath: none

Timeout error trying scrapy shell against dockerized web site

Hi there,

I am having fun trying to set-up the Vagrant/Docker network in OSX 10.11.3 (El Capitan).
First, a warning for all other OSX folks out there : don't use Vagrant 1.7.x or you will be stuck badly with non-sense errors in your console. Use the 1.8.1 (or newer version)

Then, let's go with my problem. I can see all the docker boxes. I can even ssh into them (vagrant ssh works like a charm). From there, I can see that the web box is running OK and responding HTTP queries at tcp/9312 also :

root@dev:~/book# telnet web 9312
Trying 172.17.0.2...
Connected to web.
Escape character is '^]'.
GET / HTTP/1.0

HTTP/1.0 200 OK
Date: Tue, 15 Mar 2016 08:50:03 GMT
Content-Length: 261
Content-Type: text/html
Server: TwistedWeb/15.5.0

Resource not found. Try: <a href="properties/index_00000.html">properties</a> <a href="images">images</a>, <a href="dynamic">dynamic</a>, <a href="benchmark/">benchmark</a> <a href="maps/api/geocode/json?sensor=false&address=Camden%20Town%2C%20London">maps</a> Connection closed by foreign host.

But now, following the book (p. 113, section "The URL"), if I try to use the scrapy shell to connect to http://web:9312 , there is a timeout error that I can't grok:

root@dev:~/book# scrapy shell http://web:9312
2016-03-15 08:51:17 [scrapy] INFO: Scrapy 1.0.3 started (bot: scrapybot)
2016-03-15 08:51:17 [scrapy] INFO: Optional features available: ssl, http11, boto
2016-03-15 08:51:17 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2016-03-15 08:51:17 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, CoreStats, SpiderState
2016-03-15 08:51:17 [boto] DEBUG: Retrieving credentials from metadata server.
2016-03-15 08:51:18 [boto] ERROR: Caught exception reading instance data
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url
    r = opener.open(req, timeout=timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
URLError: <urlopen error timed out>
2016-03-15 08:51:18 [boto] ERROR: Unable to read instance data, giving up
2016-03-15 08:51:18 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2016-03-15 08:51:18 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-03-15 08:51:18 [scrapy] INFO: Enabled item pipelines:
2016-03-15 08:51:18 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-03-15 08:51:18 [scrapy] INFO: Spider opened
2016-03-15 08:51:18 [scrapy] DEBUG: Crawled (200) <GET http://web:9312> (referer: None)
[s] Available Scrapy objects:
[s]   crawler    <scrapy.crawler.Crawler object at 0x7fd50cd79b10>
[s]   item       {}
[s]   request    <GET http://web:9312>
[s]   response   <200 http://web:9312>
[s]   settings   <scrapy.settings.Settings object at 0x7fd50cd79a90>
[s]   spider     <DefaultSpider 'default' at 0x7fd50bc80b50>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser

Any help will be much appreciated.

Greetings,

Juanan

urlopen error time out

I noticed an error message when using scrapy inside the VM. I think it has no consequence for the examples.
The error is link to urllib2 shooting a time out.
Is this a normal behavior and will be fix in later version ?

Feel free to close this if it is not relevant. Thanks !

> root@dev:~# scrapy shell http://web:9312/properties/property_000000.html
> 2016-09-23 11:04:19 [scrapy] INFO: Scrapy 1.0.3 started (bot: scrapybot)
> 2016-09-23 11:04:19 [scrapy] INFO: Optional features available: ssl, http11, boto
> 2016-09-23 11:04:19 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
> 2016-09-23 11:04:19 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, CoreStats, SpiderState
> 2016-09-23 11:04:19 [boto] DEBUG: Retrieving credentials from metadata server.
> 2016-09-23 11:04:20 [boto] ERROR: Caught exception reading instance data
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url
>     r = opener.open(req, timeout=timeout)
>   File "/usr/lib/python2.7/urllib2.py", line 404, in open
>     response = self._open(req, data)
>   File "/usr/lib/python2.7/urllib2.py", line 422, in _open
>     '_open', req)
>   File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
>     result = func(*args)
>   File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
>     return self.do_open(httplib.HTTPConnection, req)
>   File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
>     raise URLError(err)
> URLError: <urlopen error timed out>
> 2016-09-23 11:04:20 [boto] ERROR: Unable to read instance data, giving up
> 2016-09-23 11:04:20 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
> 2016-09-23 11:04:20 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
> 2016-09-23 11:04:20 [scrapy] INFO: Enabled item pipelines: 
> 2016-09-23 11:04:20 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
> 2016-09-23 11:04:20 [scrapy] INFO: Spider opened
> 2016-09-23 11:04:20 [scrapy] DEBUG: Crawled (200) <GET http://web:9312/properties/property_000000.html> (referer: None)
> [s] Available Scrapy objects:
> [s]   crawler    <scrapy.crawler.Crawler object at 0x7fa05db51b10>
> [s]   item       {}
> [s]   request    <GET http://web:9312/properties/property_000000.html>
> [s]   response   <200 http://web:9312/properties/property_000000.html>
> [s]   settings   <scrapy.settings.Settings object at 0x7fa05db51a90>
> [s]   spider     <DefaultSpider 'default' at 0x7fa05ca57b50>
> [s] Useful shortcuts:
> [s]   shelp()           Shell help (print this help)
> [s]   fetch(req_or_url) Fetch request (or URL) and update local objects
> [s]   view(response)    View response in a browser

vagrant ssh command got error

I am using ubuntu 16.04.
I have follow all steps and that works properly.
But when run vagrant ssh it shows

==> dev: The container is not currently running.

How i can run this container or run vagrant ssh

The spider easy can't work in Chapter 8

Page 138
When I ran the script "scrapy crawl easy -s CLOSESPIDER_ITEMCOUNT=90", I got this :

root@dev:~/book/ch08/properties# scrapy crawl easy -s CLOSESPIDER_ITEMCOUNT=90
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in
sys.exit(execute())
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 109, in execute
settings = get_project_settings()
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/project.py", line 60, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "/usr/local/lib/python2.7/dist-packages/scrapy/settings/__init__.py", line 108, in setmodule
module = import_module(module)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: No module named properties.settings


The file properties.settings was really in the directory, I don't know why it said the file can't be found.

can you help me?
thanks -John

ERROR: C:/Users/USER/Desktop/scrapybook/Vagrantfile:4: warning: constant ::TRUE is deprecated

it is my .... problem
I am NOT good at English,sorry
I reference the first video in this (http://scrapybook.com/)
but something wrong...
I am a Beginners for this,I will thank you for your advice.

Microsoft Windows [版本 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\Users\USER>Desktop
'Desktop' 不是內部或外部命令、可執行的程式或批次檔。

C:\Users\USER>cd Desktop

C:\Users\USER\Desktop>git clone https://github.com/scalingexcellence/scrapybook.
git
Cloning into 'scrapybook'...
remote: Counting objects: 1218, done.
emote: Total 1218 (delta 0), reused 0 (delta 0), pack-reused 1218Receiving objec
Receiving objects: 100% (1218/1218), 621.32 KiB | 101.00 KiB/s, done.

Resolving deltas: 100% (599/599), done.
Checking connectivity... done.

C:\Users\USER\Desktop>cd scrapybook

C:\Users\USER\Desktop\scrapybook>vagrant up --no-parallel

C:/Users/USER/Desktop/scrapybook/Vagrantfile:4: warning: constant ::TRUE is depr

ecated (I don't know why it appeared)
Bringing machine 'web' up with 'docker' provider...
Bringing machine 'spark' up with 'docker' provider...
Bringing machine 'es' up with 'docker' provider...
Bringing machine 'redis' up with 'docker' provider...
Bringing machine 'mysql' up with 'docker' provider...
Bringing machine 'scrapyd1' up with 'docker' provider...
Bringing machine 'scrapyd2' up with 'docker' provider...
Bringing machine 'scrapyd3' up with 'docker' provider...
Bringing machine 'dev' up with 'docker' provider...
==> web: Docker host is required. One will be created if necessary...

it stop here

vagrant ssh errors

I always get this error:
and I tried everything which had written here #25
but also didn't solve this error.

C:\Users\m\Desktop\scrapybook>vagrant ssh
==> dev: SSH will be proxied through the Docker virtual machine since we're
==> dev: not running Docker natively. This is just a notice, and not an error.
C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/plugins/providers/docker/provider.rb:142:in ssh_info' : undefined method first' for nil:NilClass (NoMethodError)
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/machine.rb:426:in ssh_info' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/plugins/providers/docker/communicator.rb: 145:in container_ssh_command'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/plugins/providers/docker/action/prepare_s
sh.rb:23:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/warden.rb:34:in call'

    from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/warden.rb:95:in `block

in finalize_action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/warden.rb:34:in `call'

    from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/warden.rb:34:in `call'

    from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/builder.rb:116:in `cal

l'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/runner.rb:66:in block in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/util/busy.rb:19:in busy'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/runner.rb:66:in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/builtin/call.rb:53:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/warden.rb:34:in `call'

    from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/warden.rb:95:in `block

in finalize_action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/warden.rb:34:in `call'

    from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/warden.rb:34:in `call'

    from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/builder.rb:116:in `cal

l'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/runner.rb:66:in block in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/util/busy.rb:19:in busy'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/runner.rb:66:in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/builtin/call.rb:53:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/warden.rb:34:in `call'

    from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/builder.rb:116:in `cal

l'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/runner.rb:66:in block in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/util/busy.rb:19:in busy'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/runner.rb:66:in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/machine.rb:225:in action_raw
'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/machine.rb:200:in block in a ction' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/machine.rb:182:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/machine.rb:182:in block in a ction' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/machine.rb:186:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/machine.rb:186:in action' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/plugins/commands/ssh/command.rb:60:in bl
ock in execute'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/plugin/v2/command.rb:235:in block in with_target_vms' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/plugin/v2/command.rb:229:in
each'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/plugin/v2/command.rb:229:in with_target_vms' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/plugins/commands/ssh/command.rb:41:inex
ecute'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/cli.rb:42:in execute' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/environment.rb:308:in cli'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.4/bin/vagrant:127:in `

'

unknown host

I followed all the steps to setup vagrant on my mac machine. i was able to connect to web server but not any more. When i try to ping the server it says unknown host. This i am doing from inside the dev machine.

root@dev:/etc# whoami
root
root@dev:/etc# hostname
dev
root@dev:/etc# ping http://web:9312/properties/property_000000.html
ping: unknown host http://web:9312/properties/property_000000.html
root@dev:/etc#

Please let me know how i can resolve this

Chapter 3 example pg 36-40

Hello Dimitrios,

was running some of the example problems in the scrapy shell...just wanted to know, did the web url
web:9312/properties/property_000000.html change its content or format relative to what is in the textbook on pg 36-37? When I ran some of the response.xpath ..... I get different answers than what the book has....for example>>> response.xpath('//*[@itemprop="price"] [1]/text()').exctract returns now............

<bound method SelectorList.extract of [<Selector xpath='//*[@itemprop="price"] [1]/text()' data=u'\n\xa3715.0pw'>]>

but the book has the same response returning just [u'\xa3334.39pw']

all of this is on page 37

i dont mind if the web page changed but wondered if there is an updated set of answers for the textbook........

thankks John

appreciate the help with the vagrant/windows10 bug....

'vagrant up --no-parallel ' do not work!

D:\scrapybook>vagrant up --no-parallel
D:/scrapybook/Vagrantfile:4: warning: constant ::TRUE is deprecated
Bringing machine 'web' up with 'docker' provider...
Bringing machine 'spark' up with 'docker' provider...
Bringing machine 'es' up with 'docker' provider...
Bringing machine 'redis' up with 'docker' provider...
Bringing machine 'mysql' up with 'docker' provider...
Bringing machine 'scrapyd1' up with 'docker' provider...
Bringing machine 'scrapyd2' up with 'docker' provider...
Bringing machine 'scrapyd3' up with 'docker' provider...
Bringing machine 'dev' up with 'docker' provider...
==> web: Docker host is required. One will be created if necessary...
The version of powershell currently installed on this host is less than
the required minimum version. Please upgrade the installed version of
powershell to the minimum required version and run the command again.

Installed version: 2

Minimum required version: 3

Vagrant up error

I installed all in the appendix, but when i run vagrant up --no-parallel i found this error.
How to solve this ?

hathout@hathout-laptop:~$ cd scrapybook/ hathout@hathout-laptop:~/scrapybook$ vagrant up --no-parallel Bringing machine 'web' up with 'docker' provider... Bringing machine 'spark' up with 'docker' provider... Bringing machine 'es' up with 'docker' provider... Bringing machine 'redis' up with 'docker' provider... Bringing machine 'mysql' up with 'docker' provider... Bringing machine 'scrapyd1' up with 'docker' provider... Bringing machine 'scrapyd2' up with 'docker' provider... Bringing machine 'scrapyd3' up with 'docker' provider... Bringing machine 'dev' up with 'docker' provider... ==> web: Docker host is required. One will be created if necessary... web: Vagrant will now create or start a local VM to act as the Docker web: host. You'll see the output of the vagrant upfor this VM below. web: docker-provider: Checking if box 'lookfwd/scrapybook' is up to date... docker-provider: Clearing any previously set forwarded ports... docker-provider: Clearing any previously set network interfaces... docker-provider: Preparing network interfaces based on configuration... docker-provider: Adapter 1: nat docker-provider: You are trying to forward to privileged ports (ports <= 1024). Most docker-provider: operating systems restrict this to only privileged process (typically docker-provider: processes running as an administrative user). This is a warning in case docker-provider: the port forwarding doesn't work. If any problems occur, please try a docker-provider: port higher than 1024. docker-provider: Forwarding ports... docker-provider: 9200 (guest) => 9200 (host) (adapter 1) docker-provider: 6379 (guest) => 6379 (host) (adapter 1) docker-provider: 3306 (guest) => 3306 (host) (adapter 1) docker-provider: 9312 (guest) => 9312 (host) (adapter 1) docker-provider: 6800 (guest) => 6800 (host) (adapter 1) docker-provider: 6801 (guest) => 6801 (host) (adapter 1) docker-provider: 6802 (guest) => 6802 (host) (adapter 1) docker-provider: 6803 (guest) => 6803 (host) (adapter 1) docker-provider: 21 (guest) => 21 (host) (adapter 1) docker-provider: 30000 (guest) => 30000 (host) (adapter 1) docker-provider: 30001 (guest) => 30001 (host) (adapter 1) docker-provider: 30002 (guest) => 30002 (host) (adapter 1) docker-provider: 30003 (guest) => 30003 (host) (adapter 1) docker-provider: 30004 (guest) => 30004 (host) (adapter 1) docker-provider: 30005 (guest) => 30005 (host) (adapter 1) docker-provider: 30006 (guest) => 30006 (host) (adapter 1) docker-provider: 30007 (guest) => 30007 (host) (adapter 1) docker-provider: 30008 (guest) => 30008 (host) (adapter 1) docker-provider: 30009 (guest) => 30009 (host) (adapter 1) docker-provider: 22 (guest) => 2222 (host) (adapter 1) docker-provider: Running 'pre-boot' VM customizations... docker-provider: Booting VM... docker-provider: Waiting for machine to boot. This may take a few minutes... docker-provider: SSH address: 127.0.0.1:2222 docker-provider: SSH username: vagrant docker-provider: SSH auth method: private key docker-provider: Warning: Remote connection disconnect. Retrying... docker-provider: Warning: Remote connection disconnect. Retrying... docker-provider: Warning: Remote connection disconnect. Retrying... docker-provider: Warning: Remote connection disconnect. Retrying... docker-provider: Warning: Remote connection disconnect. Retrying... docker-provider: Machine booted and ready! docker-provider: Mounting shared folders... docker-provider: /vagrant => /home/hathout/scrapybook ==> web: Starting container... ==> web: Provisioners will not be run since container doesn't support SSH. ==> spark: Docker host is required. One will be created if necessary... spark: Docker host VM is already ready. ==> spark: Syncing folders to the host VM... docker-provider: Mounting shared folders... docker-provider: /var/lib/docker/docker_1484741634_49199 => /home/hathout/scrapybook ==> spark: Vagrant has noticed that the synced folder definitions have changed. ==> spark: With Docker, these synced folder changes won't take effect until you ==> spark: destroy the container and recreate it. ==> spark: Starting container... ==> spark: Provisioners will not be run since container doesn't support SSH. ==> es: Docker host is required. One will be created if necessary... es: Docker host VM is already ready. ==> es: Starting container... ==> es: Provisioners will not be run since container doesn't support SSH. ==> redis: Docker host is required. One will be created if necessary... redis: Docker host VM is already ready. ==> redis: Starting container... ==> redis: Provisioners will not be run since container doesn't support SSH. ==> mysql: Docker host is required. One will be created if necessary... mysql: Docker host VM is already ready. ==> mysql: Starting container... ==> mysql: Provisioners will not be run since container doesn't support SSH. ==> scrapyd1: Docker host is required. One will be created if necessary... scrapyd1: Docker host VM is already ready. ==> scrapyd1: Starting container... ==> scrapyd1: Provisioners will not be run since container doesn't support SSH. ==> scrapyd2: Docker host is required. One will be created if necessary... scrapyd2: Docker host VM is already ready. ==> scrapyd2: Starting container... ==> scrapyd2: Provisioners will not be run since container doesn't support SSH. ==> scrapyd3: Docker host is required. One will be created if necessary... scrapyd3: Docker host VM is already ready. ==> scrapyd3: Starting container... ==> scrapyd3: Provisioners will not be run since container doesn't support SSH. ==> dev: Docker host is required. One will be created if necessary... dev: Docker host VM is already ready. ==> dev: Syncing folders to the host VM... docker-provider: Mounting shared folders... docker-provider: /var/lib/docker/docker_1484742051_99858 => /home/hathout/scrapybook ==> dev: Vagrant has noticed that the synced folder definitions have changed. ==> dev: With Docker, these synced folder changes won't take effect until you ==> dev: destroy the container and recreate it. ==> dev: Starting container... ==> dev: Provisioners will not be run since container doesn't support SSH.

Chapter 9 pipeline to mySQL not working/hangs/etc

Hello,
I ran a few tries at this Chapter 9 and nothing seems to work. I was running the examples on pages 159 -162, where a pipeline is set up for insert into a mysql database. I was able to run mysql in the VM dev environment, setting up tables is fine, no problem....

when running scrapy crawl easy -s CLOSESPIDER_ITEMCOUNT=1000 the spider kicks off but hangs forever. forces me to remove the vm connection, and start all over from scratch. I recall you altered a port for mysql to help with the Windows10 bug / VM but i'm not sure that is the problem. please confirm running the code from the folder ch09 writes sucessfully to mysql. I am running the same file from the same folder and my spider hangs forever......

thanks -John

ps: many of your book examples, in both chapter 3 and chapter 4 have different source html code on the :web:9312 as to what's in the paperback textbook. Even the Appery.io website is different than the pictures in the textbook, there is no startscreen tab and no data tab after setting up your account . Not sure what is the issue with that one.

Error when running scrapy shell commands examples as well as running spider code

I am getting the an error when running the following shell command in the docker scrapybook_dev_1 shell:
scrapy shell http://web:9312/properties/property_000000.html

The same when running the following spider from A scrapy project section of the book:
scrapy crawl basic

2017-11-03 02:47:59 [boto] DEBUG: Retrieving credentials from metadata server.
2017-11-03 02:48:00 [boto] ERROR: Caught exception reading instance data
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url
r = opener.open(req, timeout=timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
URLError:
2017-11-03 02:48:00 [boto] ERROR: Unable to read instance data, giving up

can't login to Appery.io to do ch04

I try to run the code in chapter 4, but I can't write to Appery.io database. i have followed the instructions in the book by configuring the settings.py
scrapy crawl gives me the following error:

2017-09-20 15:02:55 [scrapy] DEBUG: Crawled (200) <GET http://scrapybook.s3.amazonaws.com/properties/property_000055.html> (referer: http://scrapybook.s3.amazonaws.com/properties/index_00001.html)
2017-09-20 15:02:55 [scrapy] DEBUG: Gave up retrying <GET https://api.appery.io/rest/1/db/login?username=root&password=**> (failed 3 times): 400 Bad Request
2017-09-20 15:02:55 [scrapy] DEBUG: Crawled (400) <GET https://api.appery.io/rest/1/db/login?username=root&password=pass> (referer: None)

Issue with Virtualbox 5.1.6 a,d Vagrant 1.8.5

On my ubuntu 16.04 64bit I noticed an error when using the Virtualbox version 5.1.6 and vagrant version 1.8.5

There was an error while executing `VBoxManage`, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.

Command: ["startvm", "c2bd3fe0-48f7-463d-b31c-80720472de13", "--type", "headless"]

Stderr: VBoxManage: error: The virtual machine 'docker-provider' has terminated unexpectedly during startup with exit code 1 (0x1)
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component MachineWrap, interface IMachine

I simply downgrade to Virtual box 5.0.0 and now it works perfectly.

I log this issue and close it so anyone encountering this issue can have a workaround.

vagrant up --no-parallel fails at "Bringing machine dev up" on Ubuntu 14.04.4 LTS

Hi - I've followed Learning Scrapy's instructions in the appendix for Ubuntu 14.04.4 LTS, without success.

  • docker is installed properly (confirmed with docker run hello-world and docker -ps)
  • sudo apt-get install vagrant ran successfully
  • vagrant up --no-parallel generates
    The provider 'docker' could not be found, but was requested to back the machine 'web'. Please use a provider that exists.
  • Some googling point me to #5, where I pick up the suggestion vagrant box add scrapybook scrapybook.box, and therefore search for scrapybook.box
  • I've downloaded scrapybook.box
  • I ran vagrant box add scrapybook scrapybook.box with result:
    Successfully added box 'scrapybook' with provider 'virtualbox'! (VirtualBox is also installed on my computer)
  • I've run vagrant box list with result
    scrapybook (virtualbox)
  • But when I run vagrant up --no-parallel --provider=virtualbox I get:
    Bringing machine 'web' up with 'virtualbox' provider...
    Bringing machine 'spark' up with 'virtualbox' provider...
    Bringing machine 'es' up with 'virtualbox' provider...
    Bringing machine 'redis' up with 'virtualbox' provider...
    Bringing machine 'mysql' up with 'virtualbox' provider...
    Bringing machine 'scrapyd1' up with 'virtualbox' provider...
    Bringing machine 'scrapyd2' up with 'virtualbox' provider...
    Bringing machine 'scrapyd3' up with 'virtualbox' provider...
    Bringing machine 'dev' up with 'virtualbox' provider...
    There are errors in the configuration of this machine. Please fix
    the following errors and try again:

    vm:
    * A box must be specified.

My question is: how can I specify the scrapybook box?

Thanks!

Paul.

Windows Host error: volume name invalid (Solved)

I was facing the following issue:

Stderr: docker: Error response from daemon: create var\lib\docker\docker_1499653205_62199: volume name invalid: "var\\lib\\docker\\docker_1499653205_62199" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_
.-]" are allowed.

Changed in the Vagrantfile, the lines 43 & 168, respectively:

from:
spark.vm.synced_folder ".", "/root/book"
to:
spark.vm.synced_folder ".", "/vagrant", disabled: true

from:
dev.vm.synced_folder ".", "/root/book"
to:
dev.vm.synced_folder ".", "/vagrant", disabled: true

So, works smoothly...

BTW, i'm a vagrant/docker newbie and cant guarantee that's the right solution. Hope that a expert can explain if i made a mistake.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.