Coder Social home page Coder Social logo

corners_stats's Issues

SSL certificate error when running `make start_scraping`

I am fairly new to python, not sure if this is a general question or something to be configured in makefile/project settings.

$ python --version
Python 3.6.0 :: Anaconda custom (x86_64)

$ docker version
Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:31:53 2017
 OS/Arch:      darwin/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:51:55 2017
 OS/Arch:      linux/amd64
 Experimental: true

$ make start_scraping
docker build \
		--file=Dockerfile \
		-t corners/bash:dev \
		.
Sending build context to Docker daemon  64.43MB
Step 1/8 : FROM python:3.6
 ---> 955d0c3b1bb2
Step 2/8 : RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends     python-pip
 ---> Using cache
 ---> f8c677b2cb2a
Step 3/8 : COPY requirements.txt /tmp/
 ---> Using cache
 ---> b5aca624550d
Step 4/8 : ENV MPLBACKEND "agg"
 ---> Using cache
 ---> af003d820c5e
Step 5/8 : RUN pip install -r /tmp/requirements.txt
 ---> Running in 116e869d3685
Collecting matplotlib==2.0.2 (from -r /tmp/requirements.txt (line 1))
  Could not fetch URL https://pypi.python.org/simple/matplotlib/: There was a problem confirming the ssl certificate: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749) - skipping
  Could not find a version that satisfies the requirement matplotlib==2.0.2 (from -r /tmp/requirements.txt (line 1)) (from versions: )
No matching distribution found for matplotlib==2.0.2 (from -r /tmp/requirements.txt (line 1))
The command '/bin/sh -c pip install -r /tmp/requirements.txt' returned a non-zero code: 1
make: *** [bash-build] Error 1

No results from "sudo make start_scraping"

Okay, the additional sleep time did the trick for the connection issue, however the application seems to run without error but I don't believe any results are returned, or its not scraping the pages in general. Below is the log from the command line (running: sudo make start_scraping). Would it be possible for you to review and let me know if this is just an user error (my issue)?

docker build
--file=Dockerfile
-t corners/bash:dev
.
Sending build context to Docker daemon 64.43MB
Step 1/8 : FROM python:3.6
---> c5700ee6fe7b
Step 2/8 : RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends python-pip
---> Using cache
---> 6cbb5353ee2d
Step 3/8 : COPY requirements.txt /tmp/
---> Using cache
---> 866ef44e2a1d
Step 4/8 : ENV MPLBACKEND "agg"
---> Using cache
---> f42f5740388f
Step 5/8 : RUN pip install -r /tmp/requirements.txt
---> Using cache
---> 6b4e3881ce3a
Step 6/8 : COPY . /opt/corners
---> Using cache
---> e8ab01dbaa96
Step 7/8 : WORKDIR /opt/corners
---> Using cache
---> 5ec54d6c9858
Step 8/8 : CMD bash
---> Using cache
---> 6e4afe087133
Successfully built 6e4afe087133
Successfully tagged corners/bash:dev
docker run -d
-e POSTGRES_PASSWORD=corners
-e POSTGRES_USER=corners
-e POSTGRES_DB=corners
-p 8432:5432
--name corners-postgres
postgres:9.5
0f2a214ff19d561f69725586d510323e323be704745eb8162199492990f18b1a
sleep 20
docker run --rm -i
--link corners-postgres
-e DEV_PSQL_URI=postgresql://corners:corners@corners-postgres:5432/corners
corners/bash:dev
./start.sh
2017-07-19 18:15:06 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: corners442)
2017-07-19 18:15:06 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'corners442', 'CONCURRENT_REQUESTS': 4, 'CONCURRENT_REQUESTS_PER_DOMAIN': 2, 'CONCURRENT_REQUESTS_PER_IP': 2, 'DOWNLOAD_DELAY': 5, 'NEWSPIDER_MODULE': 'corners442.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['corners442.spiders'], 'USER_AGENT': 'Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405'}
2017-07-19 18:15:07 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2017-07-19 18:15:07 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-07-19 18:15:07 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-07-19 18:15:07 [scrapy.middleware] INFO: Enabled item pipelines:
['corners442.pipelines.LeaguePipeline']
2017-07-19 18:15:07 [scrapy.core.engine] INFO: Spider opened
2017-07-19 18:15:07 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-07-19 18:15:07 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-07-19 18:15:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fourfourtwo.com/robots.txt> (referer: None)
2017-07-19 18:15:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fourfourtwo.com/> (referer: None)
2017-07-19 18:15:08 [scrapy.core.engine] INFO: Closing spider (finished)
2017-07-19 18:15:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 628,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 17119,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 7, 19, 18, 15, 8, 263953),
'log_count/DEBUG': 3,
'log_count/INFO': 7,
'response_received_count': 2,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2017, 7, 19, 18, 15, 7, 329204)}
2017-07-19 18:15:08 [scrapy.core.engine] INFO: Spider closed (finished)
2017-07-19 18:15:19 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: corners442)
2017-07-19 18:15:19 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'corners442', 'CONCURRENT_REQUESTS': 4, 'CONCURRENT_REQUESTS_PER_DOMAIN': 2, 'CONCURRENT_REQUESTS_PER_IP': 2, 'DOWNLOAD_DELAY': 5, 'NEWSPIDER_MODULE': 'corners442.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['corners442.spiders'], 'USER_AGENT': 'Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405'}
2017-07-19 18:15:19 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2017-07-19 18:15:19 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-07-19 18:15:19 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-07-19 18:15:19 [scrapy.middleware] INFO: Enabled item pipelines:
['corners442.pipelines.Corners442Pipeline']
2017-07-19 18:15:19 [scrapy.core.engine] INFO: Spider opened
2017-07-19 18:15:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-07-19 18:15:19 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-07-19 18:15:19 [scrapy.core.engine] INFO: Closing spider (finished)
2017-07-19 18:15:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 7, 19, 18, 15, 19, 517179),
'log_count/DEBUG': 1,
'log_count/INFO': 7,
'start_time': datetime.datetime(2017, 7, 19, 18, 15, 19, 500297)}
2017-07-19 18:15:19 [scrapy.core.engine] INFO: Spider closed (finished)

Connection Refused

Observed the following error which caused the shell to exit:

image

I am somewhat new to web scraping with Python, the OS I am running is Ubuntu with Anaconda (python 3.6). Any assistance with this would be greatly appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.