Coder Social home page Coder Social logo

sample-projects's Introduction

Scrapinghub command line client

PyPI Version Python Versions Tests Coverage report

shub is the Scrapinghub command line client. It allows you to deploy projects or dependencies, schedule spiders, and retrieve scraped data or logs without leaving the command line.

Requirements

  • Python >= 3.6

Installation

If you have pip installed on your system, you can install shub from the Python Package Index:

pip install shub

Please note that if you are using Python < 3.6, you should pin shub to 2.13.0 or lower.

We also supply stand-alone binaries. You can find them in our latest GitHub release.

Documentation

Documentation is available online via Read the Docs: https://shub.readthedocs.io/, or in the docs directory.

sample-projects's People

Contributors

eliasdorneles avatar mukthy avatar stummjr avatar thrivenipatil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sample-projects's Issues

Unauthorized Crawlera Header: "x-crawlera-session" Error

Trying to run the example with my API key returns the following response:

<html><head></head>
    <body><pre style="word-wrap: break-word; white-space: pre-wrap;">
       Unauthorized Crawlera Header: "x-crawlera-session"
    </pre>
  </body>
</html>

How to use Splash on my side?

I want to use my own Splash instance on my server and Crawlera service(I use C10).
I try this example and a response is Website crawl ban its mean that Crawlera doesn't deal with it. But if I just use only Crawlera all works well.
Also, I tried this, just deleted request:set_header("X-Crawlera-UA", "desktop").
The result is the same. Is something change in Crawlera API? Or I should use C50 plan or smt bad in my code?
Thanks!

requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://storage.scrapinghub.com/collections/233792/s/nikoncoolpix

Following the Scrapy Price Monitor I encountered an error after successfully deploying project to Scrapy Cloud. Running for example amazon.com spider job, it is completed with 0 items and 5 errors (1 for each 'product name'). In job log I get (for 'product_name': 'nikoncoolpix'):
[scrapy.core.scraper] Error processing {'retailer': 'amazon.com', 'product_name': 'nikoncoolpix', 'when': '2017/09/07 03:57:21', 'price': 256.95, 'title': 'Nikon COOLPIX B500 Digital Camera (Red)', 'url': 'https://www.amazon.com/Nikon-COOLPIX-B500-Digital-Camera/dp/B01C3LEE9G'}

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/app/__main__.egg/price_monitor/pipelines.py", line 20, in process_item
  File "/usr/local/lib/python3.5/site-packages/scrapinghub/hubstorage/collectionsrt.py", line 152, in set
    return self._collections.set(self.coltype, self.colname, *args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/scrapinghub/hubstorage/collectionsrt.py", line 56, in set
    return self.apipost((_type, _name), is_idempotent=True, jl=_values)
  File "/usr/local/lib/python3.5/site-packages/scrapinghub/hubstorage/resourcetype.py", line 74, in apipost
    return self.apirequest(_path, method='POST', **kwargs)
  File "/usr/local/lib/python3.5/site-packages/scrapinghub/hubstorage/resourcetype.py", line 71, in apirequest
    return jldecode(self._iter_lines(_path, **kwargs))
  File "/usr/local/lib/python3.5/site-packages/scrapinghub/hubstorage/resourcetype.py", line 60, in _iter_lines
    r = self.client.request(**kwargs)
  File "/usr/local/lib/python3.5/site-packages/scrapinghub/hubstorage/client.py", line 107, in request
    return self.retrier.call(invoke_request)
  File "/usr/local/lib/python3.5/site-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/usr/local/lib/python3.5/site-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/local/lib/python3.5/site-packages/six.py", line 686, in reraise
    raise value
  File "/usr/local/lib/python3.5/site-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/usr/local/lib/python3.5/site-packages/scrapinghub/hubstorage/client.py", line 100, in invoke_request
    r.raise_for_status()
  File "/usr/local/lib/python3.5/site-packages/requests/models.py", line 844, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://storage.scrapinghub.com/collections/233792/s/nikoncoolpix

Same error is encountered when running spider in local environment. I would really appreciate any help.

System specifications:

  • OS Windows 10
  • Python 3.6.1.
  • Scrapy 1.4.0.

scrapy_price_monito (requirements_error)

I followed the tutorial to run the project "scrapy_price_monito" as the link:
(https://github.com/scrapinghub/sample-projects/tree/master/scrapy_price_monitor#installing-and-running)
When I go to "step 7 - hub deploy <your_project_id_here>"

The ScrapingHub throwed error as below:

##############START################

shub deploy 482071
Packing version 23fadd3-master
Deploying to Scrapy Cloud project "482071"
Deploy log last 30 lines:
Removing intermediate container 8a32420b04ae
---> bc794867e1a9
Step 12/12 : ENV PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
---> Running in f21786580c1f
Removing intermediate container f21786580c1f
---> 673538660e3a
Successfully built 673538660e3a
Successfully tagged i.scrapinghub.com/kumo_project/482071:1
Step 1/3 : FROM alpine:3.5
---> f80194ae2e0c
Step 2/3 : ADD kumo-entrypoint /kumo-entrypoint
---> Using cache
---> b6085fc56e21
Step 3/3 : RUN chmod +x /kumo-entrypoint
---> Using cache
---> 1bbe2a121e2b
Successfully built 1bbe2a121e2b
Successfully tagged kumo-entrypoint:latest
Entrypoint container is created successfully

Checking python dependencies
Collecting pip<20.0,>=9.0.3
Downloading https://files.pythonhosted.org/packages/00/b6/9cfa56b4081ad13874b0c6f96af8ce16cfbc1cb06bedf8e9164ce5551ec1/pip-19.3.1-py2.py3-none-any.whl (1.4MB)
Installing collected packages: pip
Successfully installed pip-19.3.1
requests 2.25.0 has requirement idna<3,>=2.5, but you have idna 2.1.
Warning: Pip checks failed, please fix the conflicts.
{"message": "Dependencies check exit code: 1", "details": "Pip checks failed, please fix the conflicts", "error": "requirements_error"}

{"status": "error", "message": "Requirements error"}
Deploy log location: /tmp/shub_deploy_3xl5dnaq.log

##############END################

Could you please help me to fix it?

Thanks alot!,

Harry

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.