Comments (7)
Hi, just run it on Amazon AWS or any other cloud service.
Cheers,
from scrapybook.
You got me wrong. I am already using my own vps. But I don't want to use the terminal all the time to setup virtualenv, crontabs etc. This is quite messy, especially if you have to install and manage a lot of scrapers. So I am looking for a nice gui to install, manage, configure and monitor my scrapers. A self hosted scrapinghub would be perfect, but I was not able to find such a tool.
from scrapybook.
Hi, did you tried scrapyd ?
It comes with a webinterface
https://scrapyd.readthedocs.io/en/latest/overview.html#web-interface
I run a fair amount of spiders, and I scripted the deployment of them in Ansible, I only need to run 1 command and it's done.
cheers
from scrapybook.
@inkrement - thank you so much! I'm so glad you like the book :)
One thing I would recommend is talking directly to @pablohoffman. Scrapinghub might be able to provide you with a licence, code or just the right direction to have exactly the system you need.
install, manage, configure and monitor my scrapers
All but the monitor on this list are actually very close to what scrapyd (as @yssoe says) and/or generic infrastructure tools like chef, vagrant or docker provide (relevant tools: 1, 2, 3). For monitoring, indeed, I'm not aware of something strong. The section named "Creating our custom monitoring command" in Chapter 11 gives some clues on how easy it is to implement such functionality. It's all REST + JSON and it should be easy and cost effective to contract someone in upwork to develop something that would exactly fit your needs and potentially opensource it as well. There is indeed a gap.
from scrapybook.
Hi @inkrement, we have no plans to provide a self-hosted version of Scrapinghub simply because it's too much work to maintain a separate appliance version of our platform (we're a small team!) and we've yet to find: 1. a customer our infrastructure can't accommodate and 2. a customer that is willing to sponsor its development (we're talking north of a couple hundred grand)
I'm curious to understand what your concerns are in regards to running your spiders in Scrapinghub. Would you have the same concerns regarding, say, hosting your web app in Heroku or your code in Github?. Thanks in advance for your insights!
from scrapybook.
@yssoe Thanks for your input. Scrapyd looks very promising, I'll take a look at it!
@lookfwd Oh, nice - I skipped that chapter back then, but I will read it. Maybe I will code something too, I studied Software Engineering, so this should not be the problem, but I hoped that there are already some existing tools.
@pablohoffman I have no concerns and I would love to use scrapinghub, but I work for a university and we have our own servers. If I am paying for external infrastructure or services I have to argue why I am not using our own hardware and that's the only reason against it. It's not easy to do that especially because usability is not really a good reason for them.
from scrapybook.
@inkrement thanks for clarifying, would love to continue the chat offline. you can reach me at pablo in scrapinghub.com
from scrapybook.
Related Issues (20)
- vagrant up error HOT 1
- can't access http://scrapybook.s3.amazonaws.com/properties/ 403 forbidden HOT 1
- there is an Syntax Error on page 16
- is it because of socks5?
- seems that I have the same problem: HOT 1
- install panda
- OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to d1kby793vfk4bq.cloudfront.net:443 HOT 1
- Virtual machine has terminated unexpectedly during startup with exit code 1 (0x1) HOT 1
- Can't deploy 'properties' spider to scrapinghub.com from the docker container (chapter 6)
- Vagrant up --no-parallel You are trying to forward a host IP that does not exist. HOT 1
- package 'openssh-server' is not installed
- The problem of setting up the environment HOT 1
- Cloning into 'algo-cs503'... fatal: unable to access 'https://github.com/saqibutm/algo-cs503.git/': error setting certificate verify locations: CAfile: D:/4th semster/ds/Git/mingw64/ssl/certs/ca-bundle.crt CApath: none this is the issue can plzz solve the issue
- Vagrant Setup - Resolving port conflicts on Mac HOT 1
- !!
- can't visit http://web:9312/ HOT 1
- how to connect local github with github id
- VAGRANT UP ERROR 2022
- vagrant up --no-parallel command not working
- vagrant up --no-parallel command not working HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapybook.