My application requires basically only one spider, but I would like to run many instan

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Running many instances of one spider about crawly HOT 3 OPEN

serpent213 commented on June 2, 2024

Running many instances of one spider

from crawly.

Comments (3)

oltarasenko commented on June 2, 2024 1

Hey @serpent213,

Indeed, Crawly is built around the spider names, and there is no easy way to switch to something else right now.

However, it may be the case that you don't need it. Let me try to explain my points here:

Spider - is not a stateful process; it's just a set of callbacks. So there is no such thing as running 100 instances of the spider.
When we talk about the stateful part of it, it's all about Manager, RequestsStorage (scheduled requests), ItemsStorage (extracted items)

When I think about your case, as I understand it, you want a broad crawl that goes to multiple websites and extracts all the information from them. Probably there is some scheduler outside Crawly that just does something like:

Crawly.Engine.start_spider(BroadCrawler, start_urls: "google.com")
Crawly.Engine.start_spider(BroadCrawler, start_urls: "openai.com")

May it be the case, that you need an API to add extra requests to an already running spider?

from crawly.

serpent213 commented on June 2, 2024

When I think about your case, as I understand it, you want a broad crawl that goes to multiple websites and extracts all the information from them. Probably there is some scheduler outside Crawly that just does something like:
Crawly.Engine.start_spider(BroadCrawler, start_urls: "google.com")
Crawly.Engine.start_spider(BroadCrawler, start_urls: "openai.com")

Exactly, that was my first attempt. Thank you for the inspiration, will look into it!

from crawly.

Recommend Projects