Coder Social home page Coder Social logo

zcrawl

CircleCI

zcrawl is an open source software platform to deploy and orchestrate web crawlers and crawling tasks in general. It's written in Go and one of the goals is to make it as flexible as possible to allow integrations with different languages and third-party services.

In order to avoid any language lock-ins, zcrawl will provide enough tools to enhance the process of creating and deploying a web crawler using your favorite language, so it's not Go specific.

We're still in the planning phase and the roadmap is subject to changes. A prototype is in progress and it's being developed as we want to test some of our ideas in a minimal way.

How to use it?

No instructions are provided at this time, if you're interested feel free to pull the code, build it and see what happens :)

Is it for me?

The project is targeted to users who want an easy way of deploying web crawlers, without messing up with crontab (in case you need to schedule recurrent crawling jobs), plain CSV files (in case you do this straight from the command line), multi-worker environments (when you need to orchestrate a distributed crawling task) and more complex pipelines that might be a combination of all these tasks.

Think about this as a Heroku-like solution where you can deploy text crawlers and orchestrate them to re-train your machine learning models with fresh data, everything in your own infrastructure. This is the type of scenarios we're interested in.

Roadmap

TBA

Contact

hello AT zcrawl DOT org

zcrawl's Projects

zcrawl icon zcrawl

An open source web crawling platform

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.