Coder Social home page Coder Social logo

go-crawler's Introduction

go-crawler Build Status Build Status

just an awesome crawler in go

configable - concurrency

Quick Glance

package main

import (
    "fmt"
    "github.com/ddo/go-crawler"
)

func main() {
    //counter, just for better log
    no := 0

    /*
        default limit:  10
        default client: timeout 10s
        default filter: http(s), no duplicated
        default scope:  http(s), no duplicated, same host only
    */
    c, err := crawler.New(&crawler.Config{
        Url: "http://facebook.com/",
    })

    //your url is invalid
    if err != nil {
        panic(err)
    }

    //url handler
    receiver_url := func(url string) {
        no++
        fmt.Println(no, "\t ", url)
    }

    //err handler
    receiver_err := func(err error) {
        fmt.Println("error\t", err)
    }

    //trigger
    c.Start(receiver_url, receiver_err)

    fmt.Println("done")
}

output

1     https://www.facebook.com/recover/initiate
2     http://facebook.com/legal/terms
3     http://facebook.com/about/privacy
4     http://facebook.com/help/cookies
5     http://facebook.com/pages/create/?ref_type=registration_form
6     https://vi-vn.facebook.com/
7     https://www.facebook.com/
8     https://zh-tw.facebook.com/
9     https://ko-kr.facebook.com/
10    https://ja-jp.facebook.com/
done

Todo

  • init with Filter
  • init with http.Client
  • crawler testing
  • travis-ci
  • coveralls.io
  • non utf-8 issue
  • init with Fetcher
  • mutex/chan limit/worker counter
  • delay
  • README advanced doc

go-crawler's People

Contributors

ddo avatar

Stargazers

Angus H. avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.