A web crawler in Go.
evilscott / gocrawler Goto Github PK
View Code? Open in Web Editor NEWWeb crawler in Go
Web crawler in Go
Add command line flags for quiet/verbose modes
shrink the todos
buffer size to 100 by default and expose the size to configuration options for flexibility
Need examples via go, and documentation in industry standard go format (markdown as well?)
Don't crawl nofollow
and obey robots.txt
Follow redirects (default) but don't crawl if landing outside of domain. Option to not follow redirects or only follow a few levels.
results are non-deterministic. possible race condition.
Add a custom user-agent (and allow custom setting via CLI)
Code is getting ugly; last step before MVP should be refactoring code for readability and maintainability.
Output should be JSON, with the option to output to file instead of stdout.
Collect data on links like internal/external, anchor, query string, etc
non 4xx/5xxs need to be surfaced, especially if the initial URL comes back with one.
File configuration should be available (either checking a default path or passing in via command line) since the number of configuration options is growing.
Create an integration test by spinning up a test server and hitting it with the main crawler.
go standard is tabs. use them and then keep with gofmt
Via command line flag. This will need to be in place to support robots.txt Crawl-delay
directive anyway
Need to add wg.Done()
to continue
calls to avoid deadlock on error during crawl
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.