slord6 / crawler Goto Github PK
View Code? Open in Web Editor NEWA web crawler which stores found content to a database
License: MIT License
A web crawler which stores found content to a database
License: MIT License
Improved arg parsing would allow for more functionality and more understandable arguments for input, might be required for #2
May require #3 to have it as optional
The (crawl name)_candidates
file is overwritten if continuing a run using the same name and loading in the (crawl name)_frontier
file. To allow properly continuing we can load in pre-existing candidates file so that they are included rather than over-written
Enable running IStringComparisonScorer
s against a saved database
Could use NLP techniques to better compare crawled pages against provided corpus in a new IStringComparisonScorer
.
Related links:
https://github.com/zenogantner/MyMediaLite
https://github.com/nreco/recommender
NLP in Python video, also covers recommendations
https://en.wikipedia.org/wiki/Recommender_system#Content-based_filtering
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.