gabrielcc2 / ir14assignment2 Goto Github PK
View Code? Open in Web Editor NEWDesktop aplication that can crawl the Web for code snippets, beggining with a set of seed URLs and complying mostly with the robots.txt standards. It uses Lucene to index the webpages and support queries over the crawled data. Different indexes can be created, loaded + stored, allowing for specialized searches. This was developed as a mini-project from an Information Retrieval course, WiSe2014-2015@OvGU.
License: Apache License 2.0