This project contains the data crawled from Polar Dataset in three categories and also a collection of programs for interpreting the crawlled data, enhancing filtering and duplication identification alogorithms for Nutch 1.10 and Tika 1.7
The categories of the crawlled data are: