A simple tool for Data Collection from the Web
This code mimics the Google search engine by accepting user inputs regarding search terms, no of urls to be included in the search output as well as a filename.
The output saved in the current directory comprises: a textfile consisting of all the pdf urls from the search result, another textfile listing the html urls and a consolidated csv file with all the urls from the search. This separation helps with any content extraction exercise further on.