Coder Social home page Coder Social logo

invertedindexcreator's Introduction

InvertedIndexCreator

See Introduction.pdf for more details.

Overview:

This generator has two main functions: generating inverted index list using Common Crawl data set and executing query search using inverted index list. The development environment is OS X or Linux

Instructions:

Pre-requisites: Java 7, mongoDB, node, maven. MongoDB should be started before running this program.

First, go to directory of the package, build the project using Maven, in terminal:

cd path/to/your/InvertedIndexCreator
mvn install

Then, prepare the dataset using shell “initialize.sh”, which will download and decompress the common crawl data automatically. In terminal:

mvn exec:java

Input E, and after it finishes, you will see the common crawl data in the “input” directory.

To start generating inverted index list, again, in terminal:

mvn exec:java

Type A It will take some time to generate this index list. We will talk about run speed details in other section. After it finishes, you will see three files: lexicon.txt, pageUrlTable.txt and invertedIndexList.txt. Here we come to the query processing part. In terminal:

mvn exec:java

Type B

Now it’s time to query, go to “frontEnd” directory, in terminal:

cd frontEnd
npm install
npm run dev

Type in the words you would like to search, the rule is that “&” represents conjunctive search and “@” means disjunctive search. Click search button or press enter.

You can also just type http://localhost:8888/query?dog&cat&bird in your browser to get the result directly.

invertedindexcreator's People

Contributors

sjtuchris avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.