Coder Social home page Coder Social logo

document-ranking's Introduction

About

This repository is our submission to Assignment-2 for the course Information Retrieval (CS F469) offered 2nd semester 2019-2020 at BITS Pilani, Pilani Campus.

It's basically a TF-IDF vector space model to rank documents wrt queries with some additional improvements - spelling correction on queries and bigram index to better answer phrasal queries.

Use

To create inverted-index and other data structures, run python3 util.py

  1. Enter path to corpus file (example wiki_02 file above)
  2. For part-1 and part-2, improvement1 (spelling correction) same index is used so enter 1
  3. For part-2, improvement2 (phrasal queries via bigram index) new index is to be created so enter 2
  4. All the files are stored in the current directory.
  5. For option 1, files stored are - inv_index.pkl, doc_lengths.pkl, doc_id_2_title.pkl
  6. For option 2, files stored are - inv_index.pkl, doc_lengths.pkl, doc_id_2_title.pkl, doc_bi_lengths.pkl
  7. Notice the name of the files are same in both cases.

To query the index, run python3 test_queries.py

  1. Enter the query
  2. To query against original index, enter 1 (should have all files with above names in the current directory)
  3. To query against original index with spelling correction (improvement1), enter 2 (again should have files)
  4. To query against combined index, enter 3 (should have all files from construction code option 2)

Note

  • In the test_queries.py file, the names of the files to be loaded are specified in load_files() function.
  • The structure of corpus file is:
<doc>...</doc>
<doc>...</doc>
...
<doc>...</doc>

document-ranking's People

Contributors

kumar-tarun avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.