Coder Social home page Coder Social logo

rerank_docs's Introduction

Rerank

The goal is to develop “telescoping” model, which here is Rochhio's Method aimed at improving the precision of results using pseudo-relevance feedback. We will use data collection and queries from TREC COVID track. We have used TF-IDF values as term weight, with many other optimizations and details stated in /algorithmic_details.pdf. For more details about the problem statement see /assignment2.pdf. For the obeservations look into report.pdf

Libraries Needed

  • BeautifulSoup4, install via pip install beautifulsoup4
  • Python 3.7 is recommended

Instructions

1)rocchio_rerank.sh [query-file] [top-100-file] [collection-dir] [output-file] to rerank the results where,

  • query-file : file containing the queries in the same xml form as the training queries released for example /covid19-topics.xml
  • top-100-file : a file containing the top100 documents in the same format as train and dev top100 files given, which need to be reranked for example /t40-top-100.txt
  • collection-dir: directory containing the full document collection. Specifically, it will have metadata.csv, a subdirectory named document parses which in turn contains subdirectories pdf json and pmc json. See link for more information.
  • output-file: is the name of the file in which you want your results.

2)python evaluate.py [ground-truth-file] [output-file] to evaluate the nDCG and MAP metric, where

  • ground-truth-file : file containting actual relevance of some documents for each query, for example in t40-qrels.txt
  • output-file : file containing results from previous command

rerank_docs's People

Contributors

agsidharth avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.