Coder Social home page Coder Social logo

foteinipapadopoulou / cord-19-ir-project Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 327.74 MB

Exploring First-Stage IR Approaches on the CORD-19 dataset

Jupyter Notebook 52.23% Dockerfile 3.58% Python 44.19%
cord-19-dataset information-retrieval master-project

cord-19-ir-project's Introduction

Exploring First-Stage IR Approaches on the CORD-19 dataset

Leonidas Kaldanis, Foteini Papadopoulou, Büsra Yilmaz
Radboud University, Nijmegen, Netherlands

This project is part of the Information Retrieval master course for AI and Data Science.

Description

Our project is motivated by the Information Retrieval community’s interest in aiding healthcare professionals and focuses on building a first-stage retrieval system using the CORD-19 dataset. We apply traditional retrieval models(TF-IDF, BM25, Language Model with Dirichlet smoothing) and a neural IR model (Deep Impact) to experiment with different variants of topics. Additionally, we explore the Doc2Query— approach during indexing, measuring its impact on the evaluation metrics and query search runtime. The results revealed that the traditional retrieval models using the standard indexing remain competitive in the TREC-COVID challenge, showing almost the same performance as the advanced neural approaches in evaluation metrics and execution time, with the TF-IDF outperforming. Moreover, the findings suggest that the choice of query variants plays a crucial role, with the description being the best choice in this context.

Directories

  • avg_query_time_csv_img contains the generated images and the csv file for the calculated average query execution time
  • code/python contains the implementation in plain python.
    • indexing.py and retrieval.py files have the main code used in our experiments.
  • code/notebook contains the implementation in jupyter notebook
  • docker_images contains the docker images needed to run the experiments
  • indexes contains all the generated indexes from the experiments
  • retrieval_query_time_logs contains all the logs from all the retreival methods so as to extract the average execution time

Run it locally

In order to run it locally, a docker container is needed with the libraries installed. Run the following command in the docker-images folder in order to build and run the container docker-compose up --build -d If you want to access the container:

docker ps #get the container id
docker exec -it <container-id> bash 

cord-19-ir-project's People

Contributors

foteinipapadopoulou avatar leokaldanis avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.