Coder Social home page Coder Social logo

Comments (10)

bratseth avatar bratseth commented on June 26, 2024 1

We didn't notice this was on cord-19 not Vespa. Reopening.
We can solve this by going back to sorting by raw timestamp, as this number isn't used for anything other than sorting anyway (afaik): vespa-engine/sample-apps#333

from cord-19.

tracy55 avatar tracy55 commented on June 26, 2024

Hi @johans1 @frodelu , was there are fix for this? I have the same issue of seeing future dated articles even though original publish date is earlier by several months, eg:
image
api timestamp from epoch - May 31, 2020
while article shows original publish date 3/13/2020
https://wwwnc.cdc.gov/eid/article/26/6/20-0320_article

from cord-19.

jobergum avatar jobergum commented on June 26, 2024

Hello @tracy55, We use the CORD-19 dataset and some of the publish_time dates are in the future. This is from the metadata.csv of the 2020-04-03 version for id 43463:

cord_uid                                                                stka064f
sha                                                                          NaN
source_x                                                                     WHO
title                          Case-Fatality Risk Estimates for COVID-19 Calc...
doi                                                       10.3201/eid2606.200320
pmcid                                                                        NaN
pubmed_id                                                                    NaN
license                                                                      unk
abstract                       We estimated the case-fatality risk for 2019 n...
publish_time                                                          2020-06-01
authors                        Wilson, Nick; Kvalsvig, Amanda; Barnard, Lucy ...
journal                                             Emerging Infectious Diseases
Microsoft Academic Paper ID                                                  NaN
WHO #Covidence                                                             #8521
has_pdf_parse                                                              False
has_pmc_xml_parse                                                          False
full_text_file                                                               NaN
url                                       https://doi.org/10.3201/eid2606.200320

i've asked the developers producing the CORD-19 dataset and you can see their response on this below

image

See https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/discussion/137474

from cord-19.

jobergum avatar jobergum commented on June 26, 2024

But the original issue is not fixed, the ordering of articles when using the sort by date option in the cord19.vespa.ai is still undefined for articles with a publish_time in the future as they are assigned the same freshness score of 1.

from cord-19.

tracy55 avatar tracy55 commented on June 26, 2024

I see, thank you @jobergum, much appreciated!

from cord-19.

jobergum avatar jobergum commented on June 26, 2024

Thank you for using the API @tracy55, you can use https://api.cord19.vespa.ai/search/?query=sars&summary=default&sorting=-timestamp if you want strict ordering fora articles with date in the future.

from cord-19.

jobergum avatar jobergum commented on June 26, 2024

Thanks @bratseth !

from cord-19.

tracy55 avatar tracy55 commented on June 26, 2024

@jobergum @bratseth Since there appears to have been some updates to this ticket, just want to check if ranking=freshness will still return future dated articles. Thanks!

from cord-19.

bratseth avatar bratseth commented on June 26, 2024

It will still return future dated articles, but they will now be sorted correctly.
If you want to limit to certain date ranges you can do that in a query.

from cord-19.

tracy55 avatar tracy55 commented on June 26, 2024

great, thank you @bratseth!

from cord-19.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.