Coder Social home page Coder Social logo

URL invalid for some datasets: about crossfit HOT 6 CLOSED

shmsw25 avatar shmsw25 commented on June 12, 2024 1
URL invalid for some datasets:

from crossfit.

Comments (6)

cherry979988 avatar cherry979988 commented on June 12, 2024

Thank you for raising this! Bill (@yuchenlin) and I will try to find some workaround.

from crossfit.

cherry979988 avatar cherry979988 commented on June 12, 2024

Hi Sewon,

I'm trying to reproduce this issue but my scripts are working as expected. Could you please provide some extra information for us? Thank you.

  1. What are the error messages you're getting?
  2. Could you double-check if your huggingface dataset has version 1.4.0 and could you please try the scripts again after clearing the cache?

Attaching my logs for reference.
Screen Shot 2021-08-23 at 11 51 56 AM

from crossfit.

shmsw25 avatar shmsw25 commented on June 12, 2024

Hi @cherry979988, thank you for your help. Yes, I double-checked that the HF datasets version is 1.4.0, and the error is keep occurring after clearing the cache. Error messages are saved here.

P.S. I think if you have downloaded the data once, the data is saved as a cache. Perhaps that is why you were not able to reproduce the error?

from crossfit.

cherry979988 avatar cherry979988 commented on June 12, 2024

Hi @shmsw25

Thank you for providing the logs. I am able to reproduce the errors.
a

My guess is that the dataset owners updated their files, and the checksums in HF datasets is not yet updated, so we're getting this checksum error.

A temporary solution will be using ignore_verifications=True when loading datasets (e.g., dataset = load_dataset("kilt_tasks", "wow", ignore_verifications=True)). However, this will probably leads to differences in few-shot sampling. I'll discuss with Bill and see if there is a better solution...

from crossfit.

shmsw25 avatar shmsw25 commented on June 12, 2024

Got it, thank you for taking a look at this!

from crossfit.

slyviacassell avatar slyviacassell commented on June 12, 2024

@cherry979988 Would you mind sharing your cache of the following for the unavailable network?

  • jeopardy
  • kilt_wow
  • definite_pronoun_resolution
  • wiki_auto

from crossfit.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.