Coder Social home page Coder Social logo

Comments (19)

jchodera avatar jchodera commented on May 16, 2024

Oh noooo.

We've had this happen before, and I thought we had impressed upon NIST TRC the importance of engaging users through a gradual community process about major changes.

@mrshirts Can you put us in touch to sort out what can be done here?

from openff-evaluator.

mrshirts avatar mrshirts commented on May 16, 2024

I'd be happy to - @mattwthompson or @SimonBoothroyd could you write up a sentence or two with the exact details for me to send to them so we can figure this out? In the meantime, I think we're mostly using a local copy, correct?

from openff-evaluator.

mattwthompson avatar mattwthompson commented on May 16, 2024

Sorry, I only have a surface-level knowledge here. Simon (or John, or somebody else who has used it before) would be better suited to provide direction.

from openff-evaluator.

ocmadin avatar ocmadin commented on May 16, 2024

@mrshirts The issue seems to be that our entry point where we access/download the ThermoML tarballs has been removed and changed. It used to be an individual .tgz for each of the journals, for example: https://trc.nist.gov/ThermoML/JCED.tgz . Now there is a single tarball at a different URL (https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz, landing page: https://data.nist.gov/od/id/mds2-2422). I'm not sure if any of the data has changed (my assumption would be no), but looking at the landing page it looks like they added .json files so it's possible there were other changes.

Here's an example traceback of how it's failing:

Traceback (most recent call last):
  File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/curate_boron_phosphorus_silicon_data.py", line 614, in <module>
    main()
  File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/curate_boron_phosphorus_silicon_data.py", line 609, in main
    initial_data = prepare_initial_data()
  File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/curate_boron_phosphorus_silicon_data.py", line 57, in prepare_initial_data
    initial_data = CurationWorkflow.apply(
  File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/openff/evaluator/datasets/curation/workflow.py", line 112, in apply
    data_frame = component_class.apply(
  File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/components.py", line 90, in apply
    modified_data_frame = cls._apply(data_frame, schema, n_processes)
  File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/thermoml.py", line 124, in _apply
    cls._download_data(schema)
  File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/thermoml.py", line 71, in _download_data
    request.raise_for_status()
  File "/home/owenmadin/anaconda3/envs/binary-mixture-publication/lib/python3.9/site-packages/requests/models.py", line 953, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://trc.nist.gov/ThermoML/JCED.tgz

So essentially I think we'd need to change the place where we're getting the tarballs from, but it will also probably break some data collation tools that expect a series of tarballs rather than just one.

from openff-evaluator.

ocmadin avatar ocmadin commented on May 16, 2024

@mattwthompson Let me know how I can help out with fixing this (I am probably the main user of this tool currently)

from openff-evaluator.

mattwthompson avatar mattwthompson commented on May 16, 2024

Is this still broken? I forget if this has been resolved on another platform

from openff-evaluator.

ocmadin avatar ocmadin commented on May 16, 2024

#402

Looks like it has been resolved.

from openff-evaluator.

ocmadin avatar ocmadin commented on May 16, 2024

Unfortunately this is broken again, this time on NIST's end. It looks like there's an issue with their tarball. I get this message trying to download with evaluator:

Traceback (most recent call last):
  File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/vapor_pressure_search.py", line 93, in <module>
    main()
  File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/vapor_pressure_search.py", line 88, in main
    initial_data = prepare_initial_data()
  File "/home/owenmadin/Documents/python/binary-mixture-publication/data-set-curation/vapor_pressure_search.py", line 57, in prepare_initial_data
    initial_data = CurationWorkflow.apply(
  File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/openff/evaluator/datasets/curation/workflow.py", line 112, in apply
    data_frame = component_class.apply(
  File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/components.py", line 90, in apply
    modified_data_frame = cls._apply(data_frame, schema, n_processes)
  File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/thermoml.py", line 113, in _apply
    cls._download_data(schema)
  File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/openff/evaluator/datasets/curation/components/thermoml.py", line 60, in _download_data
    request.raise_for_status()
  File "/home/owenmadin/anaconda3/envs/openff-force-fields/lib/python3.9/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error:  for url: https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz
{
  "requestURL" : "/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz",
  "method" : "GET",
  "status" : 500,
  "message" : "Unexpected Server Error"
}

Process finished with exit code 1

And if I try to download manually through their download manager I get the same thing:

Information about requested bundle/package is given below.

 Following files are not included in the bundle because of errors: 

 https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz?requestId=5c75c307-4328-46dd-baf2-068675b89c47 There is an Error accessing this file, Server returned status with response code  500 and message:There is an error accessing this file/URL from server.

@mrshirts can you contact someone at NIST to figure out why this is happening?

from openff-evaluator.

mrshirts avatar mrshirts commented on May 16, 2024

So, this link seems to work for me now using a manual download - can you check if that works for you, and if it might be transient?

https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz

from openff-evaluator.

ocmadin avatar ocmadin commented on May 16, 2024

I'm still unable to download manually, on either RHEL or Ubuntu.

from openff-evaluator.

mrshirts avatar mrshirts commented on May 16, 2024

Are other NIST downloads down, or just this one?

from openff-evaluator.

mrshirts avatar mrshirts commented on May 16, 2024

It was working manually for me for a couple min, but now is not.

from openff-evaluator.

ocmadin avatar ocmadin commented on May 16, 2024

I tried to download something else from the NIST website and it also failed. Maybe their servers are just struggling today?

from openff-evaluator.

mrshirts avatar mrshirts commented on May 16, 2024

Yeah, sounds like an overall NIST problem.

from openff-evaluator.

ocmadin avatar ocmadin commented on May 16, 2024

It would be good to have a "load from local tarball" option in evaluator.datasets.curation.thermoML.ImportThermoMLData in the case this happens in the future.

from openff-evaluator.

mrshirts avatar mrshirts commented on May 16, 2024

It would be good to have a "load from local tarball" option

Good idea, file an issue?

from openff-evaluator.

mrshirts avatar mrshirts commented on May 16, 2024

Email from Damien Riccardi at NIST:

"I added a few links to the web app and data.nist.gov page this morning, and, before reaching out here, I reviewed your issue linked below. It appeared as though Thermoml issues on the Open FF end were resolved until the data.nist.gov download link to the .tgz file broke (as of yesterday). An email has been sent to admins of data.nist.gov and I hope it will be fixed soon. In clicking through the related openff thermoml issues I noticed the annoyance with historical movement in the data resource. The https://data.nist.gov/od/ds/mds2-2422/ThermoML.v2020-09-30.tgz file should now (technical difficulties with data.nist.gov servers aside) never change or be deleted."

Also, the ThermoML had a software note in JCC: https://onlinelibrary.wiley.com/share/author/WKPMRWMYRCFW79RXEQPW?target=10.1002/jcc.26842

from openff-evaluator.

mattwthompson avatar mattwthompson commented on May 16, 2024

@ocmadin posted this in Slack; I don't have time to look at it now but this might be a path forward:

https://onlinelibrary.wiley.com/doi/epdf/10.1002/jcc.26842 [...] TL;DR, don't think we need to change anything, but they are now offering a RESTful API to access the data which may be useful in the future.

from openff-evaluator.

GregorySchwing avatar GregorySchwing commented on May 16, 2024

working url
https://nist-oar-cache.s3.amazonaws.com/prd/gen0/mds2-2422/ThermoML.v2020-09-30.tgz

from openff-evaluator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.