Coder Social home page Coder Social logo

Comments (6)

johnaohara avatar johnaohara commented on July 21, 2024

@rvansa I discussed with @willr3 about having versioned tests, so that we captured a history of schema changes for a test. We still need to store the historic schemas to validate older test results, or to ensure any tooling that uses the schema still has access to the correct schema. We will need to implement versioning

from horreum.

willr3 avatar willr3 commented on July 21, 2024

@rvansa Do you an example of the schema evolution that was painful with PerfRepo?
@johnaohara When would we need to validate older runs? I think I am missing a use case. Are you talking about the test changes (e.g. specjEnterprise2010 v 2018) and we want to store it as a single specjEnterprise test definition allow runs from both benchmarks to be uploaded?

I think of the test's schema as a garbage filter preventing json uploads that do not contain the expected data. The json-schema does not prevent additional data and is only there to make sure the next upload contains the required information (someone could make a strict json-schema that rejects additional data but that would be limiting). Schema changes would set requirements for the next run upload but not alter previous uploads.
Previous runs are still accessible to the usual jsonb_query_* functions in psql so all runs can be queried and anything missing the data would just return an empty value (e.g. select id,start,stop,jsonb_path_query_first(data,'$.this.is.missing') from run where testid = 1)
I can imagine a scenario where data moves, e.g. dstat moves from $.benchserver4.dstat to $.dstat.benchserver4 and tracking old schemas would make it easier to discover when exploring a tests data without having to fetch all the runs for that test.
In this case, would it be better to have all the old json-schema versions or track a merged json structure that represents all the potential data in from runs for the test?

from horreum.

rvansa avatar rvansa commented on July 21, 2024

@willr3 The problem is that with PerfRepo all the metrics are set in the test definition; reportedly you couldn't add another metric to a test and let new runs use that as well, you had to create another test. And runs from different tests could not be compared.

I have a different view of schemas: while it works as a garbage filter, I see that as a single-URI string that describes the structure of the JSON. Schema should be limiting (additionalProperties: false everywhere) and describe the contents completely.
It does not limit the users of repo, except for the need to cover the changes in a new schema that would be marked as valid for that test.

from horreum.

willr3 avatar willr3 commented on July 21, 2024

Using a limiting schema (additionalProperties: false) would theoretically make the data in each run more reliable but would absolutely make it more difficult to use horreum. Each test creator has the option to define the schema how they see fit, why take away the option? Anyone who agrees with your view would add additionalProperites: false and anyone who was not ready to write a complete json-schema would not.

from horreum.

rvansa avatar rvansa commented on July 21, 2024

Sure, there's nothing forcing you to use additionalProperties: false. As you say, test creators have that option and I intend to keep it.
The 'should' above is just describing my view how things should be organized. And Horreum implementation should support such strict workflow (and it does in current implementation).

from horreum.

rvansa avatar rvansa commented on July 21, 2024

I am closing this for now since it has been implemented, feel free to comment further and/or reopen.

from horreum.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.