Verification <input t

Thanks for opening this <a class="user-mention notranslate" data-hovercard-type="user"

Only store the Ruby cask file <p dir="auto"

Good investigation, thanks <a class="user-mention notranslate" data-hovercard-type="us

How should we store installed cask information?,about homebrew/brew

Comments (18)

apainintheneck commented on May 23, 2024

@Homebrew/cask It'd be great to hear your opinion on this topic.

from brew.

MikeMcQuaid commented on May 23, 2024

Thanks for opening this @apainintheneck! ❤️

Only store the Ruby cask file

This would make things much simpler on our end but would require an extra network request to fetch the Ruby cask file every time a new cask gets installed. We already do this for casks with flight or language blocks.

This feels like we lose a lot of benefits of the API and make offline installation impossible.

Only store the JSON cask file

This would require figuring out how to serialize the fight and language blocks in the cask DSL. That's why we currently download the Ruby cask file before installing casks with those blocks. They cannot be expressed in the current iteration of the cask API JSON.

This feels like the best option to me and worth tackling in JSON v3.

Create a new format to represent an installed cask

This might actually end up being more code to support but would be more consistent because we'd be able to specify the minimum amount of information necessary to work with an installed cask.

If this format is a e.g. superset/subset of JSON v3 this seems desirable. Otherwise, it feels like it will have all of the disadvantages of one or both of the above.

from brew.

bevanjkay commented on May 23, 2024

Great discussion. I think there a benefits to storing it as a JSON file because it allows us to store additional information, such as the tap that the cask was installed from.
The cask "install receipt" currently only stores the dir settings that were set at the time of installation (to be able to remove the files from the correct location on uninstall).

There's inconsistency between what information is available between the ruby source, and the JSON. Apart from the flight blocks, the JSON should give us more usable information.

from brew.

apainintheneck commented on May 23, 2024

Only store the Ruby cask file

This would make things much simpler on our end but would require an extra network request to fetch the Ruby cask file every time a new cask gets installed. We already do this for casks with flight or language blocks.

This feels like we lose a lot of benefits of the API and make offline installation impossible.

The only downsides I know of are the extra network request and the fact that the tap info is not included in the Ruby cask file.

Only store the JSON cask file

This would require figuring out how to serialize the fight and language blocks in the cask DSL. That's why we currently download the Ruby cask file before installing casks with those blocks. They cannot be expressed in the current iteration of the cask API JSON.

This feels like the best option to me and worth tackling in JSON v3.

@Rylan12 worked a bit on this over a year ago before we decided to to download the source files for casks with flight or language blocks. The approach at the time was to store the flight blocks as strings in the API JSON and then evaluate that code when loading the cask.

Loader code: #14304
- Changed in final commit since we decided not to use that approach
Removed in: #14611

We never really attempted to load the language tabs from the API. I'll have to look into that a bit more. I think the firefox cask is the most complex example.

from brew.

apainintheneck commented on May 23, 2024

Great discussion. I think there a benefits to storing it as a JSON file because it allows us to store additional information, such as the tap that the cask was installed from. The cask "install receipt" currently only stores the dir settings that were set at the time of installation (to be able to remove the files from the correct location on uninstall).

There's inconsistency between what information is available between the ruby source, and the JSON. Apart from the flight blocks, the JSON should give us more usable information.

Good point! We currently store the tap when installing from the API since the JSON cask already includes that information. We were also discussing whether or not it should be included when installing with internal JSON v3. Are there any other bits of information that would be good to store during the installation process?

from brew.

MikeMcQuaid commented on May 23, 2024

The approach at the time was to store the flight blocks as strings in the API JSON and then evaluate that code when loading the cask.

Yeh. I don't love it but: don't really see any alternatives, sadly.

We never really attempted to load the language tabs from the API. I'll have to look into that a bit more. I think the firefox cask is the most complex example.

Thanks ❤️

We currently store the tap when installing from the API since the JSON cask already includes that information. We were also discussing whether or not it should be included when installing with internal JSON v3.

Note: I think the JSON should be structured such that we're able to use the same schema (or some sort of schema inheritance/inclusion if that's possible?) validation eventually for the public API schema, internal API schema and JSON that's written to disk.

from brew.

apainintheneck commented on May 23, 2024

So the method_source gem only stores the code for the block/proc itself but not any surrounding variables that it closes over. This should be fine for internal use with the main cask repo but could cause problems for third-party taps. We could try some workarounds where we capture all local variables when defining the proc with a custom proc class but that would up the difficulty dramatically.

There have been some gems that have tried to solve this sort of problem in the past but it's not obvious to me that any of them were ever production-ready and all of them have been abandoned for about a decade. We could potentially use them for inspiration but I don't think it's worth it.

from brew.

MikeMcQuaid commented on May 23, 2024

Good investigation, thanks @apainintheneck!

This should be fine for internal use with the main cask repo but could cause problems for third-party taps.

We should optimise for our own use cases here and we can fix up issues with 3rd party taps and e.g. provide a longer deprecation cycle.

from brew.

apainintheneck commented on May 23, 2024

This should be fine for internal use with the main cask repo but could cause problems for third-party taps.

We should optimise for our own use cases here and we can fix up issues with 3rd party taps and e.g. provide a longer deprecation cycle.

I don't think this problem will just go away if we tell people not to do it and have a longer deprecation cycle. Inevitably it will be misused because valid Ruby syntax just cannot be described in JSON assuming we don't want to get rid of the Ruby DSL entirely. It also means that subtle errors could be introduced that wouldn't be clear until trying to load an installed cask from JSON.

It seems like a mistake to choose a flawed option which is trying to serialize and deserialize arbitrary blocks of Ruby code as strings in JSON when the option of just evaluating Ruby code like we have been for the life of the Homebrew project is on the table. Doing this research has reminded me why I was against the Ruby strings in JSON approach from the beginning.

from brew.

MikeMcQuaid commented on May 23, 2024

I don't think this problem will just go away if we tell people not to do it and have a longer deprecation cycle. Inevitably it will be misused because valid Ruby syntax just cannot be described in JSON assuming we don't want to get rid of the Ruby DSL entirely. It also means that subtle errors could be introduced that wouldn't be clear until trying to load an installed cask from JSON.

This is something we can audit for, adjust the way we evaluate the DSL (i.e. what scope we use and what variables are captured), warn/deprecate/error for at installation time, etc.

deserialize arbitrary blocks of Ruby code as strings in JSON when the option of just evaluating Ruby code like we have been for the life of the Homebrew project is on the table.

The logical conclusion of this argument is we shouldn't have used the JSON API for casks at all.

Here's the requirements we have here:

whatever we introduce: it must be the same data written to the Caskroom whether installed from API or tap and whether a core or third-party cask
it must be possible to do offline cask installation such that brew fetch <cask> and then brew install <cask> works with no internet connection
if we're writing JSON to disk: it should be a subset of a public (or what will eventually aim to become public) JSON API
we have had many problems with deprecating DSLs in casks causing issues on uninstall. whatever the solution is here: it should not make the problem worse and should ideally make it better.

and, as it relates to the API:

any internal v3 API should be designed to be similar to and evolve into a public v3 API
human readability trumps uncompressed or compressed file size
a JSON schema should be eventually possible

Hopefully that helps!

from brew.

apainintheneck commented on May 23, 2024

This is something we can audit for, adjust the way we evaluate the DSL (i.e. what scope we use and what variables are captured), warn/deprecate/error for at installation time, etc.

I didn't know that it was possible to reduce the scope of closures in Ruby. Can you point me to some references on that because I've been unable to find anything with a quick search?

The logical conclusion of this argument is we shouldn't have used the JSON API for casks at all.

There were problems initially for this very reason but they got ironed out with time. What I'm saying is that we can't generalize what we've made work reasonably well with the core cask repos to third-party repos.

Here's the requirements we have here:

whatever we introduce: it must be the same data written to the Caskroom whether installed from API or tap and whether a core or third-party cask

I disagree. It's not obvious to me that this requirement will result in a more maintainable codebase or a better experience for users.

it must be possible to do offline cask installation such that brew fetch <cask> and then brew install <cask> works with no internet connection

Good point, I hadn't thought of that but it seems ideal and achievable.

if we're writing JSON to disk: it should be a subset of a public (or what will eventually aim to become public) JSON API

This seems ideal but there might be a few fields in the internal JSON that are really no useful or relevant for the publish JSON. I agree in general though.

we have had many problems with deprecating DSLs in casks causing issues on uninstall. whatever the solution is here: it should not make the problem worse and should ideally make it better.

Could you point me to some threads or information about this problem? It's not obvious to me how pervasive this is or when it was most acute.

and, as it relates to the API:

any internal v3 API should be designed to be similar to and evolve into a public v3 API

👍

human readability trumps uncompressed or compressed file size

50/50 on this one.

a JSON schema should be eventually possible

I think the question is how painful would it be to have a JSON schema for all casks and will it be worth that effort.

from brew.

Bo98 commented on May 23, 2024

Ruby within JSON sucks and there's a reason we have already tried that before and declared it not a good idea.

With formulae, there's already a two-level system:

JSON API is used where possible
If postinstall_defined is declared then it will load the Ruby file from the bottle.

Formulae store the Ruby file always in the install directory (though not usually necessary to use given there's no postuninstall).

With casks, it's actually quite similar except it's two downloads rather than one since we don't repackage:

JSON API is used where possible
If a pre/postflight is declared then it will download and use the Ruby file

The reason casks don't always store the Ruby file because we don't always fetch the Ruby file.

However, in the end we will always support installing from third-party taps and they use Ruby files, so we will always support both JSON and Ruby install scenarios, unless the plan was to add extra code to generate JSON on-the-fly during install?

Ruby files are downloaded when pre/postflights are used, so we should already know when a Ruby file is needed and when it isn't for uninstall.

Remember that reading the installed-cask information is already quite simple: it just loads it through the CaskLoader. If we're introducing yet another format here or are starting to transform formats from Ruby->JSON or vice versa that is not a simplification, that's extra code and we need to be aware of that and its extra maintenance.

Critically: installation information should not be broken ever, and that includes whatever is already there right now. So if that install information switches from JSON API v2 to v3, then that means supporting both v2 and v3 indefinitely unless we do some similarly indefinite migration behaviour. So while I agree that we should do better here, please don't expect it to simplify anything as a lot of the existing code will probably stay.

Ultimately, when it comes down to it: uninstall_pre/postflight kinda sucks. Is there anything super complex there that cannot ever be represented by a DSL? If we ban uninstall_pre/postflight then I think we don't need to store any Ruby?

That would be the ideal option here as Ruby can be volatile and more at risk of breaking in the future.

from brew.

MikeMcQuaid commented on May 23, 2024

I didn't know that it was possible to reduce the scope of closures in Ruby.

@apainintheneck Blocks can be evaluated with instance_eval. You can decide which object you evaluate it on.

What I'm saying is that we can't generalize what we've made work reasonably well with the core cask repos to third-party repos.

We don't need to. These repos always install from Cask Ruby files in taps rather than a JSON API. The way these repos do things can be audited, deprecated, disabled and removed.

Here's the requirements we have here:

whatever we introduce: it must be the same data written to the Caskroom whether installed from API or tap and whether a core or third-party cask

I disagree. It's not obvious to me that this requirement will result in a more maintainable codebase or a better experience for users.
...

human readability trumps uncompressed or compressed file size

50/50 on this one.

I am the technical lead on this project (https://docs.brew.sh/Homebrew-Governance#6-project-leader) and I am stating that these are a hard requirements. I have explained my reasoning here enough times. Let's continue discussion on other points, please.

Could you point me to some threads or information about this problem? It's not obvious to me how pervasive this is or when it was most acute.

Please use GitHub's search to find these or just trust me.

a JSON schema should be eventually possible

I think the question is how painful would it be to have a JSON schema for all casks and will it be worth that effort.

You do not have to implement this schema. Others have expressed a desire to do so. I am stating that others want to do this so it's important to not prevent it.

from brew.

MikeMcQuaid commented on May 23, 2024

However, in the end we will always support installing from third-party taps and they use Ruby files, so we will always support both JSON and Ruby install scenarios, unless the plan was to add extra code to generate JSON on-the-fly during install?

@Bo98 Yes, but this does not mean that we should vary the file that is written to the Caskroom depending on the installation method here (which is my main argument).

Having an INSTALL_RECEIPT.json for formulae that's written sometimes and not others is not useful for users.

Critically: installation information should not be broken ever, and that includes whatever is already there right now.

Firstly, "not be broken ever" is not a realistic API contract to have as an open source project.

Secondly, we have broken this many times before on purpose and by accident and we will likely break this again in future.

So if that install information switches from JSON API v2 to v3, then that means supporting both v2 and v3 indefinitely unless we do some similarly indefinite migration behaviour.

This doesn't make a lot of sense to me because: we're not proposing a switch from JSON API v2 to v3, I'm proposing a switch from "sometimes Ruby, sometimes JSON (which is not the same as JSON v3) to "the same data is consistently written regardless of the way the cask was installed" and "we want this to somewhat follow the new v3 API design" and "we should have a longer-than-usual deprecation/migration window for handling this".

I am game for e.g. Ruby files to be written only when needed (and, note, they will break in future when we make deprecations) but a JSON file is written every single time (like the INSTALL_RECEIPT.json for formulae, perhaps even with a similar name/format to that for formulae).

Ultimately, when it comes down to it: uninstall_pre/postflight kinda sucks. Is there anything super complex there that cannot ever be represented by a DSL? If we ban uninstall_pre/postflight then I think we don't need to store any Ruby?

This is a really great idea and the direction I would like things to go in 👍🏻

from brew.

Bo98 commented on May 23, 2024

Firstly, "not be broken ever" is not a realistic API contract to have as an open source project.

We've never (to my knowledge) intentionally broken brew uninstall in formulae so breaking brew uninstall on casks sounds like really bad user experience to be treating as normal/expected behaviour and I don't think we should be accepting of that.

(To be clear: I'm not saying the current formats are great for that - but the next format really should be)

This is a really great idea and the direction I would like things to go in

Cool - we can likely eliminate large parts of this problem if we can do that.

from brew.

apainintheneck commented on May 23, 2024

I'll add another vote in favor of flight DSLs. I also think that any additional cask metadata that we want to include at install time could probably be added separately from the discussion of cask file formats and it would likely be pretty uncontroversial. I don't see any reason why for instance tap couldn't get added to the config.json metadata file or something similar.

from brew.

Bo98 commented on May 23, 2024

Yes there doesn't need to be any overlap at all between the formats and I suggest making them separate, like how the very-stable formula tab format is. We just need to store any interesting metadata + the uninstall instructions - we don't need to store preinstall data or download URLs etc.

Flight Ruby code is OK at install-time but is not really OK for an uninstall stage executed potentially many months/years in the future.

from brew.

MikeMcQuaid commented on May 23, 2024

We've never (to my knowledge) intentionally broken brew uninstall in formulae so breaking brew uninstall on casks sounds like really bad user experience to be treating as normal/expected behaviour and I don't think we should be accepting of that.

I think it's worth clarifying what "intentionally broken" means here. It means: after a long period of deprecation/warnings/perhaps migration: we will not use the uninstall hooks from the Cask JSON/Ruby file in the Caskroom but instead use those from the Cask file, if available. The absolute worst case scenario here is you have software you installed years ago, haven't upgraded it at all since, ignored all the warnings and then when you brew uninstall it it does not remove all the files you might expect it to.

(To be clear: I'm not saying the current formats are great for that - but the next format really should be)
Cool - we can likely eliminate large parts of this problem if we can do that.
Flight Ruby code is OK at install-time but is not really OK for an uninstall stage executed potentially many months/years in the future.

I agree very strongly here with all of this. Running arbitrary Ruby with context to a Caskfile/Homebrew internals is terrible for long-term backwards compatibility.

My desire to use JSON here was specifically to limit how much dynamism is possible here so that we can maintain support for (the new format) pretty much indefinitely and make any future migration to another format dramatically easier.

Yes there doesn't need to be any overlap at all between the formats and I suggest making them separate, like how the very-stable formula tab format is.

I think some overlap would be nice if possible but I also think the INSTALL_RECEIPT.json/Tab for formulae is a good example of a very stable format that's fairly easy for third-parties to parse (and we should really probably document somewhere).

That's exactly the sort of model I'm hoping for with the same for Casks.

from brew.

How should we store installed cask information? about brew HOT 18 OPEN

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent