Coder Social home page Coder Social logo

Comments (14)

JonasKruckenberg avatar JonasKruckenberg commented on July 17, 2024 1

So basically you want to use the local version of a module during development and a cdn for production?
Because I dont think this should be a feature for this plugin. Maybe I missunderstood your point, but here are my reasons:

  1. This feature should be implemented at a module resolution level i.e. when module names get resolved.
    This would also enable things like this:
import $ from 'jquery'

getting turned into:

import $ from 'https://code.jquery.com/jquery-3.5.1.slim.min.js'

The rollup-plugin-sri is not really set up to handle this, as it gets invoked after the bundle is build.

  1. This module is about subresource integrity, resolving cdns is not really in the scope I'd say.
  2. You've implied that automatic resolving of cdn urls should be a thing, and while a agree and might even try this as another npm module, it should not be part of this one, as it involves actively rewriting html while this plugin only adds two attributes.

To solve you question, here is a way one could archive a similar thing to you suggestion:

  • use @rollup/plugin-alias to resolve certain modules e.g. jquery to a cdn url https://code.jquery.com/jquery-3.5.1.slim.min.js
  • use @rollup/plugin-html with a custom template that injects more script tags than just the entry point. ( as module-preload or prefetch )
  • then use this plugin to generate integrity attributes for all link and script tags in the html.

I now this is a bit convoluted and I'm actually working on a ( yet unreleased ) html generation plugin for rollup that should make a lot of this easier, but for the time being this is the best I can come up with.
Cheers! Jonas

from rollup-plugin-sri.

brettz9 avatar brettz9 commented on July 17, 2024

Re: imports, while I'd love to work modularly with just imports, the problem is that SRI won't work currently with imports: WICG/import-maps#174 .

Sounds good re: other ideas. FWIW, since I'd like this feature for SSR also, I've started work on a project which I hope to get to handle (programmatically or by CLI output), integrity/name/version checking and CDN checking/resolving. Your project or another could then use this to turn a JSON config file into script or links in the desired formats.

Thanks for the reply!

from rollup-plugin-sri.

brettz9 avatar brettz9 commented on July 17, 2024

And with this approach, one could have the local in source (probably the easiest and cleanest since the source files wouldn't need to introduce diffs on each version update), but have a switch on whether that should serve:

  1. Local only
  2. CDN only
  3. CDN with fallback to local

Thus one could, while having stable source, generate HTML, or for SSR, a required JSON config file, which included the version-specific CDN URLs.

from rollup-plugin-sri.

JonasKruckenberg avatar JonasKruckenberg commented on July 17, 2024

Okay so just to make sure I understood you correctly, the workflow you're proposing is to

  1. Install a dependecy ( like jquery ) via npm or yarn
  2. Include the module in a script tag, since sri and imports are still at odds as you mentioned
  3. Have rollup bundle the whole website
  4. And this plugin resolve script tag sources to either the filesystem or a cdn

If I understood you correctly, this functionality would be much better off as its own plugin. I have thought about this recently too, a plugin that resolves imports to a cdn would be really handy, especially when building an existing project for denojs.

The resolution algorithm would be the biggest problem though, how would the plugin know that some/path/to/jquery.min.js should resolve to a cdn? How does it now which version to use based on the path? Should it use the latest version? What about semver-ranges?

I really think that it's way easier to just use the cdn script tags directly than building and cofiguring a plugin that desperately tries to keep local files and cdns paths in sync.

That said, some sort of cdn resolver plugin would be really nice. I'm going to close this issue for now, but we can still discuss the cdn resolving thingy more. Maybe we can sketch it out enough so it can become its own thing.

from rollup-plugin-sri.

brettz9 avatar brettz9 commented on July 17, 2024

Okay so just to make sure I understood you correctly, the workflow you're proposing is to

1. Install a dependecy ( like jquery ) via npm or yarn

2. Include the module in a script tag, since sri and imports are still at odds as you mentioned

3. Have rollup bundle the whole website

4. And this plugin resolve script tag sources to either the filesystem or a cdn

Yes. Though you could keep your script tags in source as working filesystem paths, e.g.,:

<script src="./node_modules/jquery/dist/jquery.js"></script>

...and transform that, allowing it to work locally without a build step (you could potentially go the other direction, having CDN URLs in source and mapping that to npm local files, and auto-updating the URLs when new versions were detected in node_modules, but with the disadvantage that it would create diffs in source when updating, it'd be easier to corrupt, and it wouldn't work offline, which I think is a more reasonable development default.)

If I understood you correctly, this functionality would be much better off as its own plugin. I have thought about this recently too, a plugin that resolves imports to a cdn would be really handy, especially when building an existing project for denojs.

Ah yes, I can see that.

The resolution algorithm would be the biggest problem though, how would the plugin know that some/path/to/jquery.min.js should resolve to a cdn? How does it now which version to use based on the path? Should it use the latest version? What about semver-ranges?

One would need a config file, such as in the format in my original post, which would, take regular expression strings using named capturing groups (specifically expecting name, version, and path/extensions) that would extract pieces out of the original path, and map it to a CDN URL, using a separate replacement expression to determine how the parts would be reassembled back into a URL.

Let's say we have this in source:

<script src="./node_modules/leaflet/dist/leaflet.js"></script>

In the simplest scenario, the following would be our library's default find expression (though it could also be added explicitly to a user's config file). This:

/.\/node_modules\/(?<name>[^/]*)\/(?<path>[^\'"]*)/

...could be used to identify the name and path out of all local paths found in our HTML and map it to a CDN URL. In the simplest scenario, our config would simply map all modules to the same CDN (if we wanted to bless "unpkg", that could even be the default when there was no config):

{'*': 'unpkg'}

...with "unpkg" being a reserved name since it is a popular CDN for npm packages.

This would be an out-of-the-box equivalent to the following config (which could be used instead if desired):

{'*': 'https://unpkg.com/$<name>@$<version>/$<path>'}

...which means:

  1. Add the string https://unpkg.com/, followed by
  2. The "name" capturing group, in this case "leaflet".
  3. A "@"
  4. The "version" capturing group, or, since there is no "version" capturing group in our default find expression, the development build library would default instead to looking inside node_modules/<name>/package.json (i.e., node_modules/leaflet/package.json) to see what version had been installed (especially since we can only run a safe checksum against a file we have already downloaded). Let's say it finds "1.4.0" there.
  5. A "/"
  6. The "path" capturing group, in this case "dist/leaflet.js".

I.e., it would therefore build:

<script src="https://unpkg.com/[email protected]/dist/leaflet.js" integrity="..." crossorigin="..."></script>

...where the integrity was based on a checksum it made against the local ./node_modules/leaflet/dist/leaflet.js (crossorigin, etc., could be defaults or subject to config, similar to your plugin).

Our development library should probably also try a HEAD request with the resulting CDN URL to confirm that the targeted URL gives a 200 response.

I really think that it's way easier to just use the cdn script tags directly than building and cofiguring a plugin that desperately tries to keep local files and cdns paths in sync.

I've got a good amount of the work done locally for building the plugin. As far as configuring it, one just needs to add a simple JSON config file, and add an npm script to run our library to build some production HTML (or build some JSON config for server-side use which includes the latest version and integrity info and can be read at run-time to be rendered server-side).

I think it'll be a lot nicer than having to hunt out websites' download pages every time there is a new version to find the new checksum and manage adding it to source.

Instead, one could just run an npm package like npm-check-updates or updates to get all of one's local dependencies updated to the latest version, and run this proposed cdn script to handle the checksums and versioned URLs for us, based on the latest updates (it would only break if our npm package and/or CDN changed their file path structure--but this is the case with any dependency we use, and typically, like with jquery, they don't tend to change). This will be especially useful for CDN's like unpkg which are predictable and consistent in their URL format (and in the case of unpkg, tied smoothly to package name and version, but pretty much any CDN should do this).

That said, some sort of cdn resolver plugin would be really nice. I'm going to close this issue for now, but we can still discuss the cdn resolving thingy more. Maybe we can sketch it out enough so it can become its own thing.

Yeah, sure. I can let you know when I may get my own package working. I just think it'd be cool to tie it into a Rollup HTML process, since one typically has some external dependencies for which a CDN makes sense, and some internal dependencies for which Rollup makes sense.

from rollup-plugin-sri.

JonasKruckenberg avatar JonasKruckenberg commented on July 17, 2024

Yeah alright, what you're saying makes sense, what you basically want is to sort of lock dependencies based on the hash of the local dependency so the remote cdn resource gets blocked if it is different from the local hash.
I would suggest this should be based on either the yan.lock or package-lock.json file, as it already has an integrity attribute.
So the plugin would

  1. Look at imports and script src tags and resolve file paths to module names, by looking up the nearest package.json. One should also lookup the right field in the package.json ( i.e. the main, module, browser or even unpkg ) field.

    This step would only select resources that match the given regexs.
  2. Look up each dependency in the lock file, and store the integrity and version number
  3. Map package name, version and field from step 1 to a url pointing to the CDN. This would take the template string from the configuration and populate it.
  4. Maybe check each url for availability and switch to a fallback CDN just in case.
  5. Export the urls, either by injecting them into an html file, or as you pointed out into a seperate json file for later processing.

Would you be willing to contribute to a new rollup plugin that does this? Maybe I set up a new github repo later today...

from rollup-plugin-sri.

brettz9 avatar brettz9 commented on July 17, 2024

Yeah alright, what you're saying makes sense, what you basically want is to sort of lock dependencies based on the hash of the local dependency so the remote cdn resource gets blocked if it is different from the local hash.

Right--by updating the production HTML with the npm integrity's and then letting SRI do what it does.

I would suggest this should be based on either the yan.lock or package-lock.json file, as it already has an integrity attribute.

That could be helpful but:

  1. Some packages use .npmrc config to prevent generation of a package-lock file.
  2. Some packages accidentally include both yarn.lock and package-lock.json.
  3. Perhaps npm/yarn has checks against this happening, but I'd worry (esp. for yarn perhaps which has that resolutions feature to do overrides) that if someone made a malicious commit to a lock file, it'd be less easily noticed than if the package.json dependency version was changed.

Still, it might at least be nice to have an option to report inconsistencies to the user.

So the plugin would

1. Look at imports and script src tags and resolve file paths to module names, by looking up the nearest package.json.

I would think that only checking script src tags would be necessary since imports don't have a mechanism for SRI.

Btw, while I don't see any problems in serving local files with integrity, I'm not clear what uses SRI has with local files. If a server is under one's control, presumably one already trusts the other files on one's system. If one's server is compromised (or if it is only using http allowing for man-in-the-middle attacks), the hacker can presumably modify any file on the system, including the file giving the integrity. Doing such checking no doubt comes with at least some performance penalty too. I understand the use of integrity as primarily with third-party CDN's where one knows what the contents should be (and wishes to avail oneself of their potential higher speed delivery or global availability), but where one doesn't want to take the CDN's at their word that they are not modifying the contents.

One should also lookup the right field in the package.json ( i.e. the main, module, browser or even unpkg ) field.
This step would only select resources that match the given regexs.

Since there isn't any way for imports to be used with SRI, I wouldn't think we'd need to go into the main, module, etc. fields--just look for the file path. I guess we could allow scripts like <script src="jquery/dist/jquery.js"></script> and map that to node_modules, but that would be ambiguous with relative paths (i.e., "is there a "jquery" folder in the same directory as the HTML file, or is that supposed to be a node_modules package). One could allow some special syntax like <script src="npm:jquery/dist/jquery.js"></script> and convert that, but I don't think introducing a non-standard syntax is very helpful, esp. since having a real local path has the advantage of allowing one to directly view the HTML source.

2. Look up each dependency in the lock file, and store the integrity and version number

3. Map package name, version and field from step 1 to a url pointing to the CDN. This would take the template string from the configuration and populate it.

4. Maybe check each url for availability and switch to a fallback CDN just in case.

A fallback CDN is an interesting idea, but:

  1. It might be due to chance that one CDN fails during build time.
  2. One would need to have mapping on how to do so.

When I was speaking about a fallback, though, I meant allowing config to be aware of a global so that the fallback could occur at runtime if the first file didn't load. For example, one might wish to have this entire block generated (code which will, if the CDN doesn't load, synchronously load the local jQuery fallback):

<script src="https://code.jquery.com/jquery-3.5.1.slim.min.js" integrity="..." crossorigin="..."></script>
<script>
  window.jQuery || document.write(
    decodeURI('%3Cscript src="./node_modules/jquery/dist/jquery.slim.min.js"%3E%3C/script%3E')
  )
</script>

...based solely on these inputs:

  1. The path (whether found in an HTML script tag or in a JSON config file) `node_modules/jquery/dist/jquery.slim.min.js'
  2. The global to check (window.jQuery)--as indicated in a JSON config file
  3. The locally detected npm data (package.json, and node_modules/jquery contents)
  4. The additional tag metadata (e.g., crossorigin) as indicated in a JSON file--if not following the library defaults.
5. Export the urls, either by injecting them into an html file, or as you pointed out into a seperate json file for later processing.

Would you be willing to contribute to a new rollup plugin that does this? Maybe I set up a new github repo later today...

I may be able to find some time, sure. But my main idea is to have a separate library for this and just let an HTML-bundling Rollup plugin call my separate library (to which you'd be welcome to contribute if you like, once I may get it going at least) to find the integrity and other info, which the Rollup HTML plugin could use to build the tags. As I see it, in the absence of SRI support for ESM imports, there is little connection of my tool to Rollup, except in the sense that it seems to make sense to me to convert the CDN SRI scripts (as my plugin could help handle, though my plugin wouldn't need to do the HTML parsing here) at the same time as one is parsing HTML and bundling one's local ESM imports (which again, as I understand it, shouldn't even really need SRI).

from rollup-plugin-sri.

JonasKruckenberg avatar JonasKruckenberg commented on July 17, 2024

3. Perhaps npm/yarn has checks against this happening, but I'd worry (esp. for yarn perhaps which has that resolutions feature to do overrides) that if someone made a malicious commit to a lock file, it'd be less easily noticed than if the package.json dependency version was changed.

Yeah true, that opens can can of worms potentially, but in the end, aren't lock files supposed to do just what we need? lock dependencies to a specific version. But I get your point, definitely something to keep in mind.

I'm not clear what uses SRI has with local files. If a server is under one's control, presumably one already trusts the other files on one's system.

I run a production wesite that uses SRI for all resources, no matter their origin and I've not run into any noteable performance penalties from it, ( front page loads within ~1sec ).
While yes SRI makes the most sence when using cross-origin CDNs, most people nowadays don't own their hardware/server anymore, they use CDNs too, like Netlify, AWS or cloudflare. Using SRI there protects you jsut in case. Maybe the CDN got compromised or someone is doing an http desync attack on your site. In my opinion having SRI everywhere has more advantages than disadvantages.

Since there isn't any way for imports to be used with SRI, I wouldn't think we'd need to go into the main, module, etc. fields--just look for the file path.

So yeah I get that, too bad that import maps and the like are not quite ready yet. What I do on my websites ( only the important pages though ) is that I manually add preload and prefetch tags for ESM modules to speed things up a bit.
For this use case it would be important to generate SRIs for all modules, scripts and stylesheets. You also said that you'd want SSR support as a feature, dynamically generating preload and prefetch tags would be nice ni those situations too right?

<script>
  window.jQuery || document.write(
    decodeURI('%3Cscript src="./node_modules/jquery/dist/jquery.slim.min.js"%3E%3C/script%3E')
  )
</script>

I don't think this is the right way to approach this, creating inline scripts requires people to allow unsafe-inline in their CSPs which is not something we should make people do. It's one of the most important security features on the web in my opinion.

Well I'm definitely interested, you can open another issue or reference this one once you're ready to publish your tool!

from rollup-plugin-sri.

brettz9 avatar brettz9 commented on July 17, 2024
  1. Perhaps npm/yarn has checks against this happening, but I'd worry (esp. for yarn perhaps which has that resolutions feature to do overrides) that if someone made a malicious commit to a lock file, it'd be less easily noticed than if the package.json dependency version was changed.

Yeah true, that opens can can of worms potentially, but in the end, aren't lock files supposed to do just what we need? lock dependencies to a specific version. But I get your point, definitely something to keep in mind.

Yeah. It turns out the Yarn resolutions need to be put in package.json (where things are typically more human-readable) anyways. Otherwise, I believe the install ought to complain about a bad/inconsistent lock file if there is no suitable semver or matching integrity within that semver on the npm registry.

Lock files, at least package-lock.json are meant to ensure consistent installs for those performing a local npm install--not regular production npm install <package> use, so if there were such a vulnerability unique to the lock files, the problem would be limited to developers, though of course that'd be a problem too. Maintainers who keep a package-lock.json can and should run npm audit to make sure they are not using known vulnerabilities within the indicated range, so the onus is on projects for vulnerabilities within semver--or if they take on the risk to point to Github URLs which are not versioned; since these apparently also only get locked for local installs, a dependency could maliciously amend the branch on Github, in this case, impacting regular users rather than developers).)

While it is reasonable to check lock files, if nothing else than to report if the user's local copy is not matching, I think we need to check the installed file not only for the first two reasons previously mentioned (repeating for convenience), but also a third reason:

  1. It is pretty common for projects to prevent creation of lock files
  2. Some projects have both kinds of lock files
  3. If the person building the CDN versions has forgotten to run a fresh npm install to get the latest development versions indicated by their project's lock file; in such a case, we'd be pointing their CDN (and integrity) to the latest version, but the developer would be testing locally with an outdated version. If the lock hashes are accurate, this shouldn't cause a CDN failure, but may cause confusion for the developer (and problems if we apply that same hash for their local file without doing our own check of the local file).

I'm not clear what uses SRI has with local files. If a server is under one's control, presumably one already trusts the other files on one's system.

I run a production wesite that uses SRI for all resources, no matter their origin and I've not run into any noteable performance penalties from it, ( front page loads within ~1sec ).
While yes SRI makes the most sence when using cross-origin CDNs, most people nowadays don't own their hardware/server anymore, they use CDNs too, like Netlify, AWS or cloudflare. Using SRI there protects you jsut in case. Maybe the CDN got compromised or someone is doing an http desync attack on your site. In my opinion having SRI everywhere has more advantages than disadvantages.

Would like to check that out as possible. Thank you for the info and benefit of your experience.

Since there isn't any way for imports to be used with SRI, I wouldn't think we'd need to go into the main, module, etc. fields--just look for the file path.

So yeah I get that, too bad that import maps and the like are not quite ready yet. What I do on my websites ( only the important pages though ) is that I manually add preload and prefetch tags for ESM modules to speed things up a bit.
For this use case it would be important to generate SRIs for all modules, scripts and stylesheets. You also said that you'd want SSR support as a feature, dynamically generating preload and prefetch tags would be nice ni those situations too right?

That's an interesting use case. Hadn't thought of that. But if the CDN had corrupted data, while the prefetch would be forced to fail, when ready for live fetch, I imagine the browser would nevertheless attempt again without the checks, thus getting the corrupted file, no?

<script>
  window.jQuery || document.write(
    decodeURI('%3Cscript src="./node_modules/jquery/dist/jquery.slim.min.js"%3E%3C/script%3E')
  )
</script>

I don't think this is the right way to approach this, creating inline scripts requires people to allow unsafe-inline in their CSPs which is not something we should make people do. It's one of the most important security features on the web in my opinion.

Good point. However, I don't think it would hurt to provide this as a non-default option though, as for things like some open source demos on Github Pages or such, it may be nice to have a working CDN URL defaulting to a local copy, without a lot of concern about the header policy in such situations but still wanting good and reliable performance for their users.

Well I'm definitely interested, you can open another issue or reference this one once you're ready to publish your tool!

Sure. Btw, one thing I realized. Some projects may want to store the CDN-versioned copy in their source, despite the need for diffs upon version changes, in that it could be used as the demo (e.g., if their master branch were hosted on Github Pages). So I think the package should be prepared to read and convert CDN versioned links into updated CDN versioned links, as well as convert unversioned local links into CDN versioned links and also to support converting files in place. Hope to work on this tomorrow.

Btw, FWIW, one Rollup idea for Deno if imports can ever be made to work... A plugin to retrieve and roll-up a live URL's dependencies and transitive dependencies, as a snapshot, storing them locally and hashing them so that a URL advertising itself as fixed could be verified as doing so (and the project could then review the cached code before releasing). Came to mind as a kind of poor-man's decentralized package manager. (Such a tool could be used now for the bundle it created at least.)

from rollup-plugin-sri.

brettz9 avatar brettz9 commented on July 17, 2024

Hi,

I've found out that lock files won't help us with anything besides versions anyways, as their integrity applies to the whole package, whereas we are interested in specific distribution files.

I've now gotten a preliminary package added at https://github.com/brettz9/integrity-matters . I went ahead and added some HTML parsing so the package could be used on its own without Rollup, but it still can be used programmatically and potentially added into a Rollup plugin routine.

No promises on getting to them, especially the second one, but my remaining higher priority to-dos for the project at the moment:

  1. Add tests/coverage
  2. See about modifying cheerio or, as needed, its parser dependents, to allow preserving inter-attribute whitespace

No doubt the docs can be improved, but hopefully getting to the tests should help with that, directly in inspiring improving the docs, or at least indirectly (by having tests-as-docs).

from rollup-plugin-sri.

brettz9 avatar brettz9 commented on July 17, 2024

FYI, I've now added full tests/coverage. Still want to apply to some real world projects and see if need any further tweaking, but at least the behavior is now specified in tests.

from rollup-plugin-sri.

JonasKruckenberg avatar JonasKruckenberg commented on July 17, 2024

Alright I’ll see if I find the time in the coming week to check it out. Doing my research after our discussion though, I noticed the following:
In Safari and recent Chrome builds the Cache is double keyed, meaning that CDN hosted resources will not be shared between origins, they will always be refetched, ( kind of negating the point of CDNs in my mind )
So I have a question that you might be able to answer:
What are you using CDNs for and why?
( I see the benefit in some cases like cdns that auto optimize your images )
But what are other real benefits in your opinion?

from rollup-plugin-sri.

brettz9 avatar brettz9 commented on July 17, 2024

Let me say off the bat, while I of course try to follow best practices and keep performance in mind, it is not my particular focus to track performance in my work.

I think coming to such conclusions requires more data and testing than is my personal focus.

My perspective is coming more from a perspective that:

  1. A good number of open source webapp/website projects use CDNs
  2. Manually updating these links is a hassle.
  3. Even if CDNs no longer have the browser caching advantage, I presume there may still be other factors affecting whether they may have benefit in some cases, and even if not, it may still be difficult to conclusively demonstrate any purported lack of benefit, such that these projects will be convinced to stop using them.

That all being said, it does seem to me that even if browser caching has been hobbled by the apparent attempts to prevent Etag tracking, there is still the remaining benefit CDNs may offer over traditional, cheap hosts whereby their widely distributed proxy servers can be close enough to more users. (Whether or how that is outweighed by costs for setting up a separate connection is not something I have tested or have data on.)

If they may be of interest, a few pages I came across on the topic (I didn't dive into them in depth though):

from rollup-plugin-sri.

brettz9 avatar brettz9 commented on July 17, 2024

Just wanted to mention I've had a chance now to apply this to two real-world projects, one using HTML, and one using JSON (one which doesn't modify the source file, and one which does), and having made some adjustments, I think I'm happy it's working now on a basic enough level (though it can no doubt be yet improved).

from rollup-plugin-sri.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.