I opened a preliminary PR (#10) for Federation but probably best to go via a issue first, to better enable discussions around it. Here is what I've been thinking so far.
Old proposal: https://gist.github.com/victorb/82ace9e6fe7adf578833527b8b94f914
New proposal:
Open-Registry Federation
Summary
Open-Registry as a crowdfunded registry won't be able to reach the same scale
of npm inc registry without raising significant amount of funds. What we can do
however, is setup a federation of registries which would significantly lower our
operating costs and also give the users the benefit of faster performance and
local resource sharing.
The model of federation proposed here will decentralize the storage and
transfer of tarballs first, as it poses an easier way of getting started with
federation for Open-Registry.
Once implemented and used, we can start focusing on research about federated
publishing as well.
Motivation
- Lower bandwidth/storage/hosting expenses
- Faster performance for participants
- Resilience
- User Control
Constraints
- Needs to handle npm namespace to be npm compatible (global + scoped packages)
- Handles propagation of package updates
- Anti-spam measures if needed
- Cheap to run (Federated version needs to be lightweight)
- Downloads metadata + tarballs on-demand
- Space aware (never cause "out of space" state by itself)
- Users should be able to benefit from federation by just changing the registry
url (DNS/HTTP federation)
- Users can benefit further by running federation software locally
- Runs offline
Use Cases
- Individuals can find closer mirrors
- Teams can share the same mirror
- Companies can deploy on-prem mirrors
- Organizations depending on Open Source packages can help host packages
- Registry will continue to work even though main mirror is down
Security
- Malicious people might try to be a part of federation honestly, until they
aren't honest anymore
- Content-addressing helps address this specific issue
- Tarballs are verified when downloaded via content-addressing + popular
clients (npm + yarn) checks the checksums before extracting, so mutating
served tarballs is hard without client detecting it
Practical steps
Ok, so the working plan is the following:
This is the small, MVP version to ensure the idea is viable in the wild.
First step towards federation is having the metadata index centralized with
Open-Registry while tarballs can be served from anywhere and anyone.
Plan is to use ipfs-lite by @hsanjuan to start a embedded libp2p node that will
expose the traditional registry interface as HTTP endpoints.
The software will connect to the central registry to find out the latest root
hash and also listen for any changes, automatically update it's local pointer
when Open-Registry's pointer changes.
The root hash can be found in multiple different ways, depending on the
environment of the software.
The software will basically be a resolver for (packageName, packageVersion) =>
IPFS hash via it's local proxy.
CLI interface
$ open-registry --federate
--share
--update-type=<http|dns|ipns|pubsub>
--offline
--federate <multiaddr> - Connect to already running instance and use it's
root hash.
Default: /dns4/npm.open-registry.dev/tcp/6736
--share - Enable other peers to connect to you and download
public packages.
Default: true
--update-type - How to get the latest root hash from Open-Registry.
Default: http
--offline - Don't do any connections, use last known root hash.
Default: false
Example usage:
$ open-registry
Connecting to npm.open-registry.dev
Getting latest hash via HTTP over TLS
Started sharing downloaded public packages with others
Started HTTP server on http://localhost:6736 # mnemonic: "open" in T9
...
Currently connected to 3 peers
Upload/Download [current/total]: 32kbps/0kbps [3mb/7.3mb]
Pointing your package manager to http://localhost:6736
should now allow
you to download and install packages on-demand, while caching them and serving
it to other users who are trying to download them too.
Federation Protocol
When the federation software gets started on the users device, it connects to
the main registry.
Once connection has been established, it asks for the latest version of the
registry (just a pointer), and saves it for future use.
Concurrently, it starts a HTTP server locally.
Now the user can point it's client to the local HTTP server
Requests will be proxied via the latest root hash the federation software knows
about, and cache fetched data
When the root hash of the main registry changes, it publishes it via the
following ways:
- txt record on npm.open-registry.dev under the format "registry-hash="
- Under property
hash
in response to a GET request to npm.open-registry.dev
- Send the hash via the topic
npm.open-registry.dev
on the used libp2p
network
- (maybe) updates the IPNS name that the main registry uses
If the local client makes a request for a package that doesn't exists in the
local root hash, the client needs to make a request to the central registry to
download the package. After this is done, the package will be included in the
new root hash, and can therefore be downloaded by the local client without any
requests to the central registry.
Simulator
First step of the federation setup is creating a suitable testing environment
where we can run tests about how well the federation is working.
Simulator should start with running the following scenarios:
More elaborate schemes can be created in the future.
Bootstrap nodes
Open-Registry will run a couple of bootstrap nodes. These are responsible for
being accessible to the federation nodes and provide the data for metadata and
tarballs if the federation nodes doesn't have it locally.
Metrics
Both the bootstrap nodes and the main registry index should publish metrics in
the Prometheus format to be collected by the metrics gatherer. These metrics
will eventually be made accessible via a public dashboard.
For the federation nodes, we can offer opt-in metrics in the future, so we can
see the health of the federation.
Existing infrastructure migration
The current Open-Registry is just one instance which is the main Open-Registry
index. With federation, the architecture would change to add another component
which would be the federated instances. We have more flexibility on where to
place these but are in no rush to add them currently.
Potential Issues
- Lockfiles contains direct location-based URLs
- hard for project to migrate without having to rewrite their lockfiles
- Efficient and fast lookup in the IPFS network
- private networks solve this but brings it's own problems
Drawbacks
- Requiring software to be installed and run in the background for people
wanting to take advantage of it
- ^ could possibly be solved with HTTP/DNS routing, but initial routing will
be centralized in that case and require internet connectivity
Alternatives
- Continue to run a centralized service
- Skip federation and start researching a architecture for fully decentralized
registry for both tarball and metadata
- Probably a too huge of a undertaking right now
Unresolved Problems
- Using a IPFS private VS public network
- private network will be faster to bootstrap + finding content
- public network gives us a bigger reach and ability to download content from
other nodes
- Should benchmark and see which one is faster (although private network is
pretty much sure to be faster, would be interesting to see how much)
Future
- After implementation of the tarball federation, further research should be
done on how metadata can be federated as well
- Research URL scheme currently used to define packages
- Right now, entire ecosystem is in one namespace (lets call it the npm
namespace)
- Things are referred to as
class-is
directly in the package.json
and
lockfiles
- We'd like to support multiple registries by doing something similar to
/registry.npmjs.org/class-is
instead. More verbose, but more accurate and
flexible