The dweb-transports from internetarchive

split out createReadStream

createReadStream is a useful function in itself, for cases where not coming from an AV element.

Transports should have createReadStream looping through transports like other functions - see below
IPFS should implement createReadStream - see below
HTTP should implement createReadStream- see below
HASH gets it for free when HTTP implements it - see below
WEBTORRENT should implement createReadStream - see below
FLUENCE, GUN, WOLK, YJS dont implement p_f_createReadStream
Testing - webtorrent esp from client against archive
WRITINGSHIMS.md should reflect

IPFS - Websockets - disconnected subnets & single point of failure

IPFS currently uses WebSocketStar (WSS), (since WebRTC crashes browsers on pretty much any decentralized platform, not just IPFS- see internetarchive/dweb-transport#1 )

There are several issues with WSS:
Most critical is that clients connecting via WSS can only retrieve ipfs CIDs that are known by the node they are connected to. This essentially means CIDs aren't universal, just known to the subset of connected peers. The "websocket-relay" project at Protocol is supposed to fix this.

MostUrgent: This could be made better (for the archive), especially in the short term by connecting directly to the IPFS instance at the archive since that node also knows all the IA files we've added, but so far none of the Protocol Lab people have been able to do this.

Most important long term: WSS's star gives a single point of failure, which means that IPFS using WSS is innappropriate for any anti-censorship applications. I think that the WSS-Relay could be used along with a changing list of places to connect to, ideally that would be built into IPFS, but in its absence someone is going to have to build a wrapper, that for example saves potential places to connect to between sessions, and feeds to the config during p_connect.

Feel free to pull these into separate issues if working on them ....

Most urgent is

GUN - storage full

GUN has a problem with not managing full storage - see internetarchive/dweb-archive#46 for why not testing with GUn in dweb-archive because of this problem.

GUN: add rawfetch

Add DAT

should support DAT protocol in dweb-transports, this should be relatively straightforward.

Note, this won't (currently) work in the browser due to WebRTC issues, but should work in Node (e.g. in dweb-mirror).

See DAT meta: mitra42/dweb-universal#1

Cant upgrade IPFS beyond 0.35.0 as uglify issues

See ipfs/js-ipfs#2411

Webtorrent: Fork or monkeypatch to support http urls

Should fork or monkeypatch Webtorrent library so that if it sees a HTTP (or WS) URL for download or for the tracker and is running under https, and has no other usable URL that it will try the https or wss URL.

Naming: cors and 403 errors on /services/img

Also … When I fetch images via services I’m seeing data dependent results which just look wrong …
https://archive.org/services/img/software etc and all of them work fine

HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Sat, 23 Nov 2019 06:14:06 GMT
Content-Type: image/jpeg; charset=UTF-8
Content-Length: 3286
Connection: keep-alive
Cache-Control: max-age=21600
Expires: Sat, 23 Nov 2019 07:14:06 GMT
Last-Modified: Thu, 05 Jul 2018 02:34:06 GMT
ETag: "5b3d839e-cd6"
Expires: Sat, 23 Nov 2019 12:14:06 GMT
Access-Control-Allow-Origin: *
Accept-Ranges: bytes
Strict-Transport-Security: max-age=15724800
Accept-Ranges: bytes

But …some images are missing the cors headers e.g.

curl -o/dev/null -Lv “https://archive.org/services/img/DonkeyKong64_101p”
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Sat, 23 Nov 2019 06:13:38 GMT
Content-Type: image/jpeg; charset=UTF-8
Content-Length: 7181
Connection: keep-alive
Cache-Control: max-age=3600
Expires: Sat, 23 Nov 2019 07:13:18 GMT
Last-Modified: Sat, 23 Nov 2019 06:13:18 GMT
Strict-Transport-Security: max-age=3600
X-Fastcgi-Cache: HIT
Accept-Ranges: bytes

https://archive.org/services/img/opensource_movies fails with a 403 (Forbidden) . If I access it directly in the browser its fine. I can’t think what could be different about it ?

Adding "seed"

I'm working on adding "seed" as another supported function - the design thinking is in dweb-mirror#117 which is the first use case.

@rodneywitcher - particularly interested in how we might want this to work with Wolk as well. I think it would involve adding keys during config, but not sure what info needs passing during the request to seed a file or directory.

When no collection image should display no image at all

Collections without thumbnail images display as the IA logo - instead should not display any logo

See for example http://localhost:4244/details/@mitra and compare to http://dweb.archive.org/details/@mitra

GUN issue with node.once

Problem trying to use same code on Node as on Browser means we cant use GUN for metadata on the node based dweb-mirror.

Naming - move to routing.js in dweb-archivecontroller

Move the newly split naming into dweb-archivecontroller, so dweb-transports becomes less archive.org specific.
See #22

Split transports up, make the bundle smaller include transports separately

As more transports get integrated into dweb-transports, and as the IA's UI team start looking at using dweb-archivecontroller which depends on dweb-transports, its become necessary to split up dweb-transports and make it lighter.

Solution will need ...

Work in nodejs
Work in browser via webpack
At some point work in browser via ES6 modules

Experimentation is in the 'split' branch, which may or may not always be working !

Steps might be ... (this section will be edited)

Persistence of storage - IPFS

Moved from: internetarchive/dweb-transport#2
There are issues with persistence of the IPFS content stored. This is inherent to IPFS since there is no guarantee of persistence in IPFS and things are only stored by people who publish, pin, or for a period look at them.

Since the publisher is a browser, and is probably offline at this point, and noone may have looked at content, we need a way to be able to store. Its unclear if this should be via Pin-ning, or if we have to go outside of IPFS to do so.

For now - given the challenge of pinning on a browser, this is solved with https://github.com/internetarchive/dweb-transport/issues/13which stores both on our http servers and in IPFS.
Note, I'm leaving this open in the hope that an IPFS specific solution can be found.
2018-01-23: Confirmed this is not possible directly in IPFS currently. Solution would be building a pinning service e.g. hit by HTTP from client, and then pins it. This would introduce another single point of failure (client access to http), so would really need to be using something like a IPFS pubsub channel that picks it up and passes back for pinning, which needs GoLang skills or maybe a separate node.js client at IA. For now will stick to HTTP for persistent storing from browsers.

Wolk: down

I'm seeing two issues with wolk.
a) URLs like https://cloud.wolk.com/dweb.archive.org/metadata/netlabels are failing with "Key not found"
b) The library is returning that failure as a success, with a data structure that includes headers and a 404.
I've disabled it for now in the default library.

Split: Naming - move into transports then refactor

Creating naming.js
integrate into Naming.js
in "dweb" repo, remove need for bootloader
Test on dweb.me
remove register.js and dependencies on dweb-objects from dweb-transport
Clean up dweb-objects see issue149
Remove objects from dweb-mirror See internetarchive/dweb-mirror#250
move to /metadata from /arc/archive.org/metadata etc (see below)
Catch https://dweb.me and https://dweb.archive.org and https://archive.org in namer for dweb-mirror as well as dweb:
Move to https://dweb.archive.org instead of dweb: see below
Check for documentation of naming in this repo and google docs

List deletion

Moved from: internetarchive/dweb-transport#7
Lists should support deletion, note that a deletion is just a flag of some sort (I think YJS supports it) so any retrieval should also have the option of eliminating deletions or retaining them.

Note there is already code that filters out duplicates, it probably belongs as a argument to that code to decide whether to eliminate Deletions (first - so deduplications get the not-deleted one).

Note - part of this is having some way to delete a list all the way back to empty.

Status of projects integrating with dweb-transports.

The following systems are integrated currently, updates are welcome.

Naming: localhost

Need a shim that will work with naming and intercept archive.org for localhost,
THEN can remove terniaries in DwebMirror
Part of #22 and #20

internetarchive / dweb-transports Goto Github PK

dweb-transports's People

Contributors

Stargazers

Watchers

Forkers

dweb-transports's Issues

Recommend Projects

Recommend Topics

Recommend Org