Currently emscripten stubs or fakes most filesystem access calls. This causes plenty of software to either not work at all or fail subtly due to incomplete functionality. I propose implementing a lightweight file system that would allow users of emscripted programs to set up a access to their real files with the paths that the emscripted program expects.
Basically, I plan to follow the same pre-configuration approach currently used for STDIO.prepare()
. I imagine a main Module.FS
object which would be used to set up all the paths before running the module. To create a file you could pass actual data or a relative URL that would be loaded lazily when/if requested. In non-browser contexts, the relative URL can be used as a real filesystem path.
A similar method could be used to create virtual folders or explicitly or mark leaf folders for lazy-loading, so any reads of file from inside the folder would be forwarded to a given URL.
In the interest of keeping it lightweight, I see no reason to support any of the following:
- Separate read and execute permissions.
- The sticky permission bit.
- Multiple users/groups.
- Separate created and modified timestamps.
- Hard links.
- Block devices.
Modifying anything (writing to files, deleting/creating folders, modifying permissions) by the emscripted program could either be handled by a user-specified callback, or happen in memory with the user being able to flush all the changes after the program is done running. The latter would be simpler, but the former significantly more flexible.
I believe this would allow emscripten to implement all of the Unix filesystem API at a minimum level.
In terms of actual implementation, I see the file system as a simple nested dictionary/Object
. A straightforward but memory-wasteful approach would be to have something like this for a simple file system:
root = {
read: true,
write: false,
isFolder: true,
timestamp: <Date>,
inodeNumber: 1,
contents: {
subfolder1: {
read: true,
write: true,
isFolder: true,
timestamp: <Date>,
inodeNumber: 2,
contents: {
myfile: {
read: true,
write: true,
isFolder: false,
timestamp: <Date>,
inodeNumber: 3,
url: '/path/to/myfile/on/server'
},
myotherfile: {
read: true,
write: true,
isFolder: false,
timestamp: <Date>,
inodeNumber: 4,
contents: <index-into-HEAP>
},
mylink: {
read: true,
write: true,
isFolder: false,
timestamp: <Date>,
inodeNumber: 5,
link: '/subfolder1/myfile'
},
mydevice: {
read: true,
write: true,
isFolder: false,
timestamp: <Date>,
inodeNumber: 6,
read: readFromDevice,
write: writeToDevice
},
subfolder3: {
read: true,
write: false,
isFolder: true,
timestamp: <Date>,
inodeNumber: 7,
url: '/getfileFromSubfolder?name=%s'
}
}
},
subfolder2: {
read: false,
write: false,
isFolder: true
timestamp: <Date>,
inodeNumber: 8,
}
}
};
In the interest of memory conservation, the permissions read
and write
could be merged into a single flags field, and instead of an Object
for each file, we can use a more flattened 5-tuple: permissions
, inode
, timestamp
, type
and content
, where type
implies isFolder
. The above structure would look as follows using this approach:
root = [1, 1, {
subfolder1: [3, 2, <Date>, 'contents', {
myfile: [3, 3, <Date>, 'url', '/path/to/myfile/on/server'],
myotherfile: [3, 4, <Date>, 'data', <index-into-HEAP>],
mylink: [3, 5, <Date>, 'link', '/subfolder1/myfile'],
mydevice: [3, 6, <Date>, 'device' [readFromDevice, writeToDevice]],
subfolder3: [1, 7, <Date>, 'url_pattern', '/getfileFromSubfolder?name=%s']
},
subfolder2: [0, 8, <Date>, 'contents', {}]
}];