Comments (12)
Is there any overlap between the individual images? Or are they completely separate sets of files?
Each dwarfs file contains its own set of files, there is only one copy of each file across all the dwarfs files.
There's probably a dozen different ways to implement "mounting multiple archives to the same path".
One idea i have of how this might be implemented is this:
For example, if the user mounted two dwarfs using dwarfs -i file1.dwarfs -i file2.dwarfs -o ./mount
the program then "combines" the file list from each of the archives, if a file exists in both of the archives then it should use the file from the first archive (file1.dwarfs)
For example, the user has two dwarfs files that look like this:
image.png
another_image.png
video.mp4
audio.mp3
documents/project.odt
documents/test.md
And the resulting mount directory would look like this
audio.mp3
documents/project.odt
documents/test.md
image.png
another_image.png
video.mp4
Im horrible at explaining things
from dwarfs.
Hi!
This sounds to me like what you want is very similar, if not identical, to "incremental backup" functionality, i.e. the ability to add a new snapshot of a directory to a DwarFS image, but only storing the changes relative to the previous snapshot.
I'm not entirely sure, though, because I don't really understand how you achieve this with creating multiple archives and then merge-mounting them. It'd be good to have a more detailed example of exactly what you're doing.
As for the "incremental backup" functionality, that's been requested before and it's definitely something I want to add. See #18, #208.
from dwarfs.
Here is the unholy shell script i use:
dwarfs -o workers=16 -o allow_root -o readonly comp/collection1.dwarfs ./mount/collection1/
dwarfs -o workers=16 -o allow_root -o readonly comp/collection2.dwarfs ./mount/collection2/
dwarfs -o workers=16 -o allow_root -o readonly comp/blalange1.dwarfs ./mount/blalange1/
dwarfs -o workers=16 -o allow_root -o readonly comp/collection3.dwarfs ./mount/collection3/
dwarfs -o workers=16 -o allow_root -o readonly comp/rantonse1.dwarfs ./mount/rantonse1/
dwarfs -o workers=16 -o allow_root -o readonly comp/collection4.dwarfs ./mount/collection4/
dwarfs -o workers=16 -o allow_root -o readonly comp/collection5.dwarfs ./mount/collection5/
dwarfs -o workers=16 -o allow_root -o readonly comp/blalange2.dwarfs ./mount/blalange2/
dwarfs -o workers=16 -o allow_root -o readonly comp/collection6.dwarfs ./mount/collection6/
sudo mergerfs -o cache.files=partial,dropcacheonclose=true,allow_other \
./mount/collection1:./mount/rantonse1:./mount/collection2:./mount/blalange1:./mount/collection3:./mount/collection4:./mount/collection5:./mount/blalange2:./mount/collection6 \
./pywb/collections/main/archive/
I think you understand why it would be great to have this implemented into dwarfs
from dwarfs.
That part was clear from your description. I'm more interested in how you actually build the individual archives. I assume you're creating those from the writable layer in the merged file system?
from dwarfs.
Ah, sorry about that. No, i create the archives separetely.
from dwarfs.
in my (irrelevant) opinion this functionality should be left to specialized union filesystems, like mergerfs you use in your script, or overlayfs. There is a whole list of special considerations when it comes to having multiple filesystem at one location, one of them being filename clashes.
from dwarfs.
Ah, sorry about that. No, i create the archives separetely.
And that means?
Assume I know nothing about your data (or exactly what mergerfs does in your use case).
Is there any overlap between the individual images? Or are they completely separate sets of files?
There's probably a dozen different ways to implement "mounting multiple archives to the same path".
I've looked at the mergerfs README for the last 15 minutes and it's unclear to me what exactly it does. I understand the overlayfs/unionfs approach, but mergerfs is apparently different from that. How does it behave if the same path exists in multiple branches but with different contents?
from dwarfs.
I've looked at the mergerfs README for the last 15 minutes and it's unclear to me what exactly it does. I understand the overlayfs/unionfs approach, but mergerfs is apparently different from that. How does it behave if the same path exists in multiple branches but with different contents?
in my understanding, the traditional way (overlayfs/unionfs) is to have one bottom filesystem and one or more on top, whereas mergerfs uses a merge policy (similar to git merge) that creates a virtual combination of filesystems
from dwarfs.
Kind of like ratarmount's union mounting system
from dwarfs.
Im horrible at explaining things
Your reply definitely helps, though! :)
The problem I'm having is the open questions this leaves. And I agree with @silentnoodlemaster that special/different cases should be left to special tools.
The one thing that is definitely ugly about your use case is that you have a myriad of dwarfs
processes running, each of which have their own config and, much more importantly, own independent cache. This has been bugging me for a while now as I have a somewhat similar use case — tens (maybe hundreds in the future) of dwarfs images that I'd like to mount simultaneously — but for which I don't need a merged view (I'm perfectly fine if they live in separate directories).
So what I'd like to implement, and this is likely going to happen sooner than the incremental-backup feature, is a way to add to (and remove from) a running dwarfs process additional mounts that will share the same cache.
I just haven't figured out all the details yet. And then I need to find the time to do it. So don't hold your breath just yet.
from dwarfs.
Here's a quick brain dump, feel free to comment, I'd definitely appreciate feedback.
None of these will implement any kind of "merging", though.
Mounting multiple DwarFS images
Single mount of multiple images
A single mount of multiple file systems (will show up as one FUSE mount) in a single process; shared cache
dwarfs multi [<subdir1>:]<image1> [<subdir2>:]<image2> ... <mountpoint> [options]
dwarfs add [<subdir3>:]<image3> [<subdir3>:]<image3> <mountpoint> ...
dwarfs remove <mountpoint>/<subdir> <mountpoint>/<subdir> ...
dwarfs remove -m <mountpoint> <subdir> <subdir> ...
add
and remove
only work for multi
mounts. Actually, multi
might not even be needed; add
alone might be good enough.
The contents of each DwarFS image would be accessible at <mountpoint>/<subdir>
instead of just <mountpoint>
.
dwarfs config <mountpoint> # show config?
dwarfs config <mountpoint> cachesize=8g # change cache size
The config
command would also work in the following scenarios.
Multiple mounts sharing the same process/cache
Multiple mounts of multiple file systems (will show up as multiple FUSE mounts) in a single process; shared cache
dwarfs <image1> <mountpoint1>
dwarfs <image2> <mountpoint2> -oattach=<mountpoint1>
dwarfs <image3> <mountpoint3> -oattach=<mountpoint1>
Options that cannot be changed at run-time will report an error.
I'm definitely open for suggestions regarding a name different than attach
. Or maybe even a different syntax for the command.
Multiple mounts with distinct process/cache
Multiple mounts of multiple file systems in multiple processes (current behaviour); exclusive caches
from dwarfs.
Single mount of multiple images
A single mount of multiple file systems (will show up as one FUSE mount) in a single process; shared cache
That implementation seems like the cleanest and most user friendly alternative.
I assume specifying the subdir is optional ([<subdir1>:]<image1>
), which would be great as it would allow you to use globs to mount multiple images without creating a ridiculously long command.
from dwarfs.
Related Issues (20)
- cannot enter subdirs of overlayed dwarfs HOT 10
- Error while building HOT 2
- Fuse Passthrough
- Are the prebuilt binaries affected by xz/liblzma backdoor? HOT 6
- [Feature request] Allow providing dwarfs with a dedup library HOT 4
- official debian package, allow building without git? HOT 4
- mkdwarfs always crashes with SIGABRT HOT 13
- [MacOS] DwarFS mount not seen through Finder HOT 5
- [Core Dump] Signal 7 (SIGBUS) (code: nonexistent physical address) on making archive of currently running OS (possibly bad use case) HOT 2
- Homebrew formula HOT 22
- some problem on the README.md files. Please Check up and Fix. HOT 1
- read scalability issues with large archives HOT 9
- Unexpected exception: `inode has no file (any)` HOT 8
- Segfault when using the mold linker HOT 2
- Cannot build v0.9.9 on Ubuntu 22.04 HOT 5
- Vendor fbthrift & folly using vcpkg HOT 2
- exception thrown in worker thread: class dwarfs::runtime_error: lzma_stream_encoder HOT 2
- [Feature Request] Provide non-generic packaging CI for major linux distributions HOT 2
- Document memory behaviour and give tips for dealing with many files HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dwarfs.