Coder Social home page Coder Social logo

Comments (12)

hexahigh avatar hexahigh commented on July 17, 2024 1

Is there any overlap between the individual images? Or are they completely separate sets of files?

Each dwarfs file contains its own set of files, there is only one copy of each file across all the dwarfs files.

There's probably a dozen different ways to implement "mounting multiple archives to the same path".

One idea i have of how this might be implemented is this:
For example, if the user mounted two dwarfs using dwarfs -i file1.dwarfs -i file2.dwarfs -o ./mount the program then "combines" the file list from each of the archives, if a file exists in both of the archives then it should use the file from the first archive (file1.dwarfs)

For example, the user has two dwarfs files that look like this:

image.png
another_image.png
video.mp4
audio.mp3
documents/project.odt
documents/test.md

And the resulting mount directory would look like this

audio.mp3
documents/project.odt
documents/test.md
image.png
another_image.png
video.mp4

Im horrible at explaining things

from dwarfs.

mhx avatar mhx commented on July 17, 2024

Hi!

This sounds to me like what you want is very similar, if not identical, to "incremental backup" functionality, i.e. the ability to add a new snapshot of a directory to a DwarFS image, but only storing the changes relative to the previous snapshot.

I'm not entirely sure, though, because I don't really understand how you achieve this with creating multiple archives and then merge-mounting them. It'd be good to have a more detailed example of exactly what you're doing.

As for the "incremental backup" functionality, that's been requested before and it's definitely something I want to add. See #18, #208.

from dwarfs.

hexahigh avatar hexahigh commented on July 17, 2024

Here is the unholy shell script i use:

dwarfs -o workers=16 -o allow_root -o readonly comp/collection1.dwarfs ./mount/collection1/
dwarfs -o workers=16 -o allow_root -o readonly comp/collection2.dwarfs ./mount/collection2/
dwarfs -o workers=16 -o allow_root -o readonly comp/blalange1.dwarfs ./mount/blalange1/
dwarfs -o workers=16 -o allow_root -o readonly comp/collection3.dwarfs ./mount/collection3/
dwarfs -o workers=16 -o allow_root -o readonly comp/rantonse1.dwarfs ./mount/rantonse1/
dwarfs -o workers=16 -o allow_root -o readonly comp/collection4.dwarfs ./mount/collection4/
dwarfs -o workers=16 -o allow_root -o readonly comp/collection5.dwarfs ./mount/collection5/
dwarfs -o workers=16 -o allow_root -o readonly comp/blalange2.dwarfs ./mount/blalange2/
dwarfs -o workers=16 -o allow_root -o readonly comp/collection6.dwarfs ./mount/collection6/

sudo mergerfs -o cache.files=partial,dropcacheonclose=true,allow_other \
	./mount/collection1:./mount/rantonse1:./mount/collection2:./mount/blalange1:./mount/collection3:./mount/collection4:./mount/collection5:./mount/blalange2:./mount/collection6 \
	./pywb/collections/main/archive/

I think you understand why it would be great to have this implemented into dwarfs

from dwarfs.

mhx avatar mhx commented on July 17, 2024

That part was clear from your description. I'm more interested in how you actually build the individual archives. I assume you're creating those from the writable layer in the merged file system?

from dwarfs.

hexahigh avatar hexahigh commented on July 17, 2024

Ah, sorry about that. No, i create the archives separetely.

from dwarfs.

silentnoodlemaster avatar silentnoodlemaster commented on July 17, 2024

in my (irrelevant) opinion this functionality should be left to specialized union filesystems, like mergerfs you use in your script, or overlayfs. There is a whole list of special considerations when it comes to having multiple filesystem at one location, one of them being filename clashes.

from dwarfs.

mhx avatar mhx commented on July 17, 2024

Ah, sorry about that. No, i create the archives separetely.

And that means?

Assume I know nothing about your data (or exactly what mergerfs does in your use case).

Is there any overlap between the individual images? Or are they completely separate sets of files?

There's probably a dozen different ways to implement "mounting multiple archives to the same path".

I've looked at the mergerfs README for the last 15 minutes and it's unclear to me what exactly it does. I understand the overlayfs/unionfs approach, but mergerfs is apparently different from that. How does it behave if the same path exists in multiple branches but with different contents?

from dwarfs.

silentnoodlemaster avatar silentnoodlemaster commented on July 17, 2024

I've looked at the mergerfs README for the last 15 minutes and it's unclear to me what exactly it does. I understand the overlayfs/unionfs approach, but mergerfs is apparently different from that. How does it behave if the same path exists in multiple branches but with different contents?

in my understanding, the traditional way (overlayfs/unionfs) is to have one bottom filesystem and one or more on top, whereas mergerfs uses a merge policy (similar to git merge) that creates a virtual combination of filesystems

from dwarfs.

hexahigh avatar hexahigh commented on July 17, 2024

Kind of like ratarmount's union mounting system

from dwarfs.

mhx avatar mhx commented on July 17, 2024

Im horrible at explaining things

Your reply definitely helps, though! :)

The problem I'm having is the open questions this leaves. And I agree with @silentnoodlemaster that special/different cases should be left to special tools.

The one thing that is definitely ugly about your use case is that you have a myriad of dwarfs processes running, each of which have their own config and, much more importantly, own independent cache. This has been bugging me for a while now as I have a somewhat similar use case — tens (maybe hundreds in the future) of dwarfs images that I'd like to mount simultaneously — but for which I don't need a merged view (I'm perfectly fine if they live in separate directories).

So what I'd like to implement, and this is likely going to happen sooner than the incremental-backup feature, is a way to add to (and remove from) a running dwarfs process additional mounts that will share the same cache.

I just haven't figured out all the details yet. And then I need to find the time to do it. So don't hold your breath just yet.

from dwarfs.

mhx avatar mhx commented on July 17, 2024

Here's a quick brain dump, feel free to comment, I'd definitely appreciate feedback.

None of these will implement any kind of "merging", though.

Mounting multiple DwarFS images

Single mount of multiple images

A single mount of multiple file systems (will show up as one FUSE mount) in a single process; shared cache

dwarfs multi [<subdir1>:]<image1> [<subdir2>:]<image2> ... <mountpoint> [options]
dwarfs add [<subdir3>:]<image3> [<subdir3>:]<image3> <mountpoint> ...
dwarfs remove <mountpoint>/<subdir> <mountpoint>/<subdir> ...
dwarfs remove -m <mountpoint> <subdir> <subdir> ...

add and remove only work for multi mounts. Actually, multi might not even be needed; add alone might be good enough.

The contents of each DwarFS image would be accessible at <mountpoint>/<subdir> instead of just <mountpoint>.

dwarfs config <mountpoint>                   # show config?
dwarfs config <mountpoint> cachesize=8g      # change cache size

The config command would also work in the following scenarios.

Multiple mounts sharing the same process/cache

Multiple mounts of multiple file systems (will show up as multiple FUSE mounts) in a single process; shared cache

 dwarfs <image1> <mountpoint1>
 dwarfs <image2> <mountpoint2> -oattach=<mountpoint1>
 dwarfs <image3> <mountpoint3> -oattach=<mountpoint1>

Options that cannot be changed at run-time will report an error.

I'm definitely open for suggestions regarding a name different than attach. Or maybe even a different syntax for the command.

Multiple mounts with distinct process/cache

Multiple mounts of multiple file systems in multiple processes (current behaviour); exclusive caches

from dwarfs.

hexahigh avatar hexahigh commented on July 17, 2024

Single mount of multiple images

A single mount of multiple file systems (will show up as one FUSE mount) in a single process; shared cache

That implementation seems like the cleanest and most user friendly alternative.
I assume specifying the subdir is optional ([<subdir1>:]<image1>), which would be great as it would allow you to use globs to mount multiple images without creating a ridiculously long command.

from dwarfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.