Coder Social home page Coder Social logo

Comments (2)

Jojo-1000 avatar Jojo-1000 commented on June 26, 2024

I thought a bit more about this, especially how it will work with database recreate.

I think the only possibility is to record the blocks in the DuplicateBlock table, because a database recreate can't know which version of the block should be the deleted one. Instead it will mark all of the duplicate blocks, and only record the first occurrence in the Block table.

To make this work properly, I suggest these changes:

  • When moving the blocks to the new volume, put the existing one in the duplicate block table
  • When the old volume gets deleted after uploading the new one, the duplicated blocks from that volume also need to be cleared from the table
  • In the wasted space calculation, treat duplicated blocks as wasted space (same as deleted blocks)
  • In the test operation, check the duplicated blocks and don't treat them as errors or extra hashes (this should also fix all other cases of "extra hash" after a database recreate)
  • There needs to be a way to have duplicated deleted blocks (after a duplicated block is deleted)
  • (Optional) With database recreate, make sure that all but the last (based on volume timestamp) occurrence of the block are marked as duplicate. This would make sure that after an interrupted compact, the recreated database also has the duplicated blocks in the un-compacted volume, instead of the new one. This makes sure no space is wasted in the new volume, and the duplicates will disappear with the next compact
  • (Optional) When compacting a volume (maybe even in waste calculation?), check whether there are duplicate blocks in other volumes, that will not be compacted. Then move the "real" Block entry to one of those volumes, and remove it from the compacted volume. This would over time compact away any duplicated blocks, no matter how they are distributed
    The optional changes seem low-priority to me, because this entire case should only happen rarely and it is better to be correct than space efficient.

Originally posted by @Jojo-1000 in #4967 (comment)

An alternative solution is to temporarily allow an inconsistent database. In combination with #4982 the next compact would remove the extra blocks. However, the space calculation would be incorrect in the mean time.

from duplicati.

ts678 avatar ts678 commented on June 26, 2024

Thanks for the continuing work. I've been digging through history in 2022 and 2023 about this bug, as it's been a long story.

Taken from merged PR

I assume this references PR of original post which got only partially merged in 2023, taking only more critical missing file error.
The Extra: hashes seen in test with full-remote-verification had been hoped to be an annoyance without other impact.

I'll need to read this awhile longer to see if that has changed, but I'll say that one thing that bothers me about the collection of imperfections that Duplicati gains in its records over time worry me, as they might show up as mystery issues after more runs.

They're also potential land mines for new code that makes assumptions about the data being by-the-book, when it isn't really. Having said that, finite resources tend to focus on the things that break right now, and I can't disagree. Glad we're catching up.

Another option is to write a book, so "by-the-book" has meaning. Document all the oddities, e.g. the NULL and sentinel values. People coming in from the Duplicati Inc. side might want that as training, and the pieces that I post in the forum only go so far.

So that's an editorial break, but going back to March 2022, I found one note in my files on a possible test case which might be

test all with full-remote-verification shows "Extra" hashes from error in compact #4693 (but it's not the network error, as here)

and has a note on it that it may be fixed by

Fix "test" error: extra hashes #4982 which is still a draft and is not the OP PR. I find its note repeating editorial that I said above:

Reuse existing deleted blocks to prevent uploading duplicated copies that can cause problems later

Ultimately I think we need some opinion on this that is more expert than either of us, and I'm glad that might be possible again.

Even the top expert might still have some limits, as I've been asking annoying questions about correctness of the commit design, including how well it handles different threads doing commit based on their needs. Is a good time for one also good for the rest? Hoping to get a design document from someone sometime on the concurrency plan and its interaction with transaction scheme.

from duplicati.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.