Comments (2)
I thought a bit more about this, especially how it will work with database recreate.
I think the only possibility is to record the blocks in the DuplicateBlock
table, because a database recreate can't know which version of the block should be the deleted one. Instead it will mark all of the duplicate blocks, and only record the first occurrence in the Block
table.
To make this work properly, I suggest these changes:
- When moving the blocks to the new volume, put the existing one in the duplicate block table
- When the old volume gets deleted after uploading the new one, the duplicated blocks from that volume also need to be cleared from the table
- In the wasted space calculation, treat duplicated blocks as wasted space (same as deleted blocks)
- In the test operation, check the duplicated blocks and don't treat them as errors or extra hashes (this should also fix all other cases of "extra hash" after a database recreate)
- There needs to be a way to have duplicated deleted blocks (after a duplicated block is deleted)
- (Optional) With database recreate, make sure that all but the last (based on volume timestamp) occurrence of the block are marked as duplicate. This would make sure that after an interrupted compact, the recreated database also has the duplicated blocks in the un-compacted volume, instead of the new one. This makes sure no space is wasted in the new volume, and the duplicates will disappear with the next compact
- (Optional) When compacting a volume (maybe even in waste calculation?), check whether there are duplicate blocks in other volumes, that will not be compacted. Then move the "real" Block entry to one of those volumes, and remove it from the compacted volume. This would over time compact away any duplicated blocks, no matter how they are distributed
The optional changes seem low-priority to me, because this entire case should only happen rarely and it is better to be correct than space efficient.
Originally posted by @Jojo-1000 in #4967 (comment)
An alternative solution is to temporarily allow an inconsistent database. In combination with #4982 the next compact would remove the extra blocks. However, the space calculation would be incorrect in the mean time.
from duplicati.
Thanks for the continuing work. I've been digging through history in 2022 and 2023 about this bug, as it's been a long story.
Taken from merged PR
I assume this references PR of original post which got only partially merged in 2023, taking only more critical missing file error.
The Extra: hashes seen in test
with full-remote-verification
had been hoped to be an annoyance without other impact.
I'll need to read this awhile longer to see if that has changed, but I'll say that one thing that bothers me about the collection of imperfections that Duplicati gains in its records over time worry me, as they might show up as mystery issues after more runs.
They're also potential land mines for new code that makes assumptions about the data being by-the-book, when it isn't really. Having said that, finite resources tend to focus on the things that break right now, and I can't disagree. Glad we're catching up.
Another option is to write a book, so "by-the-book" has meaning. Document all the oddities, e.g. the NULL and sentinel values. People coming in from the Duplicati Inc. side might want that as training, and the pieces that I post in the forum only go so far.
So that's an editorial break, but going back to March 2022, I found one note in my files on a possible test case which might be
test all with full-remote-verification shows "Extra" hashes from error in compact #4693 (but it's not the network error, as here)
and has a note on it that it may be fixed by
Fix "test" error: extra hashes #4982 which is still a draft and is not the OP PR. I find its note repeating editorial that I said above:
Reuse existing deleted blocks to prevent uploading duplicated copies that can cause problems later
Ultimately I think we need some opinion on this that is more expert than either of us, and I'm glad that might be possible again.
Even the top expert might still have some limits, as I've been asking annoying questions about correctness of the commit design, including how well it handles different threads doing commit based on their needs. Is a good time for one also good for the rest? Hoping to get a design document from someone sometime on the concurrency plan and its interaction with transaction scheme.
from duplicati.
Related Issues (20)
- Install fails on Ubuntu 24.04 LTS - unmet dependencies HOT 7
- Box.com authorization no longer works HOT 7
- Tray icon didn't show unpause on computer wake, while GUI icon did HOT 5
- I can't delete the backup job HOT 1
- Not working anymore auth_username ignored OpenStack v3 Infomaniak (2.0.8.1_beta_2024-05-07) HOT 5
- Missing null check in WindowsSnapshot.Dispose HOT 1
- Improve MSI HOT 2
- Direct restore fails with non-default blocksize HOT 7
- Bypass Duplicati Login Authentication Using DB Server-Passphrase HOT 2
- SQL statements using double quotes fail on FreeBSD HOT 3
- Remove the `use-ssl` flag and logic HOT 4
- duplicati-2.0.8.1-2.0.8.1_beta_20240507 rpm package executables do not have the correct mode HOT 1
- Incompatible NuGet packages for .NET 8 HOT 2
- New special environment variable DUPLICATI_BACKUP_SOURCES HOT 1
- AWS S3 region eu-south-2 not recognized HOT 2
- Upgrade to v2.0.9.100_canary_2024-05-30 loses existing backup configs HOT 10
- Tray icon disappears when backup starts HOT 4
- Duplicati crashes when backup starts, with no logs HOT 2
- how to do full backups? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duplicati.