Coder Social home page Coder Social logo

Comments (13)

owlshrimp avatar owlshrimp commented on August 14, 2024 14

If there is functionality around encryption that is known to cause corruption, there really ought to be an unavoidable warning in the software to act as a catch. Not everyone reads all relevant documentation before executing each command, and it seems like issues with encryption have been around long enough for software warnings to be implemented (this isn't a freshly-discovered bug).

I know of plenty of people personally who presumably didn't see any warnings, just commands that went through cleanly, when turning on encryption and assumed that if it's shipping in ZFS it must be safe.

from openzfs-docs.

rbrewer123 avatar rbrewer123 commented on August 14, 2024 6

@Matthew-Bradley that's a great idea. Sounds to me like both the documentation and the software should warn against enabling ZFS-native encryption. This is how the ZFS community can show respect and courtesy to users, especially considering that ZFS-native encryption has been causing real headaches and time-loss for real people for years now.

from openzfs-docs.

h-2 avatar h-2 commented on August 14, 2024 6

This is news to me. I assumed that zfs native encryption is not as fast as native encryption, but that it is stable.

Could someone more knowledgeable please clarify whether encryption is considered unsafe in general, or whether it is unsafe in combination with other features or usage patterns, and if so with which?

from openzfs-docs.

AndrewJDR avatar AndrewJDR commented on August 14, 2024 6

I'm going to try to steer things back on topic to the idea of adding some warnings about this feature.

  1. Even if these issues can always be worked around by rebooting and scrubbing twice (btw, you can find reports where this is not the case), the requirement of unexpected reboots and multiple scrubs rules out many production use cases, and so it would still make some sort of warning in the documentation and/or tools justified. You can see @wohali toward the later half of openzfs/zfs#11688 (comment) articulating this clearly. Forced reboots and scrubs are something we can all probably tolerate on home/test lab servers, not on production systems that have hundreds/thousands of people depending upon it 24/7 and where scrubs can take hours or days.

  2. Unfortunately, we also know that other sorts of issues exist with native encryption, because a zfs contributor has testbed reproduction cases that trigger: a) kernel panics b) corruption of encryption key data that requires special recovery that a non-zfs developer will probably not know how to perform. See my OP for more information on this.

from openzfs-docs.

wdoekes avatar wdoekes commented on August 14, 2024 2

(Disclaimer: I'm not nowledgeable about zfs internals. But I am an experienced user.)

We've been running encryption on Ubuntu systems since it was available in the distro. We have never had any corrupt data.

Until recently, the only problem we had was with Ubuntu/Jammy and a missing patch, which caused snapshots to be unmountable. The patch (or a send/recv loop) fixed that: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1987190

But, recently we did observe the send/recv issues that people are talking about. This did make snapshots that were recently made unavailable/useless, although the data does appear to be there (according to zdb). This is tracked in #15474 but it very much looks like #12014.

openzfs/zfs#15474 (comment) (wdoekes, Nov 2023)

Reading with zdb -r does seem to work: [...]

openzfs/zfs#12014 (comment) (aerusso, May 2021)

In my case, openzfs/zfs#11688 (which you already reference), I've discovered that rebooting "heals" the snapshot

openzfs/zfs#12014 (comment) (jgoerzen, May 2021)

After a reboot but before a scrub, the zfs send you gave executes fine.

openzfs/zfs#12014 (comment) (J0riz, Nov 2023)

Rebooting the server and running two scrubs afterwards resolves the error.

Usage pattern that appears to trigger this issue:

  • a quick succession of zfs snapshot + zfs send (without --raw) of the same or different datasets

So. Yes, I would prefer that the bug gets fixed. But if you're willing to put up with maintenance of the occasional snapshot failures - which might never happen - then I think you should be fine.

from openzfs-docs.

AndrewJDR avatar AndrewJDR commented on August 14, 2024 2

I've added a draft of a warning message to the OP. It can of course be adjusted, but it is just there to get the ball rolling.

from openzfs-docs.

rincebrain avatar rincebrain commented on August 14, 2024 2

I've personally given up trying to fix native encryption issues after the project's continued refusal to acknowledge they fucked up by merging this. I do not have the energy to argue any more that introducing a 1% chance of lighting your shit on fire in a project that's supposed to be about "reliability" where one did not exist before is a catastrophic failure, or that "it's not technically data loss because someone could write tooling to recover it" doesn't really matter if you don't have the tooling in hand, that's still data loss from everyone else's perspective, or "snapshots can cause you to get IO errors if you're doing send/recv at the same time" is a sign of how badly this is broken.

Of course, as always, leadership will probably be along shortly to claim there's no issue, and that it would be bad PR to admit there's an issue, and that's why they won't warn people, like the last 2 or 3 times I've asked them to do this.

The reason the reproducer system I have is difficult to debug is that it's a little sparc box, and the race in question in openzfs/zfs#11679 is very finicky, so A) it being a sparc box means most of Linux's kernel debugging tools just laugh at you and don't run, and B) if you add too many debug prints, the timing gets less reliable, so you can't just get all the information you want out reliably.

from openzfs-docs.

h-2 avatar h-2 commented on August 14, 2024 2

I'm going to try to steer things back on topic

I do think it is on-topic to try to document as best as possible when these issues occur. It would strengthen the case for putting up a warning and help readers of such a warning to make an informed decision.

from openzfs-docs.

owlshrimp avatar owlshrimp commented on August 14, 2024 1

I do think it is on-topic to try to document as best as possible when these issues occur.

Agreed. There should be some clarity exactly *what* is being warned against.

Whether a specific warning/lockout is warranted when using certain features in combination with native encryption, depends on whether the problems can *truly* be narrowed to specific combinations. Right now is looks like the answer is No, with kernel panics and random data corruption in the mix. Even if that is disregarded, corruption with native encryption seems from this thread to impact multiple features across snapshots, send/receive, and scrubbing. In these latter two cases (panics and widespread impact) top-level warnings against enabling native encryption in both the documentation and tools must be part of the solution. Some tooling friction against enabling it may also be warranted.

A corresponding issue should probably be set up in https://github.com/openzfs/zfs/issues or similar to track changes to the tooling.

The internet at large has already picked up on this (I first became aware of it from a phoronix article). The best thing the project can do is put strong safeguards in place to stop the flow of people being bitten. People should be absolutely certain that they can't do something dangerous without at least running into a warning or error.

from openzfs-docs.

AndrewJDR avatar AndrewJDR commented on August 14, 2024 1

Absolutely agreed that having as much clarity as we can achieve is a good thing. If anyone has suggestions on tweaks for the warning message based on what you've learned, please chime in -- the draft is in the OP. I've tried to clarify it as much as I can, based on what I've been able to learn from the publicly available information. If someone wants to work in the testbed results from @rincebrain, that seems fine as well. I thought about doing it, but struggled with how to phrase it.

Personally, I think it's important not to make the message "too scary", because that can provide folks an opening to muddy the waters with comments like "Well, I've been using it fine for years!", which while not untrue, doesn't really help the many people that try it and run into the issues. This is why the current draft of the warning mentions that many have been able to use it without issue.

from openzfs-docs.

bill-mcgonigle avatar bill-mcgonigle commented on August 14, 2024

It says here:

For the time being i suggest you to make sure you don't create or delete snapshots while an unencrypted send is running. If you only do raw encrypted zfs sends, the problem does not occur.

It seems better to have Known Issues than general guidance to not use encryption, unless there are totally unknown causes to verified and unsolved problems. But, yes, docs for any feature should advise people to consult Known Issues when they exist.

from openzfs-docs.

mabod avatar mabod commented on August 14, 2024

@rincebrain : In your reddit post you say that you are able to reproduce "one" encryption issue 50 % of the time on our test system. Which issue is that? And why cant it be further debugged if it is reproducible?

from openzfs-docs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.