v6ak / qubes-incremental-backup-poc Goto Github PK
View Code? Open in Web Editor NEWproof of concept of incremental backup scheme for Qubes
proof of concept of incremental backup scheme for Qubes
TODO: define
VM call can return huge amount of data, DoSing the dom0. In case of a wrong implementation (which seems to be limited to just wrong buffer operation, so not so likely), it could even overflow.
Two different encodings are used, ASCII and UTF-8. ASCII is much simpler, but UTF-8 might be needed in some cases (e.g. passphrase with national characters).
When UTF-8 is parsed from untrusted source, it adds some attack surface. So, I should review if UTF-8 is used only when not controlled by attacked. (I believe so, but it should be still verified.)
Firewall rules, netvm, storage size, template, …
e.g. master password salt, bcrypt parameters etc.
Those metadata are public and should be kept with each backup. It allows the backup to be restorable without any further metadata.
I would like to use this to backup my personal laptop. How can I do that?
The main benefit I can see is starting the DVM while entering the password.
Calling sync
in VM's shell is not much universal. Theoretically, ti can also cause something completely different. So we should use some RPC endpoint for FS sync when it is implemented.
This is followup of #13.
It can be currently achieved by having separate backups handled through separate BackupStorageVMs. But it is not so convenient.
Related issue in another project: duplicati/duplicati#1265, duplicati/duplicati#479
When ignoring some files, it would be useful to ignore them in:
a. all VMs
b. in all VMs based on a particular template
It would be useful to have a global exclude lists that could do this instead of hardcoding such files to scripts.
This should not be hard and should handle various edge cases of VMs we cannot currently backup.
Currently, we use Duplicity. The reason is not that it was carefuly chosen as the best one. The reason is I have some experience with it, despite I chose it in past for quite a different scenario. So, I am collecting info about backup backends in order to decide well: https://docs.google.com/spreadsheets/d/1rUXn8VkR5nrrtDhywKBpNu2zuTHzOHDX6F053ynBSjw/edit?usp=sharing
Legend for features:
Legend for first column:
What do we want:
We will want at least one file-based backend and one block-based backend (qvm-backup or similar).
If you can fill in some missing info or suggest another great backend, write it here, please!
Crypto is currently handled by openssl CLI. (Except for scrypt, which calls Perl.) See cryptopunk.py file. This is a high-latency solution that makes some assumptions (attacker can't read /proc, which is justifiable in dom0) that I would like to get rid off.
Maybe the best available alternative would be nacl/sodium. For Python, we could use python3-pynacl or python3-libnacl. But I can't see those packages in Fedora 23. If Qubes 4.0 updates dom0 to Fedora 24 or newer (not sure), we can go this way. Until then, I'd wait. Or maybe someone will suggest a better solution that we can use without waiting for Qubes 4.0.
Long-term metatask.
Target scenario:
Results of the procedure:
TODO: list relevant issues.
Currently, we set logger to “some logger”, which is something nonexistent. Think what could be done better.
Also, we do some independent logging to /tmp, which is just a quick hack.
We should hide VM names from filenames. Options:
a. Encrypt.
b. Hash.
c. Create a table.
Maybe it would be handy if we have direct access. This disqualifies encrypted names with explicit IVs.
Hashes obscure VM name length (which is not the only way to obscure it, though), but are impractical if you cannot enumerate the VM names./
Encrypted file with table seems to be rather hard approach.
Qubes 4.0 will be able to create a DVM from any AppVM. By default, we should use DVM of the template the VM belongs to.
Rather a simple key-value interface:
The interface should be very similar to BackupStorageVM<->dom0, but dom0 has to verify the permissions and maybe handle encryption.
Directory structure would be implemented on top of the mentioned key-value storage as Merkle tree.
This has to be elaborated. It does not seem to be easy, if even possible.
Until #6 is implemented, one should be able to pass private.img size.
Expected behavior: It exits with a proper error message.
It should not expose the keys, because it could cause accidental exposure.
TODO: define
Related to #14, but this time, we want something that will check the software quality itself.
Maybe we should have mostly integration tests, because we mostly integrate various pieces of software.
Currently, VM is created with final name and then restore is performed. If restore fails, some partially restored VM with good name exists.
When a VM with a correct name appears, it should have correct content.
Proposed behavior:
This is a challenging task.
We could use passphrase to derive password directly. But this would skip the master secret derivation, essentially bypassing all custom-configured password-stretching parameters. This is bad in long term, as this does not allow to use better key-stretching parameters in future without breaking compatibility. It also cannot be salted by anything else than storage URL and username. Salting with storage URL and username has some drawbacks (mostly the need of exactly same URL and username, even if the backend tolerates some deviation like case), but they are probably justifiable.
We could also download some public data from the backup storage (this can hardly be storage-agnostic) to get key derivation parameters. Those key stretching parameters have to be considered as untrusted. This implies:
Another disadvantage: This can increase practical value of shouldersurfing attacks.
However, maybe the hassle with design and implementation and all the risks are simply not worth of the enhancement.
*) Also anyone who can attack the connection can do this. So, the connection to backup storage is a new weak point.
We should ensure it does not leak any sensitive data.
Ideas:
“Clone” is way too generic, so it can collide with some other software. We should use something more specific.
Various checks can be considered:
a. Backup can be restored without errors.
b. Compare data from backup and real system. (Challenge: exclusions.)
c. Perform some user-defined VM-specific test.
(And possibly some other options.)
Pros:
Cons:
Since one can do this manually, I am postponing it away from MVP.
The DVM that performs backup (BDVM) needs no network access. According to principle of least privileges, it should not have it.
If BDVM had no direct access to the Internet, the adversary would not be able to get the Internet access and deanonymize the user this way. However, advantage of BDVM without Internet access is somewhat limited there. If adversary has an access to the backup storage, she can deanonymize the user anyway. Offloading encryption from BDVM could help partially, but attacker still would be able to observe backup sizes.
When installing BackupStorageVM tools to non-standalone and non-template VM, the policy files do not survive reboot. Fix it.
It might be reasonable to backup just package lists and few files instead of the whole filesystem.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.