In which combination in case of fast storage/cheap storage dedup/erasure/compress can

dedup/erasure/compress about reedsolomon HOT 1 CLOSED

klauspost commented on August 26, 2024

dedup/erasure/compress

from reedsolomon.

Comments (1)

klauspost commented on August 26, 2024

It depends a lot on your data. It is general purpose file storage, log files (streams).

The general approach should be dedup -> compress -> erasure. That will ensure the smallest amount of data.

However, a lot of the gains from deduplication comes from running it across multiple TB of data, so if you treat files as separate entities you will of course not get the main benefit.

ZPAQ includes deduplication and does it as follows:

Step 1, deduplicate into fragments of 4-64KB size.
Step 2, collect blocks until you have 16MB data.
Step 3, compress each block.

A file entry then contains information about which block/fragments are used to reconstruct each file. The 16MB size is the maximum penalty for getting a single fragment.

For a datacenter type job, I would look into the possibility of doing deduplication globally. The 'dedup' currently doesn't offer "DYI" splitting, but that could quite easily be added.

If you would like to discuss things in more detail, you are very welcome to write a mail with your business case. I would be happy to help out!

from reedsolomon.

dedup/erasure/compress about reedsolomon HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent