Comments (6)
Wow, so many questions! I'll try to answer them all:
0
tpxz
is just similar to how other tar things are named. Eg:tar.gz ==
.tgz,
.tar.bz2 ==.tbz2
. So if pixz creates.tar.pxz
files, it can also call them.tpxz
- No idea how popular it is, I don't know how I'd even find out
- I'm not aware of any other readers or writers of this format
- There's one more advantage of pixz over
xz -T
, it does parallel _de_compression. But yes, the indexed format is also useful!
1
- Nice diagram, thanks! It looks correct
- The number in the footer offset isn't really used, as far as I can remember. I'm not sure what I put it there for in the first place!
2
- Correct, there's no padding before the file index
- To find the file index, we just use the last XZ block. We don't actually use the footer offset, unless I'm misremembering
3
- The file index needs its own block so that we can find it easily, by just looking for the last XZ block! (see above) You're right that we could theoretically use the footer offset instead to find it, and then just append the index to the tar data. But I wanted to keep it simple.
4
- Sure, you technically could save a byte by dropping the null byte
5
- Yup, we don't do anything special to file names
6
- Pixz's block sizes are controllable through a variety of parameters. The -1, -2, etc compression levels will affect it, as will -e or -f. See the manpage for details. Pixz can handle different block sizes just fine.
- The file index is always stored in just one block. Since we don't parallelize it at all, and it's structured pretty sequentially, there's not much point to splitting it.
- Pixz is happy reading blocks of different sizes, so you could certainly attempt to optimize where block boundaries exist. I'm not sure it makes much of a difference to performance in most use-cases, but maybe if you have something very specific in mind
from pixz.
Thank you so much for your detailed reply, you're awesome! 😍
So the only thing unclear is the offset in the file index footer, i.e. the very last 8 bytes. Could I rely on it to find the beginning of the file index? Doing so would avoid readers to know about the xz format at all, which could make the implementation easier.
I understand that when creating a tpxz file, the file index must be alone in the last xz block, but my point is about reading a tpxz file.
from pixz.
What kinda fits to this:
It would be nice if pixz manpage could clarify, whether or not it's created files (i.e. also those with indexing) are expected to be compatible with the "standard" xz-utils or not.
from pixz.
It would be nice if pixz manpage could clarify, whether or not it's created files (i.e. also those with indexing) are expected to be compatible with the "standard" xz-utils or not.
@calestyo: yes, they are compatible with "standard" xz-utils:
- in non-tar mode (no pixz indexing): the created files are regular valid xz files, that can be decompressed with
unxz
- in tar mode (with pixz indexing): the created files are compatible with
tar xJ
(i.e. GNU tar usingxz -d
to decompress, other implementations of tar are expected to work as well); the only change is that the generated tar file (within the xz compression) has some data after the tar's end-of-archive, which is expected to be ignored by tar readers
from pixz.
Yes, and note that even in tar mode, the files are still 100% compatible with xz/liblzma.
So the only thing unclear is the offset in the file index footer, i.e. the very last 8 bytes. Could I rely on it to find the beginning of the file index? Doing so would avoid readers to know about the xz format at all, which could make the implementation easier.
Hm, how will you seek to the last 8 bytes of the file-index without understanding the xz file format? You'll need liblzma (or equivalent) to do that, and once you have liblzma it's not really any harder to ask "please go to the last block in the file".
Plus there are other downsides:
- Since you're checking the footer before the magic, if the file doesn't have a pixz-file-index, you'll see totally invalid data when you try to read the footer. But you'll have no way to know it's invalid, so maybe you end up seeking to a random point in the file. Dealing with that sounds unpleasant.
- liblzma (or whatever else you're using to seek in an xz file) will have to read & decompress the entire file-index just to get to the footer, since it's all in one block. It's pretty silly to decompress all that data just to get 8 bytes, then throw it out, and read/decompress it all over again.
from pixz.
Hm, how will you seek to the last 8 bytes of the file-index without understanding the xz file format? You'll need liblzma (or equivalent) to do that, and once you have liblzma it's not really any harder to ask "please go to the last block in the file".
Well, for example, the native lzma module in Python allows to decompress xz files into streams without having information about xz block boundaries. I will not be able to use that anyways because seeking to xz block 2 would mean decompressing block 1 for nothing with that implementation, but it's was an example.
It's pretty silly to decompress all that data just to get 8 bytes, then throw it out, and read/decompress it all over again.
Very good point! For some reason I completely miss that, my bad.
So I'm convinced! I will definitively rely on pixz file index to start with the last xz block instead of using the offset in the file footer.
from pixz.
Related Issues (20)
- configure: error: AsciiDoc not found, not able to generate the man page. HOT 3
- Error decoding stream footer when trying to decompress a 3.1 TiB .tpxz file HOT 8
- cppcheck 2.8 warnings about uninitialized variables
- Crash when using -x option
- What is the default level of compression? HOT 2
- Can't compile on Fedora 38 HOT 2
- msys2 build failure HOT 18
- Indexes HOT 2
- Clarify README section on differences with xz HOT 5
- any plans for another release soon? HOT 1
- Server mode HOT 1
- -k should be the default HOT 2
- concatenation of *xz files and then decompression using pixz HOT 2
- Building On Windows HOT 2
- manpage not installed if building from release tarball HOT 2
- build env question not package liblzma HOT 1
- Error creating block encoder HOT 3
- Syntax for converting existing tar.xz archive to indexed pixz file? HOT 1
- Random failures when compressing large directories HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pixz.