Comments (10)
Thank you for the fast reply.
I'm interested in a reproducible output as part of a bigger project (https://reproducible-builds.org/), comparing the files and metadata is something I want to avoid.
Section 4 of that specification states that every Ogg stream is composed of logical bitstreams with an identifying serial number, and "this unique serial number is created randomly".
Yes, I have no idea how an ogg file is structured, but by doing a diff I can see that the differences are minimal. With diffoscope it is possible to see that only serial no. and checksum differs.
And I can also confirm that the option works as you've described.
doing so makes it possible to easily create valid, chained Ogg streams by catting them together
Does it mean that catting a file to itself is not supported?
Either way, since the input file and the output file are platonically the same audio file, wouldn't it make sense to reuse the same ids? Would be there any disadvantages?
from optivorbis.
Than you for the detailed explanation.
However, it can be argued that catting (i.e., chaining) Ogg files is a very niche use case. [...] and hardly anyone on the Internet mentions it as an advantage of the Ogg Vorbis format.
Yeah, I'm not interested in catting ogg files together, especially now, knowing that cat
might create an invalid file depending on the input, and the user has no control/feedback.
I suppose there are better tools for concatenating audio files together.
The Ogg specification says that a bitstream serial number "does not have any connection to the content or encoder of the logical bitstream it represents". My interpretation of that sentence is that serial numbers should always be randomly generated, as otherwise there would be a "connection" between the content of the logical bitstream and its serial number.
Mhm, OK, I hoped it would have be a way to have a good compromise: non-random ids, and different ids for file that had different ids before the optimization.
Since optvorbis is optimizing an existent file, I also thought that changing ids might be out-of scope, as the two files should behave the same.
It seems to me you would prefer to creating random id by default, and non-random explicitly (the status quo).
Obviously I would prefer it to be the other way round XD
Would it at least be possible to support the environment variable SOURCE_DATE_EPOCH
(see https://reproducible-builds.org/docs/source-date-epoch/)?
Yes, I can set the flag --remuxer_option randomize_stream_serials=false
, or define a shell alias.
The main advantage of SOURCE_DATE_EPOCH
is that it is a standardized environment variable; a single API for controlling multiple tools (compiler, archiving tools, ...).
Being an environment variable, it works "well" when I'm not invoking optivorbis
directly; for example if it used internally by some other tool.
from optivorbis.
Just thinking out loud; I think with
ffmpeg
it is possibly to concatenate without transcoding/reencodign the audiostream, but maybe only in some situation (like equal sample rate)?
It may be possible, but I couldn't get FFmpeg to output an Ogg Vorbis file with two logical bitstreams, as cat
would do. It looks like the lossless concatenation features offered by FFmpeg insist on outputting a single logical bitstream, which, unlike the cat
ting approach, can only work well if two Ogg Vorbis files share exactly the same codec parameters (number of audio channels, sampling frequency, and codec setup data that OptiVorbis adapts to each file to achieve more efficient encoding).
I do not think I've understand that part
I was just thinking out loud how that idea would prevent repeating serials under most circumstances, don't worry about the specifics! I'm happy to read that the environment variable is an acceptable solution for you, and I'm also happy to not make things harder for any folks out there who might use Ogg stream chaining by default 😉
I'm reopening this issue to better track progress on the related reproducible builds standard.
from optivorbis.
Hey @fekir, I've just implemented support for the SOURCE_DATE_EPOCH
environment variable! 🎉
Please feel free to look at the corresponding commit above and give feedback on the changes. There hasn't been a release with these changes yet, so you'll need to get OptiVorbis executables from CI (which is currently broken due to a regression in rustdoc that should be fixed soon) or by building from source.
I'll close this issue as soon as I receive positive feedback about the new feature or I decide to release it, whichever happens sooner.
from optivorbis.
Thank you for looking at it.
I did not test it (I do not have the possibility to build it right now), but the changes you made makes at least sense :)
from optivorbis.
Hi, thanks for reaching out! 😄
By default, OptiVorbis strives to adhere closely to the Ogg format specification. Section 4 of that specification states that every Ogg stream is composed of logical bitstreams with an identifying serial number, and "this unique serial number is created randomly". Therefore, it is expected that OptiVorbis will generate slightly different files for the same input by default, since the logical bitstreams they contain should have different serial numbers.
However, if reproducibility of the generated files is your uttermost concern, you can opt out of randomizing stream serials by using the already available randomize_stream_serials
remuxer option, like this: optivorbis --remuxer_option randomize_stream_serials=false file1.ogg file2.ogg
(see also the related first_stream_serial_offset
option). After doing this, the generated files for the same input and options should always be byte-identical (i.e., have the same hashes).
Please note that randomly generating stream serials is not just a pedantic requirement of the Ogg specification: doing so makes it possible to easily create valid, chained Ogg streams by cat
ting them together, which would not be possible if serial numbers were shared across files. (Source.)
If you want to verify that OptiVorbis has indeed generated equivalent audio files for the same inputs without giving up on random serial number generation, I suggest comparing the decoded audio samples and other file metadata instead. You can do this by comparing the output of the ogginfo
and oggdec
command-line tools for the generated files.
I'm closing this as I believe the feature you requested is already available and this comment provided the necessary context to use it properly, but please feel free to get in touch if you have anything else to add.
from optivorbis.
I'm glad to hear that the option worked well for you!
Does it mean that catting a file to itself is not supported?
Yes, that's correct. Chaining an Ogg file to itself with cat
will not work, because it would create an invalid physical Ogg bitstream, with at least two logical bitstreams having the same serial number. Decoders strictly require that the serial numbers within a file are different: otherwise they can't properly parse and seek each bitstream.
Either way, since the input file and the output file are platonically the same audio file, wouldn't it make sense to reuse the same ids? Would be there any disadvantages?
The Ogg specification says that a bitstream serial number "does not have any connection to the content or encoder of the logical bitstream it represents". My interpretation of that sentence is that serial numbers should always be randomly generated, as otherwise there would be a "connection" between the content of the logical bitstream and its serial number.
On the practical side, the default behavior of generating random serial numbers allows cat
ting a file to an equivalent version of itself, and also enables cat
ting of files that share stream serial numbers, where such numbers were only unique within their own files. Passing through serial numbers would not guarantee these properties, which I think are nice to have by default.
However, it can be argued that cat
ting (i.e., chaining) Ogg files is a very niche use case. Frankly, I didn't know about it until I did the necessary Ogg format research to develop OptiVorbis, and hardly anyone on the Internet mentions it as an advantage of the Ogg Vorbis format. The fact that two different files share bitstream serials does not matter for decoding: after all, decoders have to work with each file independently, the possibility of serial number collisions between files is not zero, and that serial numbers are generated "randomly" is hard to test for. So, if your users will never chain Ogg files, not randomizing serial generation can be the right call. Besides the chaining use case, I'm not aware of any other drawbacks to not randomizing serials.
Given these considerations, I don't think I will introduce functionality to copy stream serial numbers in OptiVorbis, but you are welcome to disable their randomization if that fits your use case better. If you are inclined to copy stream serials anyway, you might want to fetch the original serial with ogginfo
, and then use the randomize_stream_serials
and first_stream_serial_offset
options to tell OptiVorbis to use that serial for the bitstream it will generate. Alternatively, you can use the rogg_serial
tool, which is part of the rogg
suite of tools mentioned in the OptiVorbis README.
from optivorbis.
[...] I suppose there are better tools for concatenating audio files together.
The unusual beauty of chaining is that, unlike joining audio files together with tools such as Audacity, GStreamer or ffmpeg, no audio data is re-encoded, which is lossless and much faster. But this feature is little known, poses problems with some popular decoders, and has the pitfalls we have identified, so it's barely used in practice. A proper tool for Ogg Vorbis file chaining would ensure that serials are unique and then cat
the files together, but I'm not aware of its existence. The necessary building blocks are already publicly available, though.
Would it at least be possible to support the environment variable
SOURCE_DATE_EPOCH
(see https://reproducible-builds.org/docs/source-date-epoch/)?
I think this is a great idea! The way I see it, it could work by fixing an RNG algorithm and seed, and then perturbing the resulting predictable RNG sequence with a checksum of the input data. This way, different files in different OptiVorbis runs would still have random but reproducible and different serial numbers, while the same files optimized in the same OptiVorbis run (which is currently only possible via the Rust API, not the CLI) would still get different serials. This could accommodate all of the use cases I'm thinking of quite nicely, what do you think? 😄
from optivorbis.
[...] tools such as Audacity, GStreamer or ffmpeg, no audio data is re-encoded
Just thinking out loud; I think with ffmpeg
it is possibly to concatenate without transcoding/reencodign the audiostream, but maybe only in some situation (like equal sample rate)?
what do you think
Well, I would still prefer not have to defined the variable (it was a surprise for me that different runs on the same input generate different outputs, just lucky me that I decided to check), but it sounds good.
the same files optimized in the same OptiVorbis run (which is currently only possible via the Rust API, not the CLI) would still get different serials
I do not think I've understand that part, unless you meant
the same files optimized in the same OptiVorbis run (which is currently only possible via the Rust API, not the CLI) would still get same serials
or
the different files optimized in the same OptiVorbis run (which is currently only possible via the Rust API, not the CLI) would still get different serials
from optivorbis.
After some other QoL changes, OptiVorbis v0.2.0 has been released with this feature! 🎉
Sorry it took me so long to make a release. I hope it was at least worth the wait 🙂
from optivorbis.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from optivorbis.