aomediacodec / libiamf Goto Github PK
View Code? Open in Web Editor NEWReference Software for IAMF
License: BSD 3-Clause Clear License
Reference Software for IAMF
License: BSD 3-Clause Clear License
Hello~
Common Issue(Opus codec):
Issue 1: preskip value is 0 in codec_config_metadata { }
Refering to spec, Pre-skip shall be same as the number of audio samples to be trimmed at the start of substreams.
Issue 2: relative_offset of OBU_DATA_TYPE_PARAMETER is 0
Refering to AOMediaCodec/iamf#331, In my opinion, the relative_offset of parameter is same with the value of trimming start.
Issue 3: no edit list box in mp4 file.
Refering to spec, Take the number of trimmed samples from the IA sequence and converting it to the sample rate used by the elst boxes.
Issue 4: After decoding, the wave quality is poor comparing with original wav file.
I think, this could be resolved by improving the bitrate setting, I guess you have set with a low bitrate.
Individual Issues:
Issue 5: test_000015, the parameter_id in parameter_block_metadata{} is 100 and 101, but in element_mix_config{} and element_mix_config{}, both parameter IDs are 100, not matching.
Thanks.
In the case of test using opus codec, the number of samples in test_0000**decoded_substream*.wav is as short as pre-skip.
For example, decoded substream wav file of test_000020 has 23688(=24000-312) samples instead of 24000.
Please check test_000020-28 and 32-35 and 49-56.
Thanks.
I tried understanding what the tests were about. I opened this test proto and looked at the human_readable_description: https://github.com/AOMediaCodec/libiamf/blob/main/tests/test_000056.textproto#L5.
It says a "A variant of test_000055". test_000055 pointed me to test_000054, which pointed me to test_000050 ... and recursively I ended up in test_000004 which does not exist.
Generally, the tests should be self understandable, maybe having a list of keywords for the features they exercise would help.
mdhd.duration shall be equal to (summation of stts.sample_deltas - elst.media_time).
current mdhd.duration of test_000056_s.mp4 is 240034.
the correct number of mdhd.duration is 240034 - 312(=239,722).
the number of output audio samples of test_000056_s.mp4 is same to 239,722(=240034 - 312)
currently, IAMF specification assume that PTS and DTS are same like other ordinary audio codec.
Instead of ctts which has the preskip value, according to https://aomediacodec.github.io/iamf/#isobmff-singletrack-basicencapsulationscheme, the preskip value is stored in elst.media_time.
Though ctts is useful for ordninary video codec which is PTS and DTS are generally different,
in IAMF bitstream, ctts box in stbl box is not helpful.
I think that ctts box makes our IAMF decoder misunderstand,
it should be omitted.
the last frame obu only has 64 samples, not 128.
test_000000_3.textproto:
codec_config_metadata {
codec_config_id: 200
codec_config {
codec_id: 0x6970636d # "ipcm"
num_samples_per_frame: 128
roll_distance: 0
decoder_config_lpcm {
sample_format_flags: LPCM_LITTLE_ENDIAN
sample_size: 16
sample_rate: 16000
}
}
}
The input bitstream for test_000051 to 000056 seems to be part of the wav file audiolab-acoustic-guitar_2OA_470_ALLRAD_5s.wav.
Is is correct that the input bitstream has 239,722 samples while the wav file has 240,000 samples?
If it is, could you describe which part of the wave file matches the input bitstream on the textproto?
or, you can provide the input bitstream as a separate wav file.
Hi,
I think the cases of opus coding, sample file contains strange signals.
Even though we trim as many samples as pre-skip, there is additional strange data of pre-skip size.
If trim_at_start was not operated in the decoder, there was a strange signal as twice as the size of pre-skip.
Therefore, it seems that the back part of the input signal is not included.
(test_000020~000028 etc. the cases using opus coding)
For example)
Input sample: 24000
num_samples_per_frame: 128
What I expected:
312(pre-skip) | 24000(input sample) | 8(padding for num_samples_per_frame)
What I guess:
312(??) + 312(pre-skip) | 23696(shorter than input)
Please check and correct it.
Hi,
In the case of test_000017, samples of more than 1 frame are trimmed.
I think it's invalid case (AOMediaCodec/iamf#395)
Please check and correct it.
In test_000036, samples_to_trim_at_end
is 149, but the decoded wave is not trimmed.
Is it correct that the decoded wave has the same samples with the input wave?
I have found some issues about the updated mp4 files(#19 ).
Segment duration | Media time | Media rate integer | Media rate fraction
20 4294967295 1 0
1481 312 1 0
In my opinion, one entry with Media time of 312 will be enough. I am not sure what's the first entry.
I think the sample delta of first entry should be 960(just check one of the TCs, test_000026) not 648.
Even the trimming start is 312, but the encoded samples are 960 samples from libopus.
(just check one of the TCs, test_000026), the total duration after decoding is not same with original wav file.
In my opinion, there is 312 pending samples which is not outputted from libopus.
if encoder outputs the pending samples, the sample delta of last frame is 312, and trimming end is 648.
Could you tell us how you got the decoded wav files in PR #19?
Can the decoded wav files be used as reference waves for test verification?
duration
of parameter_block_metadata
should cover pre-skip and padding data of the last frame as well as the input wave samples. (PR #433)
However, duration in textproto of test vectors seem to cover only pre-skip and input wave samples.
In the case of test_000049, the duration of parameter_block_metadata should be as the below
The number of samples in the input: 644971
pre_skip: 312
num_samples_per_frame: 2880
The number of encoded samples:
312(pre_skip) + 644971 + 2717(padding for the last frame) = 648000 = 2880 x 225
Therefore, Parameter block duration = 648000
However, duration in textproto is 312(pre_skip) + 644971 = 645283
test_000002 and test_000036 (other tests including padding for the last frame) also have the same problem.
Please check it. Thanks.
In the case of test_000041
, there is only textproto
file and no iamf
or mp4
file.
Also, file_name_prefix
in test_vector_metadata is set to test_000005
, so please check and correct it.
Thanks.
1, All parameter type should have unique OBU ID, as I know, this has been talked on last weekly meeting.
2, The duration of parameter may need to refer to AOMediaCodec/iamf#366
In my opinion, 'obu_redundant_copy' setting with 1 is not allowed during mp4 encapsulation, or we cannot do seeking.
3, test 12
In STTS box of test_000012_s.mp4, the samples of last audio packet is 62. and the trimming end is 2.
so, after decoding, the total samples will be 124*64 + 62 = 7998, which is not match with the samples of original file (sawtooth_100_stereo.wav)
Thanks.
I remember that you used all obu_id:0 before changing to 100, 200, etc.
It seems that the modification is not reflected.
If you don't have any other intention, please fix it so that it's the same as other cases.
Hello ~
Issue 1: there are no demix mode parameters and recon gain parameters in bitstream, it seems that the encoder doesn't downmix channels follows https://aomediacodec.github.io/iamf/#iamfgeneration-scalablechannelaudio
Issue 2:L&R channels switch to SL&SR channels
for example: test_000049.textproto
audio_frame_metadata {
...
substream_id_ordering: [0, 1, 2, 3] # L/R, Ls/Rs, C, LFE
...
}
the channels order should be Ls/Rs, L/R, C, LFE,
please refer to the figure 21 in https://aomediacodec.github.io/iamf/#iamfgeneration-scalablechannelaudio-channelgroupgenerationrule
Hi
There are no reference files on test_000030 (sawtooth_10000_stereo_44100hz_s16le.wav) and test_000031 (sawtooth_10000_stereo_48khz_s24le.wav)
Please upload these files.
mdhd time scale for some tests with a sampling rate of 44.1kHz or 48 kHz is 16000.
Could you please check the test_00029, 30, 50~56 etc tests?
currently, trun.sample_composition_time in test_000056_f.mp4 exists.
The elst.media_time in test_000056_f.mp4 has the preskip value which is 312.
I think that trun.sample_composition_time may make our IAMF decoder misunderstand the decoding timing model(PTS==DTS)
So, trun.sample_composition_time should be removed.
In test_000048 case,
16000 is used as the parameter_rate value of the mix_gain parameter.
However, the duration value of parameter_block_metadata, it seems to be set based on 48000.
(Because input file length is 24000 samples and duration is 23688)
If you have any intention, please let us know clearly.
And.. I have another question personally, do you use textproto file as an input option?, or extracting the value from encoder?, or writing it manually?(maybe not..)
I wonder how you are using protocol_buffers.
Thanks :)
It would be good to generate an HTML report listing all the tests (filename, human description, is valid or not...) maybe in the form of a table. That report could be generated automatically whenever a PR is merged and published on GitHub pages. That would offer a synthesis of what is tested.
This may not be allowed depending on how we address AOMediaCodec/iamf#377
In previous meetings we have talked about that it's possible to test bit-exactness of the decoded WAV (before rendering).
James Zern also raised this question: https://groups.aomedia.org/g/WG-Storage-and-Transport/message/567
If the samples become much larger, we could split off the large sample files into their own repository so that users of libiamf don't need to fetch them.
Hello, I'm Yongmin.
test_000022 is an example of incorrect roll_distance, with -5 in the roll_distance value when 960 num_samples_per_frame.
According to Part 4.6(https://www.rfc-editor.org/rfc/rfc7845#page-11), the value of the sample before at least 3840 should be seen, and the case seems to satisfy the condition. (960 * 5 = 4800 samples)
Please check about this issue.
The test_000231_decoded_substream_0.wav
file for this test vector is missing.
We can compare the size of the standalone iamf streams to the component bitstreams (or their bitrates) to compute the overall metadata overhead.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.