aomediacodec / libiamf Goto Github PK

View Code? Open in Web Editor NEW

28.0 28.0 11.0 1.43 GB

Reference Software for IAMF

License: BSD 3-Clause Clear License

Python 1.27% CMake 0.40% Shell 0.02% C 98.32%

libiamf's People

Contributors

Stargazers

Watchers

Forkers

yilun-zhangs jwcullen yimingprom hkchung72 felicialim yongmin-kwon

libiamf's Issues

Reviewing test_000013 - test_000035

Hello~

Common Issue(Opus codec):
Issue 1: preskip value is 0 in codec_config_metadata { }
Refering to spec, Pre-skip shall be same as the number of audio samples to be trimmed at the start of substreams.

Issue 2: relative_offset of OBU_DATA_TYPE_PARAMETER is 0
Refering to AOMediaCodec/iamf#331, In my opinion, the relative_offset of parameter is same with the value of trimming start.

Issue 3: no edit list box in mp4 file.
Refering to spec, Take the number of trimmed samples from the IA sequence and converting it to the sample rate used by the elst boxes.

Issue 4: After decoding, the wave quality is poor comparing with original wav file.
I think, this could be resolved by improving the bitrate setting, I guess you have set with a low bitrate.

Individual Issues:
Issue 5: test_000015, the parameter_id in parameter_block_metadata{} is 100 and 101, but in element_mix_config{} and element_mix_config{}, both parameter IDs are 100, not matching.

Thanks.

PR #30 [Cleanup test 000000-000056] the number of samples in decoded substream wave files for tests using opus codec

In the case of test using opus codec, the number of samples in test_0000**decoded_substream*.wav is as short as pre-skip.

For example, decoded substream wav file of test_000020 has 23688(=24000-312) samples instead of 24000.

Please check test_000020-28 and 32-35 and 49-56.
Thanks.

Improve human_readable_description

I tried understanding what the tests were about. I opened this test proto and looked at the human_readable_description: https://github.com/AOMediaCodec/libiamf/blob/main/tests/test_000056.textproto#L5.
It says a "A variant of test_000055". test_000055 pointed me to test_000054, which pointed me to test_000050 ... and recursively I ended up in test_000004 which does not exist.

Generally, the tests should be self understandable, maybe having a list of keywords for the features they exercise would help.

[Cleanup test 000000-000056][test_000056_s.mp4] mdhd.duration value is wrong.

mdhd.duration shall be equal to (summation of stts.sample_deltas - elst.media_time).
current mdhd.duration of test_000056_s.mp4 is 240034.
the correct number of mdhd.duration is 240034 - 312(=239,722).
the number of output audio samples of test_000056_s.mp4 is same to 239,722(=240034 - 312)

[Cleanup test 000000-000056][test_000056_s.mp4] ctts box in stbl box is not helpful.

currently, IAMF specification assume that PTS and DTS are same like other ordinary audio codec.
Instead of ctts which has the preskip value, according to https://aomediacodec.github.io/iamf/#isobmff-singletrack-basicencapsulationscheme, the preskip value is stored in elst.media_time.
Though ctts is useful for ordninary video codec which is PTS and DTS are generally different,
in IAMF bitstream, ctts box in stbl box is not helpful.
I think that ctts box makes our IAMF decoder misunderstand,
it should be omitted.

[test_000000_3.iamf] the samples of last frame obu.

the last frame obu only has 64 samples, not 128.
test_000000_3.textproto:
codec_config_metadata {
codec_config_id: 200
codec_config {
codec_id: 0x6970636d # "ipcm"
num_samples_per_frame: 128
roll_distance: 0
decoder_config_lpcm {
sample_format_flags: LPCM_LITTLE_ENDIAN
sample_size: 16
sample_rate: 16000
}
}
}

[test_000051 ~ 56] the input bitstream range

The input bitstream for test_000051 to 000056 seems to be part of the wav file audiolab-acoustic-guitar_2OA_470_ALLRAD_5s.wav.
Is is correct that the input bitstream has 239,722 samples while the wav file has 240,000 samples?
If it is, could you describe which part of the wave file matches the input bitstream on the textproto?
or, you can provide the input bitstream as a separate wav file.

[Cleanup test 000000-000056] abnormal signal when use opus coding

#30

Hi,

I think the cases of opus coding, sample file contains strange signals.

Even though we trim as many samples as pre-skip, there is additional strange data of pre-skip size.

If trim_at_start was not operated in the decoder, there was a strange signal as twice as the size of pre-skip.
Therefore, it seems that the back part of the input signal is not included.

(test_000020~000028 etc. the cases using opus coding)

For example)
Input sample: 24000
num_samples_per_frame: 128

What I expected:
312(pre-skip) | 24000(input sample) | 8(padding for num_samples_per_frame)

What I guess:
312(??) + 312(pre-skip) | 23696(shorter than input)

Please check and correct it.

[Cleanup test 000000-000056][test_000017] trimming at end

#30

Hi,

In the case of test_000017, samples of more than 1 frame are trimmed.
I think it's invalid case (AOMediaCodec/iamf#395)

Please check and correct it.

test_000036 decoded wave is not trimmed at the end

In test_000036, samples_to_trim_at_end is 149, but the decoded wave is not trimmed.
Is it correct that the decoded wave has the same samples with the input wave?

Mp4 container issues

I have found some issues about the updated mp4 files(#19 ).

Issue 1, in EDST box, there are 2 sample entries:

Segment duration | Media time | Media rate integer | Media rate fraction
20 4294967295 1 0
1481 312 1 0
In my opinion, one entry with Media time of 312 will be enough. I am not sure what's the first entry.

Issue 2, STTS box.

I think the sample delta of first entry should be 960(just check one of the TCs, test_000026) not 648.
Even the trimming start is 312, but the encoded samples are 960 samples from libopus.

Issue3, missed last audio frame.

(just check one of the TCs, test_000026), the total duration after decoding is not same with original wav file.
In my opinion, there is 312 pending samples which is not outputted from libopus.
if encoder outputs the pending samples, the sample delta of last frame is 312, and trimming end is 648.

Decoded wav files

Could you tell us how you got the decoded wav files in PR #19?
Can the decoded wav files be used as reference waves for test verification?

PR #30 [Cleanup test 000000-000056] duration of parameter_block_meta seems to be wrong

duration of parameter_block_metadata should cover pre-skip and padding data of the last frame as well as the input wave samples. (PR #433)
However, duration in textproto of test vectors seem to cover only pre-skip and input wave samples.

In the case of test_000049, the duration of parameter_block_metadata should be as the below

The number of samples in the input: 644971
pre_skip: 312
num_samples_per_frame: 2880

The number of encoded samples:
312(pre_skip) + 644971 + 2717(padding for the last frame) = 648000 = 2880 x 225

Therefore, Parameter block duration = 648000
However, duration in textproto is 312(pre_skip) + 644971 = 645283

test_000002 and test_000036 (other tests including padding for the last frame) also have the same problem.

Please check it. Thanks.

Missing files(test_000041)

In the case of test_000041, there is only textproto file and no iamf or mp4 file.

Also, file_name_prefix in test_vector_metadata is set to test_000005, so please check and correct it.
Thanks.

Set `recon_gain_is_present_flag` to 0 when omitting Parameter Block OBUs.

Reported in #40 comment.

Existing test vectors have recon_gain_is_present_flag == 1 when there are multiple layers. Even if the parameter blocks are omitted. Instead in these cases it should be set to 0.

Test vectors review issues

1, All parameter type should have unique OBU ID, as I know, this has been talked on last weekly meeting.
2, The duration of parameter may need to refer to AOMediaCodec/iamf#366
In my opinion, 'obu_redundant_copy' setting with 1 is not allowed during mp4 encapsulation, or we cannot do seeking.
3, test 12
In STTS box of test_000012_s.mp4, the samples of last audio packet is 62. and the trimming end is 2.
so, after decoding, the total samples will be 124*64 + 62 = 7998, which is not match with the samples of original file (sawtooth_100_stereo.wav)

Thanks.

[test_000014] obu_id

I remember that you used all obu_id:0 before changing to 100, 200, etc.
It seems that the modification is not reflected.

If you don't have any other intention, please fix it so that it's the same as other cases.

Review test_000049, test_000050, test_000052~test_000056

Hello ~

Issue 1： there are no demix mode parameters and recon gain parameters in bitstream, it seems that the encoder doesn't downmix channels follows https://aomediacodec.github.io/iamf/#iamfgeneration-scalablechannelaudio

Issue 2：L&R channels switch to SL&SR channels
for example: test_000049.textproto
audio_frame_metadata {
...
substream_id_ordering: [0, 1, 2, 3] # L/R, Ls/Rs, C, LFE
...
}
the channels order should be Ls/Rs, L/R, C, LFE,
please refer to the figure 21 in https://aomediacodec.github.io/iamf/#iamfgeneration-scalablechannelaudio-channelgroupgenerationrule

Reference wav files

There are no reference files on test_000030 (sawtooth_10000_stereo_44100hz_s16le.wav) and test_000031 (sawtooth_10000_stereo_48khz_s24le.wav)

Please upload these files.

mdhd timescale in MP4 container for 48kHz or 44.1kHz tests

mdhd time scale for some tests with a sampling rate of 44.1kHz or 48 kHz is 16000.
Could you please check the test_00029, 30, 50~56 etc tests?

[Cleanup test 000000-000056][test_000056_f.mp4] trun.sample_composition_time in trun box is not helpful.

currently, trun.sample_composition_time in test_000056_f.mp4 exists.
The elst.media_time in test_000056_f.mp4 has the preskip value which is 312.
I think that trun.sample_composition_time may make our IAMF decoder misunderstand the decoding timing model(PTS==DTS)
So, trun.sample_composition_time should be removed.

[test_000048] parameter_rate value seems to be strange.

In test_000048 case,
16000 is used as the parameter_rate value of the mix_gain parameter.

However, the duration value of parameter_block_metadata, it seems to be set based on 48000.
(Because input file length is 24000 samples and duration is 23688)

If you have any intention, please let us know clearly.

And.. I have another question personally, do you use textproto file as an input option?, or extracting the value from encoder?, or writing it manually?(maybe not..)
I wonder how you are using protocol_buffers.
Thanks :)

Generate list of test report

It would be good to generate an HTML report listing all the tests (filename, human description, is valid or not...) maybe in the form of a table. That report could be generated automatically whenever a PR is merged and published on GitHub pages. That would offer a synthesis of what is tested.

According to Part 4.6(https://www.rfc-editor.org/rfc/rfc7845#page-11), the value of the sample before at least 3840 should be seen, and the case seems to satisfy the condition. (960 * 5 = 4800 samples)

Please check about this issue.

Add 32-bit wav file for test_000231.

The test_000231_decoded_substream_0.wav file for this test vector is missing.

Compute metadata overhead

We can compare the size of the standalone iamf streams to the component bitstreams (or their bitrates) to compute the overall metadata overhead.