Coder Social home page Coder Social logo

Limiter applied? about libmpegh HOT 18 CLOSED

sclsj avatar sclsj commented on August 23, 2024
Limiter applied?

from libmpegh.

Comments (18)

SakethSathuvalli avatar SakethSathuvalli commented on August 23, 2024

Hi @sclsj,

The decoded output depends on the configurations enabled at the encoder end. The decoder as such does not do any kind of post processing apart from what the specification suggests. So in your case if you are observing that the first two channels are heavily limited then in all probability it is the way the stream is encoded (intended by the creator of the stream).

Request you to close the issue if this answers your doubt!

Thanks!

from libmpegh.

sclsj avatar sclsj commented on August 23, 2024

Thank you!

The reason I'm asking is that it appears that if I select 2ch rather than 7.1ch as output, the first two channels are more limited/compressed, although I could be wrong about that.

I will close this issue when I'm able to extract the individual objects to see if they are compressed as well.

from libmpegh.

sclsj avatar sclsj commented on August 23, 2024

Hello,

Please see below concrete evidence that limiter is applied. Please let me know what options I can pass to lower the gain and avoid hitting the limiter.

-cicp:2:

截屏2022-12-26 15 19 07

-cicp:13 (only first two channels):

截屏2022-12-26 15 19 24

As you can see in the latter example without the audio in the other channels the audio does not reach as high a level as it does for the 2 channel rendering, thus it does not hit the limiter as hard.

from libmpegh.

SakethSathuvalli avatar SakethSathuvalli commented on August 23, 2024

Hi,

If the speaker layout of the bitstream file is different from that of the speaker layout requested from command line using -cicp: option, then the decoder converts the output to the requested speaker layout using rendering algorithm suitable for the file.

From the pictures you shared it looks like the result of rendering gives an impression that the signal is compressed. As informed earlier, the decoder does not specifically apply limiter apart from the end of chain processing descried in the specification.

Thanks!

from libmpegh.

sclsj avatar sclsj commented on August 23, 2024

That made sense. However, you did not answer my question. What can I do to not let this happen?

from libmpegh.

SakethSathuvalli avatar SakethSathuvalli commented on August 23, 2024

Can You let us know the information on the number of channels / objects in this stream and the speaker lay-out of the bit stream?

Also can you please explain this "Stereo Mix provided by the artist" part ? Can you elaborate on what this information is ?

from libmpegh.

sclsj avatar sclsj commented on August 23, 2024

Yes. This is for the "sample audio in the last issue".

mediainfo /Users/jin/Desktop/Music/Stem\ Separation/Tidal/YOASOBI\ -\ 360/\ THE\ BOOK\ \(360\ Reality\ Audio\)\ \[203033692\]\ \[2021\]/06\ -\ YOASOBI\ -\ 群青\ \(360\ Reality\ Audio\).mp4
General
Complete name                            : /Users/jin/Desktop/Music/Stem Separation/Tidal/YOASOBI - 360/ THE BOOK (360 Reality Audio) [203033692] [2021]/06 - YOASOBI - 群青 (360 Reality Audio).mp4
Format                                   : MPEG-4
Format profile                           : Base Media / Version 2
Codec ID                                 : mp42 (mp42/isom)
File size                                : 19.8 MiB
Duration                                 : 4 min 8 s
Overall bit rate                         : 670 kb/s
Encoded date                             : UTC 2021-09-18 08:59:38
Tagged date                              : UTC 2021-09-18 08:59:38

Audio
ID                                       : 1
Format                                   : MPEG-H 3D Audio
Format profile                           : LC@L3
Codec ID                                 : mha1
Duration                                 : 4 min 8 s
Source duration                          : 4 min 8 s
Bit rate                                 : 666 kb/s
Channel(s)                               : 12 channels (7.1.4)
Channel layout                           : L R C LFE Lb Rb Lss Rss Tfl Tfr Tbl Tbr
Sampling rate                            : 48.0 kHz
Frame rate                               : 46.875 FPS (1024 SPF)
Stream size                              : 19.7 MiB (99%)
Source stream size                       : 19.7 MiB (100%)
Encoded date                             : UTC 2021-09-18 08:59:38
Tagged date                              : UTC 2021-09-18 08:59:38
Signal group #1                          : 10 objects
 Type                                    : Object
 Number of objects                       : 10 objects
Codec configuration box                  : mhaC

Stereo mix provided by the artist is as the name suggests; it's the stereo version of the album. In terms of tidal, it would be the [M] version of an album. It can also refer to the fallback stereo mix for a 360 reality audio album on unsupported devices. These are basically the same in terms of the amount dynamic range compression / limiting used (all brickwall).

The one I'm using for reference:

mediainfo /Users/jin/Music/iTunes/iTunes\ Media/Music/YOASOBI/群
青/01\ 群青.m4a 
General
Complete name                            : /Users/jin/Music/iTunes/iTunes Media/Music/YOASOBI/群青/01 群青.m4a
Format                                   : MPEG-4
Format profile                           : Apple audio with iTunes info
Codec ID                                 : M4A  (M4A /isom/iso2)
File size                                : 56.6 MiB
Duration                                 : 4 min 8 s
Overall bit rate mode                    : Variable
Overall bit rate                         : 1 910 kb/s
Album                                    : 群青
Album/Performer                          : YOASOBI
Part/Position                            : 1
Part/Total                               : 1
Track name                               : 群青
Track name/Position                      : 1
Track name/Total                         : 1
Performer                                : YOASOBI
Writing application                      : Lavf58.38.101
Cover                                    : Yes
Comment                                  : Brought to you by OTOTOY.JP http://ototoy.jp/_/default/p/599054

Audio
ID                                       : 2
Format                                   : ALAC
Codec ID                                 : alac
Codec ID/Info                            : Apple Lossless Audio Codec
Duration                                 : 4 min 8 s
Duration_LastFrame                       : -46 ms
Bit rate mode                            : Variable
Bit rate                                 : 1 883 kb/s
Nominal bit rate                         : 2 304 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 48.0 kHz
Bit depth                                : 24 bits
Stream size                              : 55.8 MiB (99%)
Default                                  : Yes
Alternate group                          : 1

from libmpegh.

sclsj avatar sclsj commented on August 23, 2024

This is the mediainfo for the example I used in this issue to prove limiting is done (as per specification, yes, but still done by the decoder).

mediainfo /Users/jin/orpheusdl/downloads/Staff\ Picks\ -\ \ 360\ Reality\ Audio/05.\ Essence\ \(feat.\ Justin\ Bieber\ \&\ Tems\)\ \(360RA\).m4a 
General
Complete name                            : /Users/jin/orpheusdl/downloads/Staff Picks -  360 Reality Audio/05. Essence (feat. Justin Bieber & Tems) (360RA).m4a
Format                                   : MPEG-4
Format profile                           : Base Media / Version 2
Codec ID                                 : mp42 (mp42/isom)
File size                                : 21.9 MiB
Duration                                 : 4 min 23 s
Overall bit rate                         : 698 kb/s
Album                                    : Essence (feat. Justin Bieber & Tems) (360RA)
Album/Performer                          : Wizkid
Part/Position                            : 1
Part/Total                               : 1
Track name                               : Essence (feat. Justin Bieber & Tems) (360RA)
Track name/Position                      : 5
Track name/Total                         : 24
Performer                                : Wizkid / Justin Bieber / Tems
Composer                                 : Oniko Eddie Uzezi / Ayodeji Ibrahim Balogun / Oniko Evawero Okiemute / Temilade Openiyi / Richard Isong / Justin Bieber
Lyricist                                 : Oniko Eddie Uzezi / Richard Isong / Temilade Openiyi / Ayodeji Ibrahim Balogun / Justin Bieber / Oniko Evawero Okiemute
Producer                                 : Legendury Beatz / P2J
Recorded date                            : 2021-09-03
Encoded date                             : UTC 2021-08-31 17:30:45
Tagged date                              : UTC 2021-08-31 17:30:45
ISRC                                     : USRC12102645
Copyright                                : (P) 2021 Starboy Entertainment Ltd., under exclusive license to RCA Records
Cover                                    : Yes
Rating                                   : Clean
Assistant Engineer                       : Heidi Wang
UPC                                      : 886449551712
Vocal Producer                           : Josh Gudwin
Engineer                                 : Benjamin Rice
Mixing Engineer                          : Leandro "Dro" Hidalgo
Mastering Engineer                       : Colin Leonard / Daniel Avila
Associated Performer                     : Wizkid / Justin Bieber / Tems
Recording Engineer                       : Josh Gudwin / Joseph "Giggz" Njenga

Audio
ID                                       : 1
Format                                   : MPEG-H 3D Audio
Format profile                           : LC@L3
Codec ID                                 : mha1
Duration                                 : 4 min 23 s
Source duration                          : 4 min 23 s
Bit rate                                 : 666 kb/s
Channel(s)                               : 12 channels (7.1.4)
Channel layout                           : L R C LFE Lb Rb Lss Rss Tfl Tfr Tbl Tbr
Sampling rate                            : 48.0 kHz
Frame rate                               : 46.875 FPS (1024 SPF)
Stream size                              : 20.9 MiB (95%)
Source stream size                       : 20.9 MiB (95%)
Encoded date                             : UTC 2021-08-31 17:30:45
Tagged date                              : UTC 2021-08-31 17:30:45
Signal group #1                          : 10 objects
 Type                                    : Object
 Number of objects                       : 10 objects
Codec configuration box                  : mhaC

from libmpegh.

SakethSathuvalli avatar SakethSathuvalli commented on August 23, 2024

Hi @sclsj,

Can You please let us know if you see the "limiting effect" when you he command line option -cicp: is not used ??

from libmpegh.

sclsj avatar sclsj commented on August 23, 2024

For the "sample audio in the last issue" one- yes. Still quite obvious (especially in the first two channel)

截屏2023-01-02 13 48 08

For the other example, not really. (Well, to be honest it's still obvious in channel 3. That one does not change regardless of cicp configuration, suggesting that quite some objects are concentrated in that coordinate/direction/speaker. (Or just that the vocal are quite loud / high gain compared to everything else)

截屏2023-01-02 13 50 33

from libmpegh.

sclsj avatar sclsj commented on August 23, 2024

And speaking of such, I really want to see support for outputting in (decoding to) object-oriented format such as ADM BWF. It makes more sense to call this as decoding and mixing down those objects for a specific channel layout as rendering.

from libmpegh.

SakethSathuvalli avatar SakethSathuvalli commented on August 23, 2024

For the "sample audio in the last issue" one- yes. Still quite obvious (especially in the first two channel)

截屏2023-01-02 13 48 08

For the other example, not really. (Well, to be honest it's still obvious in channel 3. That one does not change regardless of cicp configuration, suggesting that quite some objects are concentrated in that coordinate/direction/speaker. (Or just that the vocal are quite loud / high gain compared to everything else)

截屏2023-01-02 13 50 33

The two pictures here correspond to that of different audio streams or the same stream decoded with different options - can you please elaborate ?

from libmpegh.

SakethSathuvalli avatar SakethSathuvalli commented on August 23, 2024

And speaking of such, I really want to see support for outputting in (decoding to) object-oriented format such as ADM BWF. It makes more sense to call this as decoding and mixing down those objects for a specific channel layout as rendering.

We dont have currently support for ADM BWF. However, its possible to have individual decoded objects using the -ext_ren: flag.

from libmpegh.

sclsj avatar sclsj commented on August 23, 2024

Two different ones. First one is 群青, second one is Essence. Sorry for not making that clear.

from libmpegh.

sclsj avatar sclsj commented on August 23, 2024

And speaking of such, I really want to see support for outputting in (decoding to) object-oriented format such as ADM BWF. It makes more sense to call this as decoding and mixing down those objects for a specific channel layout as rendering.

We dont have currently support for ADM BWF. However, its possible to have individual decoded objects using the -ext_ren: flag.

I saw this in the GSG docx. I tried -ext_ren:1 flag, and I got _ext_ren_pcm.raw and _ext_ren_oam_md.bs in the executable folder (agrees with command line description but conflicts with the documentation). I find the section referred in the documentation, but still as some doubt:

17.10.6 Audio PCM data
The PCM data of the channels and objects interfaces shall be provided through the decoder PCM buffer, which first contains the regular rendered PCM signals (e.g. 12 signals for a 7.1+4 setup). Subsequently nchan, out additional signals carry the PCM data of the originally transmitted channel representation. These are followed by nobj, out signals carrying the PCM data of the un-rendered output objects. Then additional signals carry the nHOA, out HOA data which number is indicated in the HOA metadata interface via the HOA order (e.g. 16 signals for HOA order 3). The HOA audio data in the HOA output interface is provided in the so-called equivalent spatial domain representation. The conversion from the HOA domain into the equivalent spatial domain representation and vice versa is described in Annex C.5.1.
The decoder shall signal the offset index of the PCM buffer for the first un-rendered output object and the offset index of the PCM buffer for the first HOA audio signal.

Well, that gives us 12 + 12 + 10 = 34 channels. Assuming 16-bit and 48000Hz, that would result in a 747 mb file, but I got a 357 mb file.

When I try to decode it, I also get (mostly) garbage channels (channels with random noise). Not sure what I'm doing wrong here. I'm using: ffmpeg -f s16le -ar 48k -ac 15 -i /Users/jin/Desktop/libmpegh/_ext_ren_pcm.raw /Users/jin/Desktop/libmpegh/_ext_ren_pcm.wav. The 15 channel count comes from a rough estimate based on file size. I also tried other ones, ranging from 2 to 35 channels, but either all the channels are noise or most of the channels are noise.

Is there a flag I can use for the tool to output a wav instead of a raw pcm?

Also, if I read the specification right, according to 17.10.3 objects are still processed (DRC, gain, and peak limiter) before they are exported. Can I disable that?

from libmpegh.

SakethSathuvalli avatar SakethSathuvalli commented on August 23, 2024

And speaking of such, I really want to see support for outputting in (decoding to) object-oriented format such as ADM BWF. It makes more sense to call this as decoding and mixing down those objects for a specific channel layout as rendering.

We dont have currently support for ADM BWF. However, its possible to have individual decoded objects using the -ext_ren: flag.

I saw this in the GSG docx. I tried -ext_ren:1 flag, and I got _ext_ren_pcm.raw and _ext_ren_oam_md.bs in the executable folder (agrees with command line description but conflicts with the documentation). I find the section referred in the documentation, but still as some doubt:

17.10.6 Audio PCM data
The PCM data of the channels and objects interfaces shall be provided through the decoder PCM buffer, which first contains the regular rendered PCM signals (e.g. 12 signals for a 7.1+4 setup). Subsequently nchan, out additional signals carry the PCM data of the originally transmitted channel representation. These are followed by nobj, out signals carrying the PCM data of the un-rendered output objects. Then additional signals carry the nHOA, out HOA data which number is indicated in the HOA metadata interface via the HOA order (e.g. 16 signals for HOA order 3). The HOA audio data in the HOA output interface is provided in the so-called equivalent spatial domain representation. The conversion from the HOA domain into the equivalent spatial domain representation and vice versa is described in Annex C.5.1.
The decoder shall signal the offset index of the PCM buffer for the first un-rendered output object and the offset index of the PCM buffer for the first HOA audio signal.

Well, that gives us 12 + 12 + 10 = 34 channels. Assuming 16-bit and 48000Hz, that would result in a 747 mb file, but I got a 357 mb file.

When I try to decode it, I also get (mostly) garbage channels (channels with random noise). Not sure what I'm doing wrong here. I'm using: ffmpeg -f s16le -ar 48k -ac 15 -i /Users/jin/Desktop/libmpegh/_ext_ren_pcm.raw /Users/jin/Desktop/libmpegh/_ext_ren_pcm.wav. The 15 channel count comes from a rough estimate based on file size. I also tried other ones, ranging from 2 to 35 channels, but either all the channels are noise or most of the channels are noise.

Is there a flag I can use for the tool to output a wav instead of a raw pcm?

Also, if I read the specification right, according to 17.10.3 objects are still processed (DRC, gain, and peak limiter) before they are exported. Can I disable that?

Hi @sclsj

Can You please refer to our wiki page on external rendering interfaces ?

Thanks!

from libmpegh.

SakethSathuvalli avatar SakethSathuvalli commented on August 23, 2024

Hi @sclsj,

Can You please close this issue if this is similar to what is been discussed #19

Thanks!

from libmpegh.

sclsj avatar sclsj commented on August 23, 2024

Yes, it’s kind of the same thing. I’m having some other related issues but I need to investigate further before posting them.

from libmpegh.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.