Comments (18)
Hi @sclsj,
The decoded output depends on the configurations enabled at the encoder end. The decoder as such does not do any kind of post processing apart from what the specification suggests. So in your case if you are observing that the first two channels are heavily limited then in all probability it is the way the stream is encoded (intended by the creator of the stream).
Request you to close the issue if this answers your doubt!
Thanks!
from libmpegh.
Thank you!
The reason I'm asking is that it appears that if I select 2ch rather than 7.1ch as output, the first two channels are more limited/compressed, although I could be wrong about that.
I will close this issue when I'm able to extract the individual objects to see if they are compressed as well.
from libmpegh.
Hello,
Please see below concrete evidence that limiter is applied. Please let me know what options I can pass to lower the gain and avoid hitting the limiter.
-cicp:2
:
-cicp:13
(only first two channels):
As you can see in the latter example without the audio in the other channels the audio does not reach as high a level as it does for the 2 channel rendering, thus it does not hit the limiter as hard.
from libmpegh.
Hi,
If the speaker layout of the bitstream file is different from that of the speaker layout requested from command line using -cicp:
option, then the decoder converts the output to the requested speaker layout using rendering algorithm suitable for the file.
From the pictures you shared it looks like the result of rendering gives an impression that the signal is compressed. As informed earlier, the decoder does not specifically apply limiter apart from the end of chain processing descried in the specification.
Thanks!
from libmpegh.
That made sense. However, you did not answer my question. What can I do to not let this happen?
from libmpegh.
Can You let us know the information on the number of channels / objects in this stream and the speaker lay-out of the bit stream?
Also can you please explain this "Stereo Mix provided by the artist" part ? Can you elaborate on what this information is ?
from libmpegh.
Yes. This is for the "sample audio in the last issue".
mediainfo /Users/jin/Desktop/Music/Stem\ Separation/Tidal/YOASOBI\ -\ 360/\ THE\ BOOK\ \(360\ Reality\ Audio\)\ \[203033692\]\ \[2021\]/06\ -\ YOASOBI\ -\ 群青\ \(360\ Reality\ Audio\).mp4
General
Complete name : /Users/jin/Desktop/Music/Stem Separation/Tidal/YOASOBI - 360/ THE BOOK (360 Reality Audio) [203033692] [2021]/06 - YOASOBI - 群青 (360 Reality Audio).mp4
Format : MPEG-4
Format profile : Base Media / Version 2
Codec ID : mp42 (mp42/isom)
File size : 19.8 MiB
Duration : 4 min 8 s
Overall bit rate : 670 kb/s
Encoded date : UTC 2021-09-18 08:59:38
Tagged date : UTC 2021-09-18 08:59:38
Audio
ID : 1
Format : MPEG-H 3D Audio
Format profile : LC@L3
Codec ID : mha1
Duration : 4 min 8 s
Source duration : 4 min 8 s
Bit rate : 666 kb/s
Channel(s) : 12 channels (7.1.4)
Channel layout : L R C LFE Lb Rb Lss Rss Tfl Tfr Tbl Tbr
Sampling rate : 48.0 kHz
Frame rate : 46.875 FPS (1024 SPF)
Stream size : 19.7 MiB (99%)
Source stream size : 19.7 MiB (100%)
Encoded date : UTC 2021-09-18 08:59:38
Tagged date : UTC 2021-09-18 08:59:38
Signal group #1 : 10 objects
Type : Object
Number of objects : 10 objects
Codec configuration box : mhaC
Stereo mix provided by the artist is as the name suggests; it's the stereo version of the album. In terms of tidal, it would be the [M] version of an album. It can also refer to the fallback stereo mix for a 360 reality audio album on unsupported devices. These are basically the same in terms of the amount dynamic range compression / limiting used (all brickwall).
The one I'm using for reference:
mediainfo /Users/jin/Music/iTunes/iTunes\ Media/Music/YOASOBI/群
青/01\ 群青.m4a
General
Complete name : /Users/jin/Music/iTunes/iTunes Media/Music/YOASOBI/群青/01 群青.m4a
Format : MPEG-4
Format profile : Apple audio with iTunes info
Codec ID : M4A (M4A /isom/iso2)
File size : 56.6 MiB
Duration : 4 min 8 s
Overall bit rate mode : Variable
Overall bit rate : 1 910 kb/s
Album : 群青
Album/Performer : YOASOBI
Part/Position : 1
Part/Total : 1
Track name : 群青
Track name/Position : 1
Track name/Total : 1
Performer : YOASOBI
Writing application : Lavf58.38.101
Cover : Yes
Comment : Brought to you by OTOTOY.JP http://ototoy.jp/_/default/p/599054
Audio
ID : 2
Format : ALAC
Codec ID : alac
Codec ID/Info : Apple Lossless Audio Codec
Duration : 4 min 8 s
Duration_LastFrame : -46 ms
Bit rate mode : Variable
Bit rate : 1 883 kb/s
Nominal bit rate : 2 304 kb/s
Channel(s) : 2 channels
Sampling rate : 48.0 kHz
Bit depth : 24 bits
Stream size : 55.8 MiB (99%)
Default : Yes
Alternate group : 1
from libmpegh.
This is the mediainfo for the example I used in this issue to prove limiting is done (as per specification, yes, but still done by the decoder).
mediainfo /Users/jin/orpheusdl/downloads/Staff\ Picks\ -\ \ 360\ Reality\ Audio/05.\ Essence\ \(feat.\ Justin\ Bieber\ \&\ Tems\)\ \(360RA\).m4a
General
Complete name : /Users/jin/orpheusdl/downloads/Staff Picks - 360 Reality Audio/05. Essence (feat. Justin Bieber & Tems) (360RA).m4a
Format : MPEG-4
Format profile : Base Media / Version 2
Codec ID : mp42 (mp42/isom)
File size : 21.9 MiB
Duration : 4 min 23 s
Overall bit rate : 698 kb/s
Album : Essence (feat. Justin Bieber & Tems) (360RA)
Album/Performer : Wizkid
Part/Position : 1
Part/Total : 1
Track name : Essence (feat. Justin Bieber & Tems) (360RA)
Track name/Position : 5
Track name/Total : 24
Performer : Wizkid / Justin Bieber / Tems
Composer : Oniko Eddie Uzezi / Ayodeji Ibrahim Balogun / Oniko Evawero Okiemute / Temilade Openiyi / Richard Isong / Justin Bieber
Lyricist : Oniko Eddie Uzezi / Richard Isong / Temilade Openiyi / Ayodeji Ibrahim Balogun / Justin Bieber / Oniko Evawero Okiemute
Producer : Legendury Beatz / P2J
Recorded date : 2021-09-03
Encoded date : UTC 2021-08-31 17:30:45
Tagged date : UTC 2021-08-31 17:30:45
ISRC : USRC12102645
Copyright : (P) 2021 Starboy Entertainment Ltd., under exclusive license to RCA Records
Cover : Yes
Rating : Clean
Assistant Engineer : Heidi Wang
UPC : 886449551712
Vocal Producer : Josh Gudwin
Engineer : Benjamin Rice
Mixing Engineer : Leandro "Dro" Hidalgo
Mastering Engineer : Colin Leonard / Daniel Avila
Associated Performer : Wizkid / Justin Bieber / Tems
Recording Engineer : Josh Gudwin / Joseph "Giggz" Njenga
Audio
ID : 1
Format : MPEG-H 3D Audio
Format profile : LC@L3
Codec ID : mha1
Duration : 4 min 23 s
Source duration : 4 min 23 s
Bit rate : 666 kb/s
Channel(s) : 12 channels (7.1.4)
Channel layout : L R C LFE Lb Rb Lss Rss Tfl Tfr Tbl Tbr
Sampling rate : 48.0 kHz
Frame rate : 46.875 FPS (1024 SPF)
Stream size : 20.9 MiB (95%)
Source stream size : 20.9 MiB (95%)
Encoded date : UTC 2021-08-31 17:30:45
Tagged date : UTC 2021-08-31 17:30:45
Signal group #1 : 10 objects
Type : Object
Number of objects : 10 objects
Codec configuration box : mhaC
from libmpegh.
Hi @sclsj,
Can You please let us know if you see the "limiting effect" when you he command line option -cicp:
is not used ??
from libmpegh.
For the "sample audio in the last issue" one- yes. Still quite obvious (especially in the first two channel)
For the other example, not really. (Well, to be honest it's still obvious in channel 3. That one does not change regardless of cicp configuration, suggesting that quite some objects are concentrated in that coordinate/direction/speaker. (Or just that the vocal are quite loud / high gain compared to everything else)
from libmpegh.
And speaking of such, I really want to see support for outputting in (decoding to) object-oriented format such as ADM BWF. It makes more sense to call this as decoding and mixing down those objects for a specific channel layout as rendering.
from libmpegh.
For the "sample audio in the last issue" one- yes. Still quite obvious (especially in the first two channel)
For the other example, not really. (Well, to be honest it's still obvious in channel 3. That one does not change regardless of cicp configuration, suggesting that quite some objects are concentrated in that coordinate/direction/speaker. (Or just that the vocal are quite loud / high gain compared to everything else)
The two pictures here correspond to that of different audio streams or the same stream decoded with different options - can you please elaborate ?
from libmpegh.
And speaking of such, I really want to see support for outputting in (decoding to) object-oriented format such as ADM BWF. It makes more sense to call this as decoding and mixing down those objects for a specific channel layout as rendering.
We dont have currently support for ADM BWF. However, its possible to have individual decoded objects using the -ext_ren:
flag.
from libmpegh.
Two different ones. First one is 群青, second one is Essence. Sorry for not making that clear.
from libmpegh.
And speaking of such, I really want to see support for outputting in (decoding to) object-oriented format such as ADM BWF. It makes more sense to call this as decoding and mixing down those objects for a specific channel layout as rendering.
We dont have currently support for ADM BWF. However, its possible to have individual decoded objects using the
-ext_ren:
flag.
I saw this in the GSG docx. I tried -ext_ren:1
flag, and I got _ext_ren_pcm.raw
and _ext_ren_oam_md.bs
in the executable folder (agrees with command line description but conflicts with the documentation). I find the section referred in the documentation, but still as some doubt:
17.10.6 Audio PCM data
The PCM data of the channels and objects interfaces shall be provided through the decoder PCM buffer, which first contains the regular rendered PCM signals (e.g. 12 signals for a 7.1+4 setup). Subsequently nchan, out additional signals carry the PCM data of the originally transmitted channel representation. These are followed by nobj, out signals carrying the PCM data of the un-rendered output objects. Then additional signals carry the nHOA, out HOA data which number is indicated in the HOA metadata interface via the HOA order (e.g. 16 signals for HOA order 3). The HOA audio data in the HOA output interface is provided in the so-called equivalent spatial domain representation. The conversion from the HOA domain into the equivalent spatial domain representation and vice versa is described in Annex C.5.1.
The decoder shall signal the offset index of the PCM buffer for the first un-rendered output object and the offset index of the PCM buffer for the first HOA audio signal.
Well, that gives us 12 + 12 + 10 = 34 channels. Assuming 16-bit and 48000Hz, that would result in a 747 mb file, but I got a 357 mb file.
When I try to decode it, I also get (mostly) garbage channels (channels with random noise). Not sure what I'm doing wrong here. I'm using: ffmpeg -f s16le -ar 48k -ac 15 -i /Users/jin/Desktop/libmpegh/_ext_ren_pcm.raw /Users/jin/Desktop/libmpegh/_ext_ren_pcm.wav
. The 15 channel count comes from a rough estimate based on file size. I also tried other ones, ranging from 2 to 35 channels, but either all the channels are noise or most of the channels are noise.
Is there a flag I can use for the tool to output a wav instead of a raw pcm?
Also, if I read the specification right, according to 17.10.3 objects are still processed (DRC, gain, and peak limiter) before they are exported. Can I disable that?
from libmpegh.
And speaking of such, I really want to see support for outputting in (decoding to) object-oriented format such as ADM BWF. It makes more sense to call this as decoding and mixing down those objects for a specific channel layout as rendering.
We dont have currently support for ADM BWF. However, its possible to have individual decoded objects using the
-ext_ren:
flag.I saw this in the GSG docx. I tried
-ext_ren:1
flag, and I got_ext_ren_pcm.raw
and_ext_ren_oam_md.bs
in the executable folder (agrees with command line description but conflicts with the documentation). I find the section referred in the documentation, but still as some doubt:17.10.6 Audio PCM data The PCM data of the channels and objects interfaces shall be provided through the decoder PCM buffer, which first contains the regular rendered PCM signals (e.g. 12 signals for a 7.1+4 setup). Subsequently nchan, out additional signals carry the PCM data of the originally transmitted channel representation. These are followed by nobj, out signals carrying the PCM data of the un-rendered output objects. Then additional signals carry the nHOA, out HOA data which number is indicated in the HOA metadata interface via the HOA order (e.g. 16 signals for HOA order 3). The HOA audio data in the HOA output interface is provided in the so-called equivalent spatial domain representation. The conversion from the HOA domain into the equivalent spatial domain representation and vice versa is described in Annex C.5.1. The decoder shall signal the offset index of the PCM buffer for the first un-rendered output object and the offset index of the PCM buffer for the first HOA audio signal.
Well, that gives us 12 + 12 + 10 = 34 channels. Assuming 16-bit and 48000Hz, that would result in a 747 mb file, but I got a 357 mb file.
When I try to decode it, I also get (mostly) garbage channels (channels with random noise). Not sure what I'm doing wrong here. I'm using:
ffmpeg -f s16le -ar 48k -ac 15 -i /Users/jin/Desktop/libmpegh/_ext_ren_pcm.raw /Users/jin/Desktop/libmpegh/_ext_ren_pcm.wav
. The 15 channel count comes from a rough estimate based on file size. I also tried other ones, ranging from 2 to 35 channels, but either all the channels are noise or most of the channels are noise.Is there a flag I can use for the tool to output a wav instead of a raw pcm?
Also, if I read the specification right, according to 17.10.3 objects are still processed (DRC, gain, and peak limiter) before they are exported. Can I disable that?
Hi @sclsj
Can You please refer to our wiki page on external rendering interfaces ?
Thanks!
from libmpegh.
Hi @sclsj,
Can You please close this issue if this is similar to what is been discussed #19
Thanks!
from libmpegh.
Yes, it’s kind of the same thing. I’m having some other related issues but I need to investigate further before posting them.
from libmpegh.
Related Issues (20)
- Unable to decode mhm1 HOT 13
- License question HOT 5
- Failed to decode mhas files HOT 2
- Unable to decode fragmented mhm1 mp4 file HOT 5
- ext_ren flag produces files with incorrect filename HOT 4
- Feature request: Can you add "binaural" to the target speaker layout options? HOT 11
- Compile static executable HOT 1
- Input File: File Open Failed HOT 2
- Stuck when decoding 360RA file HOT 6
- Fraunhofer test files fail with "invalid channel configuration" HOT 2
- [QUESTION] Can I pay someone to explain how to use this like I'm a 5th grader? HOT 3
- Need help HOT 5
- Off-by-one issue in maximum number of description languages check HOT 7
- Bitstream pointer has to be aligned to payload length after parsing HOT 2
- OAM external renderer interface writes `goa_element_id` always as `0` HOT 1
- Versioning out of sync HOT 1
- Fix build warnings observed with latest build
- Update documentation reflecting the latest changes related to LC level4 support
- I need clarification regarding target loudness and DRC effect parameters. HOT 1
- DRC handling is not thread-safe HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libmpegh.