bbc / vc2-reference Goto Github PK
View Code? Open in Web Editor NEWA reference encoder and decoder for SMPTE ST 2042-1 "VC-2 Video Compression"
License: Other
A reference encoder and decoder for SMPTE ST 2042-1 "VC-2 Video Compression"
License: Other
VC-2 Reference Encoder and Decoder ---------------------------------- Copyright (C) Tim Borer, James Weaver and Galen Reich 2010-2020, British Broadcasting Corporation. < [email protected] > This repository contains a SMPTE 2042-1 VC-2 reference encoder and decoder. It can be compiled using autotools on Linux or Windows and includes the following executables once compiled: o EncodeStream -- an encoder which will encode a VC-2 compliant stream using one of the supported LD or HQ profiles. o DecodeStream -- a decoder which will decode a VC-2 compliant stream which complies with the LD or HQ profiles. The EncodeStream tool supports the following profiles: o HQ_CBR -- an encoder for the High Quality (HQ) profile of VC-2 which encodes at a constant bit rate. o HQ_ConstQ -- an encoder for the High Quality (HQ) profile of VC-2 which encodes with a constant quantiser value. o LD -- an OBSOLETE encoder for the Low Delay (LD) profile of VC-2. (included for backwards compatibility). In addition, an optional utility (DecodeFrame) is includeded which takes in the compressed bytes of a VC-2 frame without any surrounding headers. This is not compiled by default but can be enabled with the --enable-frame-decoder flag (./configure --enable-frame-decoder). The googletest testing framework can be used to run tests on the repository. This requires the googletest submodule to be added by git submodule init git submodule update (or using --recurse-submodules when cloning) Then using `make check' to build and run the tests. Additional help on each executable will be printed if it is run with the --help parameter.
The EncodeHQ-ConstQ
command requires that its -q
(quantisation index)
argument is in the range 0 to 63.
If the following assumptions are made:
The maximum sensible quantisation index would be 103. Specifically, for a
4 2D level and 1 horizontal only level fidelity transform on 16 bit inputs, it
is possible to contrive an input picture where the qindex must be at least 103
to result in all transform coefficients becoming zero.
The constant quantizer HQ profile encoder (EncodeHQ-ConstQ
) produces
non-conforming bitstreams. Specifically, it appears to produce malformed slice
data such that each slice is larger than it should be. The resultant streams,
in some cases, fail to decode under DecodeStream
, but in either case are
non-conforming.
As test cases, the following minimal inputs are provided:
Mid-Grey frame (1 frame, 10 bit, 1280x720,
4:2:2, progressive, YCbCr)
This mid-grey frame should can be coded with all transform coefficients set
to zero; and therefore all slices being of minimum size. When coded by
EncodeHQ-ConstQ
, it appears more zero-bytes are written than expected to
the stream.
Real pictures (3 frames, 10 bit, 1280x720,
4:2:2, progressive, YCbCr)
A series of real pictures.
The following is used to encode:
$ EncodeHQ-ConstQ \
-x 1280 -y 720 -f 4:2:2 -z 10 -r 6 \
-k LeGall -d 3 -a 2 -u 1 -q 0 -S 2 \
input.16p2 output.vc2
The envoded 'mid-grey' frame is correctly decoded by DecodeStream
but is non
conforming, with the picture_parse
data unit being 28822 bytes long, not
36022
as the next_parse_offset
field suggested.
The real picture case is similar, though this time the end-of-stream is
encountered before the whole picture has been decoded. This stream also fails to
decode using DecodeStream
:
$ DecodeStream -v output.vc2 output.16p2
Failed to read compressed framenumber 0
The DecodeHQ and DecodeLD utilities imply that they perform the inverse process to EncodeHQ-* and EncodeLD.
DecodeHQ/LD are intended for decoding VC2 frame data without the complete header information of a VC2 stream.
EncodeHQ-*/LD are used to encode a VC2 stream, complete with header information.
DecodeStream can be used to decode the VC2 streams generated by EncodeHQ-*/LD
I suggest renaming the frame tools to DecodeHQFrame and DecodeLDFrame.
The various Encode*
commands fail to specify a clean area appropriate to the
format provided when a base video format does not exactly match the format.
For example, given the following sample frame (1 frame, 8 bit, 176x144, 4:4:4, progressive, YCbCr),
and encoded as follows:
EncodeHQ-CBR \
-x 176 -y 144 -f 4:4:4 -n 1 -l 8 \
-k LeGall -d 1 -a 1 -u 1 -s $((176*144*3*1)) \
qcif_ramps.raw out.vc2
The output bitstream uses base video format 0 (custom format) and fails to
override the clean area from 640x480+0+0, which in this case is larger than the
picture resolution and therefore non-conforming.
For larger input formats, the same problem occurs although here the clean area
may not cover the whole frame which arguably is not a sensible default.
LD Mode fails to produce conformant output, slice_bytes numerator and denominator are both 0.
To reproduce, use incantation:
EncodeStream -m LD --width 1280 --height 720 --format 4:2:2 --bitDepth 10 --framerate 6 --kernel Haar1 --waveletDepth 2 --hSlice 4 --vSlice 1 --compressedBytes 2073600 ./static_ramps.16p2 static_ramps_broken.vc2
It would be nice if a user could specify a minimum number of commandline arguments and get sensible output.
Arguments that specify the input format are required (width, height, colour format).
But current required arguments such as waveletDepth, hSlice, vSlice, mode, kernel, could be selected automatically or have sensible defaults.
Given a slice size which doesn't exactly divide the transform coefficients the
encoder will sometimes crash.
For example with the following sample frame (1 frame, 10 bit, 1280x720, 4:2:2, progressive, YCbCr):
$ EncodeHQ-CBR \
-x 1280 -y 720 -f 4:2:2 -l 10 -r 6 \
-k LeGall -d 3 -a 1 -u 1 -s $((1280*720*2*2/4)) \
pictures.16p2 out2.vc2
EncodeHQ-CBR: ../../src/boost/multi_array/view.hpp:317: boost::detail::multi_array::multi_array_view<T, NumDims>& boost::detail::multi_array::multi_array_view<T, NumDims>::operator=(const ConstMultiArray&) [with ConstMultiArray = boost::multi_array<int, 2>; T = int; long unsigned int NumDims = 2]: Assertion `std::equal(other.shape(),other.shape()+this->num_dimensions(), this->shape())' failed.
Aborted (core dumped)
I have also seen in the past (but have failed to reproduce for this report)
cases where encoding succeeds but produces an invalid output. Specifically, the
encoder will sometimes set the VC-2 level value to something which explicitly
prohibits non-constant slice sizes. In other cases, I've also seen corrupted
outputs produced.
When in HQ_CBR mode, if the compressedBytes
parameter is set too large, the encoding fails with:
Error: Bytes: value bigger than specified number of bytes
This can be replicated with:
EncodeStream --mode HQ_CBR --fragmentLength 0 --prefix 0 --scalar 1 --framerate 3 --width 1280 --height 720 --format 4:2:2 --bytes 2 --bitDepth 10 --kernel LeGall --waveletDepth 3 --vSlice 2 --hSlice 2 --compressedBytes 1000000 --output Stream --verbose ./linear_ramps.16p2 linear_ramps.vc2
Having a problem with the LD profile workflow. The image going out does not seem like what is going in.
Test script:
ffmpeg -f lavfi -i smptebars=duration=1:size=1920x1080:rate=1 -pix_fmt yuv422p10le -c:v rawvideo bars.yuv
~/vc2-reference/tools/convert_to_16p2 bars.yuv
~/vc2-reference/src/EncodeLD/EncodeLD -v -x 1920 -y 1080 -f 4:2:2 -l 10 -k LeGall -d 3 -u 1 -a 2 -s 4147200 bars.yuv.16p2 bars.vc2
~/vc2-reference/tools/vc2streamdebugger bars.vc2
~/vc2-reference/src/DecodeLD/DecodeLD -v -x 1920 -y 1080 -f 4:2:2 -l 10 -k LeGall -d 3 -u 1 -a 2 -s 4147200 bars.vc2 barsout.16p2
~/vc2-reference/tools/convert_from_16p2 barsout.16p2
ffmpeg -f rawvideo -s 1920x1080 -pix_fmt yuv422p10le -i barsout.16p2.yuv barsout.png
Text output:
/home/ubuntu/vc2-reference/src/EncodeLD/EncodeLD -v -x 1920 -y 1080 -f 4:2:2 -l 10 -k LeGall -d 3 -u 1 -a 2 -s 4147200 bars.yuv.16p2 bars.vc2
input file = bars.yuv.16p2
output file = bars.vc2
bytes per sample= 2
luma depth (bits) = 10
chroma depth (bits) = 10
height = 1080
width = 1920
chroma format = 4:2:2
interlaced = false
wavelet kernel = LeGall (5,3) ("LeGall")
wavelet depth = 3
vertical slice size (in units of 2**(wavelet depth)) = 1
horizontal slice size (in units of 2**(wavelet depth)) = 2
compressed bytes = 4147200
output = Stream
Vertical slices per picture = 135
Horizontal slices per picture = 120
Slice bytes numerator = 256
Slice bytes denominator = 1
Quantisation matrix = 4, 2, 2, 0, 4, 4, 2, 5, 5, 3
Writing Sequence Header
Reading input frame number 0
Forward transform
Determine quantisation indices
Quantise transform coefficients
Split quantised coefficients into slices
Writing compressed picture to file
Mean, Standard Deviation of quantiser index = 0.00, 0.00
End of input reached after 1 frames
0x0000000000 : [ PARSE INFO ]
parse_code : 0x00
next_parse_offset : 0x00000011
prev_parse_offset : 0x00000000
-- Sequence Header --
4 bytes of coded parameters
Major Version : 1
Minor Version : 0
Profile : 0
Level : 3
Base Video Format : 12
Custom Scan Format
Source Sampling : 0
Picture Coding Mode : 0
0x0000000011 : [ PARSE INFO ]
parse_code : 0xc8
next_parse_offset : 0x003f4819
prev_parse_offset : 0x00000011
-- Low Delay Picture --
4147212 bytes of coded data
0x00003f482a : [ PARSE INFO ]
parse_code : 0x10
next_parse_offset : 0x00000000
prev_parse_offset : 0x003f4819
-- End of Sequence --
0x00003f4837 : [ END ]
/home/ubuntu/vc2-reference/src/DecodeLD/DecodeLD -v -x 1920 -y 1080 -f 4:2:2 -l 10 -k LeGall -d 3 -u 1 -a 2 -s 4147200 bars.vc2 barsout.16p2
input file = bars.vc2
output file = barsout.16p2
bytes per sample= 2
luma depth (bits) = 10
chroma depth (bits) = 10
height = 1080
width = 1920
chroma format = 4:2:2
interlaced = false
wavelet kernel = LeGall (5,3) ("LeGall")
wavelet depth = 3
vertical slice size (in units of 2**(wavelet depth)) = 1
horizontal slice size (in units of 2**(wavelet depth)) = 2
output = Decoded
Vertical slices per picture = 135
Horizontal slices per picture = 120
Slice bytes numerator = 256
Slice bytes denominator = 1
Quantisation matrix = 4, 2, 2, 0, 4, 4, 2, 5, 5, 3
Reading compressed input frame number 0
Merge slices into full picture
Inverse quantise
Inverse transform
Copy picture to output frame
Clipping output
Writing decoded output file
End of input reached after 1 frames
The following wavelet test pattern (1 frame, 10 bit, 1280x720, 4:2:2, progressive, YCbCr) causes EncodeHD-CBR
to crash during quantisation index selection.
Example invocation:
$ EncodeHQ-CBR \
-x 1280 -y 720 -r 6 -f 4:2:2 -l 10 \
-k LeGall -d 3 -a 2 -u 1 -s 921600 -S 2 \
input.16p2 out.vc2 -v
bytes per sample= 2
luma depth (bits) = 10
chroma depth (bits) = 10
height = 720
width = 1280
chroma format = 4:2:2
interlaced = false
wavelet kernel = LeGall (5,3) ("LeGall")
wavelet depth = 3
vertical slice size (in units of 2**(wavelet depth)) = 1
horizontal slice size (in units of 2**(wavelet depth)) = 2
compressed bytes = 921600
output = Stream
Vertical slices per picture = 90
Horizontal slices per picture = 80
Slice bytes numerator = 128
Slice bytes denominator = 1
Quantisation matrix = 4, 2, 2, 0, 4, 4, 2, 5, 5, 3
Writing Sequence Header
Reading input frame number 0
Forward transform
Determine quantisation indices
Floating point exception (core dumped)
Note: The test pattern consists of in-range values (i.e. it is a valid picture)
and is designed to produce near-maximum values in intermediate and output
stages of the wavelet transform. For reference, this image is shown below:
This file does not appear to crash EncodeLD
.
When encoding a bitstream, the top-field-first flag is determined by the base
video format selected. The various Encode*
implementations select the base
video format based on unrelated criteria.
In the examples below the following sample frame (1 frame, 10 bit, 1280x720, 4:2:2, progressive, YCbCr)
is used.
Below the -t
argument is given (indicating that top-field-first should be
selected) however the encoder chooses base video format 0 (custom format) which
defines top field first to be false.
EncodeHQ-CBR \
-t \
-x 1280 -y 720 -f 4:2:2 -l 10 -r 1 \
-k LeGall -d 3 -a 2 -u 1 -s $((1280*720*2*2/4)) -S 2 \
input.16p2 /tmp/out.vc2
The custom base format is chosen by the encoder in this case because the
resolution/subsampling/framerate combination does not correspond with an
existing base video format. (As an aside, the most efficient encoding would be
to pick base video format 10 and override the frame rate -- this would also set
the field order correctly in this case.)
As an example of the opposite problem, below the -b
argument is given (bottom
field first) but the encoder chooses base video format 10 (HD 720p 50) which
sets top-field-first to true.
EncodeHQ-CBR \
-b \
-x 1280 -y 720 -f 4:2:2 -l 10 -r 6 \
-k LeGall -d 3 -a 2 -u 1 -s $((1280*720*2*2/4)) -S 2 \
input.16p2 /tmp/out.vc2
This time the video format specified matches base format 10 and hence the
encoder picking that format, though the field order specification is ignored.
Bitstreams which omit the (optional) next_parse_offset
field in the parse
info header are not correctly decoded by DecodeStream
, typically resulting in
bizarre failures to interpret a non-existant sequence header.
For example with the following sample bitstream:
$ DecodeStream absent_next_parse_offset.vc2 out.16p2
Error: DataUnitIO: Custom Quantisation Matrix flag not supported
Picture numbers must have consecutive, ascending integer values, wrapping at (2**32)-1 back to 0.
Using EncodeHQ-ConstQ to encode the real_pictures file with
EncodeHQ-ConstQ \
-x 1280 -y 720 -f 4:2:2 -z 10 -r 6 \
-k LeGall -d 3 -a 2 -u 1 -v -q 0 -S 2 \
./real_pictures.16p2 ./encoded_real_pictures.vc2
gives a non-conformat bitstream with sequential pictures having a picture number of zero. It also gives a nonconformant stream when the interlaced flag (-i
) is set
Encode_CBR produces nonconformant bitstream with the same error if the interlaced flag is set (but not otherwise).
The error: Error: Bytes: value bigger than specified number of bytes
is thrown when the compressedBytes parameter is too large.
It can be reproduced using the following command, which uses unreasonably large (but valid size) slices.
EncodeStream --compressedBytes 527521 --fragmentLength 0 --prefix 0 --scalar 419 --framerate 3 --width1280 --height 720 --format 4:2:2 --bytes 2 --bitDepth 10 --kernel LeGall --waveletDepth 3 --vSlice 90 --hSlice 80 --output Stream --mode HQ_CBR --verbose ./linear_ramps.16p2 linear_ramps_test.vc2
In this example the number of allocated bytess (527521
) is on the cusp of success, --compressedBytes 527520
succeeds.
What should be the input file format for YUV 422?
The DecodeStream
command fails to decode all (concatenated) sequences in a
stream.
For example sample bitstream with two sequences with one picture each:
$ DecodeStream concatenated_sequences.vc2 out.16p2
Have read data unit of type: Sequence Header
Parsing Sequence Header
height = 720
width = 1280
chroma format = 4:2:2
interlaced = false
frame rate = 50 fps
Have read data unit of type: HQ Picture
Parsing Picture Header
Picture number : 0
Wavelet Kernel : LeGall (5,3) ("LeGall")
Transform Depth : 3
Slices Horizontally : 80
Slices Verically : 90
Slice Prefix : 0
Slice Size Scalar : 1
Quantisation matrix = 4, 2, 2, 0, 4, 4, 2, 5, 5, 3
Reading compressed input frame number 0
Merge slices into full picture
Inverse quantise
Inverse transform
Copy picture to output frame
Clipping output
Writing decoded output file
Have read data unit of type: End of Sequence
End of Sequence after 1 frames, exiting
This is likely to be system dependent as others have not had this problem.
When building the project with autotools, my build fails with the error:
./configure: line 16781: syntax error near unexpected token 'fi' ./configure: line 16781: 'fi'
Which relates to an empty else clause in 'configure'. The build is successful if the empty else statements are deleted.
Is this codec under active development? I am interested in getting involved.
The different encoders currently contain lots of repeated code. It would be good to refactor this so that code is shared where possible.
Line endings are a mix of LF and CRLF throughout the repository - it would be nice if these were tidied up.
Many of these overrides are not supported (despite generally being purely
metadata and not requiring any special decoder behaviour). The following test
bitstreams exercise all permissible settings of these flags.
$ DecodeStream custom_flags_1.vc2 out.16p2
$ DecodeStream custom_flags_2.vc2 out.16p2
Error: DataUnitIO: custom_pixel_aspect_ratio_flag set, shouldn't be
$ DecodeStream custom_flags_3.vc2 out.16p2
Error: DataUnitIO: Invalid Frame Rate on Input: 0
$ DecodeStream custom_flags_4.vc2 out.16p2
Error: DataUnitIO: Invalid Frame Rate on Input: 0
NB: The first test file sets all custom flags to false (and is correctly decoded by the decoder). The remaining test files encode exactly the same video format but set some or all custom value flags.
Tested on a fresh setup of xcode8, reproduced also with vanilla clang-3.8.1.
I tested with boost 1.85.0 and boost 1.62.0.
I used the command line from the help and a large enough sample (using /dev/zero works the same way).
Reading input frame number 0
Assertion failed: (std::equal(other.shape(),other.shape()+this->num_dimensions(), this->shape())), function operator=, file /usr/local/include/boost/multi_array/multi_array_ref.hpp, line 484.
Process 58525 stopped
* thread #1: tid = 0x1971f1, 0x00007fffa7e83dda libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fffa7e83dda libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
-> 0x7fffa7e83dda <+10>: jae 0x7fffa7e83de4 ; <+20>
0x7fffa7e83ddc <+12>: movq %rax, %rdi
0x7fffa7e83ddf <+15>: jmp 0x7fffa7e7cd6f ; cerror_nocancel
0x7fffa7e83de4 <+20>: retq
(lldb) bt
* thread #1: tid = 0x1971f1, 0x00007fffa7e83dda libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fffa7e83dda libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fffa7f6e797 libsystem_pthread.dylib`pthread_kill + 90
frame #2: 0x00007fffa7de9440 libsystem_c.dylib`abort + 129
frame #3: 0x00007fffa7db08b3 libsystem_c.dylib`__assert_rtn + 320
frame #4: 0x000000010000684e EncodeHQ-ConstQ`boost::multi_array_ref<int, 2ul>& boost::multi_array_ref<int, 2ul>::operator=<boost::multi_array<int, 2ul, std::__1::allocator<int> > >(this=<unavailable>, other=<unavailable>) + 462 at multi_array_ref.hpp:483 [opt]
frame #5: 0x0000000100002f06 EncodeHQ-ConstQ`main [inlined] boost::multi_array<int, 2ul, std::__1::allocator<int> >::operator=(boost::multi_array<int, 2ul, std::__1::allocator<int> > const&) + 6758 at multi_array.hpp:377 [opt]
frame #6: 0x0000000100002f01 EncodeHQ-ConstQ`main [inlined] Picture::operator=(Picture const&) + 7 at Picture.h:53 [opt]
frame #7: 0x0000000100002efa EncodeHQ-ConstQ`main(argc=<unavailable>, argv=<unavailable>) + 6746 at EncodeHQ-ConstQ.cpp:268 [opt]
frame #8: 0x00007fffa7d55255 libdyld.dylib`start + 1
When using the Low Delay mode and settings specified in SMPTE RP 2047-1:2009, the encode fails with the following error message:
Error: Attempt to write beyond end of bounded write
The incantation used to generate this error is as follows, which corresponds to a compression ratio of 4:1.
EncodeStream --mode LD --width 1920 --height 1080 --format 4:2:2 --bitDepth 10 --framerate 6 --kernel Haar1 --waveletDepth 2 --hSlice 4 --vSlice 1 --compressedBytes 2073600 ./color_bars_1920_1080.16p2 ./color_bars_1920_1080_0.vc2 --verbose
Given a mid gray frame as input, EncodeHQ-CBR
crashes with a floating point
exception during qindex selection.
Mid-Grey frame (1 frame, 10 bit, 1280x720,
4:2:2, progressive, YCbCr)
Coding fails as illustrated follows:
$ EncodeHQ-CBR \
-v \
-x 1280 -y 720 --f 4:2:2 -l 10 -r 6 \
-k LeGall -d 3 -a 2 -u 1 -s $((1280*720*2*2/4)) \
synthetic_grey.16p2 out.vc2
input file = synthetic_grey.16p2
output file = out.vc2
bytes per sample= 2
luma depth (bits) = 10
chroma depth (bits) = 10
height = 720
width = 1280
chroma format = 4:2:2
interlaced = false
wavelet kernel = LeGall (5,3) ("LeGall")
wavelet depth = 3
vertical slice size (in units of 2**(wavelet depth)) = 1
horizontal slice size (in units of 2**(wavelet depth)) = 2
compressed bytes = 921600
output = Stream
Vertical slices per picture = 90
Horizontal slices per picture = 80
Slice bytes numerator = 128
Slice bytes denominator = 1
Quantisation matrix = 4, 2, 2, 0, 4, 4, 2, 5, 5, 3
Writing Sequence Header
Reading input frame number 0
Forward transform
Determine quantisation indices
Floating point exception (core dumped)
The EncodeHQ-CBR
and EncodeHQ-ConstQ
commands require the slice size scaler
(see section 13.5.4 of ST 2042-1) to be specified manually using the -S
argument. If an insufficient slice size scaler is provided, the following
cryptic error is produced:
Error: Bytes: value bigger than specified number of bytes
At a minimum, a better error message is required, but ideally the reference
codec should select a suitable slice size scaler automatically.
The conversion tools convert_from_16p2
and convert_to_16p2
are not compatible with Python 3 due to the changes in handling of integer division and byte arrays.
Similarly, vc2streamdebugger
uses Python 2 print statement.
The build has quite a few warnings in the build process, would be good to tidy them up.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.