Coder Social home page Coder Social logo

aomediacodec / av1-avif Goto Github PK

View Code? Open in Web Editor NEW
445.0 40.0 40.0 560.5 MB

AV1 Image File Format Specification - ISO-BMFF/HEIF derivative

Home Page: https://aomediacodec.github.io/av1-avif/

License: BSD 2-Clause "Simplified" License

HTML 90.60% Makefile 1.31% Bikeshed 8.10%

av1-avif's Introduction

av1-avif

This document describes how to use ISO-BMFF structures to generate a HEIF/MIAF compatible file that contains one or more still images encoded using AV1.

The specification is written using a special syntax (mixing markup and markdown) to enable generation of cross-references, syntax highlighting, ... The file using this syntax is index.bs.

index.bs is processed to produce an HTML version (index.html) by a tool called Bikeshed (https://github.com/tabatkins/bikeshed), which is run when content is pushed onto the master branch or when Pull Requests are made.

av1-avif's People

Contributors

agrange avatar aklemets avatar baumanj avatar cconcolato avatar crissov avatar lambdapioneer avatar ledyba-z avatar leo-barnes avatar paukerr avatar rzumer avatar tdaede avatar y-guyon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

av1-avif's Issues

How about additional constraints: "All frames in Image Sequence file should have the same size (widthxheight)"

Hi. I am currently supporting AVIF/AVIFS images in Chrome, and implementing AVIFS in cavif and davif.

According to Wan-Teh, a chrome developer, it is required to all frames in an animated image to be the same size in Chrome(widthxheight).

I think it's reasonable constraints.

FYI: In WebP, it looks all frame have the same size.
https://github.com/webmproject/libwebp/blob/master/src/webp/mux.h#L277-L312

AVIF, sRGB and CICP

MIAF currently specifies:

For RGB encoding, sRGB should be assumed as indicated by colour_primaries equal to 1, transfer_characteristics equal to 13, matrix_coefficients equal to 0, and full_range_flag equal to 1.

In this context, MPEG/ITU issued a "Defect report for Video CICP (ISO/IEC 23091-2)" on 2019-10-11, N18912.

This raises 2 questions (at least):

  1. what values should be used for AVIF when encoding sRGB content;
  2. Given the default value in MIAF, should the AVIF spec be changed to mandate the colr to include the recommendations from question 1)?

Add cavif/davif to the wiki

Hi.

I would like to add link for cavif and davif to the wiki.
They are used to generate test images I prepared and decode them again to compare to the original images.

While libavif supports many encoders and decoders, cavif and davif just uses libaom to encode and dav1d to decode. Instead, cavif and davif let you control each encoder and decoder in detail.

Thank you.

Distinguish between picture tracks and video tracks

The spec currently treats picture tracks and video tracks as they are the same. I think a picture track can be a sequence of burst images, for example, and would be generated "picture module" of a camera. But a video track is just a plain video track that would be generated by the "video module".
So, while it would be OK to have special restrictions on "picture tracks" (for example, I-frames only), "video tracks" should be mostly unrestricted as they would be created by the camera's normal video module.

Default matrixCoefficients? (handling "unspecified")

Hi all --

In having some discussions with @wantehchang, we were curious if there was an official fallback value for the matrixCoefficients enum, when it is unspecified. Should we fallback to BT709 or BT601?

The current libavif falls back to BT709 coefficients (1) if it is unspecified, but I believe traditional video paths (ffmpeg) tend to fallback to BT601 (6). Is there anywhere in the HEIF or AVIF standard we can use that'd guide us here?

Bikeshed generation

I just worked on my first PR and both my local bikeshed copy as well as the Curl API https://tabatkins.github.io/bikeshed/#curl generate an docs/index.html with a lot of changes vs the status-quo.

If would be great to have either (or both) of the following

  • A reproducible command for local generation (e.g. fixed version)
  • An automated generation (e.g. using CI)

Create conformance file with odd framesizes

There is value in having conformance files that have odd frame sizes. Currently, all test image files have a frame size that is divisible by 8.

By default, AV1/libaom will internally use a frame size aomImage->w, aomImage->h that is a multiple of 8(?). However, it also exposes the to-be-displayed framesize aomImage->d_w, aomImage->d_hand the to-be-rendered framesize aomImage->r_w, aomImage->r_h. (I don't know if there's an important difference between d_w and r_w)

Adding the conformance files makes sure that the right dimensions are referenced for the output. I suggest the following resolutions:

  • Single image: 63x65, 4:2:0
  • Single image: 65x63, 4:4:4

Maybe we want to also include a derived grid image with the "sub images" having odd resolutions?

How about add additional constraints: (bit depth of primary image == bit depth of alpha image)

According to #68, it is valid AVIF file whose bit-depth of primary image and that of alpha plane are different.

(For example, AVIF file with 8bpc primary image and 10bpc alpha plane).

However, as @joedrago says

I think this creates a lot of pain points, corner cases, and complexity for very little gain, if I'm being honest.
AOMediaCodec/libavif#86

I really agree with him. In fact, I support those image in cavif and davif, but it made the code a little bit complex and make very difficult to write test (because there are many combinations to test...).

How about add additional constraints, such as "The bit-depth of the primary image and that of alpha image should be the same".

Multi-layer images

Current spec is not clear about the status of scalable images. An item is said to be a AV1 Sync Sample but this is allowed to contain multiple frames from different layers. This should probably be made explicit and maybe restricted in profiles

Consider defining a brand to identify intra-only sequences

As discussed last week, we should consider whether we want to define a brand to identify when image sequences are intra-only coded.

As of today, you have to inspect the track and detect that all samples are 'sync' samples. This might be a bit late and could result in files being downloaded while not being playable, if the decoder only supports intra.

It might be better defined also within MPEG, as this is codec-independent.

Potential fragmented-only decoder contraint

Based on the AOM call and discussion on the Chrome bug for AVIF, there may be interest in allowing a constrained decoder to only support AVIF in fragmented MP4, as opposed to requiring unfragmented MP4 support.

This could be similar to the way the decoder constraint for reduced / intra-only tooling is handled. Not sure if a flag is required or if the file can simply be inspected.

File extension is misleading

.avif file extension is very misleading. It looks very similar to .avi, and a user may expect that it should be a video, but it is not. Why .webp extension is not utilized? If .webp can't be used because the internal container has been changed (from RIFF to ISOBMFF), we could utilize .webi (Web Image) or .aomi (AOMedia Image), for example.

Grouping of images and new properties

HEIF 2nd Ed Amd 1 defines a set of new properties for the purpose of distinguishing images in a group (bracketing or panorama):

  • AutoExposureProperty
  • WhiteBalanceProperty
  • FocusProperty
  • FlashExposureProperty
  • DepthOfFieldProperty
  • PanoramaProperty

We should determine if we want to do something special with these properties.

still_picture and reduced_still_picture_header in items

As discussed during today's AOM Image call:

  • why do we require still_picture flag set to 1 in item data? That requires flipping this bit when creating an image item from a KEY FRAME in a video sequence. What do we gain by having this flag set?
  • why are we recommending to have reduced_still_picture_header flag set to 1? Should leave freedom to encoders to do what they want.

How to determine repetition count for the AVIFS image sequence?

Hi, I'm currently working on supporting AVIF and AVIFS, and I have a question.

How can I determine the repetition count (infinite loop, fixed time loop, no loop) for the AVIFS image sequence?

I checked ISOBMFF, MIAF, HEIF, and av1-isobmff, but I can't find how many times the avifs should be repeated.

(It looks sample_delta field in stts box determine the display time, but it is unsigned, so I think it can't be used to represent infinite loop.)

Question about "should" statement verification

Section 6.1: "The FileTypeBox should declare at least one profile that enables decoding of the primary image item, or one of its alternates. The profile should allow decoding of any associated auxiliary images, unless it is acceptable to decode the image item without its auxiliary images."

How can we know if it is "acceptable to decode the image item without its auxiliary images"?

Provide reference library and CLI tools (libavif)

For easy integration and experimentation a reference library and command line tools for AVIF should be provided somewhere. It could have a similar structure to webp, which provides a libary libwebp and command line tools cwebp and dwebp.

I would suggest the following 6 packages to be created:

.avif (single frame)

  • libavif: Encoding and decoding .avif library
  • cavif: Command line .avif encoding tool
  • davif: Command line .avif decoding tool

.avifs (multi frame / animation)

  • libavifs: Encoding and decoding .avifs library
  • cavifs: Command line .avifs encoding tool
  • davifs: Command line .avifs decoding tool

In this configuration, the avif tools could be very lightweight, since only the AV1 intra encoding and decoding tools are needed. the avifs tools extend avif with the inter en- and decoding code. cavifs could replace both gif2webp and img2webp.

Patents and Licenses

One of the primary concerns for adopting new image formats is if they are patent encumbered or have non-compatible licenses with what businesses need. The current documentation does not mention patents or licenses.

Even if this is a decision that is planned to be made after the format becomes more stable, that should be mentioned.

4:4:4 chroma sub-sampling and profile "High"

Motivation

Allowing 4:4:4 Chroma sub-sampling is critical to many application areas (e.g. graphical images or photographic content with overlays). AVIF should allow for storing image information with 4:4:4 chroma sub-sampling.

Comparison with other formats:

Proposed change

Change required reader profile to "High" or higher to allow 4:4:4 bit streams.

From the AV1 spec https://aomediacodec.github.io/av1-spec/av1-spec.pdf, p635:

The Main profile supports YUV 4:2:0 or monochrome bitstreams with bit depth equal to 8 or 10. The Highprofile further adds support for 4:4:4 bitstreams with the same bit depth constraints. Finally, the Professional profile extends support over the High profile to also bitstreams with bit depth equal to 12, and also adds support for the 4:2:2 video format

Other

I'd suspect that 444 would play particularly well to the strength of the "chroma from luma prediction" in AV1: https://people.xiph.org/~xiphmont/demo/av1/demo1.shtml

See related issue: #15

why not use WebP

From pretty early on in AV1 development it was assumed that it would replace VP8 in WebP, so why not do that. There are lots of people (including myself) who are not a fan of the trend of creating a new container for every use case and every combination of codecs, especially when it is just a limited subset of one that already exists (why so many people initialy disliked WebM). The way I see it it should either be based on WebP in which case call it WebP or it be based on HEIF in which case call it HEIF, this should be a specification for the "carriage of AV1" in an already existing container (like the av1-isobmff specification is doing).

Edit: Actually why not both.

Edit 2: The simple version for those that are confused; use a common file extension so regular users don't get scared and use some online "converter" to "convert" to jpeg (what the vast majority of people think pictures should be).

Restriction on AV1CodecConfigurationBox not compatible with AV1 ISOBMFF mapping

In this spec it says that the AV1CodecConfigurationBox shall not contain any Sequence Header OBUs. But in the AV1 ISOBMFF mapping spec, it says the opposite. In that spec, the Sequence Header is mandatory and it shall be the first one.

This discrepancy is problematic because ideally the AV1F spec should just be a superset of the AV1-ISOBMFF mapping spec. It should not define things that directly conflict with the AV1-ISOBMFF mapping spec.
It gets particularly confusing with the AV1F file contains an image sequence. As defined in the HEIF spec, an image sequence is stored in a 'pict' track, which is defined just like a video track, except that the HandlerType is 'pict'. But an AV1 video track would require sequence Sequence Header OBUs in the AV1CodecConfigurationBox. The result is that within the same AV1F file, there will be conflicting requirements for the AV1CodecConfigurationBox depending on whether AV1CodecConfigurationBox is used for a still image vs. a picture track. This is not a good situation, IMO.

AVIF and audio/video; create a normal format like "AOM" (AOMedia)

AVIF is based on HEIF, which means it can also contain audio (sic!) and subtitles. Why not make a new format and name it like AOM (AOMedia)? AV1 for video, AA1 for audio, AS1 for subtitles. Yes, you will need to re-invent, but you will immediately make a normal format, which will not be embarrassing to let in the masses. There will be a single ecosystem that does not oblige to support all that obsolete garbage from the WEBM.

Wishes:

  1. Thumbnail should be encoded AV1, why do you need JPEG here?
  2. At the beginning of the file (magic number) you can use "aom". And then add "a" if file contain audio, "v" for video, "s" for subtitles. The order of additional designations is exactly what was called earlier ("avs").
  3. Would not prevent a gradual download, as in the FLIF format. It would be nice to do this by default, but it's already impossible, right? AV1 is already standardized...
  4. Try to make the design of the format more minimalistic, you do not need to do "if it's here, then there, otherwise in other places". Be simpler.
  5. Be doubly simpler. You do not need to support all possible metadata formats. (Make your own, but well-thought-out. Meow.)
  6. Force it on the web.

Let's do better than HEVC, PNG, JPEG, GIF, FLIF, ASS (SSA) and any audio format. We will reconsider everything that we have done before, we will do better. And we will spread this format by all means, because AOMedia contains companies that everyone has heard.

Predictive images and unknown essential properties

HEIF 2nd Edition AMD 1 specifies the ability to store predicted images (inter frames) as items. The item type of such image is the same as for intra images. The way to detect that an image is predicted is by search for an essential property of type pred.

We should determine if we want to have support for AV1 inter frames or not. If so, should we wait for a possible MIAF update?

We should raise awareness of implementers that they should not process image items containing essential properties that they don't understand.

Add expected rendering examples for test files

I'm working on AVIF support for Firefox, and the sample images in https://github.com/AOMediaCodec/av1-avif/tree/master/testFiles have been very helpful, but given the dearth of renderers in the current ecosystem, it's difficult to have confidence in the correctness of output.

For example, I'm pretty confident that Netflix/avif/cosmos_frame01000_yuv420_8bpc_bt709_g22_qlossless.avif is rendering correctly based on this screenshot of my prototype rendering in browser:
cosmos_frame01000_yuv420_8bpc_bt709_g22_qlossless

But I assumed the same frame in 10-bit (cosmos_frame01000_yuv420_10bpc_bt2020_hlg_qlossless.avif) should look essentially the same, causing me to doubt my output when it did not:
cosmos_frame01000_yuv420_10bpc_bt2020_hlg_qlossless

However, the 10-bit version rendered by VLC nightly looks quite similar:
cosmos_frame01000_yuv420_10bpc_bt2020_hlg_qlossless

As does the version rendered by Windows:
cosmos_frame01000_yuv420_10bpc_bt2020_hlg_qlossless

Though I have my doubts about relying on the correctness of other renderers based on the Windows output for cosmos_frame01000_yuv444_8bpc_bt709_g22_qlossless.avif:
cosmos_frame01000_yuv444_8bpc_bt709_g22_qlossless

For comparison here's that same input as rendered by Firefox:
cosmos_frame01000_yuv444_8bpc_bt709_g22_qlossless

And VLC:
cosmos_frame01000_yuv444_8bpc_bt709_g22_qlossless

If it would be possible to see what the input was, it would be a great help in validating the correctness of the output.

New HEIF item properties for metadata

HEIF 2nd Ed Amd 1 defines a lot of new item properties, which can be categorized as metadata properties:

  • CreationTimeProperty
  • ModificationTimeProperty
  • UserDescriptionProperty
  • AccessibilityTextProperty

We should determine if we want to update the spec to mention/mandate some of them.

AVIF should have a different extension/mime-type for lossless mode, as well as additional restrictions

Since the format allows lossless encoding, it should have these to distinguish it from lossy images.
In this case of course the container should only allow such image parameters that are in line of the image being lossless for the file to be valid. (Not sure what else could be done to guarantee that the image is lossless beside a checksum/hashcode)
Otherwise the image should be rejected.

Lossless images of course have some distinct use cases. And it helps if the user can tell whether he acquired a lossless image, or saved a lossless image just by choosing the lossless AVIF variant. One expects them to be editable and re-encodable without quality loss.

Potential issues with two MIME types

  1. There's no precedent for image formats with distinct MIME types for their still and animated variants, e.g., GIF, PNG, and WebP(?). Why was a different decision made here?
  2. As far as I could tell from the document there's no distinct processing model defined for consumers. Producers are told to use specific MIME types, but there are no requirements for consumers getting MIME types with certain bits being wrong. Absence such requirements, consumers will likely use the same decoding path for both, and producers might eventually stop caring about the difference as well as there's no discernible difference.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.