Coder Social home page Coder Social logo

goofy's Introduction

Goofy - Realtime DXT1/ETC1 encoder

Actions Status Actions Status

Run WebAssembly Tests in your browser

About

This is a very fast DXT/ETC encoder that I wrote, checking out the following idea. "What if while we design a block compression algorithm, we put the compression speed before everything else?" Of course, our compressed results should be reasonable enough; let's say it should be better than a baseline. Let's set our baseline to a texture downsampled by the factor of two in RGB565 format which gives us the same memory footprint as DXT1/BC1.

Why would we need a compressor that is very-very fast but cannot compete with well-known codecs in terms of quality?

I think such a compressor might be useful for several reasons:

  • To quickly encode an uncompressed texture on the fly. When you need to use uncompressed texture for rendering (synthesised or fetched from the Internet), it may be a good option to compress it first using Goofy to save some device memory and performance.
  • To make a "preview" build for massive projects. Usually, you need to compress thousands of textures before you can play or test the build. Sometimes you don't care about texture quality that much, and you only want to get you playable build as fast as possible.
  • Quick preview for live-sync tools. You can immediately show any texture changes using Goofy and then run a more high-quality but slow encoder in parallel to improve the final look of the texture (progressive live texture sync)

Design Principles

  • Performance over quality.
  • One heuristics to rule them all. In favor of speed, I can't afford to check different combinations or explore a solution space deep enough.
  • SSE2 friendly. Let's get this SSE2 thing to the extreme! I should be able to run encoder sixteen SIMD lanes wide using SSE2 instruction set.

Goofy Algorithm

  1. Find the principal axis using the diagonal of the bounding box in RGB space.
  2. Convert the principal axis to perceptual brightness using the YCoCg color model.
  3. Convert all the 16 block pixel from RGB to perceptual brightness
  4. Project 16 pixels to the principal axis using brightness values
  5. For ETC1 encoder, get a base color as an average color of 16 pixels but adjust the brightness to get into the center of the principal axis.

Of course, the devil is in the detail; there are a lot of small optimizations/tricks on how to make it fast and parallel for 16 pixels at a time. I recommend you to look at the code for this, I tried to make it as clear as possible and made a lot of comments to keep the data transformation flow clear.

ETC1 always encoded using ETC1s format.

NOTE: Due to quantization based on perceptual brightness and because of ETC1s format limitation Goofy codec doesn't fit well for Normal Maps.

Performance and Quality

All the performance timings below gathered for the following CPU: i7-7820HQ, 2.9Ghz single thread. To compute timings, I ran encoder 128 times and chose the fastest timing from the run to avoid noise from OS.

Encoder MP/s RGB-PSNR (db)
Baseline n/a 33.39
Goofy DXT1 1429 37.02
Goofy ETC1 1221 36.30

Those numbers looks pretty good. As far as I can tell, this is the fastest CPU compressor available at the moment. https://github.com/castano/nvidia-texture-tools/wiki/RealTimeDXTCompression

Examples of Compressed Images

Kodim17 Kodim18 Lena

Comparison with other Encoders

For all the encoders in the comparison, I've used the fastest available options/lowest quality.

Encoder MP/s RGB-PSNR (db)
Baseline n/a 33.39
Goofy DXT1 1429 37.02
icbc DXT1 v1.0 (SSE2 enabled, fast DXT encoding using box fitting) 24 41.00
rgbcx v1.08 (level0 low-quality) 60 40.85
ryg DXT1 (STB_DXT_NORMAL) 43 40.82
Goofy ETC1 1221 36.30
Basisu ETC1 n/a 36.27
rg v1.04 ETC1 (low-quality, dithering disabled) 3 40.87

The following chart shows the RGB-PSNR vs. Performance for every image in the test image set. Comparison Chart

Note: As I mentioned earlier compressed Normal Map quality is way worse than photos or albedo textures.

Note: Comparison with "Basisu" is not fair, because this library is supercompressor and target to reduce the final image size. But this is the only ETC1S codec available to compare.

Usage

Goofy is a header-only library and it's very easy to use.

// Add a preprocessor definition and include goofy header
#define GOOFYTC_IMPLEMENTATION
#include <goofy_tc.h>

// You are all set
void test(unsigned char* result, const unsigned char* input, unsigned int width, unsigned int height, unsigned int stride)
{
  goofy::compressDXT1(dest, source, width, height, stride);
  goofy::compressETC1(dest, source, width, height, stride);
}

Next steps

At some point, I hope I'll make a DXT5/ETC2 alpha encoder based on this code. It should be pretty much straightforward because I can use alpha directly instead of brightness.

Look like it should be easy enough to write support for ARM NEON instruction set. Lack of _mm_movemask_epi8 analog may cause some extra troubles, but everything else should be fine.

I appreciate any push requests and improvements. Feel free to ping me and/or send your PRs.

Useful reading (in random order):

Basis Universal GPU Texture Codec by Binomial LLC

https://github.com/BinomialLLC/basis_universal

DXT1/DXT5 compressor. Originally written by Fabian "ryg" Giesen

https://github.com/nothings/stb/blob/master/stb_dxt.h

rg-etc1 encoder by Rich Geldreich

https://github.com/richgel999/rg-etc1

Fast, single source file BC1-5 and BC7/BPTC GPU texture encoders by Rich Geldreich

https://github.com/richgel999/bc7enc

ICBC - A High Quality BC1 Encoder by Ignacio Castano

https://github.com/castano/icbc

The squish open source DXT compression library. Originally written by Simon Brown.

https://github.com/Cavewhere/squish

Etc2Comp - Texture to ETC2 compressor. Originally written by Colt McAnlis.

https://github.com/google/etc2comp

https://medium.com/@duhroach/building-a-blazing-fast-etc2-compressor-307f3e9aad99

SIMD transposes by Fabian "ryg" Giesen

https://fgiesen.wordpress.com/2013/07/09/simd-transposes-1/

Performance Tuning for CPU. Part 2: Advanced SIMD Optimization by Marat Dukhan

https://docs.google.com/presentation/u/1/d/1I0-SiHid1hTsv7tjLST2dYW5YF5AJVfs9l4Rg9rvz48/htmlpresent

A few missing SSE intrinsics by Alfred Klomp

http://www.alfredklomp.com/programming/sse-intrinsics/

Accelerating Texture Compression with Intel Streaming SIMD Extensions by RADU V. (Intel)

https://software.intel.com/en-us/articles/accelerating-texture-compression-with-intel-streaming-simd-extensions

KTX (Khronos Texture) Library and Tools

https://github.com/KhronosGroup/KTX-Software

DXT Compression Techniques by Simon Brown

http://sjbrown.co.uk/2006/01/19/dxt-compression-techniques/

Extreme DXT Compression by Peter Uliciansky

http://www.cauldron.sk/files/extreme_dxt_compression.pdf

Real-Time YCoCg-DXT Compression by J.M.P. van Waveren and Ignacio Castano

https://developer.download.nvidia.com/whitepapers/2007/Real-Time-YCoCg-DXT-Compression/Real-Time%20YCoCg-DXT%20Compression.pdf

Real-Time DXT Compression by J.M.P. van Waveren

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.215.7942&rep=rep1&type=pdf

goofy's People

Contributors

adam-ce avatar sergeymakeev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

goofy's Issues

GCC/Linux support

https://github.com/SergeyMakeev/Goofy/blob/master/GoofyTC/goofy_tc.h#L1453

blRangeY.m128i_u8[0] is not available outside MSVC.

Possible replacement (and I am no expert in using processor intrinsics):

alignas(16) unsigned char m128i_u8[16];
_mm_storeu_si128((__m128i*)m128i_u8, blRangeY);

const uint32_t block0a = etc1BrighnessRangeTocontrolByte[m128i_u8[0]] | ((baseColors.r0 << 3ull) & 0xFFFFFF);

Seems to work OK for me.

Some other things needed

#if defined(__GNUC__) || defined(__clang__)
#  define ALIGN(x) __attribute__ ((aligned(x)))
# define goofy_inline __attribute__((always_inline)) inline
#  define goofy_restrict __restrict
#elif defined(_MSC_VER)
#  define ALIGN(x) __declspec(align(x))
#  define goofy_restrict __restrict
#  define goofy_inline __forceinline
#else
#  error "Unknown compiler; can't define ALIGN"
#endif

    // constants
    ALIGN(16) static const uint32_t gConstEight[4] = { 0x08080808, 0x08080808, 0x08080808, 0x08080808 };
    ALIGN(16) static const uint32_t gConstSixteen[4] = { 0x10101010, 0x10101010, 0x10101010, 0x10101010 };
    ALIGN(16) static const uint32_t gConstMaxInt[4] = { 0x7f7f7f7f, 0x7f7f7f7f, 0x7f7f7f7f, 0x7f7f7f7f };

Possibly also x86intrin.h instead of immintrin.h

Also, the program completely fails if compressETC1 is not wrapped in #pragma optimize( "", off ).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.