In this post I go over some simple data transpose options with some rather non-intuitive results.

Previous posts on this topic:

Part 1 - Intro

Part 2 - The Basics

Part 4 - Entropy

In the last post we determined the baseline of straight up LZMA compression on DXT5 data, which is 2.28bpp on average for my test data set from Orbital Comm Tower in Firefall.

## Transposing DXT5

The conversion from an array of structures (AOS) to a structure of arrays (SOA).

// array of structures (AOS)

struct {

unsigned char alphaLineStart;

unsigned char alphaLineEnd;

unsigned char alphaSelectionBits[6];

unsigned short colorLineStart;

unsigned short colorLineEnd;

unsigned int colorSelectionBits;

} dxt5Blocks[1024];

// structure of arrays (SOA)

struct {

unsigned char alphaLineStart[1024];

unsigned char alphaLineEnd[1024];

unsigned char alphaSelectionBits[1024*6];

unsigned short colorLineStart[1024];

unsigned short colorLineEnd[1024];

unsigned int colorSelectionBits[1024];

} dxt5Blocks;

Conversion to SOA should be better for compression, because similar elements are adjacent to each other. That is rather than the default of interleaved with elements which are related, but have different statistical properties.

We have 3 textures of DXT5 blocks with 6 components for each block.

- color selection bits per pixel (C[0-2] bits)
- alpha selection bits per pixel (A[0-2] bits)
- a color line start (C[0-2] start)
- a color line end (C[0-2] end)
- an alpha line start (A[0-2] start)
- an alpha line end (A[0-2] end)

I attempted compressing the following permutations of these components:

__2.28bpp__-**Compress all arrays at once**. IE, LZ(C0 bits, C1 bits, C2 bits, A0 bits, A1 bits, A2 bits, C0 start, C1 start, C2 start, C0 end, C1 end, C2 end, A0 start, A1 start, A2 start, A0 end, A1 end, A2 end).__2.38bp__p -**Compress 18 different arrays individually.**IE, LZ(C0 bits), LZ(C1 bits), LZ(C2 bits), LZ(A0 bits), LZ(A1 bits), LZ(A2 bits), LZ(C0 start), LZ(C1 start), LZ(C2 start), LZ(C0 end), LZ(C1 end), LZ(C2 end), LZ(A0 start), LZ(A1 start), LZ(A2 start), LZ(A0 end), LZ(A1 end), LZ(A2 end)__2.32bpp__-**Compress 6 different arrays, one for each DXT5 component where the component data of all 3 textures is concatenated end to end**. IE, LZ(C[0-2] bits), LZ(A[0-2] bits), LZ(C[0-2] start), LZ(C[0-2] end), LZ(A[0-2] start), LZ(A[0-2] end)

An alternate variation is grouping the color start and end points together and treating them as a pair. This gives 4 different components to a DXT5 block:

- color selection bits per pixel (C[0-2] bits)
- alpha selection bits per pixel (A[0-2] bits)
- a color line start & end (C[0-2] line)
- an alpha line start & end (A[0-2] line)

I tested the following permutations with this variation of logical DXT elements:

__2.27bpp__-**Compress all arrays at once**. IE, LZ(C bits, A bits, C line, A line)__2.32bpp__-**Compress 12 different arrays individually**. IE, LZ(C0 bits), LZ(C1 bits), LZ(C2 bits), LZ(A0 bits), LZ(A1 bits), LZ(A2 bits), LZ(C0 line), LZ(C1 line), LZ(C2 line), LZ(A0 line), LZ(A1 line), LZ(A2 line)__2.27bpp__-**Compress 4 different arrays, one for each component where the data of all 3 textures is concatenated end to end.**IE, LZ(C bits), LZ(A bits), LZ(C line), LZ(A line)

There is also another variation that is better @ 2.26bpp and also is faster perf wise. You can combine the color line and alpha line into one data set (non-interleaved).

- color selection bits per pixel (C[0-2] bits)
- alpha selection bits per pixel (A[0-2] bits)
- a color/alpha line start & end (C[0-2] lines, A[0-2] lines)

I tried a few other variations but overall, 2.26bpp was the best.

The question still remains though, which method described above is most complementary to the baseline? As in, which does the best job at compressing stuff that the baseline does not?

The best complementary transpose @ 2.26bpp is LZMA compression with knobs

- LZ(C[0-2] bits), default knobs
- LZ(A[0-2] bits), default knobs
- LZ(C[0-2] line, A[0-2] line), lc = 0, lp = 1, pb = 1

The down side is that on average transposing does not make compression

__significantly__better when compared with optimally tuned straight up LZMA.

The up side, if you switch between baseline and the best complementary transpose conditionally, you can get the size down a small bit further to 2.25bpp. A small, but expected improvement of about 0.5%.

If however you do

**not**have the option to optimally tune LZMA knobs (transferring to a web-browser over Javascript via gzip), trying out the best complementary layout may give better results than the more typical transpose all elements in some cases. As always, be sure to measure and test your own data sets.