How to Compress an Index Buffer

1/26/2021

This is an article going over the technique to compress index buffers presented in my 2007,2008 presentations in more detail here.

Update: Fixed some bugs in my pseudo code. Also important to point out that your choice of index buffer optimizer greatly affects the effectiveness of this technique. Hence, look for a future post where I talk about that a bit.

The process is as follows:

First clean up the data...
1) Optimize the index buffers for cache coherency (another article on this later eventually, there's better ways than the typical methods)
2) Remove degenerate triangles (triangles with two or more indexes which are identical)
3) Remove duplicate triangles (triangles which occur more than once in the same index buffer, even if in different orders)
4) Order the vertexes such that they appear in the same order as the index buffer itself.

The next step we make note that each triangle is typically re-using many elements of the previous triangle - so don't repeat yourself! The most common patterns occur in triangle fans and triangle strips which the index buffer optimizer creates in abundance. This usually has two repeating indexes, and one new index. There are some other cases with only two new indexes and such, but they are rare enough that we just escape all other cases to 3 new indexes. Additionally, we can losslessly rotate the indexes of the triangle to reduce the set of possibilities we need to encode.

So we in effect create a 2 bit "repeat table", where we have an entry per triangle which describes the most common repeating patterns - and also creating a new index buffer which has those repeats omitted.

For the 2 bit table, each value would mean
0 == first index same as last triangle's first index, second index same as last triangle's third index.
1 = = first index same as last triangle's third index, second index same as last triangle's second index.
2 == first index same as last triangle's second index, second index same as last triangle's first index.
3 == escape to encode all 3 indexes as if not repeated.

In code this would be something like:
// need a 2-bit entry for each triangle, except the first one which is assumed to be "3"
char *repeat_table = (char*)malloc(num_indexes/3-1);
char *rptbl = repeat_table; // the 2-bit repeat table, in bytes. entropy coder will do the right thing later.
int *unq_idxs = unique_index_buffer;
int *idxs = index_buffer;
int * idxs_end = index_buffer + num_indexes;
// First triangle is always there.
*unq_idxs++ = *idxs++;
*unq_idxs++ = *idxs++;
*unq_idxs++ = *idxs++;
for(; idxs < idxs_end; idxs+=3) {
int *prev_tri = idxs-3;
bool is_idx_old[3] = {false};
int num_old_idxs = 0;
for(int j = 0; j < 3; ++j) {
if(idxs[j] == prev_tri[0] || idxs[j] == prev_tri[1] || idxs[j] == prev_tris[2]) {
is_idx_old[j] = true;
++num_old_idxs;
}
}
// Previous cleanups should guarantee that degenerate triangles do not happen.
assert(num_old_idxs != 3);
// If we have less than 2 index repeats, then just encode an escape into the repeat table.
if(num_old_idxs < 2) {
memcpy(unq_idxs, idxs, sizeof(*idxs)*3);
unq_idxs += 3;
*rptbl++ = 3;
}
// Rotate the triangle to match one of the encoding if possible.
// We want the two index repeats in the first two indexes of this triangle.
while(!is_idx_old[0] || !is_idx_old[1]) {
// Rotate the triangle
int tmp = idxs[0];
memmove(idxs, idxs+1, sizeof(*idxs)*2);
int btmp = is_idx_old[0];
memmove(is_idx_old, is_idx_old+1, sizeof(*is_idx_old)*2);
}
if(idxs[0] == prev_tri[0] && idxs[1] == prev_tri[2]) {
*unq_idxs++ = idxs[2];
*rptbl++ = 0;
continue;
}
if(idxs[0] == prev_tri[2] && idxs[1] == prev_tri[1]) {
*unq_idxs++ = idxs[2];
*rptbl++ = 1;
continue;
}
if(idxs[0] == prev_tri[1] && idxs[1] == prev_tri[0]) {
*unq_idxs++ = idxs[2];
*rptbl++ = 2;
continue;
}
memcpy(unq_idxs, idxs, sizeof(*idxs)*3);
unq_idxs += 3;
*rptbl++ = 3;
}

The next step is to notice that the indexes which are left are typically sequential (0,1,2,3,4,5,6,7,8,etc) as we re-ordered the vertex buffer to specifically be that way up-front. We can encode that explicitly with a 1-bit table which represents that...
0 == not sequential
1 == sequential

code for that would look something like...

int num_unique_indexes = unq_idxs - unique_index_buffer;
char *chaos_index_buffer = (char*)malloc(num_unique_indexes); // maximum possible size
char *chaos_idxs = chaos_index_buffer;
char *chaos_bits = (char*)malloc(num_unique_indexes);
for(int i = 0, j = 0; i < num_unqiue_indexes; ++i) {
if(unqiue_index_buffer[i] == j) {
chaos_bits[i] = 1;
++j;
} else {
chaos_bits[i] = 0;
*chaos_idxs++ = unique_index_buffer[i];
}
}

Next up in the compression is to notice that the difference in numbers is typically small and can be represented in a small number of bits by delta encoding. Word of warning though, that this was built in the time of SPUs where you had to chunk your data up - as such sometimes depending on your data, that the deltas encoded in the following may be not worth the effort. I would bet though that there would be some on average gain though for most uses.

int num_chaos_indexes = chaos_idxs - chaos_index_buffer;
// Delta-encode the indexes
for(int i = num_chaos_indexes-1; i >= 1; --i) {
chaos_index_buffer[i] -= chaos_index_buffer[i-1];
}

Since the values can go positive or negative, you next need to shift the data to all positive.

// First find the min value
int min_idx = INT_MAX;
for(int i = 0; i < num_chaos_indexes; ++i) {
min_idx = min_idx < chaos_index_buffer[i] ? min_idx : chaos_index_buffer[i];
}

// Then shift it up so its 0 based.
for(int i = 0; i < num_chaos_indexes; ++i) {
chaos_index_buffer[i] -= min_idx;
}

Finally, you need to encode the following data...
1) min_idx
2) chaos_bits
3) chaos_index_buffer
4) repeat_table

and you have a few choices for doing that.
1) You could just throw it at Kraken/Zip and let it handle all the heavy lifting.
2) You could just pack the data straight forward like.

min_idx is 4 bytes (or 2 bytes if this is known 16-bit index buffer)
repeat_table is 2-bits per entry
chaos_bits is 1 bit per entry
chaos_index_buffer is some number of bits, typically less than a full 16bits or 32bits per entry - so you can just find the minimum number of bits it can fit in and encode that directly.

Obviously, Kraken/Zip is probably going to get you the biggest wins, but for a higher overall computational cost vs bit unpacking.

When Kraken encoding, its probably beneficial to split the high and low bytes so that the compression doesn't have to do any heaving lifting there (high bytes should be almost always zero).

For now I leave the decoding of the data as an exercise for the reader - its pretty straight forward just doing the operations in reverse.

0 Comments

How to Compress an Index Buffer

Leave a Reply.

Archives

Categories