|
In this series of blog posts, I'm going to cover various parts of the research and development behind Bink 2 HDR.
There are 4 different color standards, namely Rec.601, Rec.709, Rec.2020, and Rec.2100. Rec.601, Rec.709 and Rec.2020 all have different definitions of R,G and B. Rec.2020 notably defines a "wide color gamut". Rec.2100 has the same color gamut as Rec.2020, but defines Percentual Quantiers (PQ) and Hybrid Log-Gamma (HLG) for UHDR. First, when working with HDR color spaces, be sure to use linear RGB inputs. No sRGB, Adobe RGB, etc... What about if your loading a floating point format like EXR or HDR? These are both linear, so no conversion *should* be necessary. There are three color spaces in Rec.2100.
Non-Constant Luminance (NCL) YCbCr
This color space is very similar to LDR YCbCr after tone mapping from 10k Luma to LDR range.
The code for converting to/from linear RGB to NCL Y'Cb'Cr' is as follows...
void RGBtoYCbCr2100(float *rgbx, float *ycbcr, int num) {
for(int i = 0; i < num*4; i+=4) {
float r = rgbx[i+0], g = rgbx[i+1], b = rgbx[i+2];
r = smpte2084_encodef(r);
g = smpte2084_encodef(g);
b = smpte2084_encodef(b);
ycbcr[i+0] = 0.2627f*r + 0.6780f*g + 0.0593f*b;
ycbcr[i+1] = -0.13963f*r - 0.36037f*g + 0.5f*b;
ycbcr[i+2] = 0.5f*r - 0.45979f*g - 0.040214f*b;
ycbcr[i+3] = rgbx[i+3];
}
}
void YCbCrtoRGB2100(float *ycbcrx, float *rgbx, int num) {
for(int i = 0; i < num*4; i+=4) {
float Y = ycbcrx[i+0], Cb = ycbcrx[i+1], Cr = ycbcrx[i+2];
float r = Y + 1.4746f*Cr;
float g = Y - 0.164552f*Cb - 0.571352f*Cr;
float b = Y + 1.8814f*Cb;
r = smpte2084_decodef(r);
g = smpte2084_decodef(g);
b = smpte2084_decodef(b);
rgbx[i+0] = r;
rgbx[i+1] = g;
rgbx[i+2] = b;
rgbx[i+3] = ycbcrx[i+3];
}
}
We can display these color spaces using LDR images via tone mapping - as shown below.
Constant Luminance ICtCp
Rec.2100, the latest standard, defines a new color space for HDR which improves on the NCL & CL YCbCr called ICtCp.
The code for converting from Rec.2020 linear RGB to ICtCp is as follows...
void RGBtoICtCp(float *rgbx, float *ictcpx, int num) {
for(int i = 0; i < num*4; i+=4) {
double r = rgbx[i+0], g = rgbx[i+1], b = rgbx[i+2];
double L = 0.4121093750000000*r + 0.5239257812500000*g + 0.0639648437500000*b;
double M = 0.1667480468750000*r + 0.7204589843750000*g + 0.1127929687500000*b;
double S = 0.0241699218750000*r + 0.0754394531250000*g + 0.9003906250000000*b;
L = smpte2084_encodef(L);
M = smpte2084_encodef(M);
S = smpte2084_encodef(S);
ictcpx[i+0] = 0.5*L + 0.5*M;
ictcpx[i+1] = 1.613769531250000*L - 3.323486328125000*M + 1.709716796875000*S;
ictcpx[i+2] = 4.378173828125000*L - 4.245605468750000*M - 0.132568359375000*S;
ictcpx[i+3] = rgbx[i+3];
}
}
The inverse transform of ICtCp back to Rec.2020 RGB is as follows:
void ICtCptoRGB(float *ictcpx, float *rgbx, int num) {
for(int i = 0; i < num*4; i+=4) {
double I = ictcpx[i+0], T = ictcpx[i+1], P = ictcpx[i+2];
double L = I + 0.00860903703793281*T + 0.11102962500302593*P;
double M = I - 0.00860903703793281*T - 0.11102962500302593*P;
double S = I + 0.56003133571067909*T - 0.32062717498731880*P;
L = smpte2084_decodef(L);
M = smpte2084_decodef(M);
S = smpte2084_decodef(S);
rgbx[i+0] = 3.4366066943330793*L - 2.5064521186562705*M + 0.0698454243231915*S;
rgbx[i+1] = -0.7913295555989289*L + 1.9836004517922909*M - 0.1922708961933620*S;
rgbx[i+2] = -0.0259498996905927*L - 0.0989137147117265*M + 1.1248636144023192*S;
rgbx[i+3] = ictcpx[i+3];
}
}
ICtCp claims that it provides an improved color representation that is designed for high dynamic range (HDR) and wide color gamut (WCG). It also claims that for CIEDE2000 color quantization errors 10-bit ICtCp would be equal to 11.5 bit YCbCr. Constant luminance is also improved with ICtCp which has a luminance relationship of 0.998 between the luma and encoded brightness while YCbCr has a luminance relationship of 0.819. An improved constant luminance is an advantage for color processing operations such as chroma subsampling and gamut mapping where only color information is changed.
Note, that I haven't verified these claims yet... Again, here is this color space converted to LDR via tone mapping. Evaluation of Color Spaces
To evaluate the color spaces, I am looking for a few different properties (that come to mind)
Additional Snippets
static double smpte2084_decodef(double fv) {
double num, denom;
fv = pow(fv, 1/SMPTE_2084_M2);
num = fv - SMPTE_2084_C1;
num = num < 0 ? 0 : num;
denom = SMPTE_2084_C2 - SMPTE_2084_C3 * fv;
return pow(num / denom, 1/SMPTE_2084_M1);
}
static double smpte2084_encodef(double v) {
double tmp = pow(v, SMPTE_2084_M1);
return pow((SMPTE_2084_C1 + SMPTE_2084_C2 * tmp) / (1 + SMPTE_2084_C3 * tmp), SMPTE_2084_M2);
}
References
Probably others, but these are the important ones.
1 Comment
In this series of blog posts, I'm going to cover various parts of the research and development behind Bink 2 HDR.
So first thing is deciding on an encoding. Of which there are a very many to choose from. There is...
Just to name a few. Additionally with video games, we have additional constraints such as texture filtering and performance considerations, etc... For example, bi-linear filtering is a linear operation, and the luma representation would have to operate correctly under linear transforms (or at least be fast enough to decode so that it wouldn't matter to first decode then interpolate). Additional x 2 for a video format like Bink, we need to consider various compression artifacts and what those would look like. With so many different formats to choose from, you have to take a step back and instead look at the actual encoding used by the output itself - as that really determines what is best (or used directly). Which leads me to the next topic of SMPTE 2084. SMPTE-2084 ... aka High Dynamic Range Electro-Optical Transfer Function of Mastering Reference Displays (try saying that 5 times fast!) SMPTE-2084 is the format to which Dolby Vision and HDR10 displays use - so its basically the narrow part of the pipeline. Everything you want to display has to go through this non-linear encoding at some point before being displayed on the TV (decoded back to linear in the process as well). The SMPTE-2084 format is locked behind a pay wall (yay) - which I have purchased and will boil it down for you to what I believe is the most important parts. The format defines luma in absolute values between 0 to 10,000 cd/m^2 (candelas per square meter). However, with the caveat that in real implementations of the spec, 10k luma won't actually be representable in anything but pure white color. Additionally, actual displays vary from the absolute curve due to output limitations and effects of non-ideal viewing environments. While the format supports 10, 12, 14, and 16-bit Luma representations, as currently deployed Dolby Vision is 12-bit and HDR10 is 10-bit. 14 and 16-bit is not widely deployed - if deployed at all anywhere other than the reference monitor. Additionally, these are positive numbers only. No negative values. The code to encode/decode a SMPTE-2084 is as follows.
#define SMPTE_2084_M1 (2610.f/4096*0.25f)
#define SMPTE_2084_M2 (2523.f/4096*128)
#define SMPTE_2084_C1 (3424.f/4096)
#define SMPTE_2084_C2 (2413.f/4096*32)
#define SMPTE_2084_C3 (2392.f/4096*32)
// Gives a value of 0 .. 10,000 in linear absolute brightness
float smpte2084_decode(unsigned v, int bits) {
float fv, num, denom;
fv = v / ((1 << bits)-1.f);
fv = fv > 1.f ? 1.f : fv; // Clamp 0 .. 1
fv = powf(fv, 1.f/SMPTE_2084_M2);
num = fv - SMPTE_2084_C1;
num = num < 0.f ? 0.f : num;
denom = SMPTE_2084_C2 - SMPTE_2084_C3 * fv;
return powf(num / denom, 1.f/SMPTE_2084_M1) * 10000.f;
}
// Gives a value between 0 and 2^bits-1 (non-linear)
unsigned smpte2084_encode(float v, int bits) {
float n, tmp;
v /= 10000.f;
v = v > 1.f ? 1.f : v < 0.f ? 0.f : v; // Clamp 0 .. 1
tmp = powf(v, SMPTE_2084_M1);
n = powf((SMPTE_2084_C1 + SMPTE_2084_C2 * tmp) / (1 + SMPTE_2084_C3 * tmp), SMPTE_2084_M2);
return (int)floorf(((1 << bits)-1) * n + 0.5f);
}
The pretty nice thing about the limited range here (10 to 12 bits) is that you can pre-generate a table to decode into floats and store it in a texture or constant buffer or whatever. This makes decoding rather inexpensive! There are still some open questions here regarding its suitability as a video encoding format that I have. Namely...
To be continued.... Announcing my STB style 256 lines of code MPEG1/2 writer! In this initial release the features are: 1) 256 lines of C code (single file) 2) no memory allocations 3) public domain 4) patent free (as far as I know) There is a lot left to be done, notably the encoding of P frames and audio. However, this is very useful as is for a drop-in MPEG writer in any project. Basic Usage: FILE *fp = fopen("foo.mpg", "wb"); jo_write_mpeg(fp, frame0_rgbx, width, height, 60); jo_write_mpeg(fp, frame1_rgbx, width, height, 60); jo_write_mpeg(fp, frame2_rgbx, width, height, 60); // ... fclose(fp); Some notes that this writer takes advantage of the fact that MPEG1/2 is designed so that you can literally concatenate files together to combine movies. Technically each frame is its own movie (until p-frames are implemented, then a set of frames would be its own movie). Some video players mess up here and don't decode correctly, but MPlayer, SMPlayer, FFMpeg and others work correctly. So this is great as a quick intermediate format! Have fun and code responsibly! (j/k)
Many years ago I started work on translating a poorly written paper on a really good shadow mapping technique for publication in Game Engine Gems. I never finished the paper, but the technique is used in production in Firefall. Of all the shadow mapping techniques I've tried, its the best for Firefall's use case. Rather than wait for perfection, I figured I'd just post it in case somebody finds it useful.
Basic Usage: // 4 component. RGBX format, where X is unused char *frame = new char[128*128*4]; jo_gif_t gif = jo_gif_start("foo.gif", 128, 128, 0, 32); jo_gif_frame(&gif, frame, 4, false); // frame 1 jo_gif_frame(&gif, frame, 4, false); // frame 2 jo_gif_frame(&gif, frame, 4, false); // frame 3, ... jo_gif_end(&gif); Where frame holds the RGBA pixels for a frame. You call start, then frame a bunch of times then end. Here is a more interesting example used to create the image above. :) Enjoy!
void hsv2rgb(float hsv[3], float rgb[3]) { if(hsv[1] <= 0.0) { // < is bogus, just shuts up warnings rgb[0] = rgb[1] = rgb[2] = hsv[2]; return; } float hh = hsv[0]; if(hh >= 360) { hh = 0; } hh /= 60; long i = (long)hh; float ff = hh - i; float p = hsv[2] * (1.f - hsv[1]); float q = hsv[2] * (1.f - (hsv[1] * ff)); float t = hsv[2] * (1.f - (hsv[1] * (1.f - ff))); switch(i) { case 0: rgb[0] = hsv[2]; rgb[1] = t; rgb[2] = p; break; case 1: rgb[0] = q; rgb[1] = hsv[2]; rgb[2] = p; break; case 2: rgb[0] = p; rgb[1] = hsv[2]; rgb[2] = t; break; case 3: rgb[0] = p; rgb[1] = q; rgb[2] = hsv[2]; break; case 4: rgb[0] = t; rgb[1] = p; rgb[2] = hsv[2]; break; case 5: rgb[0] = hsv[2]; rgb[1] = p; rgb[2] = q; break; } } int main(int argc, char **argv) { const int w = 256, h = 256; jo_gif_t gif = jo_gif_start("foo.gif", w,h,0,32); for(int frame = 0; frame < 360/4; ++frame) { unsigned char tmp[w*h*4]; double coordX = -0.74529; double coordY = 0.113075; double zoom = 1.5E-4*0.5; for(int y = 0; y < h; ++y) { for(int x = 0; x < w; ++x) { double x0 = (x/double(w) * 3.5 - 2.5) * zoom + coordX; double y0 = (y/double(h) * 3.0 - 1.5) * zoom + coordY; double xx = 0, yy = 0; int iter = 0; while(xx*xx + yy*yy < 2*2 && iter++ < 4096) { double xtmp = xx*xx - yy*yy + x0; yy = 2*xx*yy + y0; xx = xtmp; } int i = y*w*4+x*4; int iter2 = iter + 360 - frame*4; float hsv[3] = { float(iter2%360), 1, iter2 < 4096 ? 1.f : 0.f }; float rgb[3]; hsv2rgb(hsv, rgb); tmp[i+0] = (unsigned char)(rgb[0] * 255); tmp[i+1] = (unsigned char)(rgb[1] * 255); tmp[i+2] = (unsigned char)(rgb[2] * 255); tmp[i+3] = 255; } } jo_gif_frame(&gif, tmp, 4, false); } jo_gif_end(&gif); } The crescent bay demo was very good. The headset was light, latency was fantastic, the picture was solid and sharp taking very good advantage of low persistence, the content was beautifully rendered and very enjoyable. All told a very solid improvement over dk2. The valve demo was better and it's hard to explain why. Their tracking was just as good, their latency and/or persistence was *slightly* worse, I *think* the headset was heavier ( I didn't have both at the same time to directly compare ), they had no hrtf audio - so it wasn't technically a win but still it was better for some reason... Why? I think for a few reasons. One is that the walkable area is 5 times bigger than what oculus demoed increasing immersion immensely, second is they had perfectly tracked controllers which provided a way for you to interact with the virtual world in a very fun and personal way, and third ( and most important ) is that the content they showed was amazingly fun and really showed off the walkable area and controller interaction. The first demo was a small controller introduction where you would press the right trigger and a balloon would blow up out of your hand and float away. It was physically simulated so that you could then interact with the balloon with the controllers. At one point I tried to catch the balloon by instinctively pressing it against myself, but it went right through me ( which was a very weird sensation ). this demo was so incredibly fun. The second demo iirc was where I was on a bridge of a sunken ship under the ocean. Lots of creatures swam by including a giant whale. Very peaceful. Next I think was the VR painting demo which showed a really cool 3d interface and some pretty awesome painting. Very beautiful and very fun! Another demo of a tabletop game where miniature people were fighting each other. Pretty cool, but nothing to write home about there. Though I can see a cool game being made with this kind of setup. There was a surgeon simulator demo which was pretty darn awesome :) you are in space with an alien on the table and another table with various tools on it. Your controllers turned into hands which you could open and close very similarly to a prosthetic hand. It was a little awkward, but I laughed and had a lot of fun doin surgery then taking alien organs and stuffing them into the aliens mouth. Lol There was another demo where you would walk around and depending on where you walked the room would change to a different room. I think this game was there to show off a kind of transportation travel method. It was fun, but a bit confusing. Last demo was an aperture science demo where they had you open drawers pull levers and try to fix a broken robot. Was lots of fun. Then finally the walls were torn away to find yourself in a shipping crate. A giant robot came by. The floor started to tear away. Was just fantastic. That's the end of the demos and then something funny happened that I can't fully explain. When the headset came off, I had a very primal need to get back into VR. I didn't *want* to get back into VR, I *needed* to. Something was compelling me. I noticed it right away as foreign and was a bit confused about how a VR experience can ilicit a drug like response. I spend a lot of time in VR, so this is very unusual. I think the primary cause is the content difference, but I can't be sure. I have been told I am not the only one, and that is kinda cool and also kinda scary. It means awesome things if you are a VR dev as it means VR will spread like an unstoppable virus. They will not be able to make VR headsets fast enough to meet demand - not by a long shot. The down side is that there are some possibly serious and negative societal side effects of VR. That is beyond what I already worried about before it was addictive. We may see some government regulation of VR. All told though as a VR dev myself, I'm super excited and impressed that VR has come this far in such short a time. One thing is clear, the future is virtual. Welcome to part 4 of the DXT compression series. In this series I go over the techniques and results used to compress Firefall's texture data as they are uncovered and implemented.
In this post I go over a lossy algorithm reducing the data size on disk - Or how I went from 2.5bpp to 1.5bpp with very little visible and measurable loss in quality. Previous posts on this topic: Part 1 - Intro Part 2 - The Basics Part 3 - Transposes In Part 2 we determined the baseline of optimized LZMA compression on DXT5 data, which is 2.28bpp on average for my test data set from Orbital Comm Tower in Firefall. In Part 3, we went over various transposes of data and found that they only make a small impact to 2.25bpp. Alright, here we go! Welcome to part 3 of the DXT compression series. In this series I go over the techniques and results used to compress Firefall's texture data as they are uncovered and implemented.
In this post I go over some simple data transpose options with some rather non-intuitive results. Previous posts on this topic: Part 1 - Intro Part 2 - The Basics Part 4 - Entropy In the last post we determined the baseline of straight up LZMA compression on DXT5 data, which is 2.28bpp on average for my test data set from Orbital Comm Tower in Firefall. Welcome to part 2 of the DXT compression series. In this series I go over the techniques and results used to compress Firefall's texture data as they are discovered and implemented. Red 5 Studios has graciously allowed me to post about this work publicly with the intention that peer review and group process will end up with something better overall, not only just for Red 5 but for others in the industry as well. So please do comment and suggest improvements if you have ideas or thoughts on the matter :)
Today I've been researching various DXT compression algorithms that attempt to reduce the on disk footprint of DXT textures. I have a loose requirement though that it has to be lossless. The reason is we already have blocking artifacts due to DXT and I don't want to make them worse. The exception of course is if it really is absolutely not noticeable, even to an artist.
|
Archives
January 2025
Categories |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||




RSS Feed