Comments on chilliant: sRGB Integer Conversions

Linear to sRGB is the hard part here. The linear t...

2023-11-26T18:55:20.783+00:00

Linear to sRGB is the hard part here. The linear to sRGB conversion is reminiscent of a square root and doesn't work very well with polynomial approximations, making it extremely difficult to round trip all 256 values. The lookup table method involves a 3295 byte lookup table that maps linear 0—3294 to sRGB 0—255, with the 1/12.92 slope of linear mapped to step size of 1. The bit scan reverse or find first set might be used to construct exponent and mantissa of linear value to possibly compress the lookup table, but that doesn't seem to be applicable to vectorized implementations such as for MMX or SSE2. One possible way to approximate sRGB to linear from 8-bit i is by int16_t j=i<<7; int16_t a=10810; a=(a*j)>>16; a+=7424; a=(a*j)>>16; a+=498; a=(a*j)>>16; which converts to a range of 0—3424 and may be implemented in terms of PUNPCKLBW, PSLLW, PMULHW, and PADDW instructions. However, the reverse doesn't seem to be numerically stable enough for a single polynomial, and a piecewise method by PCMPGTW instruction might run out of registers if you tried to splice two quadratics and store the constants in registers. I'm new to SIMD/vector instructions so I'm still trying to figure things out here. I've left the GNU/LLVM duopoly so I'm trying to optimize all I can.

If you're still reading after all these years,...

2019-02-22T02:18:31.779+00:00

If you're still reading after all these years, this is my attempt:
https://github.com/ncruces/go-image/blob/master/imageutil/srgb.go

My approach is different, with two 512 byte LUTs for the forward and reverse transforms.

Isn't the final table two kilobytes, not one? ...

2018-10-28T06:20:38.855+00:00

Isn't the final table two kilobytes, not one?

This is interesting, but it's kind of unfortunate that your focus is on converting to linear instead of from linear. I'm not sure what your use case is for this because I don't know when you would ever have 16-bit data in the sRGB gamma, let alone needing to convert it to linear as fast as possible. In the real world, most input data would be 8-bit sRGB, and you want to convert it to 16-bit linear to do high-precision linear transformations. This can be done directly with a 512 byte LUT. The hard part is converting 16-bit linear back down to 8-bit sRGB efficiently without a huge 64kb LUT.

I imagine your solution here would work just as well though, and since typically we only care about 8-bit sRGB, it would probably work with just a 16-bit LUT for the linear piecewise approximation, so it really would be 1kb. It's not exactly straightforward to compute the tables (and get rounding right) but I can't find any better solution.