Petr Kobalíček has spurred me on to look more closely at the the HLSL conversion between RGB and HSV colour spaces. With a bit of refactoring, I've managed to shave a few more GPU cycles off:
float3 RGBtoHSV(in float3 RGB)
{
float3 HSV = 0;
#if NO_ASM
HSV.z = max(RGB.r, max(RGB.g, RGB.b));
float M = min(RGB.r, min(RGB.g, RGB.b));
float C = HSV.z - M;
#else
float4 RGB4 = RGB.rgbr;
asm { max4 HSV.z, RGB4 };
asm { max4 RGB4.w, -RGB4 };
float C = HSV.z + RGB4.w;
#endif
if (C != 0)
{
float RGB0 = float4(RGB, 0);
float4 Delta = (HSV.z - RGB0) / C;
Delta.rgb -= Delta.brg;
Delta.rgb += float3(2,4,6);
Delta.rgb *= step(HSV.z, RGB.gbr);
#if NO_ASM
HSV.x = max(Delta.r, max(Delta.g, Delta.b));
#else
float4 Delta4 = Delta.rgbr;
asm { max4 HSV.x, Delta4 };
#endif
HSV.x = frac(HSV.x / 6);
HSV.y = 1 / Delta.w;
}
return HSV;
}
Monday 31 January 2011
Sunday 30 January 2011
16-bit Shifts on Z80
One would think that 16-bit, unsigned, binary shifts on a Z80 microprocessor would be as trivial as it comes. But I've only recently realised just how little has been actually written down about optimising Z80 code for even these simplest of cases. So here's a quick guide; alas, probably thirty years too late!
The reason these shifts aren't obvious is because of the inherent asymmetry in the Z80 processor. Although it has a fairly orthogonal 8-bit ALU, its 16-bit (address) pipeline only has a simple adder. Multiplying HL by two (a left shift) is trivial: ADD HL, HL. Shifting right is a whole can of worms; to shift a 16-bit quantity to the right, it is sometimes quicker (and/or shorter) to shift/rotate left and adjust.
I've written a library of routines to perform the shifts, including some that use self-modifying code to eke out a few more T-cycles. I believe they're optimal. Anyone out there know different?
The reason these shifts aren't obvious is because of the inherent asymmetry in the Z80 processor. Although it has a fairly orthogonal 8-bit ALU, its 16-bit (address) pipeline only has a simple adder. Multiplying HL by two (a left shift) is trivial: ADD HL, HL. Shifting right is a whole can of worms; to shift a 16-bit quantity to the right, it is sometimes quicker (and/or shorter) to shift/rotate left and adjust.
I've written a library of routines to perform the shifts, including some that use self-modifying code to eke out a few more T-cycles. I believe they're optimal. Anyone out there know different?
Subscribe to:
Posts (Atom)