Petr Kobalíček has spurred me on to look more closely at the the HLSL conversion between RGB and HSV colour spaces. With a bit of refactoring, I've managed to shave a few more GPU cycles off:

float3 RGBtoHSV(in float3 RGB)

{

float3 HSV = 0;

#if NO_ASM

HSV.z = max(RGB.r, max(RGB.g, RGB.b));

float M = min(RGB.r, min(RGB.g, RGB.b));

float C = HSV.z - M;

#else

float4 RGB4 = RGB.rgbr;

asm { max4 HSV.z, RGB4 };

asm { max4 RGB4.w, -RGB4 };

float C = HSV.z + RGB4.w;

#endif

if (C != 0)

{

float RGB0 = float4(RGB, 0);

float4 Delta = (HSV.z - RGB0) / C;

Delta.rgb -= Delta.brg;

Delta.rgb += float3(2,4,6);

Delta.rgb *= step(HSV.z, RGB.gbr);

#if NO_ASM

HSV.x = max(Delta.r, max(Delta.g, Delta.b));

#else

float4 Delta4 = Delta.rgbr;

asm { max4 HSV.x, Delta4 };

#endif

HSV.x = frac(HSV.x / 6);

HSV.y = 1 / Delta.w;

}

return HSV;

}

## Monday, 31 January 2011

## Sunday, 30 January 2011

### 16-bit Shifts on Z80

One would think that 16-bit, unsigned, binary shifts on a Z80 microprocessor would be as trivial as it comes. But I've only recently realised just how little has been actually written down about optimising Z80 code for even these simplest of cases. So here's a quick guide; alas, probably thirty years too late!

The reason these shifts aren't obvious is because of the inherent asymmetry in the Z80 processor. Although it has a fairly orthogonal 8-bit ALU, its 16-bit (address) pipeline only has a simple adder. Multiplying HL by two (a left shift) is trivial: ADD HL, HL. Shifting right is a whole can of worms; to shift a 16-bit quantity to the right, it is sometimes quicker (and/or shorter) to shift/rotate left and adjust.

I've written a library of routines to perform the shifts, including some that use self-modifying code to eke out a few more T-cycles. I believe they're optimal. Anyone out there know different?

The reason these shifts aren't obvious is because of the inherent asymmetry in the Z80 processor. Although it has a fairly orthogonal 8-bit ALU, its 16-bit (address) pipeline only has a simple adder. Multiplying HL by two (a left shift) is trivial: ADD HL, HL. Shifting right is a whole can of worms; to shift a 16-bit quantity to the right, it is sometimes quicker (and/or shorter) to shift/rotate left and adjust.

I've written a library of routines to perform the shifts, including some that use self-modifying code to eke out a few more T-cycles. I believe they're optimal. Anyone out there know different?

Subscribe to:
Posts (Atom)