chilliant: Modulo Arithmetic in Z80 assembler

Sunday, 26 December 2010

Modulo Arithmetic in Z80 assembler

I've done a little 8-bit assembly coding in the past, mostly Z80 (on Sinclair machines) and 6502 (on BBC Micro), but I've never really thought about arithmetic optimisations for those platforms as much as I have the last couple of months.

Imagine that you are (in the 21st century) writing Z80 date manipulation functions where the day of the week was important. You'd naturally want to perform "modulo 7" arithmetic on 16-bit quantities. If you had a routine to perform integer division by 7, you might find "z = x mod 7" given "y = floor(x / 7)" for unsigned 16-bit values as follows:

; Z80 assembly to compute 'x mod 7' from 'x' and 'x div 7'
; 'x' in HL
; 'x div 7' in DE
LD B, D ; 4T 1B -- Take a copy of DE
LD C, E   ; 4T 1B -- BC := x div 7
EX DE, HL ; 4T 1B -- DE := x, HL := x div 7
ADD HL, HL    ; 11T 1B -- HL := (x div 7) * 2
ADD HL, HL    ; 11T 1B -- HL := (x div 7) * 4
ADD HL, HL    ; 11T 1B -- HL := (x div 7) * 8
AND A       ; 4T 1B -- Clear carry flag
SBC HL, BC ; 15T 2B -- HL := (x div 7) * 7
EX DE, HL    ; 4T 1B -- HL := x, DE := (x div 7) * 7
AND A       ; 4T 1B -- Clear carry flag
SBC HL, BC ; 15T 2B -- HL := x - (x div 7) * 7
LD A, L ; 4T 1B -- A:= x mod 7
; 'x mod 7' in A

At first glance, this seems fairly fast, compact code (91 T-cycles and 14 code bytes), but notice that, by the end of it, one side-effect is setting H to zero. Not very surprising as "x mod 7" is clearly an 8-bit quantity. So why go to the trouble of computing full 16-bit intermediates that have only partial bearing on the final result? Providing that the upper eight bits don't "pollute" the lower eight (e.g. during division or change of sign), you need not compute them. Especially as 16-bit arithmetic is disproportionately expensive on Z80:

Obviously uses more precious registers,
Addition of 16-bit quantites takes at least 11 T-cycles compared to 4 T-cycles for 8-bit quantities,
Subtraction via 'SBC' requires worrying about the carry flag as there is no 16-bit 'SUB' instruction,
Shifting 16-bit quantities (multiplication and division by two) other than 'ADD HL, HL' requires two, long instructions.

So the 8-bit equivalent could be:

; Z80 assembly to compute 'x mod 7' from 'x' and 'x div 7'
; 'x' in HL
; 'x div 7' in DE
LD A, E ; 4T 1B -- A := x div 7 [low bits]
ADD A, A    ; 4T 1B -- A := (x div 7) * 2 [low bits]
ADD A, A    ; 4T 1B -- A := (x div 7) * 4 [low bits]
ADD A, A    ; 4T 1B -- A := (x div 7) * 8 [low bits]
SUB E       ; 4T 1B -- A := (x div 7) * 7 [low bits]
NEG      ; 8T 2B -- A := (x div 7) * -7 [low bits]
ADD A, L ; 4T 1B -- A := x mod 7
; 'x mod 7' in A

This is both faster and smaller (32 T-cycles and 8 code bytes) with no loss of accuracy, so I'm a bit bemused that it's taken me nearly three decades to stumble across it. And a lot of good it will do me too!

P.S. Why is 'NEG' so expensive on Z80? Given that 'CPL' is only 4 T-cycles and 2 code bytes, it seems a waste to dedicate precious silicon on a similar instruction that can be represented with existing instructions ('CPL' then 'INC A') for the same cost. Bah humbug!

4 comments :

tecnoxarxaTuesday, 10 November 2015 at 01:37:00 UTC
Thank a lot! The second routine is a great inspiration. Cheers!
ReplyDelete
Replies
AnonymousThursday, 15 August 2024 at 14:50:00 UTC
Hi!
Writes for the seizure of the M4 FORTH translator for the ZX Spectrum (also Z80).
This means that I will solve all possible algorithms and the problems that it causes.
One of the areas is optimization, when some value is known at the time of translation (constant).
Multiplied by a constant.
Divide by a constant.
And modulo the constant is also included in that.
The math I came up with is quite interesting.
You can calculate the modulo without first calculating the division.
It works on the principle that you can divide/break a number into two parts.
Figure out how to modify the first part so that it can be ignored as it won't affect the result and continue the calculation with a smaller number.
For example, a 16-bit number with mod 7 can be divided into upper 7 bits and lower 9 bits.
The upper 7 bits represent the number 512*hi9.
512 can be divided into (511+1)*hi9 = 7*73*x+x
Where the 7*73*x part can be ignored.
And we can continue the calculation with the number hi9+lo9...

dworkin@dw-A15:~/Programovani/ZX/Forth/Test$ ../check_word.sh '_7UMOD'
;[27:121+] 7umod ( u -- u_mod_7 )
;# num16bit = 512*hi9bit + lo9bit
;# num16bit = 73*7*hi9bit + hi9bit + lo9bit --> num10bit
ld A, H ; 1:4 7umod L = lo(lo9bit)
and 0x01 ; 2:7 7umod
ld B, A ; 1:4 7umod B = hi(lo9bit)
xor H ; 1:4 7umod
rra ; 1:4 7umod
ld C, A ; 1:4 7umod C = lo(hi9bit)
ld H, 0x00 ; 2:7 7umod H = hi(hi9bit)
add HL, BC ; 1:11 7umod num10bit = hi9bit+lo9bit = 0 .. 638
;# num10bit = 64*hi6bit + lo6bit
;# num10bit = 9*7*hi6bit + hi6bit + lo6bit --> num7bit
ld A, L ; 1:4 7umod
and 64-1 ; 2:7 7umod A = lo6bit
add HL, HL ; 1:11 7umod
add HL, HL ; 1:11 7umod H = hi6bit
add A, H ; 1:4 7umod num7bit = hi6bit+lo6bit = 0..71
ld HL, 0x0007 ; 3:10 7umod
sub 35 ; 2:7 7umod 5*7
jr nc, $-2 ; 2:7/12 7umod
add A, L ; 1:4 7umod
jr nc, $-1 ; 2:7/12 7umod
ld L, A ; 1:4 7umod HL = num16bit mod 7 = 0..6
; seconds: 0 ;[27:121]
ReplyDelete
Replies
AnonymousThursday, 15 August 2024 at 14:53:00 UTC
"seizure" --> translation / linker
Google translate failed
ReplyDelete
Replies

Add comment

chilliant

Sunday, 26 December 2010

Modulo Arithmetic in Z80 assembler

4 comments :

Search This Blog

Blog Archive

About Me

Links