Computus: PUSH DE PUSH HL LD D, H LD A, L SRL D RRA LD E, A LD A, 1 AND H LD H, A LD A, 133 SUB D LD C, A LD B, 0 ADD HL, BC LD A, 63 AND L ADD HL, HL ADD HL, HL LD L, A LD A, H RLCA RLCA RLCA SUB H ADD A, L LD L, A RRA ADD A, L RRA OR A RRA ADD A, L RRA ADD A, L RLCA RLCA RLCA AND 7 LD H, A RLA RLA RLA ADD A, H RLA ADD A, H SUB L NEG LD C, A LD H, D LD L, E SRL H RR L PUSH HL SRL H RR L ADD HL, DE PUSH BC LD B, H LD C, L SUB A ADD HL, HL RLA ADD HL, HL RLA ADD HL, HL RLA LD L, H LD H, A EX DE, HL SBC HL, DE LD DE, 16 ADD HL, DE SUB A ADD HL, HL RLA ADD HL, HL RLA ADD HL, HL RLA LD L, H LD H, A ADD HL, BC SUB A ADD HL, HL RLA ADD HL, HL RLA LD L, H LD H, A LD D, H LD E, L ADD HL, HL ADD HL, HL ADD HL, DE ADD HL, HL LD A, 66 SUB H LD C, A LD B, 0 ADD HL, HL ADD HL, HL ADD HL, BC ADD HL, DE ADD HL, HL POP BC LD B, H LD H, D LD L, E SRL H RR L SRL H RR L EX DE, HL XOR A SBC HL, DE PUSH HL LD E, B LD A, C ADD A, A ADD A, A ADD A, A LD H, 0 LD L, A LD D, H ADD HL, HL LD A, 5 ADD A, C LD B, A ADD A, A ADD A, B SBC HL, DE LD E, A ADD HL, DE POP DE ADD HL, DE LD A, L AND 31 ADD HL, HL ADD HL, HL ADD HL, HL SLA H ADD A, H LD B, A POP HL SBC HL, DE PUSH HL RRCA RRCA RRCA RRCA AND 15 ADD A, B INC A AND 224 LD H, A RLCA RLCA RLCA RLCA SUB H ADD A, B LD D, 0 LD B, A ADD A, A ADD A, A ADD A, A ADD A, A LD H, D RL H LD L, A LD A, 53 ADD A, C LD E, A ADD HL, DE LD A, H RRA CPL ADD A, 29 ADD A, B LD C, A ADD A, 2 LD E, A POP HL ADD HL, DE EX DE, HL POP HL PUSH HL XOR A ADD HL, DE RLA LD E, L ADD HL, HL RLA ADD HL, HL RLA LD D, H RL H RLA RL H RLA LD B, A LD A, D AND 63 LD D, A LD A, E AND 63 ADD A, D ADD A, B LD D, A LD E, 31 RRA RRA RRA AND E ADD A, D RRA RRA INC A RRA AND E ADD A, D RRA RRA RRA AND E ADD A, D AND 248 LD E, A RRA RRA RRA SUB E ADD A, D SUB C NEG LD D, A SUB 32 SBC A, A ADD A, 4 LD C, A SUB 3 ADD A, D AND 31 LD B, A LD A, D POP HL POP DE RET
Here's a faster version that runs in less than a third of a millisecond (on a standard 4Mhz Z-80). It takes 1323 T cycles, down from 1513 in the version I posted here on February 8.
I put a copy at http://petrofsky.org/src/computus.asm
The source, with comments, won't fit within this site's 4k comment size limit, but here's the terse version:
; Compute proleptic Gregorian/Anglican Easter, per the ; British Calendar Act, 24 Geo. II c. 23 (1751) and "Romani ; Calendarii a Gregorio XIII. P. M. Restituti Explicatio" ; (Clavius, 1603).
; Authored 2011 by Al Petrofsky .
; Derived from a version by Ian Taylor ; dated January 2011. I rewrote and sped up the code for ; the five major division/modulo operations (divisors 19, ; 100, 25, 30, and 7). I also removed all branching. Some ; of the miscellaneous values named by Taylor have names ; starting with "T". Names I added start with U.
; Features: ; No self-modifying code (suitable for ROMming) ; No subroutines or data tables (completely relocatable code) ; No branching (just put one foot in front of the other)
; Measurements, as of rev. 1.17: ; clock ticks: 1323 ; code bytes: 278 ; opcode fetches: 241 ; stack words: 4 (plus return address) ; (those all exclude the call but include the return)
; On entry: ; HL is Gregorian year (0..65535) ; On exit: ; A is "Day of March" for Easter (22..56) ; B is day of month of Easter (1..31) ; C is month of year of Easter (3..4) ; F is trashed ; DE and HL are saved and restored. ; IX, IY, and the alternate registers are never touched ; (and therefore will be available during interrupts)
Computus: PUSH DE ; Save DE PUSH HL ; Save Year
;;; Compute Year % 19 (Golden Number - 1) (full 16-bit numerator):
LD D, H ; DE = Year LD E, L LD A, 1 ; Let U2 = (Year & 511) + 133 - (Year >> 9) AND H LD H, A ; HL = Year & 511 LD A, D RRA CPL ADD A, 134 ; A = 133 - (Year >> 9) LD C, A LD B, 0 ADD HL, BC ; HL = U2 (range 6 to 644) (U2 % 19 == Year % 19) LD A, 63 ; Let U5 = 7 * (U2 >> 6) + (U2 & 63) AND L ; A = U2 & 63 ADD HL, HL ADD HL, HL ; H = U2 >> 6 LD L, A LD A, H RLCA RLCA RLCA SUB H ; A = 7 * (U2 >> 6) ADD A, L LD L, A ; L = U5 (range 6 to 126) (U5 % 19 == Year % 19) RRA ; Let U6 = (((U5 >> 1) + U5 >> 2) + U5 >> 1) + U5 >> 5 << 4 ADD A, L RRA OR A RRA ADD A, L RRA ADD A, L RRA AND 240 LD H, A ; H = U6 RRA ; Let U7 = U5 - (((U6 >> 1) + U6 >> 3) + U6) ADD A, H RRA RRA RRA ; A = (U6 >> 1) + U6 >> 3 ADD A, H SUB L LD C, A ; C = - U7
Sorry about that, blogspot decided your second comment was spam. I've restored it. The comments answer my questions nicely, though I'll have to study your optimisations carefully.
Since my February 11 post, I've managed to reduce runtime another 20% (from 1323 clock cycles to 1056), and stack usage has shrunk a whopping 50% (from 4 words to 2). I've also improved the documentation (although it still falls well short of really explaining anything). See rev. 1.32 of computus.asm in the src directory on petrofsky.org (to which there should be a hyperlink under my name above this comment). For older versions, and the log of changes between versions, see the RCS subdirectory.
Thank you for the esoteric challenge! I don't think I'll be able to make the code much faster than this. Actually, I know I could save ten more cycles by using your ending, which makes use of a conditional return, but I really like the feature of a fixed execution time, just in case, say, you want to compute Easter date tables during the delay loop while driving a speaker to generate an accurate pitch for tuning your oboe.
In addition to testing the code for all years (0 to 65535) on an emulator, I've also now lightly tested it (for a few hand-typed years) on some real hardware: the Z-80A in a TRS-80 Model 4D. Here's the BASIC code (at boot-time, answer 32000 (or less) to "Memory Size?"):
100 POKE 16526,1:POKE 16527,125 110 FOR X%=32001 TO 32264:READ I%:POKE X%,I%:NEXT 120 DA%=0:X=VARPTR(DA%):POKE 32012,X AND 255:POKE 32013,X/256 130 MO%=0:X=VARPTR(MO%):POKE 32016,X AND 255:POKE 32017,X/256 140 INPUT "Year";YR 150 IF YR>32767 THEN YR=YR-65536 160 DM=USR(YR) 170 IF MO%=3 THEN MO$="March" ELSE MO$="April" 180 PRINT "Easter Sunday is " MO$ DA% 190 GOTO 140 200 REM Code to interface between BASIC and Computus 210 DATA 205,127,10,205,21,125,111,175,103,120,50,0 220 DATA 0,121,50,0,0,195,154,10 230 REM Computus, from computus.asm rev 1.32. 240 DATA 76,175,203,25,71,23,213,93,203,19,23,203 250 DATA 19,23,87,135,130,135,130,198,133,145,79,125 260 DATA 230,63,129,95,31,131,31,183,31,131,31,131 270 DATA 31,230,240,79,31,129,31,31,31,129,147,198 280 DATA 138,87,124,31,229,203,29,31,203,29,230,63 290 DATA 31,132,31,183,31,132,203,16,23,203,16,79 300 DATA 135,129,135,135,135,129,149,47,198,26,95,31 310 DATA 31,230,63,31,131,31,183,31,131,23,23,23 320 DATA 23,230,15,129,79,136,145,71,96,105,41,41 330 DATA 9,41,41,62,51,148,41,9,41,133,124,206 340 DATA 13,96,105,203,60,203,29,9,203,60,203,29 350 DATA 149,79,156,145,71,122,135,135,130,135,130,145 360 DATA 79,152,145,31,121,31,95,6,31,31,31,31 370 DATA 160,31,60,131,31,31,31,230,30,129,160,79 380 DATA 203,34,209,222,28,62,235,153,79,9,235,68 390 DATA 125,203,24,31,230,254,131,95,122,136,31,87 400 DATA 125,47,23,230,7,203,19,23,203,19,23,130 410 DATA 87,123,31,31,130,87,31,31,230,127,31,130 420 DATA 31,31,230,63,31,138,31,31,31,130,230,7 430 DATA 145,87,254,32,222,255,71,146,198,3,79,122 440 DATA 209,203,168,201
I'll just leave this here.
ReplyDeleteComputus:
PUSH DE
PUSH HL
LD D, H
LD A, L
SRL D
RRA
LD E, A
LD A, 1
AND H
LD H, A
LD A, 133
SUB D
LD C, A
LD B, 0
ADD HL, BC
LD A, 63
AND L
ADD HL, HL
ADD HL, HL
LD L, A
LD A, H
RLCA
RLCA
RLCA
SUB H
ADD A, L
LD L, A
RRA
ADD A, L
RRA
OR A
RRA
ADD A, L
RRA
ADD A, L
RLCA
RLCA
RLCA
AND 7
LD H, A
RLA
RLA
RLA
ADD A, H
RLA
ADD A, H
SUB L
NEG
LD C, A
LD H, D
LD L, E
SRL H
RR L
PUSH HL
SRL H
RR L
ADD HL, DE
PUSH BC
LD B, H
LD C, L
SUB A
ADD HL, HL
RLA
ADD HL, HL
RLA
ADD HL, HL
RLA
LD L, H
LD H, A
EX DE, HL
SBC HL, DE
LD DE, 16
ADD HL, DE
SUB A
ADD HL, HL
RLA
ADD HL, HL
RLA
ADD HL, HL
RLA
LD L, H
LD H, A
ADD HL, BC
SUB A
ADD HL, HL
RLA
ADD HL, HL
RLA
LD L, H
LD H, A
LD D, H
LD E, L
ADD HL, HL
ADD HL, HL
ADD HL, DE
ADD HL, HL
LD A, 66
SUB H
LD C, A
LD B, 0
ADD HL, HL
ADD HL, HL
ADD HL, BC
ADD HL, DE
ADD HL, HL
POP BC
LD B, H
LD H, D
LD L, E
SRL H
RR L
SRL H
RR L
EX DE, HL
XOR A
SBC HL, DE
PUSH HL
LD E, B
LD A, C
ADD A, A
ADD A, A
ADD A, A
LD H, 0
LD L, A
LD D, H
ADD HL, HL
LD A, 5
ADD A, C
LD B, A
ADD A, A
ADD A, B
SBC HL, DE
LD E, A
ADD HL, DE
POP DE
ADD HL, DE
LD A, L
AND 31
ADD HL, HL
ADD HL, HL
ADD HL, HL
SLA H
ADD A, H
LD B, A
POP HL
SBC HL, DE
PUSH HL
RRCA
RRCA
RRCA
RRCA
AND 15
ADD A, B
INC A
AND 224
LD H, A
RLCA
RLCA
RLCA
RLCA
SUB H
ADD A, B
LD D, 0
LD B, A
ADD A, A
ADD A, A
ADD A, A
ADD A, A
LD H, D
RL H
LD L, A
LD A, 53
ADD A, C
LD E, A
ADD HL, DE
LD A, H
RRA
CPL
ADD A, 29
ADD A, B
LD C, A
ADD A, 2
LD E, A
POP HL
ADD HL, DE
EX DE, HL
POP HL
PUSH HL
XOR A
ADD HL, DE
RLA
LD E, L
ADD HL, HL
RLA
ADD HL, HL
RLA
LD D, H
RL H
RLA
RL H
RLA
LD B, A
LD A, D
AND 63
LD D, A
LD A, E
AND 63
ADD A, D
ADD A, B
LD D, A
LD E, 31
RRA
RRA
RRA
AND E
ADD A, D
RRA
RRA
INC A
RRA
AND E
ADD A, D
RRA
RRA
RRA
AND E
ADD A, D
AND 248
LD E, A
RRA
RRA
RRA
SUB E
ADD A, D
SUB C
NEG
LD D, A
SUB 32
SBC A, A
ADD A, 4
LD C, A
SUB 3
ADD A, D
AND 31
LD B, A
LD A, D
POP HL
POP DE
RET
Here's a faster version that runs in less than a third of a millisecond (on a standard 4Mhz Z-80). It takes 1323 T cycles, down from 1513 in the version I posted here on February 8.
ReplyDeleteI put a copy at http://petrofsky.org/src/computus.asm
The source, with comments, won't fit within this site's 4k comment size limit, but here's the terse version:
d5 e5 54 5d 3e 01 a4 67 7a 1f 2f c6 86 4f 06 00 09 3e 3f a5 29 29 6f 7c 07 07 07 94 85 6f 1f 85 1f b7 1f 85 1f 85 1f e6 f0 67 1f 84 1f 1f 1f 84 95 4f 62 6b cb 3c cb 1d cb 3c cb 1d e5 c5 7c 60 b7 1f 82 1f b7 1f 82 cb 10 17 cb 10 4f 87 81 87 87 87 81 95 2f c6 1a 5f 1f 1f e6 3f 1f 83 1f b7 1f 83 17 17 17 17 e6 0f 6f 09 2b 54 5d 29 29 19 29 3e 42 94 06 00 4f 29 29 09 19 29 4c 62 6b cb 3c cb 1d cb 3c cb 1d af ed 52 eb e1 7d e1 19 e5 eb 09 5f 87 87 83 87 83 c6 e1 95 6f 9c 95 1f 7d 1f 1f 1f 1f e6 1e 67 7d e6 1f 84 57 1f 1f 1f e6 1f 1f 8a e6 e0 67 07 07 07 07 94 82 4f 87 87 87 87 60 cb 14 6f 3e 35 93 50 5f 19 7c 1f 2f c6 1e 81 4f 3c 5f e1 19 d1 d5 af 19 17 5d 29 17 29 17 54 cb 14 17 cb 14 17 47 7a 26 3f a4 57 7b a4 82 80 57 1f 1f a4 1f 82 1f 1f a4 1f 8a e6 f8 5f 1f 1f 1f 93 82 91 2f 57 c6 e0 ce 20 47 92 c6 03 4f cb a8 7a e1 d1 c9
And here's the start of the verbose version:
; Z-80 Computus
; Compute proleptic Gregorian/Anglican Easter, per the
; British Calendar Act, 24 Geo. II c. 23 (1751) and "Romani
; Calendarii a Gregorio XIII. P. M. Restituti Explicatio"
; (Clavius, 1603).
; Authored 2011 by Al Petrofsky .
; Derived from a version by Ian Taylor
; dated January 2011. I rewrote and sped up the code for
; the five major division/modulo operations (divisors 19,
; 100, 25, 30, and 7). I also removed all branching. Some
; of the miscellaneous values named by Taylor have names
; starting with "T". Names I added start with U.
; Features:
; No self-modifying code (suitable for ROMming)
; No subroutines or data tables (completely relocatable code)
; No branching (just put one foot in front of the other)
; Measurements, as of rev. 1.17:
; clock ticks: 1323
; code bytes: 278
; opcode fetches: 241
; stack words: 4 (plus return address)
; (those all exclude the call but include the return)
; On entry:
; HL is Gregorian year (0..65535)
; On exit:
; A is "Day of March" for Easter (22..56)
; B is day of month of Easter (1..31)
; C is month of year of Easter (3..4)
; F is trashed
; DE and HL are saved and restored.
; IX, IY, and the alternate registers are never touched
; (and therefore will be available during interrupts)
Computus:
PUSH DE ; Save DE
PUSH HL ; Save Year
;;; Compute Year % 19 (Golden Number - 1) (full 16-bit numerator):
LD D, H ; DE = Year
LD E, L
LD A, 1 ; Let U2 = (Year & 511) + 133 - (Year >> 9)
AND H
LD H, A ; HL = Year & 511
LD A, D
RRA
CPL
ADD A, 134 ; A = 133 - (Year >> 9)
LD C, A
LD B, 0
ADD HL, BC ; HL = U2 (range 6 to 644) (U2 % 19 == Year % 19)
LD A, 63 ; Let U5 = 7 * (U2 >> 6) + (U2 & 63)
AND L ; A = U2 & 63
ADD HL, HL
ADD HL, HL ; H = U2 >> 6
LD L, A
LD A, H
RLCA
RLCA
RLCA
SUB H ; A = 7 * (U2 >> 6)
ADD A, L
LD L, A ; L = U5 (range 6 to 126) (U5 % 19 == Year % 19)
RRA ; Let U6 = (((U5 >> 1) + U5 >> 2) + U5 >> 1) + U5 >> 5 << 4
ADD A, L
RRA
OR A
RRA
ADD A, L
RRA
ADD A, L
RRA
AND 240
LD H, A ; H = U6
RRA ; Let U7 = U5 - (((U6 >> 1) + U6 >> 3) + U6)
ADD A, H
RRA
RRA
RRA ; A = (U6 >> 1) + U6 >> 3
ADD A, H
SUB L
LD C, A ; C = - U7
Okay, I give in! Brilliant. How did you come up with that? Also, I'm a bit confused as to what B and C hold on return.
ReplyDeleteSorry about that, blogspot decided your second comment was spam. I've restored it. The comments answer my questions nicely, though I'll have to study your optimisations carefully.
ReplyDeleteSince my February 11 post, I've managed to reduce runtime another 20% (from 1323 clock cycles to 1056), and stack usage has shrunk a whopping 50% (from 4 words to 2). I've also improved the documentation (although it still falls well short of really explaining anything). See rev. 1.32 of computus.asm in the src directory on petrofsky.org (to which there should be a hyperlink under my name above this comment). For older versions, and the log of changes between versions, see the RCS subdirectory.
ReplyDeleteThank you for the esoteric challenge! I don't think I'll be able to make the code much faster than this. Actually, I know I could save ten more cycles by using your ending, which makes use of a conditional return, but I really like the feature of a fixed execution time, just in case, say, you want to compute Easter date tables during the delay loop while driving a speaker to generate an accurate pitch for tuning your oboe.
In addition to testing the code for all years (0 to 65535) on an emulator, I've also now lightly tested it (for a few hand-typed years) on some real hardware: the Z-80A in a TRS-80 Model 4D. Here's the BASIC code (at boot-time, answer 32000 (or less) to "Memory Size?"):
100 POKE 16526,1:POKE 16527,125
110 FOR X%=32001 TO 32264:READ I%:POKE X%,I%:NEXT
120 DA%=0:X=VARPTR(DA%):POKE 32012,X AND 255:POKE 32013,X/256
130 MO%=0:X=VARPTR(MO%):POKE 32016,X AND 255:POKE 32017,X/256
140 INPUT "Year";YR
150 IF YR>32767 THEN YR=YR-65536
160 DM=USR(YR)
170 IF MO%=3 THEN MO$="March" ELSE MO$="April"
180 PRINT "Easter Sunday is " MO$ DA%
190 GOTO 140
200 REM Code to interface between BASIC and Computus
210 DATA 205,127,10,205,21,125,111,175,103,120,50,0
220 DATA 0,121,50,0,0,195,154,10
230 REM Computus, from computus.asm rev 1.32.
240 DATA 76,175,203,25,71,23,213,93,203,19,23,203
250 DATA 19,23,87,135,130,135,130,198,133,145,79,125
260 DATA 230,63,129,95,31,131,31,183,31,131,31,131
270 DATA 31,230,240,79,31,129,31,31,31,129,147,198
280 DATA 138,87,124,31,229,203,29,31,203,29,230,63
290 DATA 31,132,31,183,31,132,203,16,23,203,16,79
300 DATA 135,129,135,135,135,129,149,47,198,26,95,31
310 DATA 31,230,63,31,131,31,183,31,131,23,23,23
320 DATA 23,230,15,129,79,136,145,71,96,105,41,41
330 DATA 9,41,41,62,51,148,41,9,41,133,124,206
340 DATA 13,96,105,203,60,203,29,9,203,60,203,29
350 DATA 149,79,156,145,71,122,135,135,130,135,130,145
360 DATA 79,152,145,31,121,31,95,6,31,31,31,31
370 DATA 160,31,60,131,31,31,31,230,30,129,160,79
380 DATA 203,34,209,222,28,62,235,153,79,9,235,68
390 DATA 125,203,24,31,230,254,131,95,122,136,31,87
400 DATA 125,47,23,230,7,203,19,23,203,19,23,130
410 DATA 87,123,31,31,130,87,31,31,230,127,31,130
420 DATA 31,31,230,63,31,138,31,31,31,130,230,7
430 DATA 145,87,254,32,222,255,71,146,198,3,79,122
440 DATA 209,203,168,201
Fantastic, I miss the Z80 :-(
ReplyDelete