Documentation/tech/cpu/6502_cpu.txt

   1 #
   2 # $Id: 6502_cpu.txt,v 1.1 2002/05/21 00:42:27 xodnizel Exp $
   3 #
   4 # This file is part of Commodore 64 emulator
   5 #      and Program Development System.
   6 #
   7 # See README for copyright notice
   8 #
   9 # This file contains documentation for 6502/6510/8500/8502 instruction set.
  10 #
  11 #
  12 # Written by
  13 #   John West       (john@ucc.gu.uwa.edu.au)
  14 #   Marko M\8akel\8a    (msmakela@kruuna.helsinki.fi)
  15 #
  16 #
  17 # $Log: 6502_cpu.txt,v $
  18 # Revision 1.1  2002/05/21 00:42:27  xodnizel
  19 # updates
  20 #
  21 # Revision 1.8  1994/06/03  19:50:04  jopi
  22 # Patchlevel 2
  23 #
  24 # Revision 1.7  1994/04/15  13:07:04  jopi
  25 # 65xx Register descriptions added
  26 #
  27 # Revision 1.6  1994/02/18  16:09:36  jopi
  28 #
  29 # Revision 1.5  1994/01/26  16:08:37  jopi
  30 # X64 version 0.2 PL 1
  31 #
  32 # Revision 1.4  1993/11/10  01:55:34  jopi
  33 #
  34 # Revision 1.3  93/06/21  13:37:18  jopi
  35 #  X64 version 0.2 PL 0
  36 #
  37 # Revision 1.2  93/06/21  13:07:15  jopi
  38 # *** empty log message ***
  39 #
  40 #
  41
  42  Note: To extract the uuencoded ML programs in this article most
  43        easily you may use e.g. "uud" by Edwin Kremer ,
  44        which extracts them all at once.
  45
  46
  47 Documentation for the NMOS 65xx/85xx Instruction Set
  48
  49         6510 Instructions by Addressing Modes
  50         6502 Registers
  51         6510/8502 Undocumented Commands
  52         Register selection for load and store
  53         Decimal mode in NMOS 6500 series
  54         6510 features
  55         Different CPU types
  56         6510 Instruction Timing
  57         How Real Programmers Acknowledge Interrupts
  58         Memory Management
  59         Autostart Code
  60         Notes
  61         References
  62
  63
  64 6510 Instructions by Addressing Modes
  65
  66 off- ++++++++++ Positive ++++++++++  ---------- Negative ----------
  67 set  00      20      40      60      80      a0      c0      e0      mode
  68
  69 +00  BRK     JSR     RTI     RTS     NOP*    LDY     CPY     CPX     Impl/immed
  70 +01  ORA     AND     EOR     ADC     STA     LDA     CMP     SBC     (indir,x)
  71 +02   t       t       t       t      NOP*t   LDX     NOP*t   NOP*t     ? /immed
  72 +03  SLO*    RLA*    SRE*    RRA*    SAX*    LAX*    DCP*    ISB*    (indir,x)
  73 +04  NOP*    BIT     NOP*    NOP*    STY     LDY     CPY     CPX     Zeropage
  74 +05  ORA     AND     EOR     ADC     STA     LDA     CMP     SBC     Zeropage
  75 +06  ASL     ROL     LSR     ROR     STX     LDX     DEC     INC     Zeropage
  76 +07  SLO*    RLA*    SRE*    RRA*    SAX*    LAX*    DCP*    ISB*    Zeropage
  77
  78 +08  PHP     PLP     PHA     PLA     DEY     TAY     INY     INX     Implied
  79 +09  ORA     AND     EOR     ADC     NOP*    LDA     CMP     SBC     Immediate
  80 +0a  ASL     ROL     LSR     ROR     TXA     TAX     DEX     NOP     Accu/impl
  81 +0b  ANC**   ANC**   ASR**   ARR**   ANE**   LXA**   SBX**   SBC*    Immediate
  82 +0c  NOP*    BIT     JMP     JMP ()  STY     LDY     CPY     CPX     Absolute
  83 +0d  ORA     AND     EOR     ADC     STA     LDA     CMP     SBC     Absolute
  84 +0e  ASL     ROL     LSR     ROR     STX     LDX     DEC     INC     Absolute
  85 +0f  SLO*    RLA*    SRE*    RRA*    SAX*    LAX*    DCP*    ISB*    Absolute
  86
  87 +10  BPL     BMI     BVC     BVS     BCC     BCS     BNE     BEQ     Relative
  88 +11  ORA     AND     EOR     ADC     STA     LDA     CMP     SBC     (indir),y
  89 +12   t       t       t       t       t       t       t       t         ?
  90 +13  SLO*    RLA*    SRE*    RRA*    SHA**   LAX*    DCP*    ISB*    (indir),y
  91 +14  NOP*    NOP*    NOP*    NOP*    STY     LDY     NOP*    NOP*    Zeropage,x
  92 +15  ORA     AND     EOR     ADC     STA     LDA     CMP     SBC     Zeropage,x
  93 +16  ASL     ROL     LSR     ROR     STX  y) LDX  y) DEC     INC     Zeropage,x
  94 +17  SLO*    RLA*    SRE*    RRA*    SAX* y) LAX* y) DCP*    ISB*    Zeropage,x
  95
  96 +18  CLC     SEC     CLI     SEI     TYA     CLV     CLD     SED     Implied
  97 +19  ORA     AND     EOR     ADC     STA     LDA     CMP     SBC     Absolute,y
  98 +1a  NOP*    NOP*    NOP*    NOP*    TXS     TSX     NOP*    NOP*    Implied
  99 +1b  SLO*    RLA*    SRE*    RRA*    SHS**   LAS**   DCP*    ISB*    Absolute,y
 100 +1c  NOP*    NOP*    NOP*    NOP*    SHY**   LDY     NOP*    NOP*    Absolute,x
 101 +1d  ORA     AND     EOR     ADC     STA     LDA     CMP     SBC     Absolute,x
 102 +1e  ASL     ROL     LSR     ROR     SHX**y) LDX  y) DEC     INC     Absolute,x
 103 +1f  SLO*    RLA*    SRE*    RRA*    SHA**y) LAX* y) DCP*    ISB*    Absolute,x
 104
 105         ROR intruction is available on MC650x microprocessors after
 106         June, 1976.
 107
 108         Legend:
 109
 110         t       Jams the machine
 111         *t      Jams very rarely
 112         *       Undocumented command
 113         **      Unusual operation
 114         y)      indexed using Y instead of X
 115         ()      indirect instead of absolute
 116
 117 Note that the NOP instructions do have other addressing modes than the
 118 implied addressing. The NOP instruction is just like any other load
 119 instruction, except it does not store the result anywhere nor affects the
 120 flags.
 121
 122 6502 Registers
 123
 124 The NMOS 65xx processors are not ruined with too many registers. In addition
 125 to that, the registers are mostly 8-bit. Here is a brief description of each
 126 register:
 127
 128      PC Program Counter
 129           This register points the address from which the next instruction
 130           byte (opcode or parameter) will be fetched. Unlike other
 131           registers, this one is 16 bits in length. The low and high 8-bit
 132           halves of the register are called PCL and PCH, respectively. The
 133           Program Counter may be read by pushing its value on the stack.
 134           This can be done either by jumping to a subroutine or by causing
 135           an interrupt.
 136      S Stack pointer
 137           The NMOS 65xx processors have 256 bytes of stack memory, ranging
 138           from $0100 to $01FF. The S register is a 8-bit offset to the stack
 139           page. In other words, whenever anything is being pushed on the
 140           stack, it will be stored to the address $0100+S.
 141
 142           The Stack pointer can be read and written by transfering its value
 143           to or from the index register X (see below) with the TSX and TXS
 144           instructions.
 145      P Processor status
 146           This 8-bit register stores the state of the processor. The bits in
 147           this register are called flags. Most of the flags have something
 148           to do with arithmetic operations.
 149
 150           The P register can be read by pushing it on the stack (with PHP or
 151           by causing an interrupt). If you only need to read one flag, you
 152           can use the branch instructions. Setting the flags is possible by
 153           pulling the P register from stack or by using the flag set or
 154           clear instructions.
 155
 156           Following is a list of the flags, starting from the 8th bit of the
 157           P register (bit 7, value $80):
 158                N Negative flag
 159                     This flag will be set after any arithmetic operations
 160                     (when any of the registers A, X or Y is being loaded
 161                     with a value). Generally, the N flag will be copied from
 162                     the topmost bit of the register being loaded.
 163
 164                     Note that TXS (Transfer X to S) is not an arithmetic
 165                     operation. Also note that the BIT instruction affects
 166                     the Negative flag just like arithmetic operations.
 167                     Finally, the Negative flag behaves differently in
 168                     Decimal operations (see description below).
 169                V oVerflow flag
 170                     Like the Negative flag, this flag is intended to be used
 171                     with 8-bit signed integer numbers. The flag will be
 172                     affected by addition and subtraction, the instructions
 173                     PLP, CLV and BIT, and the hardware signal -SO. Note that
 174                     there is no SEV instruction, even though the MOS
 175                     engineers loved to use East European abbreviations, like
 176                     DDR (Deutsche Demokratische Republik vs. Data Direction
 177                     Register). (The Russian abbreviation for their former
 178                     trade association COMECON is SEV.) The -SO (Set
 179                     Overflow) signal is available on some processors, at
 180                     least the 6502, to set the V flag. This enables response
 181                     to an I/O activity in equal or less than three clock
 182                     cycles when using a BVC instruction branching to itself
 183                     ($50 $FE).
 184
 185                     The CLV instruction clears the V flag, and the PLP and
 186                     BIT instructions copy the flag value from the bit 6 of
 187                     the topmost stack entry or from memory.
 188
 189                     After a binary addition or subtraction, the V flag will
 190                     be set on a sign overflow, cleared otherwise. What is a
 191                     sign overflow? For instance, if you are trying to add
 192                     123 and 45 together, the result (168) does not fit in a
 193                     8-bit signed integer (upper limit 127 and lower limit
 194                     -128). Similarly, adding -123 to -45 causes the
 195                     overflow, just like subtracting -45 from 123 or 123 from
 196                     -45 would do.
 197
 198                     Like the N flag, the V flag will not be set as expected
 199                     in the Decimal mode. Later in this document is a precise
 200                     operation description.
 201
 202                     A common misbelief is that the V flag could only be set
 203                     by arithmetic operations, not cleared.
 204                1 unused flag
 205                     To the current knowledge, this flag is always 1.
 206                B Break flag
 207                     This flag is used to distinguish software (BRK)
 208                     interrupts from hardware interrupts (IRQ or NMI). The B
 209                     flag is always set except when the P register is being
 210                     pushed on stack when jumping to an interrupt routine to
 211                     process only a hardware interrupt.
 212
 213                     The official NMOS 65xx documentation claims that the BRK
 214                     instruction could only cause a jump to the IRQ vector
 215                     ($FFFE). However, if an NMI interrupt occurs while
 216                     executing a BRK instruction, the processor will jump to
 217                     the NMI vector ($FFFA), and the P register will be
 218                     pushed on the stack with the B flag set.
 219                D Decimal mode flag
 220                     This flag is used to select the (Binary Coded) Decimal
 221                     mode for addition and subtraction. In most applications,
 222                     the flag is zero.
 223
 224                     The Decimal mode has many oddities, and it operates
 225                     differently on CMOS processors. See the description of
 226                     the ADC, SBC and ARR instructions below.
 227                I Interrupt disable flag
 228                     This flag can be used to prevent the processor from
 229                     jumping to the IRQ handler vector ($FFFE) whenever the
 230                     hardware line -IRQ is active. The flag will be
 231                     automatically set after taking an interrupt, so that the
 232                     processor would not keep jumping to the interrupt
 233                     routine if the -IRQ signal remains low for several clock
 234                     cycles.
 235                Z Zero flag
 236                     The Zero flag will be affected in the same cases than
 237                     the Negative flag. Generally, it will be set if an
 238                     arithmetic register is being loaded with the value zero,
 239                     and cleared otherwise. The flag will behave differently
 240                     in Decimal operations.
 241                C Carry flag
 242                     This flag is used in additions, subtractions,
 243                     comparisons and bit rotations. In additions and
 244                     subtractions, it acts as a 9th bit and lets you to chain
 245                     operations to calculate with bigger than 8-bit numbers.
 246                     When subtracting, the Carry flag is the negative of
 247                     Borrow: if an overflow occurs, the flag will be clear,
 248                     otherwise set. Comparisons are a special case of
 249                     subtraction: they assume Carry flag set and Decimal flag
 250                     clear, and do not store the result of the subtraction
 251                     anywhere.
 252
 253                     There are four kinds of bit rotations. All of them store
 254                     the bit that is being rotated off to the Carry flag. The
 255                     left shifting instructions are ROL and ASL. ROL copies
 256                     the initial Carry flag to the lowmost bit of the byte;
 257                     ASL always clears it. Similarly, the ROR and LSR
 258                     instructions shift to the right.
 259      A Accumulator
 260           The accumulator is the main register for arithmetic and logic
 261           operations. Unlike the index registers X and Y, it has a direct
 262           connection to the Arithmetic and Logic Unit (ALU). This is why
 263           many operations are only available for the accumulator, not the
 264           index registers.
 265      X Index register X
 266           This is the main register for addressing data with indices. It has
 267           a special addressing mode, indexed indirect, which lets you to
 268           have a vector table on the zero page.
 269      Y Index register Y
 270           The Y register has the least operations available. On the other
 271           hand, only it has the indirect indexed addressing mode that
 272           enables access to any memory place without having to use
 273           self-modifying code.
 274
 275 6510/8502 Undocumented Commands
 276
 277 -- A brief explanation about what may happen while using don't care states.
 278
 279         ANE $8B         A = (A | #$EE) & X & #byte
 280                         same as
 281                         A = ((A & #$11 & X) | ( #$EE & X)) & #byte
 282
 283                         In real 6510/8502 the internal parameter #$11
 284                         may occasionally be #$10, #$01 or even #$00.
 285                         This occurs when the video chip starts DMA
 286                         between the opcode fetch and the parameter fetch
 287                         of the instruction.  The value probably depends
 288                         on the data that was left on the bus by the VIC-II.
 289
 290         LXA $AB         C=Lehti:   A = X = ANE
 291                         Alternate: A = X = (A & #byte)
 292
 293                         TXA and TAX have to be responsible for these.
 294
 295         SHA $93,$9F     Store (A & X & (ADDR_HI + 1))
 296         SHX $9E         Store (X & (ADDR_HI + 1))
 297         SHY $9C         Store (Y & (ADDR_HI + 1))
 298         SHS $9B         SHA and TXS, where X is replaced by (A & X).
 299
 300                         Note: The value to be stored is copied also
 301                         to ADDR_HI if page boundary is crossed.
 302
 303         SBX $CB         Carry and Decimal flags are ignored but the
 304                         Carry flag will be set in substraction. This
 305                         is due to the CMP command, which is executed
 306                         instead of the real SBC.
 307
 308         ARR $6B         This instruction first performs an AND
 309                         between the accumulator and the immediate
 310                         parameter, then it shifts the accumulator to
 311                         the right. However, this is not the whole
 312                         truth. See the description below.
 313
 314 Many undocumented commands do not use AND between registers, the CPU
 315 just throws the bytes to a bus simultaneously and lets the
 316 open-collector drivers perform the AND. I.e. the command called 'SAX',
 317 which is in the STORE section (opcodes $A0...$BF), stores the result
 318 of (A & X) by this way.
 319
 320 More fortunate is its opposite, 'LAX' which just loads a byte
 321 simultaneously into both A and X.
 322
 323         $6B  ARR
 324
 325 This instruction seems to be a harmless combination of AND and ROR at
 326 first sight, but it turns out that it affects the V flag and also has
 327 a special kind of decimal mode. This is because the instruction has
 328 inherited some properties of the ADC instruction ($69) in addition to
 329 the ROR ($6A).
 330
 331 In Binary mode (D flag clear), the instruction effectively does an AND
 332 between the accumulator and the immediate parameter, and then shifts
 333 the accumulator to the right, copying the C flag to the 8th bit. It
 334 sets the Negative and Zero flags just like the ROR would. The ADC code
 335 shows up in the Carry and oVerflow flags. The C flag will be copied
 336 from the bit 6 of the result (which doesn't seem too logical), and the
 337 V flag is the result of an Exclusive OR operation between the bit 6
 338 and the bit 5 of the result.  This makes sense, since the V flag will
 339 be normally set by an Exclusive OR, too.
 340
 341 In Decimal mode (D flag set), the ARR instruction first performs the
 342 AND and ROR, just like in Binary mode. The N flag will be copied from
 343 the initial C flag, and the Z flag will be set according to the ROR
 344 result, as expected. The V flag will be set if the bit 6 of the
 345 accumulator changed its state between the AND and the ROR, cleared
 346 otherwise.
 347
 348 Now comes the funny part. If the low nybble of the AND result,
 349 incremented by its lowmost bit, is greater than 5, the low nybble in
 350 the ROR result will be incremented by 6. The low nybble may overflow
 351 as a consequence of this BCD fixup, but the high nybble won't be
 352 adjusted. The high nybble will be BCD fixed in a similar way. If the
 353 high nybble of the AND result, incremented by its lowmost bit, is
 354 greater than 5, the high nybble in the ROR result will be incremented
 355 by 6, and the Carry flag will be set. Otherwise the C flag will be
 356 cleared.
 357
 358 To help you understand this description, here is a C routine that
 359 illustrates the ARR operation in Decimal mode:
 360
 361         unsigned
 362            A,  /* Accumulator */
 363            AL, /* low nybble of accumulator */
 364            AH, /* high nybble of accumulator */
 365
 366            C,  /* Carry flag */
 367            Z,  /* Zero flag */
 368            V,  /* oVerflow flag */
 369            N,  /* Negative flag */
 370
 371            t,  /* temporary value */
 372            s;  /* value to be ARRed with Accumulator */
 373
 374         t = A & s;                      /* Perform the AND. */
 375
 376         AH = t >> 4;                    /* Separate the high */
 377         AL = t & 15;                    /* and low nybbles. */
 378
 379         N = C;                          /* Set the N and */
 380         Z = !(A = (t >> 1) | (C << 7)); /* Z flags traditionally */
 381         V = (t ^ A) & 64;               /* and V flag in a weird way. */
 382
 383         if (AL + (AL & 1) > 5)          /* BCD "fixup" for low nybble. */
 384           A = (A & 0xF0) | ((A + 6) & 0xF);
 385
 386         if (C = AH + (AH & 1) > 5)      /* Set the Carry flag. */
 387           A = (A + 0x60) & 0xFF;        /* BCD "fixup" for high nybble. */
 388
 389         $CB  SBX   X <- (A & X) - Immediate
 390
 391 The 'SBX' ($CB) may seem to be very complex operation, even though it
 392 is a combination of the subtraction of accumulator and parameter, as
 393 in the 'CMP' instruction, and the command 'DEX'. As a result, both A
 394 and X are connected to ALU but only the subtraction takes place. Since
 395 the comparison logic was used, the result of subtraction should be
 396 normally ignored, but the 'DEX' now happily stores to X the value of
 397 (A & X) - Immediate.  That is why this instruction does not have any
 398 decimal mode, and it does not affect the V flag. Also Carry flag will
 399 be ignored in the subtraction but set according to the result.
 400
 401  Proof:
 402
 403 begin 644 vsbx
 404 M`0@9$,D'GL(H-#,IJC(U-JS"*#0T*:HR-@```*D`H#V1*Z`_D2N@09$KJ0>%
 405 M^QBE^VEZJ+$KH#F1*ZD`2"BI`*(`RP`(:-B@.5$K*4#P`E@`H#VQ*SAI`)$K
 406 JD-Z@/[$K:0"1*Y#4J2X@TO\XH$&Q*VD`D2N0Q,;[$+188/_^]_:_OK>V
 407 `
 408 end
 409
 410  and
 411
 412 begin 644 sbx
 413 M`0@9$,D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'BI`*!-D2N@3Y$KH%&1*ZD#
 414 MA?L8I?M*2)`#J1@LJ3B@29$K:$J0`ZGX+*G8R)$K&/BXJ?2B8\L)AOP(:(7]
 415 MV#B@3;$KH$\Q*Z!1\2L(1?SP`0!H1?TIM]#XH$VQ*SAI`)$KD,N@3[$K:0"1
 416 9*Y#!J2X@TO\XH%&Q*VD`D2N0L<;[$))88-#X
 417 `
 418 end
 419
 420 These test programs show if your machine is compatible with ours
 421 regarding the opcode $CB. The first test, vsbx, proves that SBX does
 422 not affect the V flag. The latter one, sbx, proves the rest of our
 423 theory. The vsbx test tests 33554432 SBX combinations (16777216
 424 different A, X and Immediate combinations, and two different V flag
 425 states), and the sbx test doubles that amount (16777216*4 D and C flag
 426 combinations). Both tests have run successfully on a C64 and a Vic20.
 427 They ought to run on C16, +4 and the PET series as well. The tests
 428 stop with BRK, if the opcode $CB does not work as expected. Successful
 429 operation ends in RTS. As the tests are very slow, they print dots on
 430 the screen while running so that you know that the machine has not
 431 jammed. On computers running at 1 MHz, the first test prints
 432 approximately one dot every four seconds and a total of 2048 dots,
 433 whereas the second one prints half that amount, one dot every seven
 434 seconds.
 435
 436 If the tests fail on your machine, please let us know your processor's
 437 part number and revision. If possible, save the executable (after it
 438 has stopped with BRK) under another name and send it to us so that we
 439 know at which stage the program stopped.
 440
 441 The following program is a Commodore 64 executable that Marko M"akel"a
 442 developed when trying to find out how the V flag is affected by SBX.
 443 (It was believed that the SBX affects the flag in a weird way, and
 444 this program shows how SBX sets the flag differently from SBC.)  You
 445 may find the subroutine at $C150 useful when researching other
 446 undocumented instructions' flags. Run the program in a machine
 447 language monitor, as it makes use of the BRK instruction. The result
 448 tables will be written on pages $C2 and $C3.
 449
 450 begin 644 sbx-c100
 451 M`,%XH`",#L&,$,&,$L&XJ8*B@LL7AOL(:(7\N#BM#L$M$,'M$L$(Q?OP`B@`
 452 M:$7\\`,@4,'N#L'0U.X0P=#/SB#0[A+!T,<``````````````)BJ\!>M#L$M
 453 L$,'=_\'0":T2P=W_PM`!8,K0Z:T.P2T0P9D`PID`!*T2P9D`PYD`!
 454
 455 Other undocumented instructions usually cause two preceding opcodes
 456 being executed. However 'NOP' seems to completely disappear from 'SBC'
 457 code $EB.
 458
 459 The most difficult to comprehend are the rest of the instructions
 460 located on the '$0B' line.
 461
 462 All the instructions located at the positive (left) side of this line
 463 should rotate either memory or the accumulator, but the addressing
 464 mode turns out to be immediate! No problem. Just read the operand, let
 465 it be ANDed with the accumulator and finally use accumulator
 466 addressing mode for the instructions above them.
 467
 468 RELIGION_MODE_ON
 469 /* This part of the document is not accurate.  You can
 470    read it as a fairy tale, but do not count on it when
 471    performing your own measurements. */
 472
 473 The rest two instructions on the same line, called 'ANE' and 'LXA'
 474 ($8B and $AB respectively) often give quite unpredictable results.
 475 However, the most usual operation is to store ((A | #$ee) & X & #$nn)
 476 to accumulator. Note that this does not work reliably in a real 64!
 477 In the Commodore 128 the opcode $8B uses values 8C, CC, EE, and
 478 occasionally 0C and 8E for the OR instead of EE,EF,FE and FF used in
 479 the C64. With a C128 running at 2 MHz #$EE is always used.  Opcode $AB
 480 does not cause this OR taking place on 8502 while 6510 always performs
 481 it. Note that this behaviour depends on processor and/or video chip
 482 revision.
 483
 484 Let's take a closer look at $8B (6510).
 485
 486         A <- X & D & (A | VAL)
 487
 488         where VAL comes from this table:
 489
 490        X high   D high  D low   VAL
 491         even     even    ---    $EE (1)
 492         even     odd     ---    $EE
 493         odd      even    ---    $EE
 494         odd      odd      0     $EE
 495         odd      odd     not 0  $FE (2)
 496
 497 (1) If the bottom 2 bits of A are both 1, then the LSB of the result may
 498     be 0. The values of X and D are different every time I run the test.
 499     This appears to be very rare.
 500 (2) VAL is $FE most of the time. Sometimes it is $EE - it seems to be random,
 501     not related to any of the data. This is much more common than (1).
 502
 503   In decimal mode, VAL is usually $FE.
 504
 505 Two different functions have been discovered for LAX, opcode $AB. One
 506 is A = X = ANE (see above) and the other, encountered with 6510 and
 507 8502, is less complicated A = X = (A & #byte). However, according to
 508 what is reported, the version altering only the lowest bits of each
 509 nybble seems to be more common.
 510
 511 What happens, is that $AB loads a value into both A and X, ANDing the
 512 low bit of each nybble with the corresponding bit of the old
 513 A. However, there are exceptions. Sometimes the low bit is cleared
 514 even when A contains a '1', and sometimes other bits are cleared. The
 515 exceptions seem random (they change every time I run the test). Oops -
 516 that was in decimal mode. Much the same with D=0.
 517
 518 What causes the randomness?  Probably it is that it is marginal logic
 519 levels - when too much wired-anding goes on, some of the signals get
 520 very close to the threshold. Perhaps we're seeing some of them step
 521 over it. The low bit of each nybble is special, since it has to cope
 522 with carry differently (remember decimal mode). We never see a '0'
 523 turn into a '1'.
 524
 525 Since these instructions are unpredictable, they should not be used.
 526
 527 There is still very strange instruction left, the one named SHA/X/Y,
 528 which is the only one with only indexed addressing modes. Actually,
 529 the commands 'SHA', 'SHX' and 'SHY' are generated by the indexing
 530 algorithm.
 531
 532 While using indexed addressing, effective address for page boundary
 533 crossing is calculated as soon as possible so it does not slow down
 534 operation. As a result, in the case of SHA/X/Y, the address and data
 535 are processed at the same time making AND between them to take place.
 536 Thus, the value to be stored by SAX, for example, is in fact (A & X &
 537 (ADDR_HI + 1)).  On page boundary crossing the same value is copied
 538 also to high byte of the effective address.
 539
 540 RELIGION_MODE_OFF
 541
 542
 543 Register selection for load and store
 544
 545    bit1 bit0     A  X  Y
 546     0    0             x
 547     0    1          x
 548     1    0       x
 549     1    1       x  x
 550
 551 So, A and X are selected by bits 1 and 0 respectively, while
 552  ~(bit1|bit0) enables Y.
 553
 554 Indexing is determined by bit4, even in relative addressing mode,
 555 which is one kind of indexing.
 556
 557 Lines containing opcodes xxx000x1 (01 and 03) are treated as absolute
 558 after the effective address has been loaded into CPU.
 559
 560 Zeropage,y and Absolute,y (codes 10x1 x11x) are distinquished by bit5.
 561
 562
 563 Decimal mode in NMOS 6500 series
 564
 565   Most sources claim that the NMOS 6500 series sets the N, V and Z
 566 flags unpredictably when performing addition or subtraction in decimal
 567 mode. Of course, this is not true. While testing how the flags are
 568 set, I also wanted to see what happens if you use illegal BCD values.
 569
 570   ADC works in Decimal mode in a quite complicated way. It is amazing
 571 how it can do that all in a single cycle. Here's a C code version of
 572 the instruction:
 573
 574         unsigned
 575            A,  /* Accumulator */
 576            AL, /* low nybble of accumulator */
 577            AH, /* high nybble of accumulator */
 578
 579            C,  /* Carry flag */
 580            Z,  /* Zero flag */
 581            V,  /* oVerflow flag */
 582            N,  /* Negative flag */
 583
 584            s;  /* value to be added to Accumulator */
 585
 586         AL = (A & 15) + (s & 15) + C;         /* Calculate the lower nybble. */
 587
 588         AH = (A >> 4) + (s >> 4) + (AL > 15); /* Calculate the upper nybble. */
 589
 590         if (AL > 9) AL += 6;                  /* BCD fixup for lower nybble. */
 591
 592         Z = ((A + s + C) & 255 != 0);         /* Zero flag is set just
 593                                                  like in Binary mode. */
 594
 595         /* Negative and Overflow flags are set with the same logic than in
 596            Binary mode, but after fixing the lower nybble. */
 597
 598         N = (AH & 8 != 0);
 599         V = ((AH << 4) ^ A) & 128 && !((A ^ s) & 128);
 600
 601         if (AH > 9) AH += 6;                  /* BCD fixup for upper nybble. */
 602
 603         /* Carry is the only flag set after fixing the result. */
 604
 605         C = (AH > 15);
 606         A = ((AH << 4) | (AL & 15)) & 255;
 607
 608   The C flag is set as the quiche eaters expect, but the N and V flags
 609 are set after fixing the lower nybble but before fixing the upper one.
 610 They use the same logic than binary mode ADC. The Z flag is set before
 611 any BCD fixup, so the D flag does not have any influence on it.
 612
 613 Proof: The following test program tests all 131072 ADC combinations in
 614        Decimal mode, and aborts with BRK if anything breaks this theory.
 615        If everything goes well, it ends in RTS.
 616
 617 begin 600 dadc
 618 M 0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@   'BI&*  A/N$_$B@+)$KH(V1
 619 M*Q@(I?PI#X7]I?LI#V7]R0J0 FD%J"D/A?VE^RGP9?PI\ C $) ":0^JL @H
 620 ML ?)H) &""@X:5\X!?V%_0AH*3W@ ! ""8"HBD7[$ JE^T7\, 28"4"H**7[
 621 M9?S0!)@) J@8N/BE^V7\V A%_= G:(3]1?W0(.;[T(?F_-"#:$D8\ )88*D=
 622 0&&4KA?NI &4LA?RI.&S[  A%
 623
 624 end
 625
 626   All programs in this chapter have been successfully tested on a Vic20
 627 and a Commodore 64 and a Commodore 128D in C64 mode. They should run on
 628 C16, +4 and on the PET series as well. If not, please report the problem
 629 to Marko M"akel"a. Each test in this chapter should run in less than a
 630 minute at 1 MHz.
 631
 632 SBC is much easier. Just like CMP, its flags are not affected by
 633 the D flag.
 634
 635 Proof:
 636
 637 begin 600 dsbc-cmp-flags
 638 M 0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@   'B@ (3[A/RB XH8:66HL2N@
 639 M09$KH$R1*XII::BQ*Z!%D2N@4)$K^#BXI?OE_-@(:(7].+BE^^7\"&A%_? !
 640 5 .;[T./F_-#?RA"_8!@X&#CEY<7%
 641
 642 end
 643
 644   The only difference in SBC's operation in decimal mode from binary mode
 645 is the result-fixup:
 646
 647         unsigned
 648            A,  /* Accumulator */
 649            AL, /* low nybble of accumulator */
 650            AH, /* high nybble of accumulator */
 651
 652            C,  /* Carry flag */
 653            Z,  /* Zero flag */
 654            V,  /* oVerflow flag */
 655            N,  /* Negative flag */
 656
 657            s;  /* value to be added to Accumulator */
 658
 659         AL = (A & 15) - (s & 15) - !C;        /* Calculate the lower nybble. */
 660
 661         if (AL & 16) AL -= 6;                 /* BCD fixup for lower nybble. */
 662
 663         AH = (A >> 4) - (s >> 4) - (AL & 16); /* Calculate the upper nybble. */
 664
 665         if (AH & 16) AH -= 6;                 /* BCD fixup for upper nybble. */
 666
 667         /* The flags are set just like in Binary mode. */
 668
 669         C = (A - s - !C) & 256 != 0;
 670         Z = (A - s - !C) & 255 != 0;
 671         V = ((A - s - !C) ^ s) & 128 && (A ^ s) & 128;
 672         N = (A - s - !C) & 128 != 0;
 673
 674         A = ((AH << 4) | (AL & 15)) & 255;
 675
 676   Again Z flag is set before any BCD fixup. The N and V flags are set
 677 at any time before fixing the high nybble. The C flag may be set in any
 678 phase.
 679
 680   Decimal subtraction is easier than decimal addition, as you have to
 681 make the BCD fixup only when a nybble overflows. In decimal addition,
 682 you had to verify if the nybble was greater than 9. The processor has
 683 an internal "half carry" flag for the lower nybble, used to trigger
 684 the BCD fixup. When calculating with legal BCD values, the lower nybble
 685 cannot overflow again when fixing it.
 686 So, the processor does not handle overflows while performing the fixup.
 687 Similarly, the BCD fixup occurs in the high nybble only if the value
 688 overflows, i.e. when the C flag will be cleared.
 689
 690   Because SBC's flags are not affected by the Decimal mode flag, you
 691 could guess that CMP uses the SBC logic, only setting the C flag
 692 first. But the SBX instruction shows that CMP also temporarily clears
 693 the D flag, although it is totally unnecessary.
 694
 695   The following program, which tests SBC's result and flags,
 696 contains the 6502 version of the pseudo code example above.
 697
 698 begin 600 dsbc
 699 M 0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@   'BI&*  A/N$_$B@+)$KH':1
 700 M*S@(I?PI#X7]I?LI#^7]L /I!1@I#ZBE_"GPA?VE^RGP"#CE_2GPL KI7RBP
 701 M#ND/.+ )*+ &Z0^P NE?A/T%_87]*+BE^^7\"&BH.+CXI?OE_-@(1?W0FVB$
 702 8_47]T)3F^]">YOS0FFA)&- $J3C0B%A@
 703
 704 end
 705
 706   Obviously the undocumented instructions RRA (ROR+ADC) and ISB
 707 (INC+SBC) have inherited also the decimal operation from the official
 708 instructions ADC and SBC. The program droradc proves this statement
 709 for ROR, and the dincsbc test proves this for ISB. Finally,
 710 dincsbc-deccmp proves that ISB's and DCP's (DEC+CMP) flags are not
 711 affected by the D flag.
 712
 713 begin 644 droradc
 714 M`0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'BI&*``A/N$_$B@+)$KH(V1
 715 M*S@(I?PI#X7]I?LI#V7]R0J0`FD%J"D/A?VE^RGP9?PI\`C`$)`":0^JL`@H
 716 ML`?)H)`&""@X:5\X!?V%_0AH*3W@`!`""8"HBD7[$`JE^T7\,`28"4"H**7[
 717 M9?S0!)@)`J@XN/BE^R;\9_S8"$7]T"=HA/U%_=`@YOO0A>;\T(%H21CP`EA@
 718 2J1T892N%^ZD`92R%_*DX;/L`
 719 `
 720 end
 721
 722 begin 644 dincsbc
 723 M`0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'BI&*``A/N$_$B@+)$KH':1
 724 M*S@(I?PI#X7]I?LI#^7]L`/I!1@I#ZBE_"GPA?VE^RGP"#CE_2GPL`KI7RBP
 725 M#ND/.+`)*+`&Z0^P`NE?A/T%_87]*+BE^^7\"&BH.+CXI?O&_.?\V`A%_="9
 726 ::(3]1?W0DN;[T)SF_-"8:$D8T`2I.-"&6&#\
 727 `
 728 end
 729
 730 begin 644 dincsbc-deccmp
 731 M`0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'B@`(3[A/RB`XH8:7>HL2N@
 732 M3Y$KH%R1*XII>ZBQ*Z!3D2N@8)$KBFE_J+$KH%61*Z!BD2OX.+BE^^;\Q_S8
 733 L"&B%_3BXI?OF_,?\"&A%_?`!`.;[T-_F_-#;RA"M8!@X&#CFYL;&Q\?GYP#8
 734 `
 735 end
 736
 737
 738 6510 features
 739
 740    o  PHP always pushes the Break (B) flag as a `1' to the stack.
 741       Jukka Tapanim"aki claimed in C=lehti issue 3/89, on page 27 that the
 742       processor makes a logical OR between the status register's bit 4
 743       and the bit 8 of the stack pointer register (which is always 1).
 744       He did not give any reasons for this argument, and has refused to clarify
 745       it afterwards. Well, this was not the only error in his article...
 746
 747    o  Indirect addressing modes do not handle page boundary crossing at all.
 748       When the parameter's low byte is $FF, the effective address wraps
 749       around and the CPU fetches high byte from $xx00 instead of $xx00+$0100.
 750       E.g. JMP ($01FF) fetches PCL from $01FF and PCH from $0100,
 751       and LDA ($FF),Y fetches the base address from $FF and $00.
 752
 753    o  Indexed zero page addressing modes never fix the page address on
 754       crossing the zero page boundary.
 755       E.g. LDX #$01 : LDA ($FF,X) loads the effective address from $00 and $01.
 756
 757    o  The processor always fetches the byte following a relative branch
 758       instruction. If the branch is taken, the processor reads then the
 759       opcode from the destination address. If page boundary is crossed, it
 760       first reads a byte from the old page from a location that is bigger
 761       or smaller than the correct address by one page.
 762
 763    o  If you cross a page boundary in any other indexed mode,
 764       the processor reads an incorrect location first, a location that is
 765       smaller by one page.
 766
 767    o  Read-Modify-Write instructions write unmodified data, then modified
 768       (so INC effectively does LDX loc;STX loc;INX;STX loc)
 769
 770    o  -RDY is ignored during writes
 771       (This is why you must wait 3 cycles before doing any DMA --
 772       the maximum number of consecutive writes is 3, which occurs
 773       during interrupts except -RESET.)
 774
 775    o  Some undefined opcodes may give really unpredictable results.
 776
 777    o  All registers except the Program Counter remain unmodified after -RESET.
 778       (This is why you must preset D and I flags in the RESET handler.)
 779
 780
 781 Different CPU types
 782
 783 The Rockwell data booklet 29651N52 (technical information about R65C00
 784 microprocessors, dated October 1984), lists the following differences between
 785 NMOS R6502 microprocessor and CMOS R65C00 family:
 786
 787
 788      1. Indexed addressing across page boundary.
 789              NMOS: Extra read of invalid address.
 790              CMOS: Extra read of last instruction byte.
 791
 792
 793      2. Execution of invalid op codes.
 794              NMOS: Some terminate only by reset. Results are undefined.
 795              CMOS: All are NOPs (reserved for future use).
 796
 797
 798      3. Jump indirect, operand = XXFF.
 799              NMOS: Page address does not increment.
 800              CMOS: Page address increments and adds one additional cycle.
 801
 802
 803      4. Read/modify/write instructions at effective address.
 804              NMOS: One read and two write cycles.
 805              CMOS: Two read and one write cycle.
 806
 807
 808      5. Decimal flag.
 809              NMOS: Indeterminate after reset.
 810              CMOS: Initialized to binary mode (D=0) after reset and interrupts.
 811
 812
 813      6. Flags after decimal operation.
 814              NMOS: Invalid N, V and Z flags.
 815              CMOS: Valid flag adds one additional cycle.
 816
 817
 818      7. Interrupt after fetch of BRK instruction.
 819              NMOS: Interrupt vector is loaded, BRK vector is ignored.
 820              CMOS: BRK is executed, then interrupt is executed.
 821
 822
 823 6510 Instruction Timing
 824
 825   The NMOS 6500 series processors always perform at least two reads
 826 for each instruction. In addition to the operation code (opcode), they
 827 fetch the next byte. This is quite efficient, as most instructions are
 828 two or three bytes long.
 829
 830   The processors also use a sort of pipelining. If an instruction does
 831 not store data in memory on its last cycle, the processor can fetch
 832 the opcode of the next instruction while executing the last cycle. For
 833 instance, the instruction EOR #$FF truly takes three cycles. On the
 834 first cycle, the opcode $49 will be fetched. During the second cycle
 835 the processor decodes the opcode and fetches the parameter #$FF. On
 836 the third cycle, the processor will perform the operation and store
 837 the result to accumulator, but simultaneously it fetches the opcode
 838 for the next instruction. This is why the instruction effectively
 839 takes only two cycles.
 840
 841   The following tables show what happens on the bus while executing
 842 different kinds of instructions.
 843
 844   Interrupts
 845
 846      NMI and IRQ both take 7 cycles. Their timing diagram is much like
 847      BRK's (see below). IRQ will be executed only when the I flag is
 848      clear. IRQ and BRK both set the I flag, whereas the NMI does not
 849      affect its state.
 850
 851      The processor will usually wait for the current instruction to
 852      complete before executing the interrupt sequence. To process the
 853      interrupt before the next instruction, the interrupt must occur
 854      before the last cycle of the current instruction.
 855
 856      There is one exception to this rule: the BRK instruction. If a
 857      hardware interrupt (NMI or IRQ) occurs before the fourth (flags
 858      saving) cycle of BRK, the BRK instruction will be skipped, and
 859      the processor will jump to the hardware interrupt vector. This
 860      sequence will always take 7 cycles.
 861
 862      You do not completely lose the BRK interrupt, the B flag will be
 863      set in the pushed status register if a BRK instruction gets
 864      interrupted. When BRK and IRQ occur at the same time, this does
 865      not cause any problems, as your program will consider it as a
 866      BRK, and the IRQ would occur again after the processor returned
 867      from your BRK routine, unless you cleared the interrupt source in
 868      your BRK handler. But the simultaneous occurrence of NMI and BRK
 869      is far more fatal. If you do not check the B flag in the NMI
 870      routine and subtract two from the return address when needed, the
 871      BRK instruction will be skipped.
 872
 873      If the NMI and IRQ interrupts overlap each other (one interrupt
 874      occurs before fetching the interrupt vector for the other
 875      interrupt), the processor will most probably jump to the NMI
 876      vector in every case, and then jump to the IRQ vector after
 877      processing the first instruction of the NMI handler. This has not
 878      been measured yet, but the IRQ is very similar to BRK, and many
 879      sources state that the NMI has higher priority than IRQ. However,
 880      it might be that the processor takes the interrupt that comes
 881      later, i.e. you could lose an NMI interrupt if an IRQ occurred in
 882      four cycles after it.
 883
 884      After finishing the interrupt sequence, the processor will start
 885      to execute the first instruction of the interrupt routine. This
 886      proves that the processor uses a sort of pipelining: it finishes
 887      the current instruction (or interrupt sequence) while reading the
 888      opcode of the next instruction.
 889
 890      RESET does not push program counter on stack, and it lasts
 891      probably 6 cycles after deactivating the signal. Like NMI, RESET
 892      preserves all registers except PC.
 893
 894   Instructions accessing the stack
 895
 896      BRK
 897
 898         #  address R/W description
 899        --- ------- --- -----------------------------------------------
 900         1    PC     R  fetch opcode, increment PC
 901         2    PC     R  read next instruction byte (and throw it away),
 902                        increment PC
 903         3  $0100,S  W  push PCH on stack (with B flag set), decrement S
 904         4  $0100,S  W  push PCL on stack, decrement S
 905         5  $0100,S  W  push P on stack, decrement S
 906         6   $FFFE   R  fetch PCL
 907         7   $FFFF   R  fetch PCH
 908
 909      RTI
 910
 911         #  address R/W description
 912        --- ------- --- -----------------------------------------------
 913         1    PC     R  fetch opcode, increment PC
 914         2    PC     R  read next instruction byte (and throw it away)
 915         3  $0100,S  R  increment S
 916         4  $0100,S  R  pull P from stack, increment S
 917         5  $0100,S  R  pull PCL from stack, increment S
 918         6  $0100,S  R  pull PCH from stack
 919
 920      RTS
 921
 922         #  address R/W description
 923        --- ------- --- -----------------------------------------------
 924         1    PC     R  fetch opcode, increment PC
 925         2    PC     R  read next instruction byte (and throw it away)
 926         3  $0100,S  R  increment S
 927         4  $0100,S  R  pull PCL from stack, increment S
 928         5  $0100,S  R  pull PCH from stack
 929         6    PC     R  increment PC
 930
 931      PHA, PHP
 932
 933         #  address R/W description
 934        --- ------- --- -----------------------------------------------
 935         1    PC     R  fetch opcode, increment PC
 936         2    PC     R  read next instruction byte (and throw it away)
 937         3  $0100,S  W  push register on stack, decrement S
 938
 939      PLA, PLP
 940
 941         #  address R/W description
 942        --- ------- --- -----------------------------------------------
 943         1    PC     R  fetch opcode, increment PC
 944         2    PC     R  read next instruction byte (and throw it away)
 945         3  $0100,S  R  increment S
 946         4  $0100,S  R  pull register from stack
 947
 948      JSR
 949
 950         #  address R/W description
 951        --- ------- --- -------------------------------------------------
 952         1    PC     R  fetch opcode, increment PC
 953         2    PC     R  fetch low address byte, increment PC
 954         3  $0100,S  R  internal operation (predecrement S?)
 955         4  $0100,S  W  push PCH on stack, decrement S
 956         5  $0100,S  W  push PCL on stack, decrement S
 957         6    PC     R  copy low address byte to PCL, fetch high address
 958                        byte to PCH
 959
 960   Accumulator or implied addressing
 961
 962         #  address R/W description
 963        --- ------- --- -----------------------------------------------
 964         1    PC     R  fetch opcode, increment PC
 965         2    PC     R  read next instruction byte (and throw it away)
 966
 967   Immediate addressing
 968
 969         #  address R/W description
 970        --- ------- --- ------------------------------------------
 971         1    PC     R  fetch opcode, increment PC
 972         2    PC     R  fetch value, increment PC
 973
 974   Absolute addressing
 975
 976      JMP
 977
 978         #  address R/W description
 979        --- ------- --- -------------------------------------------------
 980         1    PC     R  fetch opcode, increment PC
 981         2    PC     R  fetch low address byte, increment PC
 982         3    PC     R  copy low address byte to PCL, fetch high address
 983                        byte to PCH
 984
 985      Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
 986                         LAX, NOP)
 987
 988         #  address R/W description
 989        --- ------- --- ------------------------------------------
 990         1    PC     R  fetch opcode, increment PC
 991         2    PC     R  fetch low byte of address, increment PC
 992         3    PC     R  fetch high byte of address, increment PC
 993         4  address  R  read from effective address
 994
 995      Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
 996                                      SLO, SRE, RLA, RRA, ISB, DCP)
 997
 998         #  address R/W description
 999        --- ------- --- ------------------------------------------
1000         1    PC     R  fetch opcode, increment PC
1001         2    PC     R  fetch low byte of address, increment PC
1002         3    PC     R  fetch high byte of address, increment PC
1003         4  address  R  read from effective address
1004         5  address  W  write the value back to effective address,
1005                        and do the operation on it
1006         6  address  W  write the new value to effective address
1007
1008      Write instructions (STA, STX, STY, SAX)
1009
1010         #  address R/W description
1011        --- ------- --- ------------------------------------------
1012         1    PC     R  fetch opcode, increment PC
1013         2    PC     R  fetch low byte of address, increment PC
1014         3    PC     R  fetch high byte of address, increment PC
1015         4  address  W  write register to effective address
1016
1017   Zero page addressing
1018
1019      Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
1020                         LAX, NOP)
1021
1022         #  address R/W description
1023        --- ------- --- ------------------------------------------
1024         1    PC     R  fetch opcode, increment PC
1025         2    PC     R  fetch address, increment PC
1026         3  address  R  read from effective address
1027
1028      Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
1029                                      SLO, SRE, RLA, RRA, ISB, DCP)
1030
1031         #  address R/W description
1032        --- ------- --- ------------------------------------------
1033         1    PC     R  fetch opcode, increment PC
1034         2    PC     R  fetch address, increment PC
1035         3  address  R  read from effective address
1036         4  address  W  write the value back to effective address,
1037                        and do the operation on it
1038         5  address  W  write the new value to effective address
1039
1040      Write instructions (STA, STX, STY, SAX)
1041
1042         #  address R/W description
1043        --- ------- --- ------------------------------------------
1044         1    PC     R  fetch opcode, increment PC
1045         2    PC     R  fetch address, increment PC
1046         3  address  W  write register to effective address
1047
1048   Zero page indexed addressing
1049
1050      Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
1051                         LAX, NOP)
1052
1053         #   address  R/W description
1054        --- --------- --- ------------------------------------------
1055         1     PC      R  fetch opcode, increment PC
1056         2     PC      R  fetch address, increment PC
1057         3   address   R  read from address, add index register to it
1058         4  address+I* R  read from effective address
1059
1060        Notes: I denotes either index register (X or Y).
1061
1062               * The high byte of the effective address is always zero,
1063                 i.e. page boundary crossings are not handled.
1064
1065      Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
1066                                      SLO, SRE, RLA, RRA, ISB, DCP)
1067
1068         #   address  R/W description
1069        --- --------- --- ---------------------------------------------
1070         1     PC      R  fetch opcode, increment PC
1071         2     PC      R  fetch address, increment PC
1072         3   address   R  read from address, add index register X to it
1073         4  address+X* R  read from effective address
1074         5  address+X* W  write the value back to effective address,
1075                          and do the operation on it
1076         6  address+X* W  write the new value to effective address
1077
1078        Note: * The high byte of the effective address is always zero,
1079                i.e. page boundary crossings are not handled.
1080
1081      Write instructions (STA, STX, STY, SAX)
1082
1083         #   address  R/W description
1084        --- --------- --- -------------------------------------------
1085         1     PC      R  fetch opcode, increment PC
1086         2     PC      R  fetch address, increment PC
1087         3   address   R  read from address, add index register to it
1088         4  address+I* W  write to effective address
1089
1090        Notes: I denotes either index register (X or Y).
1091
1092               * The high byte of the effective address is always zero,
1093                 i.e. page boundary crossings are not handled.
1094
1095   Absolute indexed addressing
1096
1097      Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
1098                         LAX, LAE, SHS, NOP)
1099
1100         #   address  R/W description
1101        --- --------- --- ------------------------------------------
1102         1     PC      R  fetch opcode, increment PC
1103         2     PC      R  fetch low byte of address, increment PC
1104         3     PC      R  fetch high byte of address,
1105                          add index register to low address byte,
1106                          increment PC
1107         4  address+I* R  read from effective address,
1108                          fix the high byte of effective address
1109         5+ address+I  R  re-read from effective address
1110
1111        Notes: I denotes either index register (X or Y).
1112
1113               * The high byte of the effective address may be invalid
1114                 at this time, i.e. it may be smaller by $100.
1115
1116               + This cycle will be executed only if the effective address
1117                 was invalid during cycle #4, i.e. page boundary was crossed.
1118
1119      Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
1120                                      SLO, SRE, RLA, RRA, ISB, DCP)
1121
1122         #   address  R/W description
1123        --- --------- --- ------------------------------------------
1124         1    PC       R  fetch opcode, increment PC
1125         2    PC       R  fetch low byte of address, increment PC
1126         3    PC       R  fetch high byte of address,
1127                          add index register X to low address byte,
1128                          increment PC
1129         4  address+X* R  read from effective address,
1130                          fix the high byte of effective address
1131         5  address+X  R  re-read from effective address
1132         6  address+X  W  write the value back to effective address,
1133                          and do the operation on it
1134         7  address+X  W  write the new value to effective address
1135
1136        Notes: * The high byte of the effective address may be invalid
1137                 at this time, i.e. it may be smaller by $100.
1138
1139      Write instructions (STA, STX, STY, SHA, SHX, SHY)
1140
1141         #   address  R/W description
1142        --- --------- --- ------------------------------------------
1143         1     PC      R  fetch opcode, increment PC
1144         2     PC      R  fetch low byte of address, increment PC
1145         3     PC      R  fetch high byte of address,
1146                          add index register to low address byte,
1147                          increment PC
1148         4  address+I* R  read from effective address,
1149                          fix the high byte of effective address
1150         5  address+I  W  write to effective address
1151
1152        Notes: I denotes either index register (X or Y).
1153
1154               * The high byte of the effective address may be invalid
1155                 at this time, i.e. it may be smaller by $100. Because
1156                 the processor cannot undo a write to an invalid
1157                 address, it always reads from the address first.
1158
1159   Relative addressing (BCC, BCS, BNE, BEQ, BPL, BMI, BVC, BVS)
1160
1161         #   address  R/W description
1162        --- --------- --- ---------------------------------------------
1163         1     PC      R  fetch opcode, increment PC
1164         2     PC      R  fetch operand, increment PC
1165         3     PC      R  Fetch opcode of next instruction,
1166                          If branch is taken, add operand to PCL.
1167                          Otherwise increment PC.
1168         4+    PC*     R  Fetch opcode of next instruction.
1169                          Fix PCH. If it did not change, increment PC.
1170         5!    PC      R  Fetch opcode of next instruction,
1171                          increment PC.
1172
1173        Notes: The opcode fetch of the next instruction is included to
1174               this diagram for illustration purposes. When determining
1175               real execution times, remember to subtract the last
1176               cycle.
1177
1178               * The high byte of Program Counter (PCH) may be invalid
1179                 at this time, i.e. it may be smaller or bigger by $100.
1180
1181               + If branch is taken, this cycle will be executed.
1182
1183               ! If branch occurs to different page, this cycle will be
1184                 executed.
1185
1186   Indexed indirect addressing
1187
1188      Read instructions (LDA, ORA, EOR, AND, ADC, CMP, SBC, LAX)
1189
1190         #    address   R/W description
1191        --- ----------- --- ------------------------------------------
1192         1      PC       R  fetch opcode, increment PC
1193         2      PC       R  fetch pointer address, increment PC
1194         3    pointer    R  read from the address, add X to it
1195         4   pointer+X   R  fetch effective address low
1196         5  pointer+X+1  R  fetch effective address high
1197         6    address    R  read from effective address
1198
1199        Note: The effective address is always fetched from zero page,
1200              i.e. the zero page boundary crossing is not handled.
1201
1202      Read-Modify-Write instructions (SLO, SRE, RLA, RRA, ISB, DCP)
1203
1204         #    address   R/W description
1205        --- ----------- --- ------------------------------------------
1206         1      PC       R  fetch opcode, increment PC
1207         2      PC       R  fetch pointer address, increment PC
1208         3    pointer    R  read from the address, add X to it
1209         4   pointer+X   R  fetch effective address low
1210         5  pointer+X+1  R  fetch effective address high
1211         6    address    R  read from effective address
1212         7    address    W  write the value back to effective address,
1213                            and do the operation on it
1214         8    address    W  write the new value to effective address
1215
1216        Note: The effective address is always fetched from zero page,
1217              i.e. the zero page boundary crossing is not handled.
1218
1219      Write instructions (STA, SAX)
1220
1221         #    address   R/W description
1222        --- ----------- --- ------------------------------------------
1223         1      PC       R  fetch opcode, increment PC
1224         2      PC       R  fetch pointer address, increment PC
1225         3    pointer    R  read from the address, add X to it
1226         4   pointer+X   R  fetch effective address low
1227         5  pointer+X+1  R  fetch effective address high
1228         6    address    W  write to effective address
1229
1230        Note: The effective address is always fetched from zero page,
1231              i.e. the zero page boundary crossing is not handled.
1232
1233   Indirect indexed addressing
1234
1235      Read instructions (LDA, EOR, AND, ORA, ADC, SBC, CMP)
1236
1237         #    address   R/W description
1238        --- ----------- --- ------------------------------------------
1239         1      PC       R  fetch opcode, increment PC
1240         2      PC       R  fetch pointer address, increment PC
1241         3    pointer    R  fetch effective address low
1242         4   pointer+1   R  fetch effective address high,
1243                            add Y to low byte of effective address
1244         5   address+Y*  R  read from effective address,
1245                            fix high byte of effective address
1246         6+  address+Y   R  read from effective address
1247
1248        Notes: The effective address is always fetched from zero page,
1249               i.e. the zero page boundary crossing is not handled.
1250
1251               * The high byte of the effective address may be invalid
1252                 at this time, i.e. it may be smaller by $100.
1253
1254               + This cycle will be executed only if the effective address
1255                 was invalid during cycle #5, i.e. page boundary was crossed.
1256
1257      Read-Modify-Write instructions (SLO, SRE, RLA, RRA, ISB, DCP)
1258
1259         #    address   R/W description
1260        --- ----------- --- ------------------------------------------
1261         1      PC       R  fetch opcode, increment PC
1262         2      PC       R  fetch pointer address, increment PC
1263         3    pointer    R  fetch effective address low
1264         4   pointer+1   R  fetch effective address high,
1265                            add Y to low byte of effective address
1266         5   address+Y*  R  read from effective address,
1267                            fix high byte of effective address
1268         6   address+Y   R  read from effective address
1269         7   address+Y   W  write the value back to effective address,
1270                            and do the operation on it
1271         8   address+Y   W  write the new value to effective address
1272
1273        Notes: The effective address is always fetched from zero page,
1274               i.e. the zero page boundary crossing is not handled.
1275
1276               * The high byte of the effective address may be invalid
1277                 at this time, i.e. it may be smaller by $100.
1278
1279      Write instructions (STA, SHA)
1280
1281         #    address   R/W description
1282        --- ----------- --- ------------------------------------------
1283         1      PC       R  fetch opcode, increment PC
1284         2      PC       R  fetch pointer address, increment PC
1285         3    pointer    R  fetch effective address low
1286         4   pointer+1   R  fetch effective address high,
1287                            add Y to low byte of effective address
1288         5   address+Y*  R  read from effective address,
1289                            fix high byte of effective address
1290         6   address+Y   W  write to effective address
1291
1292        Notes: The effective address is always fetched from zero page,
1293               i.e. the zero page boundary crossing is not handled.
1294
1295               * The high byte of the effective address may be invalid
1296                 at this time, i.e. it may be smaller by $100.
1297
1298   Absolute indirect addressing (JMP)
1299
1300         #   address  R/W description
1301        --- --------- --- ------------------------------------------
1302         1     PC      R  fetch opcode, increment PC
1303         2     PC      R  fetch pointer address low, increment PC
1304         3     PC      R  fetch pointer address high, increment PC
1305         4   pointer   R  fetch low address to latch
1306         5  pointer+1* R  fetch PCH, copy latch to PCL
1307
1308        Note: * The PCH will always be fetched from the same page
1309                than PCL, i.e. page boundary crossing is not handled.
1310
1311                 How Real Programmers Acknowledge Interrupts
1312
1313   With RMW instructions:
1314
1315         ; beginning of combined raster/timer interrupt routine
1316         LSR $D019       ; clear VIC interrupts, read raster interrupt flag to C
1317         BCS raster      ; jump if VIC caused an interrupt
1318         ...             ; timer interrupt routine
1319
1320         Operational diagram of LSR $D019:
1321
1322           #  data  address  R/W
1323          --- ----  -------  ---  ---------------------------------
1324           1   4E     PC      R   fetch opcode
1325           2   19    PC+1     R   fetch address low
1326           3   D0    PC+2     R   fetch address high
1327           4   xx    $D019    R   read memory
1328           5   xx    $D019    W   write the value back, rotate right
1329           6  xx/2   $D019    W   write the new value back
1330
1331         The 5th cycle acknowledges the interrupt by writing the same
1332         value back. If only raster interrupts are used, the 6th cycle
1333         has no effect on the VIC. (It might acknowledge also some
1334         other interrupts.)
1335
1336   With indexed addressing:
1337
1338         ; acknowledge interrupts to both CIAs
1339         LDX #$10
1340         LDA $DCFD,X
1341
1342         Operational diagram of LDA $DCFD,X:
1343
1344           #  data  address  R/W  description
1345          --- ----  -------  ---  ---------------------------------
1346           1   BD     PC      R   fetch opcode
1347           2   FD    PC+1     R   fetch address low
1348           3   DC    PC+2     R   fetch address high, add X to address low
1349           4   xx    $DC0D    R   read from address, fix high byte of address
1350           5   yy    $DD0D    R   read from right address
1351
1352         ; acknowledge interrupts to CIA 2
1353         LDX #$10
1354         STA $DDFD,X
1355
1356         Operational diagram of STA $DDFD,X:
1357
1358           #  data  address  R/W  description
1359          --- ----  -------  ---  ---------------------------------
1360           1   9D     PC      R   fetch opcode
1361           2   FD    PC+1     R   fetch address low
1362           3   DC    PC+2     R   fetch address high, add X to address low
1363           4   xx    $DD0D    R   read from address, fix high byte of address
1364           5   ac    $DE0D    W   write to right address
1365
1366   With branch instructions:
1367
1368         ; acknowledge interrupts to CIA 2
1369                 LDA #$00  ; clear N flag
1370                 JMP $DD0A
1371         DD0A    BPL $DC9D ; branch
1372         DC9D    BRK       ; return
1373
1374         You need the following preparations to initialize the CIA registers:
1375
1376                 LDA #$91  ; argument of BPL
1377                 STA $DD0B
1378                 LDA #$10  ; BPL
1379                 STA $DD0A
1380                 STA $DD08 ; load the ToD values from the latches
1381                 LDA $DD0B ; freeze the ToD display
1382                 LDA #$7F
1383                 STA $DC0D ; assure that $DC0D is $00
1384
1385         Operational diagram of BPL $DC9D:
1386
1387           #  data  address  R/W  description
1388          --- ----  -------  ---  ---------------------------------
1389           1   10    $DD0A    R   fetch opcode
1390           2   91    $DD0B    R   fetch argument
1391           3   xx    $DD0C    R   fetch opcode, add argument to PCL
1392           4   yy    $DD9D    R   fetch opcode, fix PCH
1393         ( 5   00    $DC9D    R   fetch opcode )
1394
1395         ; acknowledge interrupts to CIA 1
1396                 LSR       ; clear N flag
1397                 JMP $DCFA
1398         DCFA    BPL $DD0D
1399         DD0D    BRK
1400
1401         ; Again you need to set the ToD registers of CIA 1 and the
1402         ; Interrupt Control Register of CIA 2 first.
1403
1404         Operational diagram of BPL $DD0D:
1405
1406           #  data  address  R/W  description
1407          --- ----  -------  ---  ---------------------------------
1408           1   10    $DCFA    R   fetch opcode
1409           2   11    $DCFB    R   fetch argument
1410           3   xx    $DCFC    R   fetch opcode, add argument to PCL
1411           4   yy    $DC0D    R   fetch opcode, fix PCH
1412         ( 5   00    $DD0D    R   fetch opcode )
1413
1414         ; acknowledge interrupts to CIA 2 automagically
1415                 ; preparations
1416                 LDA #$7F
1417                 STA $DD0D       ; disable all interrupt sources of CIA2
1418                 LDA $DD0E
1419                 AND #$BE        ; ensure that $DD0C remains constant
1420                 STA $DD0E       ; and stop the timer
1421                 LDA #$FD
1422                 STA $DD0C       ; parameter of BPL
1423                 LDA #$10
1424                 STA $DD0B       ; BPL
1425                 LDA #$40
1426                 STA $DD0A       ; RTI/parameter of LSR
1427                 LDA #$46
1428                 STA $DD09       ; LSR
1429                 STA $DD08       ; load the ToD values from the latches
1430                 LDA $DD0B       ; freeze the ToD display
1431                 LDA #$09
1432                 STA $0318
1433                 LDA #$DD
1434                 STA $0319       ; change NMI vector to $DD09
1435                 LDA #$FF        ; Try changing this instruction's operand
1436                 STA $DD05       ; (see comment below).
1437                 LDA #$FF
1438                 STA $DD04       ; set interrupt frequency to 1/65536 cycles
1439                 LDA $DD0E
1440                 AND #$80
1441                 ORA #$11
1442                 LDX #$81
1443                 STX $DD0D       ; enable timer interrupt
1444                 STA $DD0E       ; start timer
1445
1446                 LDA #$00        ; To see that the interrupts really occur,
1447                 STA $D011       ; use something like this and see how
1448         LOOP    DEC $D020       ; changing the byte loaded to $DD05 from
1449                 BNE LOOP        ; #$FF to #$0F changes the image.
1450
1451         When an NMI occurs, the processor jumps to Kernal code, which jumps to
1452         ($0318), which points to the following routine:
1453
1454         DD09    LSR $40         ; clear N flag
1455                 BPL $DD0A       ; Note: $DD0A contains RTI.
1456
1457         Operational diagram of BPL $DD0A:
1458
1459           #  data  address  R/W  description
1460          --- ----  -------  ---  ---------------------------------
1461           1   10    $DD0B    R   fetch opcode
1462           2   11    $DD0C    R   fetch argument
1463           3   xx    $DD0D    R   fetch opcode, add argument to PCL
1464           4   40    $DD0A    R   fetch opcode, (fix PCH)
1465
1466   With RTI:
1467
1468         ; the fastest possible interrupt handler in the 6500 family
1469                 ; preparations
1470                 SEI
1471                 LDA $01         ; disable ROM and enable I/O
1472                 AND #$FD
1473                 ORA #$05
1474                 STA $01
1475                 LDA #$7F
1476                 STA $DD0D       ; disable CIA 2's all interrupt sources
1477                 LDA $DD0E
1478                 AND #$BE        ; ensure that $DD0C remains constant
1479                 STA $DD0E       ; and stop the timer
1480                 LDA #$40
1481                 STA $DD0C       ; store RTI to $DD0C
1482                 LDA #$0C
1483                 STA $FFFA
1484                 LDA #$DD
1485                 STA $FFFB       ; change NMI vector to $DD0C
1486                 LDA #$FF        ; Try changing this instruction's operand
1487                 STA $DD05       ; (see comment below).
1488                 LDA #$FF
1489                 STA $DD04       ; set interrupt frequency to 1/65536 cycles
1490                 LDA $DD0E
1491                 AND #$80
1492                 ORA #$11
1493                 LDX #$81
1494                 STX $DD0D       ; enable timer interrupt
1495                 STA $DD0E       ; start timer
1496
1497                 LDA #$00        ; To see that the interrupts really occur,
1498                 STA $D011       ; use something like this and see how
1499         LOOP    DEC $D020       ; changing the byte loaded to $DD05 from
1500                 BNE LOOP        ; #$FF to #$0F changes the image.
1501
1502         When an NMI occurs, the processor jumps to Kernal code, which
1503         jumps to ($0318), which points to the following routine:
1504
1505         DD0C    RTI
1506
1507         How on earth can this clear the interrupts? Remember, the
1508         processor always fetches two successive bytes for each
1509         instruction.
1510
1511         A little more practical version of this is redirecting the NMI
1512         (or IRQ) to your own routine, whose last instruction is JMP
1513         $DD0C or JMP $DC0C.  If you want to confuse more, change the 0
1514         in the address to a hexadecimal digit different from the one
1515         you used when writing the RTI.
1516
1517         Or you can combine the latter two methods:
1518
1519         DD09    LSR $xx  ; xx is any appropriate BCD value 00-59.
1520                 BPL $DCFC
1521         DCFC    RTI
1522
1523         This example acknowledges interrupts to both CIAs.
1524
1525   If you want to confuse the examiners of your code, you can use any
1526 of these techniques. Although these examples use no undefined opcodes,
1527 they do not necessarily run correctly on CMOS processors. However, the
1528 RTI example should run on 65C02 and 65C816, and the latter branch
1529 instruction example might work as well.
1530
1531   The RMW instruction method has been used in some demos, others were
1532 developed by Marko M"akel"a. His favourite is the automagical RTI
1533 method, although it does not have any practical applications, except
1534 for some time dependent data decryption routines for very complicated
1535 copy protections.
1536
1537