This version is of historical interest only. See updated version.

Adding Structured Control-flow to any* Assembler (part 1)

[* Well almost any. Unfortunately the following method cannot be used with the GNU Assembler because it does not allow the assembly location to be moved backwards. For the GNU Assembler, an attempt to .org backwards is an error.]

I assume that most programmers appreciate the advantages of structured control-flow over "spaghetti code" (using labels and jumps or "go-to"s). The basic problem is that a label is a "come-from" statement with no indication of where control can come from. With structured control-flow we have sets of matching reserved-words that indicate the direction of the jump (forward or backward) and we use indentation to indicate which words are connected.

e.g. Instead of we write

        tst     R8                        tst     R8

        jnz     label                     _IF     _Z

        mov     R9,R8                         mov     R9,R8

label:                                    _ENDIF

which generates exactly the same machine code. This is not to be confused with conditional assembly.

And instead of we write something like

label:                                    _BEGIN

        sub     R9,R8                         sub     R9,R8

        jnc     label                     _UNTIL  _C

Users of Forth systems have had structured control-flow in their assembly languages since 1968, the same year Dijkstra's famous letter was published (Go To Statement Considered Harmful). I was pleased to learn that the x86 community at least, now has support for structured control-flow built into the assemblers MASM and NASM. It seems incredible to me that assemblers still exist with no support for structured control-flow. I program microcontrollers using the IAR Workbench assembler where this is the case.

The method used to implement structured control-flow in Forth-based assemblers (and in the Forth language itself) was nicely refined by the time of the first ANS/ISO standard Forth. You can read about it here http://lars.nocrew.org/dpans/dpansa3.htm#A.3.2.3.2. But you don't need to, since what I've done for you below, is to translate the Forth definitions into assembler macro definitions, so you don't need to use Forth, or know anything about it. I only mention Forth to honour the source of the method.

Some of the words used for control-flow in Forth are confusing when you try to relate them to C, Pascal or structured BASIC. That's because (a) those languages were only a glint in their creators' eyes when these things were included in Forth – only Fortran's counted DO loop had any influence, and (b) the elegant stack-based implementation limits us to a context-free grammar. But you can change the actual words to be whatever you want. In the examples below I substitute ENDIF and ENDW for Forth's uncommon use of THEN and REPEAT. The main thing is the method, that can be adapted to add structured control-flow to almost any assembler, without requiring access to the assembler's source code. It only requires assembler variables and macros, and the ability to move the current assembly location backwards.

The code below implements structured control-flow for a specific assembler (IAR) and a specific target processor (MSP430) but the text explains how to modify it to work with any combination of assembler and target.

The first thing you need to do is implement a control-flow stack using assembler variables. The assembler I'm using has the usual EQU to assign constant values to labels, but it also has SET to assign variable values. So the stack can be implemented like this:

_CS_PUSH    MACRO   arg

_CS4        SET     _CS3

_CS3        SET     _CS2

_CS2        SET     _CS_TOP

_CS_TOP     SET     arg

            ENDM

_CS_DROP    MACRO

_CS_TOP     SET     _CS2

_CS2        SET     _CS3

_CS3        SET     _CS4

_CS4        SET     0

            ENDM

We use underscores at the start of all the macro and variable names so they are less likely to clash with anything in the source code. "CS" stands for Control-flow Stack. We implement DROP instead of POP since we can just access the top item directly as _CS_TOP. I've only shown a four level stack, but you get the idea. Eight levels is probably sufficient.

We don't need the full generality of ANS Forth's CS-PICK and CS-ROLL, so don't worry if you don't know what they are. We only need one more stack operation, which swaps the top two elements. This one uses an XOR-swap because hey, they're tricky, and how often do you get the chance. :-)

_CS_SWAP    MACRO

_CS_TOP     SET     _CS_TOP ^ _CS2

_CS2        SET     _CS_TOP ^ _CS2

_CS_TOP     SET     _CS_TOP ^ _CS2

            ENDM

Next you need to look up the format of your jump instructions. They will generally consist of three bit-fields. The constant opcode, the condition code and the address offset. You need to write a macro that will take a condition code and an address offset as arguments and assemble the corresponding jump instruction. e.g.

_ASM_Jxx    MACRO   cond,offset

            DW      1<<13 | (cond&7)<<10 | (offset>>1)&$03FF

            ENDM

Now we need to define some meaningful constant labels for the various jump conditions. But we actually define them as the condition field of the jump instruction with the inverse condition. Notice how, in the examples above, the _IF _Z needs to assemble a jnz instruction and the _UNTIL _C assembles a jnc.

; Define condition codes for structured assembly. Used as arguments to _IF, _WHILE, _UNTIL.

; MSP430 condition codes

_Z      EQU     0   ; (jnz) Zero

_NZ     EQU     1   ; (jz ) Not Zero

_C      EQU     2   ; (jnc) Carry

_NC     EQU     3   ; (jc ) No Carry

_NN     EQU     4   ; (jn ) Not Negative

_L      EQU     5   ; (jge) Less (signed), (N xor V)

_GE     EQU     6   ; (jl ) Greater or Equal (signed), not(N xor V)

_NEVER  EQU     7   ; (jmp) Never (unconditional jump). Used in defs of _ELSE, _ENDW

Now we can start defining the actual control-flow macros that we will use in our source code.

; Define macros for conditional execution

; _IF _cc ... _ENDIF

; _IF _cc ... _ELSE ... _ENDIF

; Mark the origin of a forward jump.

; Called by _ELSE and _WHILE.

_IF         MACRO       cond

            _CS_PUSH    (cond << 29) | ($ & $1FFFFFFF) ; Push the condition code and the address

                                            ; where the jump instruction will be filled-in later

            ORG         $+2                 ; Skip over that location

            ENDM

; Resolve a forward jump due to the most recent _IF, _ELSE or _WHILE.

; Called by _ELSE and _ENDW.

_ENDIF      MACRO

_destin     SET         $                   ; Remember where we were up to in assembling

_origin     SET         _CS_TOP & $1FFFFFFF ; Extract the origin address

_cond       SET         _CS_TOP>>29         ; Extract the condition code

_offset     SET         _destin-_origin-2   ; Calculate the offset in bytes

            ORG         _origin             ; Go back to address on top of control-flow stack

            _ASM_Jxx    _cond,_offset       ; Assemble the jump instruction with offset

            ORG         _destin             ; Go forward again to continue assembling

            _CS_DROP                        ; Drop address and cond code off control-flow stack

            ENDM

; Mark the origin of a forward unconditional jump and

; resolve a forward jump due to an _IF.

_ELSE       MACRO

            _IF         _NEVER      ; Leave space for an unconditional jump and push its address

            _CS_SWAP                ; Get the original _IF address back on top

            _ENDIF                  ; Back-fill the jump instruction for the previous _IF

            ENDM

; Define macros for conditional loops

; _BEGIN ... _UNTIL _cc              (post-tested loop)

; _BEGIN ... _WHILE _cc ... _ENDW    (pre or mid tested loop)

; Mark a backward destination (i.e. the start of a loop).

_BEGIN      MACRO

            _CS_PUSH    $              ; Push the address to jump back to

            ENDM

; Resolve the most recent _BEGIN with a backward conditional jump.

; The end of a post-tested loop.

; Called by ENDW.

_UNTIL      MACRO       cond

_offset     SET         _CS_TOP-$-2     ; Calculate the offset back to the most recent BEGIN

            _ASM_Jxx    cond,_offset    ; Assemble cond. jump back to address on top of stack

            _CS_DROP                    ; Drop the address off the control-flow stack

            ENDM

; Mark the origin of a forward conditional jump out of a loop.

; The test of a pre-tested or mid-tested loop.

_WHILE      MACRO       cond

            _IF         cond            ; Leave space for conditional jump and push its address

            _CS_SWAP                    ; Get the _BEGIN address back on top

            ENDM

; Resolve the most recent _BEGIN with a backward unconditional jump and

; resolve a forward jump due to the most recent _WHILE.

; The end of a pre-tested or mid-tested loop.

_ENDW       MACRO

            _UNTIL      _NEVER          ; Assemble uncond. jump back to the most recent _BEGIN

            _ENDIF                      ; Back-fill the jump instruction for the last _WHILE

            ENDM

These definitions are delightfully simple. In real life, someone would probably want us to clutter them up with some error checking and listing control.

You may feel that the word for starting a loop should be either _DO, _LOOP or _REPEAT instead of _BEGIN. But these words already have other control-flow meanings in Forth, so it would be confusing, and disrespectful to the community that has given us this simple implementation, to redefine them. In Forth, as in Fortran and PL/I, DO is used to start a counted loop (in Forth's case post-tested). In Forth, LOOP ends a counted loop and REPEAT ends a pre or mid tested conditional loop and so is synonomous with our ENDW above.

A loop with multiple exits is not strictly structured, but can be implemented by adding _WHILEs inside the loop. Each additional _WHILE must be matched by an additional _ENDIF or _ELSE ... _ENDIF following the loop. For more information on this see http://lars.nocrew.org/dpans/dpansa3.htm#figure.a.2.

Notice that the 5 basic control-flow elements are _IF, _ENDIF, _BEGIN, _UNTIL and CS_SWAP. All the other elements are defined using those. We can use these 5 to define other structures such as counted loops, switch/case statements and short-circuit-OR. You can read about these in part 2.

-- Dave Keenan, 2010-Nov-24 (last updated 2014-May-26)