This version is of historical interest only. See updated version.

Adding Structured Control-flow to any* Assembler (part 1)

[* Well almost any. Unfortunately the following method cannot be used with the GNU Assembler because it does not allow the assembly location to be moved backwards. For the GNU Assembler, an attempt to .org backwards is an error.]

I assume that most programmers appreciate the advantages of structured control-flow over "spaghetti code" (using labels and jumps or "go-to"s). The basic problem is that a label is a "come-from" statement with no indication of where control can come from. With structured control-flow we have sets of matching reserved-words that indicate the direction of the jump (forward or backward) and we use indentation to indicate which words are connected.

e.g. Instead of                                                 we write

 
        tst     R8                        tst     R8
        jnz     label                     _IF     _Z
        mov     R9,R8                         mov     R9,R8
label:                                    _ENDIF

which generates exactly the same machine code. This is not to be confused with conditional assembly.

And instead of                                                 we write something like

 
label:                                    _BEGIN
        sub     R9,R8                         sub     R9,R8
        jnc     label                     _UNTIL  _C

Users of Forth systems have had structured control-flow in their assembly languages since 1968, the same year Dijkstra's famous letter was published (Go To Statement Considered Harmful). I was pleased to learn that the x86 community at least, now has support for structured control-flow built into the assemblers MASM and NASM. It seems incredible to me that assemblers still exist with no support for structured control-flow. I program microcontrollers using the IAR Workbench assembler where this is the case.

The method used to implement structured control-flow in Forth-based assemblers (and in the Forth language itself) was nicely refined by the time of the first ANS/ISO standard Forth. You can read about it here http://lars.nocrew.org/dpans/dpansa3.htm#A.3.2.3.2. But you don't need to, since what I've done for you below, is to translate the Forth definitions into assembler macro definitions, so you don't need to use Forth, or know anything about it. I only mention Forth to honour the source of the method.

Some of the words used for control-flow in Forth are confusing when you try to relate them to C, Pascal or structured BASIC. That's because (a) those languages were only a glint in their creators' eyes when these things were included in Forth – only Fortran's counted DO loop had any influence, and (b) the elegant stack-based implementation limits us to a context-free grammar. But you can change the actual words to be whatever you want. In the examples below I substitute ENDIF and ENDW for Forth's uncommon use of THEN and REPEAT. The main thing is the method, that can be adapted to add structured control-flow to almost any assembler, without requiring access to the assembler's source code. It only requires assembler variables and macros, and the ability to move the current assembly location backwards.

The code below implements structured control-flow for a specific assembler (IAR) and a specific target processor (MSP430) but the text explains how to modify it to work with any combination of assembler and target.

The first thing you need to do is implement a control-flow stack using assembler variables. The assembler I'm using has the usual EQU to assign constant values to labels, but it also has SET to assign variable values. So the stack can be implemented like this:

 
_CS_PUSH    MACRO   arg
_CS4        SET     _CS3
_CS3        SET     _CS2
_CS2        SET     _CS_TOP
_CS_TOP     SET     arg
            ENDM
 
_CS_DROP    MACRO
_CS_TOP     SET     _CS2
_CS2        SET     _CS3
_CS3        SET     _CS4
_CS4        SET     0
            ENDM

We use underscores at the start of all the macro and variable names so they are less likely to clash with anything in the source code. "CS" stands for Control-flow Stack. We implement DROP instead of POP since we can just access the top item directly as _CS_TOP. I've only shown a four level stack, but you get the idea. Eight levels is probably sufficient.

We don't need the full generality of ANS Forth's CS-PICK and CS-ROLL, so don't worry if you don't know what they are. We only need one more stack operation, which swaps the top two elements. This one uses an XOR-swap because hey, they're tricky, and how often do you get the chance. :-)

 
_CS_SWAP    MACRO
_CS_TOP     SET     _CS_TOP ^ _CS2
_CS2        SET     _CS_TOP ^ _CS2
_CS_TOP     SET     _CS_TOP ^ _CS2
            ENDM

Next you need to look up the format of your jump instructions. They will generally consist of three bit-fields. The constant opcode, the condition code and the address offset. You need to write a macro that will take a condition code and an address offset as arguments and assemble the corresponding jump instruction. e.g.

 
_ASM_Jxx    MACRO   cond,offset
            DW      1<<13 | (cond&7)<<10 | (offset>>1)&$03FF
            ENDM

Now we need to define some meaningful constant labels for the various jump conditions. But we actually define them as the condition field of the jump instruction with the inverse condition. Notice how, in the examples above, the _IF _Z needs to assemble a jnz instruction and the _UNTIL _C assembles a jnc.

 
; Define condition codes for structured assembly. Used as arguments to _IF, _WHILE, _UNTIL.
; MSP430 condition codes
_Z      EQU     0   ; (jnz) Zero
_NZ     EQU     1   ; (jz ) Not Zero
_C      EQU     2   ; (jnc) Carry
_NC     EQU     3   ; (jc ) No Carry
_NN     EQU     4   ; (jn ) Not Negative
_L      EQU     5   ; (jge) Less (signed), (N xor V)
_GE     EQU     6   ; (jl ) Greater or Equal (signed), not(N xor V)
_NEVER  EQU     7   ; (jmp) Never (unconditional jump). Used in defs of _ELSE, _ENDW

Now we can start defining the actual control-flow macros that we will use in our source code.

 
; Define macros for conditional execution
; _IF _cc ... _ENDIF
; _IF _cc ... _ELSE ... _ENDIF
 
; Mark the origin of a forward jump.
; Called by _ELSE and _WHILE.
 
_IF         MACRO       cond
            _CS_PUSH    (cond << 29) | ($ & $1FFFFFFF) ; Push the condition code and the address
                                            ; where the jump instruction will be filled-in later
            ORG         $+2                 ; Skip over that location
            ENDM
 
 
; Resolve a forward jump due to the most recent _IF, _ELSE or _WHILE.
; Called by _ELSE and _ENDW.
 
_ENDIF      MACRO
_destin     SET         $                   ; Remember where we were up to in assembling
_origin     SET         _CS_TOP & $1FFFFFFF ; Extract the origin address
_cond       SET         _CS_TOP>>29         ; Extract the condition code
_offset     SET         _destin-_origin-2   ; Calculate the offset in bytes
            ORG         _origin             ; Go back to address on top of control-flow stack
            _ASM_Jxx    _cond,_offset       ; Assemble the jump instruction with offset
            ORG         _destin             ; Go forward again to continue assembling
            _CS_DROP                        ; Drop address and cond code off control-flow stack
            ENDM
 
 
; Mark the origin of a forward unconditional jump and
; resolve a forward jump due to an _IF.
 
_ELSE       MACRO
            _IF         _NEVER      ; Leave space for an unconditional jump and push its address
            _CS_SWAP                ; Get the original _IF address back on top
            _ENDIF                  ; Back-fill the jump instruction for the previous _IF
            ENDM
 
 
; Define macros for conditional loops
; _BEGIN ... _UNTIL _cc              (post-tested loop)
; _BEGIN ... _WHILE _cc ... _ENDW    (pre or mid tested loop)
 
; Mark a backward destination (i.e. the start of a loop).
 
_BEGIN      MACRO
            _CS_PUSH    $              ; Push the address to jump back to
            ENDM
 
 
; Resolve the most recent _BEGIN with a backward conditional jump.
; The end of a post-tested loop.
; Called by ENDW.
 
_UNTIL      MACRO       cond
_offset     SET         _CS_TOP-$-2     ; Calculate the offset back to the most recent BEGIN
            _ASM_Jxx    cond,_offset    ; Assemble cond. jump back to address on top of stack
            _CS_DROP                    ; Drop the address off the control-flow stack
            ENDM
 
 
; Mark the origin of a forward conditional jump out of a loop.
; The test of a pre-tested or mid-tested loop.
 
_WHILE      MACRO       cond
            _IF         cond            ; Leave space for conditional jump and push its address
            _CS_SWAP                    ; Get the _BEGIN address back on top
            ENDM
 
 
; Resolve the most recent _BEGIN with a backward unconditional jump and
; resolve a forward jump due to the most recent _WHILE.
; The end of a pre-tested or mid-tested loop.
 
_ENDW       MACRO
            _UNTIL      _NEVER          ; Assemble uncond. jump back to the most recent _BEGIN
            _ENDIF                      ; Back-fill the jump instruction for the last _WHILE
            ENDM
 

These definitions are delightfully simple. In real life, someone would probably want us to clutter them up with some error checking and listing control.

You may feel that the word for starting a loop should be either _DO, _LOOP or _REPEAT instead of _BEGIN. But these words already have other control-flow meanings in Forth, so it would be confusing, and disrespectful to the community that has given us this simple implementation, to redefine them. In Forth, as in Fortran and PL/I, DO is used to start a counted loop (in Forth's case post-tested). In Forth, LOOP ends a counted loop and REPEAT ends a pre or mid tested conditional loop and so is synonomous with our ENDW above.

A loop with multiple exits is not strictly structured, but can be implemented by adding _WHILEs inside the loop. Each additional _WHILE must be matched by an additional _ENDIF or _ELSE ... _ENDIF following the loop. For more information on this see http://lars.nocrew.org/dpans/dpansa3.htm#figure.a.2.

Notice that the 5 basic control-flow elements are _IF, _ENDIF, _BEGIN, _UNTIL and CS_SWAP. All the other elements are defined using those. We can use these 5 to define other structures such as counted loops, switch/case statements and short-circuit-OR. You can read about these in part 2.

-- Dave Keenan, 2010-Nov-24 (last updated 2014-May-26)

thing.gif