Macros to Add Structured Control-flow to any Assembler (part 1) (version 2)

For those who just want to use it, and don’t care how it works, here’s the Quick Reference.

In memory of Wil Baden (1928-2016). Among Wil's many gifts to the world, is the elegant control-flow implementation scheme in ANS Forth (1994), on which this work is based.

[The first version of this article required the following disclaimer:

“Well almost any. Unfortunately the following method cannot be used with the GNU assembler because it does not allow the assembly location to be moved backwards. For the GNU assembler, an attempt to .org backwards is an error.”

However, Bill Westfield was inspired by this, to devise an org-free way of doing it in the GNU assembler. He used computed local labels. I have now adopted some of Bill’s ideas, to create an org-free variant of my method below, that uses computed labels (ordinary, not local), and will therefore work with other assemblers, in addition to the GNU assembler. It turns out to be simpler than my first method, and it generates far-more-readable listings. Thanks Bill.]

I assume that most programmers appreciate the advantages of structured control-flow over "spaghetti code" (using labels and jumps or "go-to"s). The basic problem is that a label is a "come-from" statement with no indication of where control can come from. With structured control-flow we have sets of matching reserved-words that indicate the direction of the jump (forward or backward) and we use indentation to indicate which words are connected.

e.g. Instead of we write

        tst     R8                        tst     R8

        jnz     label                     _IF     Z

        mov     R9,R8                         mov     R9,R8

label:                                    _ENDIF

which generates exactly the same machine code. This is not to be confused with conditional assembly.

And instead of we write

label:                                    _REPEAT

        sub     R9,R8                         sub     R9,R8

        jnc     label                     _UNTIL  C

You don’t need to know anything about Forth in order to use this method, and the words I have chosen for the macro names owe more to C, Pascal and BASIC. But I feel the need to say a few words to honour the source of the method. Users of Forth systems have had structured control-flow in their assembly languages since 1968, the same year Dijkstra's famous letter was published (Go To Statement Considered Harmful). I was pleased to learn that the x86 community at least, now has support for structured control-flow built into the assemblers MASM and NASM. It seems incredible to me that assemblers still exist with no support for structured control-flow. I program microcontrollers using the IAR Workbench assembler where this is the case.

The method used to implement structured control-flow in Forth-based assemblers (and in the Forth language itself) was nicely refined by the time of the first ANS/ISO standard Forth in 1994. If you’re curious, you can read about it here http://lars.nocrew.org/dpans/dpansa3.htm#A.3.2.3.2. But you don't need to, since what I've done for you below, is to translate the method into assembler macro definitions, so you don't need to use Forth, or know anything about it.

Some of the words used for control-flow in Forth are confusing when you try to relate them to programming languages in common use today. That's because those languages were only a glint in their creators' eyes when Chuck Moore included structured control-flow in Forth. So I have changed those words to more familiar ones. The main thing is the method, which can be adapted to add structured control-flow to any assembler, without requiring access to the assembler's source code. It only requires assembler variables and macros.

We can’t make it read exactly like any higher-level language because (a) the elegant stack-based implementation limits us to a context-free grammar, and (b) in assembly language, a conditional branch must come textually after the code that performs the test, i.e. IF and UNTIL are inherently postfix operations in assembly language. But I’ve done my best to make it read in a familiar manner.

The code below implements structured control-flow for a specific assembler (IAR) and a specific target processor (MSP430), but with minor modifications the method can be applied to any combination of assembler and target. Here’s the full source code for the IAR/MSP430 version. And here it is for the CCS assembler and the TMS320C28x processor. Please let me know if you implement this coding pattern for any other processor/assembler combination, so I can include a link here.
John Hardy has implemented it for the Z80 using asm80 and gives a nice description of how it works, in case you find mine too hard to follow.

In general terms, the method consists of macros that push some information (usually a label number) onto a stack when assembling the start of a structure, and other macros that pop it off and make use of it when assembling the end of the corresponding structure. The use of a stack allows structures to be nested. I once saw a forum post where someone claimed it was impossible to use such a method with most assemblers, because they do not provide an assembly-time stack. I thought so too, until one day I realised I could make such a stack, using assembler variables and macros, by “brute force” as it were. Garth Wilson came up with the same idea independently and has implemented it for the C32 assembler for 65C02 and MPASM for PIC16. All assemblers have a directive like SET, to assign variable values to labels, as opposed to constant values. So the stack can be implemented like this:

_CS_PUSH    MACRO   arg

_CS4        SET     _CS3

_CS3        SET     _CS2

_CS2        SET     _CS_TOP

_CS_TOP     SET     arg

            ENDM

_CS_DROP    MACRO

_CS_TOP     SET     _CS2

_CS2        SET     _CS3

_CS3        SET     _CS4

_CS4        SET     0

            ENDM

We use underscores at the start of all the macro and variable names so they don’t clash with assembler directives and are unlikely to clash with anything in the application code. "CS" stands for Control-flow Stack. We implement DROP instead of POP since we can just access the top item directly as _CS_TOP. I've only shown a four level stack, but you get the idea. I find that 12 levels are sufficient, but this will depend mostly on the maximum number of cases you want to have in a _CASE statement, which we will meet in part 2.

It turns out we need one more stack operation SWAP, to implement some words that can occur in the middle of a structure, such as ELSE and WHILE. It swaps the top two elements. This one uses an XOR-swap because hey, they're tricky, and how often do you get the chance. :-)

_CS_SWAP    MACRO

_CS_TOP     SET     _CS_TOP ^ _CS2

_CS2        SET     _CS_TOP ^ _CS2

_CS_TOP     SET     _CS_TOP ^ _CS2

            ENDM

Then we need a macro that will take an integer assembler variable and assemble a unique label based on its value, e.g. _L followed by the decimal representation of the value. This is easy in the GNU assembler, thanks to the % operator which becomes available following an .altmacro directive. This evaluates a numeric expression and turns it into a string. But in the IAR assembler I had to be more creative. I came up with the following recursive macro. At each level of the recursion, the number is divided by 10 and the string grows by one digit, until the number hits zero.

_LABEL MACRO num, str ; "\2" below is equivalent to "str" (2nd argument) but can be concatenated

IF num == 0

_L\2

ELSE

IF num % 10 == 0

_LABEL num / 10, 0\2

ENDIF

IF num % 10 == 1

_LABEL num / 10, 1\2

ENDIF

IF num % 10 == 2

_LABEL num / 10, 2\2

ENDIF

IF num % 10 == 3

_LABEL num / 10, 3\2

ENDIF

IF num % 10 == 4

_LABEL num / 10, 4\2

ENDIF

IF num % 10 == 5

_LABEL num / 10, 5\2

ENDIF

IF num % 10 == 6

_LABEL num / 10, 6\2

ENDIF

IF num % 10 == 7

_LABEL num / 10, 7\2

ENDIF

IF num % 10 == 8

_LABEL num / 10, 8\2

ENDIF

IF num % 10 == 9

_LABEL num / 10, 9\2

ENDIF

ENDM

And we need a macro that will take a condition code and a label number, and assemble the corresponding jump instruction. We use the same recursive conversion from label number to label string.

_JUMP MACRO cond, num, str ; "\3" below is equivalent to "str" (3rd argument) but can be concatenated

IF num == 0

J\1 _L\3

ELSE

IF num % 10 == 0

_JUMP cond, num / 10, 0\3

ENDIF

IF num % 10 == 1

_JUMP cond, num / 10, 1\3

ENDIF

IF num % 10 == 2

_JUMP cond, num / 10, 2\3

ENDIF

IF num % 10 == 3

_JUMP cond, num / 10, 3\3

ENDIF

IF num % 10 == 4

_JUMP cond, num / 10, 4\3

ENDIF

IF num % 10 == 5

_JUMP cond, num / 10, 5\3

ENDIF

IF num % 10 == 6

_JUMP cond, num / 10, 6\3

ENDIF

IF num % 10 == 7

_JUMP cond, num / 10, 7\3

ENDIF

IF num % 10 == 8

_JUMP cond, num / 10, 8\3

ENDIF

IF num % 10 == 9

_JUMP cond, num / 10, 9\3

ENDIF

ENDM

Now we need to define some macros for jump instructions with the opposite condition from that used in the _IF or _UNTIL. Notice how, in the examples above, the _IF Z needs to assemble a jnz instruction and the _UNTIL C assembles a jnc.

; Translate the jump instructions generated by _JUMP above, when "not" is placed before the condition code.

; Used by _IF and _UNTIL.

JnotZ MACRO label

JNZ label

ENDM

JnotNZ MACRO label

JZ label

ENDM

JnotEQ MACRO label

JNE label

ENDM

JnotNE MACRO label

JEQ label

ENDM

JnotHS MACRO label

JLO label

ENDM

JnotC MACRO label

JNC label

ENDM

JnotNC MACRO label

JC label

ENDM

JnotLO MACRO label

JHS label

ENDM

JnotN MACRO label ; MSP430 specific.

JN $+4 ; The best substitute for the non-existent JNN instruction

JMP label ; Thanks to Anders Lindgren

ENDM

JnotNN MACRO label

JN label

ENDM

JnotL MACRO label

JGE label

ENDM

JnotGE MACRO label

JL label

ENDM

JnotNEVER MACRO label ; An unconditional jump

JMP label

ENDM

Now we initialise the variable used to generate unique labels beginning with _L.

_LABEL_NUM SET 100

And we define a couple of macros that will improve the readability of the other macros below, by allowing us to move variable names away from the first column.

_INC MACRO var

var SET var + 1

ENDM

_SET MACRO var, expr

var SET expr

ENDM

Now we can start defining the actual control-flow macros that we will use in our source code. First we define the macros that let us write conditional execution like this:

        <test>

        _IF cc

...

        _ENDIF

or this:

        <test>

        _IF cc

...

        _ELSE

...

        _ENDIF

Note that I’m using “...” here to stand for any number of lines of assembly language. “<test>” also stands for any number of lines, but with the specific purpose of affecting some processor condition flag (status bit). And in the case of the MSP430 processor, “cc” stands for one of Z, NZ, EQ, NE, C, NC, HS, LO, N, NN, L, GE or NEVER.

; Mark the origin of a forward jump.

; Called by _ELSE and _WHILE.

_IF MACRO cond ; "\1" below is equivalent to "cond" (1st argument) but can be concatenated

        _JUMP   not\1, _LABEL_NUM      ; Assemble a conditional jump with the opposite condition

        _CS_PUSH       _LABEL_NUM      ; Push its label number

        _INC           _LABEL_NUM      ; Increment the label number

        ENDM

; Resolve a forward jump due to the most recent _IF, _ELSE or _WHILE.

; Called by _ELSE and _ENDW.

_ENDIF  MACRO

        _LABEL  _CS_TOP                ; Assemble the label for the previous _IF.

        _CS_DROP                       ; Drop its label number off the control-flow stack

        ENDM

; Mark the origin of a forward unconditional jump and

; resolve a forward jump due to an _IF.

_ELSE   MACRO

        _IF     NEVER                  ; Assemble an unconditional jump and push its label number

        _CS_SWAP                       ; Get the prior _IF’s label number back on top

        _ENDIF                         ; Assemble the label for the prior _IF, and drop its number

        ENDM

Now we define the macros that let us write conditional loops like this:

        _REPEAT

...

            <test>                     ; post-tested loop

        _UNTIL cc

and this:

_DO

...

            <test>                     ; pre or mid tested loop

        _WHILE cc

...

        _ENDW

; Mark a backward destination (i.e. the start of a loop).

_REPEAT MACRO

        _LABEL   _LABEL_NUM            ; Assemble a label

        _CS_PUSH _LABEL_NUM            ; Push the number of the label to jump back to

        _INC     _LABEL_NUM            ; Increment the label number

        ENDM

_DO     MACRO

        _REPEAT

        ENDM

; Resolve the most recent _REPEAT or _DO with a backward conditional jump.

; The end of a post-tested loop.

; Called by ENDW.

_UNTIL  MACRO  cond                   ; "\1" below is equivalent to "cond" (1st argument) but can be concatenated

        _JUMP   not\1, _CS_TOP         ; Assemble a conditional jump back to the corresponding _REPEAT or _DO

        _CS_DROP                       ; Drop its label number off the control-flow stack

        ENDM

; Mark the origin of a forward conditional jump out of a loop.

; The test of a pre-tested or mid-tested loop.

_WHILE  MACRO   cond

        _IF     cond                  ; Assemble a conditional jump and push its label number

        _CS_SWAP                      ; Get the _DO label number back on top

        ENDM

; Resolve the most recent _DO with a backward unconditional jump and

; resolve a forward jump due to the most recent _WHILE.

; The end of a pre-tested or mid-tested loop.

_ENDW   MACRO

        _UNTIL  NEVER                 ; Assemble a jump back to the most recent _DO

        _ENDIF                        ; Assemble the label for the last _WHILE

        ENDM

Note that _DO and _REPEAT are equivalent. They both just generate a label to jump back to. But having different words improves readability by allowing you to tell if the loop is post-tested or not, right from its start.

A loop with multiple exits is not strictly structured, but can be implemented by adding extra _WHILE s inside the loop. Each additional _WHILE must be matched by an additional _ENDIF or _ELSE ... _ENDIF following the loop. For more information on this see http://lars.nocrew.org/dpans/dpansa3.htm#figure.a.2.

These definitions are delightfully simple. In real life, someone would probably want us to clutter them up with some error checking and listing control.

The simplest error checking uses an assembler variable to keep a count of the items on the control-flow stack. We initialise it to zero.

_CS_COUNT SET 0

Then we add the following to the start of _CS_PUSH above.

_CS_COUNT SET _CS_COUNT+1

IF _CS_COUNT > 12 ; Or whatever your stack size is

#error "Control flow stack overflow"

ENDIF

And we add the following to the start of _CS_DROP above.

_CS_COUNT SET _CS_COUNT-1

IF _CS_COUNT < 0

#error "Control flow stack underflow"

ENDIF

Then we invoke the following macro at the end of the program, or anywhere that control flow structures should all be complete, to check that the control flow stack is empty and has not underflowed.

_CS_CHECK MACRO

IF _CS_COUNT != 0

#error "Control-flow stack is unbalanced"

ENDIF

ENDM

Notice that the 5 basic control-flow elements are _IF, _ENDIF, _REPEAT, _UNTIL and CS_SWAP. All the other elements are defined using those. We can use these 5 to define other structures such as counted loops, switch/case statements and short-circuit-conditionals. You can read about these in Part 2.

Go to Part 2. [Note: This Go To is not considered harmful :-) ]

-- Dave Keenan, 2018-Jan-01 (last updated 2018-Nov-19)