UNIVERSITY OF NORTHERN BRITISH COLUMBIA

Winter 2002
Computer Science

Code Generator

1. General points

To continue onto code generation, it is essential that you understand the Tiny Machine (TM) Runtime Environment

This is the final stage, where object code is produced from the annotated syntax tree and the symbol table for the TM virtual machine.

Since Cminus has recursive procedures it is necessary to have runtime environment that is stack based, but since Cminus contains no pointers or dynamic allocation there is no need for a heap.

When developing the Code Generator, pay special care to:

Standard prelude that initializes the environment

The management of the registers, and determining what registers are appropriate¹

Converting absolute address jumps to relative address jumps

The Calling sequence

Distinction between lvalues and rvalues

Array manipulation

Global references relative to the gp (global pointer)

1 It is not necessary to use all the registers; optimization of register use is considered beyond the scope of this project. However, you are more than welcome to implement register optimization if you wish to do so.

2. Suggested Readings

Chapter 7
Chapter 8

3. Tiny Machine Runtime Environment

The following description assumes a knowledge of the Tiny Machine as given in Section 8.7 and an understanding of stack-based runtime environments from Chapter 7. Since C- (unlike TINY) has recursive procedures, the runtime environment must be stack based. The environment consists of a global area at the top of dMem, and the stack just below it, growing downward toward 0. Since C- contains no pointers or dynamic allocation, there is no need for a heap. The basic organization of each activation record (or stack frame) in C- is:

ofp

ret

…

params

…

local vars

…

temps

…

<- fp points here

ofpFO = 0

retFO = -1

initFO = -2

Here, fp is the current frame pointer, which is kept in a register for easy access. The ofp (old frame pointer) is the control link as discussed in Chapter 7 of the text. The constants to the right ending in FO (for frame offset) are the offsets at which each of the indicated quantities are stored. The value initFO is the offset at which the params and local vars begin their storage areas in an activation record. Since the Tiny Machine contains no stack pointer, all references to fields within an activation record will use the fp, with negative frame offsets.

For example, if we have the following C- function declaration,

int f(int x, int y)

{

int z;

. . .

}

then x, y, and z must be allocated in the current frame, and the frame offset at the beginning of the generation of code for the body of f will be -5 (one location each for x, y, z and two locations for the bookkeeping information of the activation record). The offsets of x, y, and z are -2, -3, and -4, respectively.

Global references can be found at absolute locations in memory. Nevertheless, as

with TINY, we prefer also to reference these variables by offset from a register. We do this by keeping a fixed register, which we call the gp, and which always points at the maximum address. Since the TM simulator stores this address into location 0 before execution begins, the gp can be loaded from location 0 on start-up, and the following standard prelude initializes the runtime environment:

O: LD gp, 0(ac) * load gp with maxaddress

1: LDA fp, 0(gp) * copy gp to fp

2: ST ac, 0(ac) * clear location 0

Function calls also require that the beginning code location for their bodies be used in a calling sequence. We also prefer to call functions by performing a relative jump using the current value of the pc rather than an absolute jump. (This makes the code potentially relocatable.) The utility procedure emitRAbs in code.h/code.c can be used for this purpose. (It takes an absolute code location and relativizes it by using the current code generation location.)

For example, suppose we want to call a function f whose code begins at location 27, and we are currently at location 42. Then instead of generating the absolute jump

42: LDC pc, 27(*)

we would generate

42: LDA pc, -16(pc)

since 27 - (42 + 1) = -16.

The Calling Sequence A reasonable division of labor between caller and callee is to have the caller store the values of the arguments in the new frame and create the new frame, except for the storing of the return pointer in the retFO position. Instead of storing the return pointer itself, the caller leaves it in the ac register, and the callee stores it into the new frame. Thus, every function body must begin with code to store that value in the (now current) frame:

ST ac, retFO(fp)

This saves one instruction at each call site. On return, each function then loads the pc with this return address by executing the instruction

LD pc, retFO(fp)

Correspondingly, the caller computes the arguments one by one, and pushes each onto the stack in its appropriate position before pushing the new frame. The caller must also save the current fp into the frame at ofpFO before pushing the new frame. After a return from the callee, the caller then discards the new frame by loading the fp with the old fp. Thus, a call to a function of two parameters will cause the generation of the following code:

<code to compute first arg>

ST ac, frameoffset+initFO (fp)

<code to compute second arg>

ST ac, frameoffset+initFO-1 (fp)

ST fp, frameoffset+ofpFO (fp) * store current fp

LDA fp frameOffset(fp) * push new frame

LDA ac,l(pc) * save return in ac

LDA pc, . . .(pc) * relative jump to function entry

LD fp, ofpFO(fp) * pop current frame

Address Computations Since both variables and subscripted arrays are allowed on the lefthand side of assignment, we must distinguish between addresses and values during compilation. For example, in the statement

a[i] := a[i+l];

the expression a[i] refers to the address of a[i], while the expression a[i+l]

refers to the value of a at the i+1 location. This distinction can be achieved by using an isAddress parameter to the cGen procedure. When this parameter is true, cGen generates code to compute the address of a variable rather than its value. In the case of a simple variable, this means adding the offset to either the gp (in the case of a global variable) or the fp (in the case of a local variable) and loading the result into the ac:

LDA ac, off set (fp) * put address of local var in ac

In the case of an array variable, this means adding the value of the index to the base address of the array and loading the result into the ac, as described below.

Arrays Arrays are allocated on the stack beginning at the current frame offset and extending downward in memory in order of increasing subscript, as follows:

…

A [0]

A [1]

A [2]

etc.

…

<- base address of array A

allocated space for elements

of array A in the stack

Note that array locations are computed by subtracting the index value from the base address.

When an array is passed to a function, only the base address is passed. The allocation of the area for the base elements is done only once and remains fixed for the life of the array. Function arguments do not include the actual elements of an array, only the address. Thus, array parameters become reference parameters. This causes an anomaly when array parameters are referenced inside a function, since they must be treated as having their base addresses rather than their values stored in memory. Thus, an array parameter has its base address computed using an LD operation instead of an LDA.

4. A general outline of runtime memory

Below is one example of how your runtime memory could be organized. This is only an example and you may organize your runtime memory differently.

5. Code Generator I/O Structure

6. Sample Output

Object code output for three programs: euclid.mns, nfact.mns and quicksort.mns can be found by following this link.

7. What to hand in

The write-up should be approximately 2 pages excluding diagrams, tables and code. The assignment should have all of the following sections:

Description of the solution
- A description of the main functions.
- A description of how you used the registers
- A diagram and description of your runtime environment memory organization
Bugs In Semantic Analyzer
- List the bugs that were found by the marker in your Semantic Analyzer project
- State explicitly which of these bugs have been corrected and which bugs have not been corrected and why.
Bugs in Code Generator
Solutions to these Bugs in Code Generator
Description of Conventions
- State each error message and its meaning.
- State any other conventions that you may have produced.
Compiling and Running Cminus code
- State all the modified files in this section of the report.
- Show how you compile and run your Cminus code using descriptions and/or examples.
- Show how you run a compiled Cminus program using descriptions and/or examples.
Code
- State all the modified files in this section of the report.
- Include all code.

7. Code Generator Testing

The test code will only be made available after this part of the project has been marked, so that you can correct your code. It is your responsibility to develop your own tests and make sure this part of the project works. Note that you do not submit tests that you create yourself.

Code Generator (this link is only made active after you this part of the project has been marked)