UNIVERSITY OF NORTHERN BRITISH COLUMBIA

Winter 2002
Computer Science


            Semantic Analyzer

                                                        

1. General points

  • The major task of semantic analysis is the gathering of information in the symbol table and performing semantic analysis (i.e. type checking, scope checking, et cetera) upon the symbol table. At the same time, you annotate the leaves of your parse tree (such as the computation of constant expressions to optimize the compiler).
  • The basic ideology of semantic analysis is as you traverse the parse tree1:
    • Add information to the symbol table.
    • Annotate your syntax tree.
    • Check to make sure that the semantic rules are followed.
  • It will be necessary to determine what type of data structure to use as the symbol table but it should have constant-time insert operation, lookup and delete operations that are linear time in the size of the list, and be efficient .
    • Binary search trees, AVL trees, B trees and hash tables are all acceptable choices but  a hash table is recommended because all three operations can be performed in almost constant time
  • If you use a hash table for your symbol table things to consider are:
    • choose a prime number for the size (i.e. 211)
    • scope must be incorporated into your hash table (i.e. separate tables, each with a different scope level could be chained together or you could incorporate the scope within the data types stored in the hash table)
  • Information you may want to include in your hash table is:
    • type (i.e. int or void)
    • declaration name
    • location (just set to a default value for now, you will use this later in code generation)
    • block number
    • scope name (global or main)
    • line numbers (list of line numbers where it is referenced)
  • It is important to note:
    • Cminus uses static scoping
    • Two functions "int input(void) {....}" and "void output(int x) {...}" are considered to be predefined in the global environment
  • Besides producing an annotated syntax tree and symbol table, a major purpose of the semantic analysis for a compiler is to be able to recognize errors and to output meaningful error messages. After encountering and error, compilation should be continued without cascading the error.

1 Your should only output one symbol table and an annotated parse  tree

      
2. Suggested Readings

  1. Chapter #6


 

            

3. Semantic Rules

For each of these grammar rules we give a short explanation of the associated semantics.  

1. program -> declaration-list

2. declaration-list -> declaration-list declaration | declaration

3. declaration -> var-declaration | fun-declaration

A program consists of a list (or sequence) of declarations, which may be function or variable declarations, in any order. There must be at least one declaration. Semantic restrictions are as follows (these do not occur in C). All variables and functions must be declared before they are used (this avoids backpatching references). The last declaration in a program must be a function declaration of the form void main (void). Note that C- lacks prototypes, so that no distinction is made between declarations and definitions (as in C).  

4. var-declaration -> type-specifier ID ; | type-specifier ID [ NUM ] ;

5. type-specifier -> int | void

A variable declaration declares either a simple variable of integer type or an array variable whose base type is integer, and whose indices range from 0..NUM- 1. Note that in C- the only basic types are integer and void. In a variable declaration, only the type specifier int can be used. Void is for function declarations (see below). Note, also, that only one variable can be declared per declaration.

6. fun-declaration -> type-specifier ID ( params ) compound-stmt

7. params -> param-list | void

8. param-list -> param-list , param | param

9. param -> type-specifier ID | type-specifier ID [ ]

A function declaration consists of a return type specifier, an identifier, and a comma-separated list of parameters inside parentheses, followed by a compound statement with the code for the function. If the return type of the function is void, then the function returns no value (i.e., is a procedure). Parameters of a function are either void (i.e., there are no parameters) or a list representing the function's parameters. Parameters followed by brackets are array parameters whose size can vary. Simple integer parameters are passed by value. Array parameters are passed by reference (i.e., as pointers) and must be matched by an array variable during a call. Note that there are no parameters of type "function". The parameters of a function have scope equal to the compound statement of the function declaration, and each invocation of a function has a separate set of parameters. Functions may be recursive (to the extent that declaration before use allows).

10. compound-stmt -> { local-declarations statement-list }

      A compound statement consists of curly brackets surrounding a set of declarations and statements. A compound statement is executed by executing the statement sequence in the order given. The local declarations have scope equal to the statement list of the compound statement and supersede any global declarations.

11. local-declarations -> local-declarations var-declaration | empty

12. statement-list -> statement-list statement | empty

Note that both declarations and statement lists may be empty. (The nonterminal empty stands for the empty string, sometimes written as ,.)

13. statement -> expression- stmt

| compound-stmt

| selection-stmt

| iteration-stmt

| return-stmt

14. expression-stmt -> expression ; | ;

An expression statement has an optional expression followed by a semicolon. Such expressions are usually evaluated for their side effects. Thus, this statement is used for assignments and function calls.

15. selection-stmt -> if ( expression ) statement

| if ( expression ) statement else statement

The if-statement has the usual semantics: the expression is evaluated; a nonzero value causes execution of the first statement; a zero value causes execution of the second statement, if it exists. This rule results in the classical dangling else ambiguity, which is resolved in the standard way: the else part is always parsed immediately as a substructure of the current if (the "most closely nested" disambiguating rule).

16. iteration-stmt -> while ( expression ) statement

The while-statement is the only iteration statement in C-. It is executed by repeatedly evaluating the expression and then executing the statement if the expression evaluates to a nonzero value, ending when the expression evaluates to 0.

17. return-stmt -> return ; | return expression ;

A return statement may either return a value or not. Functions not declared void must return values. Functions declared void must not return values. A return causes transfer of control back to the caller (or termination of the program if it is inside main).

18. expression -> var = expression | simple-expression

19. var -> ID | ID [ expression ]

An expression is a variable reference followed by an assignment symbol (equal sign) and an expression, or just a simple expression. The assignment has the usual storage semantics: the location of the variable represented by var is found, then the subexpression to the right of the assignment is evaluated, and the value of the subexpression is stored at the given location. This value is also returned as the value of the entire xpression. A var is either a simple (integer) variable or a subscripted array variable. A negative subscript causes the program to halt (unlike C). However, upper bounds of subscripts are not checked. Vars represent a further restriction of C- from C. In C the target of an assignment must be an 1-value, and l-values are addresses that can be obtained by many operations. In C- the only 1-values are those given by the var syntax, and so this category is checked syntactically, instead of during type checking as in C. Thus, pointer arithmetic is forbidden in C-.

20. simple-expression -> additive-expression relop additive-expression

| additive-expression

21. relop -> <= | < | > | >= | == | !=

A simple expression consists of relational operators that do not associate (that is, an unparenthesized expression can only have one relational operator). The value of a simple expression is either the value of its additive expression if it contains no relational operators, or 1 if the relational operator evaluates to true, or 0 if it evaluates to false.

22. additive-expression -> additive-expression addop term | term

23. addop -> + | -

24. term -> term mulop factor | factor

25. mulop -> * | /

Additive expressions and terms reprerent the typical associativity and precedence of the arithmetic operators. The / symbol represents integer division; that is, any remainder is truncated.

26. factor -> ( expression ) | var | call | NUM

A factor is an expression enclosed in parentheses, a variable, which evaluates to the value of its variable; a call of a function, which evaluates to the returned value of the function; or a NUM, whose value is computed by the scanner. An array variable must be subscripted, except in the case of an expression consisting of a single ID and used in a function call with an array parameter (see below).

27. call -> ID ( args )

28. args -> arg-list | empty

29. arg-list -> arg-list , expression | expression

  A function call consists of an ID (the name of the function), followed by parentheses enclosing its arguments. Arguments are either empty or consist of a comma separated list of expressions, representing the values to be assigned to parameters during a call. Functions must be declared before they are called, and the number of parameters in a declaration must be equal the number of arguments in a call. An array parameter in a function declaration must be matched with an expression consisting of a single identifier representing an array variable.

  Finally, the above rules give no input or output statement. We must include such functions in the definition of C-, since unlike C, C- has no separate compilation or linking facilities. We, therefore, consider two functions to be predefined in the global environment, as though they had the indicated declarations:

int input (void) { … }

void output(int x) { … }

 

  The input function has no parameters and returns an integer value from the standard input device (usually the keyboard). The output function takes one integer parameter, whose value it prints to the standard output (usually the screen), together with a newline.

 

 

4. Semantic Analyzer structure

 

 

   
5. Sample Output


The semantic analyzer output must contain at least the functionality displayed. Feel free to make additions that you deem necessary. Examples of the output for the programs testblock.mns, euclid.mns, and quicksort.mns can be found by following this link.

6. What to hand in


The write-up should be approximately 3 pages excluding diagrams, tables and code. The assignment should have all of the following sections:

  1. Description of the solution
    • A description of the main functions.
    • A description of the symbol table data structure and how it was created
    • A list of all optimizations implemented.
  2. Bugs In Parser
    • List the bugs that were found by the marker in your Parser project
    • State explicitly which of these bugs have been corrected and which bugs have not been corrected and why.
  3. Bugs in Semantic Analyzer
  4. Solutions to these Bugs in Semantic Analyzer
  5. Description of Conventions
    • State each error message and its meaning (this is an important part of your write-up)
    • State any other conventions that you may have produced.
  6. Compiling and Running Cminus code
    • State all the modified files in this section of the report.
    • Show how you compile and run your Cminus code using descriptions and/or examples.
    • Show how you run a compiled Cminus program using descriptions and/or examples.
  7. Code
    • State all the modified files in this section of the report.
    • Include all code.

 

7. Semantic Analyzer Testing

The test code will only be made available after this part of the project has been marked, so that you can correct your code. It is your responsibility to develop your own tests and make sure this part of the project works. Note that you do not submit tests that you create yourself.

Semantic Analyzer (this link is only made active after you this part of the project has been marked)