ECE 466 - Compilers


Syllabus

Please read the syllabus which contains information about suggested text references and course expectations!

The definitive reference for the C language is the ISO standard. The 1999 standard introduced a number of language extensions, some of which we may chose to ignore in class. A working draft of that standard was openly available and a copy may be downloaded here. A more recent revision known as "C11" which was adopted in 2011 is here

In this course, we will be building a heavily simplified compiler for C-99. In fact, with all the features that will become optional, the standard will be closer to C-89. Therefore, any of the standards above can be used for reference. There will be small details that affect backward compatability and we can discuss these in class.

Lecture Notes and Assignments




Unit 1 - Intro/Lexical

Topics include: overview of the compilation process, front-end vs back-end, the role of lexical, syntactical and semantic analysis, regular expressions, finite automata construction, lex/flex programs.
Lecture Notes (PS).

Suggested readings and activities


* Read the documentation for flex. Try out some examples.
* Read section 6.4 (Lexical Elements) of the C language standard. And/or read Chapter 2 of Harbison and Steele.
* Read Chapter 3 of the dragon book. Don't bother reading chapters 1 and 2; they will just confuse you.

Assignment 1 - Lexer

The first assignment is to write the lexical analysis phase of the C compiler. To stay on track, this should be completed by ~ Feb 7 Assignment sheet (updated 1/23) format)
Lecture Notes 1
Support files

Unit 2 - Syntactical Analysis (Parsing)

Topics: context-free grammars, derivations and parse trees, LL and LR grammars, top-down and bottom-up parsing, overview of yacc/bison. Ambiguous and non-LALR grammars and resolution, precedence/associativity. Embedded semantic actions, synthesized vs inherited attributes.

Suggested study


* Read the entire Bison manual.
* Read Chapters 4 and 5 of the Dragon Book.
* Read Chapter 7 (Expressions) in H&S. And/or read section 6.5 of the C standard.
Lecture notes 2 (revised/corrected 2/7)

Assignment 2 - Expression Parsing

This is the first exploration of Bison. You will write an interpreter which understands and evaluates the C expression syntax (limited subset) and transforms it into Abstract Syntax Trees. Target for completion: Feb 21
Assignment 2 worksheet
some test cases and sample output from test cases

Remember, these test cases are not a comprehensive test suite! Your output does not have to match exactly.



Unit 3 -- Error handling and memory allocation

Topics: Errors vs warnings, error recovery, error reporting, memory allocation strategies within the compiler.
Lecture notes 3 (upd 2/13)(/A>


Unit 4 -- Semantic Analysis

Topics: compile-time vs run-time, syntax-directed translation, abstract syntax trees, type systems, symbol tables and scoping rules, feedback between lexer, parser and semantic stages, overview of C language type and namespace issues.
Lecture notes 4 (upd 3/7)(

Suggested Readings


* Sections 6.2 and 6.7 of the C language standard.
* H & S, chapters 4 and 5.
Assignment #3: Parsing declarations, representing types, managing symbol tables. There are some files here that you might find useful in testing your code. They are not intended as exhaustive compliance tests. As with previous test cases, your output doesn't have to match my sample output exactly. Target: March 14


Assignment #4: Completion of the AST to include all statements. There are a few test cases along with example AST output. As always, these are examples only and there is no requirement that your output match up exactly. Assignment #4 PDF Target: April 4

Unit 5 -- Intermediate Representations

Graphical vs linear IR, internal vs external IR. 1-address and 3-address linear IRs. Overview of IR approaches of popular languages including Perl, Python, Java, PHP. Overview of LLVM IR. Techniques for generating IR from AST.
Lecture notes (updated 3/29)

Assignment 5 -- Generation of Quads

Having completed AST generation and declaration/symbol handling for the entire language, it is now time to enter the final phase "how do we do it?" Target: April 25
Assignment #5 PDF


Unit 6 -- Architecture-Neutral IR Optimization

Theory of optimization at the quad level. Examples of several optimization algorithms.
Lecture notes (Revised Apr 5)


Unit 7 -- Target Environment

We'll explore the X86 assembly language in some depth, with special attention to variables, addressing modes, and function calling/register usage conventions. If time permits, we'll contrast this CISC architecture with a RISC architecture.
Lecture notes


Unit 8 -- Target Code Generation

Overview of Instruction selection and Register Allocation. Challenges in one- and two-address architectures.
Lecture notes

Assignment 6 -- Generation of Quads

It's the final curtain!All work must be submitted by the last day of the semester, May 12
Assignment #6 PDF
Here are some resources to help you complete this last assignment: Note: these are old, I'm looking for more recent versions
Intel X86 reference manual (over 2000 pages!) Note that the instruction set is presented according to Intel syntax, which differs from the "UNIX" or "AT&T" syntax which is used on Solaris, Linux and BSD systems.
Sun Microsystems x86 Assembly Language Reference This booklet presents an overview of the X86 assembler in UNIX notation and provides a translation of opcode names and addressing mode nomenclature between UNIX and Intel formats. It does not describe the individual instructions in detail
SPARC Assembly Reference Manual Provides an overview of the SPARC V8 and V9 assemblers. Doesn't give a detailed explanation of each instruction but there is a summary of all instructions and addressing modes
SPARC Architecture Manual Provides detailed coverage of the SPARC V9 architecture including operation of the processor, register windowing, addressing modes, opcodes and instruction formats. This manual was written by the SPARC International group and as such is OS and assembler agnostic. But unlike the Intel reference manuals, the notation used follows the UNIX standard very closely.