ECE 466 - Compilers


Syllabus

Please read the syllabus which contains information about suggested text references and course expectations!

Lecture Notes and Assignments


The 2023 lecture notes and assignments are posted here. Some of these will be updated.



Unit 1 - Intro/Lexical

Topics include: overview of the compilation process, front-end vs back-end, the role of lexical, syntactical and semantic analysis, regular expressions, finite automata construction, lex/flex programs.
Lecture Notes .

Suggested readings and activities


* Read the documentation for flex. Try out some examples.
* Read section 6.4 (Lexical Elements) of the C language standard. And/or read Chapter 2 of Harbison and Steele.
* Read Chapter 3 of the dragon book. Don't bother reading chapters 1 and 2; they will just confuse you.

Assignment 1 - Lexer

The first assignment is to write the lexical analysis phase of the C compiler.
Assignment sheet
Support files

Unit 2 - Syntactical Analysis (Parsing)

Topics: context-free grammars, derivations and parse trees, LL and LR grammars, top-down and bottom-up parsing, overview of yacc/bison. Ambiguous and non-LALR grammars and resolution, precedence/associativity. Embedded semantic actions, synthesized vs inherited attributes.

Suggested study


* Read the entire Bison manual.
* Read Chapters 4 and 5 of the Dragon Book.
* Read Chapter 7 (Expressions) in H&S. And/or read section 6.5 of the C standard.
Lecture notes 2

Assignment 2 - Expression Parsing

This is the first exploration of Bison. You will write an interpreter which understands and evaluates the C expression syntax (limited subset) and transforms it into Abstract Syntax Trees. Target for completion: Feb 21
Assignment 2 worksheet
some test cases and sample output from test cases

Remember, these test cases are not a comprehensive test suite! Your output does not have to match exactly.



Unit 3 -- Error handling and memory allocation

Topics: Errors vs warnings, error recovery, error reporting, memory allocation strategies within the compiler.
Lecture notes 3


Unit 4 -- Semantic Analysis

Topics: compile-time vs run-time, syntax-directed translation, abstract syntax trees, type systems, symbol tables and scoping rules, feedback between lexer, parser and semantic stages, overview of C language type and namespace issues.
Lecture notes 4Updated 2/26/2025

Suggested Readings


* Sections 6.2 and 6.7 of the C language standard.
* H & S, chapters 4 and 5.
Assignment #3: Parsing declarations, representing types, managing symbol tables. There are some files here that you might find useful in testing your code. They are not intended as exhaustive compliance tests. As with previous test cases, your output doesn't have to match my sample output exactly.


Assignment #4: Completion of the AST to include all statements. There are a few test cases along with example AST output. As always, these are examples only and there is no requirement that your output match up exactly. Assignment #4 PDF

Unit 5 -- Intermediate Representations

Graphical vs linear IR, internal vs external IR. 1-address and 3-address linear IRs. Overview of IR approaches of popular languages including Perl, Python, Java, PHP. Overview of LLVM IR. Techniques for generating IR from AST.
Lecture notes Updated 3/26/2025

Assignment 5 -- Generation of Quads

Having completed AST generation and declaration/symbol handling for the entire language, it is now time to enter the final phase "how do we do it?"
Assignment #5 PDF


Unit 6 -- Architecture-Neutral IR Optimization

Theory of optimization at the quad level. Examples of several optimization algorithms.
Lecture notes revised 4/21/25


Unit 7 -- Target Environment

We'll explore the X86 assembly language in some depth, with special attention to variables, addressing modes, and function calling/register usage conventions. If time permits, we'll contrast this CISC architecture with a RISC architecture.
Lecture notes revised 5/9/25


Unit 8 -- Target Code Generation

Overview of Instruction selection and Register Allocation. Challenges in one- and two-address architectures.
Lecture notes revised 5/7/25

Assignment 6 -- Generation of Quads

All work must be submitted by May 15.
Assignment #6 PDF

C Language Standards

In this course, we'll be constructing a rudimentary compiler for the C language. We won't have time to implement the entire C language standard, but we'll do enough that our compiler can generate some demonstrative results.

This raises the question of what is the C standard? The original definition of C was known as K&R after the authors of the language, Kernighan and Ritchie. It was not a formalized standard. In 1989, an ANSI standard was created, which later became an ISO standard. This is known as C-89. Another significant revision was made in C-99. The C-11 standard was mostly concerned with multi-threading support and library functions, which do not concern us in this course. The C-18 standard (sometimes called C-17) made minor corrections to the C-11 standard. Finally, C-23 has introduced a number of fairly minor changes to the language itself (such as the creation of true and false keywords) and deprecation of some legacy K&R syntax.

For this course, we can implement a compiler which is somewhere between C-89 and C-99 and that is more than enough work! Now, another problem with these standards is that the final version is behind a paywall. But, working draft versions are released which are almost identical to the final versions. I've gathered some of these for you:

  • C-99 working draft
  • C-11 working draft
  • C-23 working draft


    Assembly/Processor Standards


    Intel Architecture Volume 1A breezy 500 page introduction
    Intel 32 and 64 bit Instruction Set Reference Manual 2605 pages of easy reading!
    Intel System Programming ReferenceThis material would be more of interest for Operating Systems
    Note that the instruction set is presented according to Intel syntax, which differs from the "UNIX" or "AT&T" syntax which is used on Solaris, Linux and BSD systems.
    Sun Microsystems x86 Assembly Language Reference This booklet presents an overview of the X86 assembler in UNIX notation and provides a translation of opcode names and addressing mode nomenclature between UNIX and Intel formats. It does not describe the individual instructions in detail
    SPARC Assembly Reference Manual Provides an overview of the SPARC V8 and V9 assemblers. Doesn't give a detailed explanation of each instruction but there is a summary of all instructions and addressing modes
    SPARC Architecture Manual Provides detailed coverage of the SPARC V9 architecture including operation of the processor, register windowing, addressing modes, opcodes and instruction formats. This manual was written by the SPARC International group and as such is OS and assembler agnostic. But unlike the Intel reference manuals, the notation used follows the UNIX standard very closely.