Team

Project description

This project is a spin-off of the project we are working on in the compiler construction class: Boa. While working on this project, we experienced difficulties in the parser implementation that were caused by the nature of the input language. Consider the following code example:

def sign(i: int) -> int:

if i < 0:
return -1
elif i == 0:
return 0
else:
return 1

Nested structures are not defined by BEGIN and END tokens, but by the level of indentation of statements. This feature can be found in several other programming languages around:

The problem with this kind of languages is that they apparently cannot be generated by a context-free grammar. The languages Python and Boa solve the problem by generating special tokens INDENT and DEDENT that have much the same meaning as BEGIN and END. This is done by adding a special compiler component that is either placed before or after the scanner. In any case, it must generate the tokens needed by the parser and thus pass over tokens to it.

However, no general description and parsing mechanism exists for these languages. Instead, compilers include some algorithms to perform this dirty work:

The objective of this project is to analyze these kind of languages in a more generic way and to define some automaton or machine that is able to handle them.

Outline:

Files