toy programming language
implementation for a simple compiler, follow the video list on youtube
source:
- [play list] https://www.youtube.com/channel/UCkyrFlOF7I6U-IMUxum9HrA/videos
- [first video] https://www.youtube.com/watch?v=xqcd6hZD6MY
This document looks messy at the moment. I will revise it with the learning progress.
"premature optimization is the root of all evil".
implemented in basic.py
Lexer = code -> lexer -> Tokens
if xxx:
xxx:
else:
xxx
The compiler read the source code from a file as a large string shows below
if xxx: \n xxx \n else: xxx
tokenizer -> split the code into a list of token
token: lexical unit => <token-name, attribute-value>
实现目标算术器:
- 1 + 1 - 2 + 4 = 4
- (1 + 1) * 2 = 4
- 1 *+ 1 = Error
implemented in grammar.py
Usage
python main.py
tokens -> AST [] -> 递归 -> tree
文法: 语言的规则,toypl
BNF(巴科斯范式) context-free-grammar, 上下文无关文法,可以递归嵌套
Grammar
expr -> term (( PLUS | MINUS ) term)*
term -> factor (( MUL | DIV ) factor)*
factor -> INT | FLOAT
-> (PLUS | MINUS ) factor
-> LPAREN expr RPAREN
解释:
- ‘*’ 表示匹配0或者多次
- expr -> term (( PLUS | MINUS ) term)*
- exp表达式,包括两部分
- 表达式1:一个term表达式,
- 表达式2:一个操作(PLUS或者MINUS)在连接一个表达式term
- 表达式2可以出现0次或者多次
- term -> factor ( MUL | DIV ) factor)*
- term表达式,包含两部分
- 表达式:一个factor表达式
- 表达式2:( MUL | DIV ) factor)*
- factor -> INT | FLOAT
-> (PLUS | MINUS ) factor
-> LPAREN expr RPAREN
- 可以有一个int或者float
- 可以有一个PLUS | MINUS
- 可以有 '(' + 一个expr表达式 + ')'
Principle
- token parser -> RTResult(store the result from interpreter)
- Execution for Interpreter: do correspond operation based on the properties from Ast Node中
For the BinOp, such as 1+1
left node -> 1,
right node -> 1
we execute + operation for left node and right node