Notes

Part 1 - Introduction and Setting up REPL

Used ChatGPT/Claude to understand what happens to the query input when it goes to the front-end. I summarized the LLMs' output to help myself grasp the concepts learned.

REPL: Read-Eval-Print-Loop

Front-end here refers to the user interface or application layer that enables users or other software to interact with the SQLite database.

The output of the frontend is SQLite VM bytecode that the back-end will take in as instructions.

Front-end consists of:

tokenizer - responsible for breaking down input SQL query into sequence of tokens. The tokenizer scans the input character by character and identifies keywords, identifiers, literals, and other syntactic elements. It outputs a stream of tokens that the parser will use.
parser - takes a stream of tokens and constructs a syntactic tree or parse tree. The parser enforces grammer rules of the SQL language. If the query is syntactically correct, the parser generates a parse tree that represents the query's structure
code generator - takes the parse tree and produces a set of instructions for the DB engine to follow in order to fulfill the query. The code generator transforms the parse tree into an intermediate presentation or machine code that can be executed by the DB engine. This involves optimizing the execution plan, considering factors such as indexes, joins, etc.

Back-end consists of:

virtual machine - takes bytecode generated by the front-end as instructions. It can then perform operations on one or more tables or indexes, each of which is stored in a data structure called a B-tree. The VM is essentially a big switch statement on the type of bytecode instruction.
B-tree - known as a Balance Tree, is a data structure used to organize and store data. Each B-tree consists of many nodes. Each node is one page in length. The B-tree can retrieve a page from disk or save it back to disk by issuing commands to the pager.
pager - receives commands to read or write pages of data. It is responsible for reading/writing at appropriate offsets in the database file. It also keeps a cache of recently-accessed pages in memory, and determines when those pages need to be written back to disk.
os interface - is the layer that differs depending on which operating system sqlite was compiled for. In this tutorial, I'm not going to support multiple platforms.

Part 2 - World's Simplest SQL Compiler and Virtual Machine

Let's say the "front-end" of sqlite is a SQL compiiler that parses a string and outputs bytecode. This byte code is passed to the VM, which executes it.

We broke things down into 2 steps to reduce complexity and allow compiling common queries once and caching the bytecode for improved performance.

Non-SQL statements are called meta-commands. They all start with a dot and we handled them in a separate function.

In this section we added support for two new keywords: select and insert.

We also added some enums for response codes to handle errors or to tell us if our operation succeeded.

Here's some logic we added to our sqlite front-end:

add a step that conversts line of input into internal representation of a statement (prepare_statement)
pass the response of prepare_statement to execute_statement which will eventually become our VM

We laid out some of the skeleton for the DB so far, in the next part we will implement insert and select.

Part 3 - An In-Memory, Append-Only, Single-Table Database

To start small, here are some some limitations for our DB:

support two operations: inserting a row and printing all rows
reside only in memory (no persistence to disk)
support a single, hard-coded table

We will upgrade prepare_statement to parse arguments and store those parsed arguments in a new Row data structure.

We need to also copy that data into some data structure that represents a table. SQLite uses B-Tree for fast lookups, inserts and deletes. To simplify, we will arrange pages of data as an array.

Plan:

store rows in blocks of memory called pages
each page stores as manuy rows as it can fit
rows are serialiazed into a compact representation with each page
Pages are only allocated as needed
Keep a fixed-size array of pointers to pages

Page will be 4kb because that's the same size as a page used in virtual memory systems of most computer architectures. One page in our DB corresponds to one page used by the OS. The OS will move pages in an out of memory as whole units instead of breaking them up.

In this section we enable our clone to actually insert and save data into memory. We establish Row and Table structure and created some helper methods to handle data storage and retrieval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NOTES.md

NOTES.md

Part 1 - Introduction and Setting up REPL

Part 2 - World's Simplest SQL Compiler and Virtual Machine

Part 3 - An In-Memory, Append-Only, Single-Table Database

Files

NOTES.md

Latest commit

History

NOTES.md

File metadata and controls

Part 1 - Introduction and Setting up REPL

Part 2 - World's Simplest SQL Compiler and Virtual Machine

Part 3 - An In-Memory, Append-Only, Single-Table Database