Notes
Used ChatGPT/Claude to understand what happens to the query input when it goes to the front-end. I summarized the LLMs' output to help myself grasp the concepts learned.
REPL: Read-Eval-Print-Loop
Front-end here refers to the user interface or application layer that enables users or other software to interact with the SQLite database.
- The output of the frontend is SQLite VM bytecode that the back-end will take in as instructions.
Front-end consists of:
- tokenizer - responsible for breaking down input SQL query into sequence of tokens. The tokenizer scans the input character by character and identifies keywords, identifiers, literals, and other syntactic elements. It outputs a stream of tokens that the parser will use.
- parser - takes a stream of tokens and constructs a syntactic tree or parse tree. The parser enforces grammer rules of the SQL language. If the query is syntactically correct, the parser generates a parse tree that represents the query's structure
- code generator - takes the parse tree and produces a set of instructions for the DB engine to follow in order to fulfill the query. The code generator transforms the parse tree into an intermediate presentation or machine code that can be executed by the DB engine. This involves optimizing the execution plan, considering factors such as indexes, joins, etc.
Back-end consists of:
- virtual machine - takes bytecode generated by the front-end as instructions. It can then perform operations on one or more tables or indexes, each of which is stored in a data structure called a B-tree. The VM is essentially a big switch statement on the type of bytecode instruction.
- B-tree - known as a Balance Tree, is a data structure used to organize and store data. Each B-tree consists of many nodes. Each node is one page in length. The B-tree can retrieve a page from disk or save it back to disk by issuing commands to the pager.
- pager - receives commands to read or write pages of data. It is responsible for reading/writing at appropriate offsets in the database file. It also keeps a cache of recently-accessed pages in memory, and determines when those pages need to be written back to disk.
- os interface - is the layer that differs depending on which operating system sqlite was compiled for. In this tutorial, I'm not going to support multiple platforms.
Let's say the "front-end" of sqlite is a SQL compiiler that parses a string and outputs bytecode. This byte code is passed to the VM, which executes it.
We broke things down into 2 steps to reduce complexity and allow compiling common queries once and caching the bytecode for improved performance.
Non-SQL statements are called meta-commands. They all start with a dot and we handled them in a separate function.
In this section we added support for two new keywords: select
and insert
.
We also added some enums for response codes to handle errors or to tell us if our operation succeeded.
Here's some logic we added to our sqlite front-end:
- add a step that conversts line of input into internal representation of a statement (
prepare_statement
) - pass the response of
prepare_statement
toexecute_statement
which will eventually become our VM
We laid out some of the skeleton for the DB so far, in the next part we will implement insert
and select
.
To start small, here are some some limitations for our DB:
- support two operations: inserting a row and printing all rows
- reside only in memory (no persistence to disk)
- support a single, hard-coded table
We will upgrade prepare_statement
to parse arguments and store those parsed arguments in a new Row
data structure.
We need to also copy that data into some data structure that represents a table. SQLite uses B-Tree for fast lookups, inserts and deletes. To simplify, we will arrange pages of data as an array.
Plan:
- store rows in blocks of memory called pages
- each page stores as manuy rows as it can fit
- rows are serialiazed into a compact representation with each page
- Pages are only allocated as needed
- Keep a fixed-size array of pointers to pages
Page will be 4kb because that's the same size as a page used in virtual memory systems of most computer architectures. One page in our DB corresponds to one page used by the OS. The OS will move pages in an out of memory as whole units instead of breaking them up.
In this section we enable our clone to actually insert and save data into memory. We establish Row and Table structure and created some helper methods to handle data storage and retrieval.