Skip to main content
Back to all seeds
DataLive2026

MuriukiDB

Custom relational database engine built from scratch — hand-written SQL lexer, recursive-descent AST parser, query executor with validation, and an in-memory B-Tree with node splits. Submitted to the Pesapal Junior Dev Challenge 2026 as a job application.

TypeScriptReact 18ViteSupabaseCustom SQL parserB-Tree indexing

What it is

A working relational database engine written from scratch in TypeScript — hand-written SQL lexer, recursive-descent AST parser, query executor, and an in-memory B-Tree with node splits. Wraps a terminal-style REPL with syntax highlighting and ships as an installable PWA. Submitted to the Pesapal Junior Dev Challenge 2026 as a job application; not a competition entry.

The problem

Most full-stack engineers can wire up Postgres, but few can explain the path from text to result. I wanted to prove I could build the whole stack — lexer, parser, executor, indexing — at depth, not just describe it.

What I built

The pipeline

Source string → Lexer (token stream) → Parser (AST) → Validator → Executor (against the in-memory store + B-Tree index) → Result. Each stage has its own test suite and can be exercised in isolation from the REPL.

The B-Tree

Hand-written with node splits and rebalancing on insert. Supports range scans for indexed columns. Keys are typed (INTEGER, TEXT, REAL, BOOLEAN, DATE).

The REPL

Terminal-style input with syntax highlighting (custom tokenizer reusing the lexer's output), command history, and result tables that respect column type.

Persistence + auth

Supabase backs user accounts and per-user database state so saved tables survive across sessions; the engine itself is in-memory while the REPL is open.

Engineering decisions

Why recursive descent over a parser generator

Recursive descent forces you to understand the grammar. A generator would have been faster but would hide the most interesting part — how SELECT, WHERE, JOIN, and ORDER BY actually compose.

Why B-Tree over hash index

B-Trees support range queries. Hash gives you O(1) point lookups but breaks on `WHERE id BETWEEN ...` — the kind of query a portfolio reviewer is most likely to ask about.

Why TypeScript instead of a systems language

The point was to demonstrate engineering depth, not maximum throughput. TypeScript meant the focus stayed on the algorithm and its tests; readers can audit the lexer in their browser.

What I'd do differently

The engine is in-memory only — no persistence across sessions, no ACID, last-write-wins concurrency, no nested SELECT subqueries. A v2 should land WAL-style persistence and proper transaction semantics before adding more SQL surface area.