Skunk, Explained As A Pipeline
This booklet is meant to be printed, annotated, and read slowly. It starts from the top-level story of the compiler, then follows one tiny Skunk program through parsing, checking, layouts, IR generation, and native build.
Who It Is For
Someone who is new to compiler architecture, new to LLVM, and wants to learn this codebase without being thrown into the deepest file first.
Best Way To Use It
Read one chapter, then open the linked file and inspect the exact function names mentioned there. This guide is a map, not a replacement for reading code.
How To Read This
The most important idea in this whole booklet is that Skunk is a pipeline. Each stage receives a program in one form and hands a more useful form to the next stage. If you always know which stage you are in, the compiler stops feeling like a pile of unrelated files.
The second important idea is that not every program exercises every pass equally. A tiny non-generic program will mostly glide through monomorphization. A generic one will make that pass much more interesting. That is normal.
maininsrc/main.rsload_programinsrc/source.rsprepare_programinsrc/monomorphize.rscheckinsrc/type_checker.rscompile_to_executableinsrc/compiler.rs
The Whole Pipeline
The high-level path through the compiler is short enough to memorize. That is helpful, because it lets you classify almost every file by its role in the bigger machine.
Loading, grammar, AST construction, and semantic checks live mostly here.
Monomorphization makes generic programs more concrete before later passes.
Layouts, lowering, runtime linkage, and native build happen here.
maininsrc/main.rscompile_to_llvm_irandcompile_to_executableinsrc/compiler.rs
Parsing And AST
Skunk parsing is split into two layers. src/grammar.pest describes which source forms are valid. src/ast.rs turns those grammar matches into the compiler's internal tree of Node values.
This matters because the rest of the compiler does not want to reason about raw strings. It wants to reason about named constructs like StructDeclaration, FunctionDeclaration, StructInitialization, and Access.
source text
"Point { x: 20, y: 22 }"
grammar match
recognized as a struct initialization
AST
Node::StructInitialization {
_type: Point,
fields: [("x", 20), ("y", 22)]
}
PestImpl::parseinsrc/ast.rscreate_astinsrc/ast.rscreate_primary,create_access, andcreate_struct_initinsrc/ast.rs
Modules And Normalization
src/source.rs is where the compiler stops thinking in terms of "one file the user opened" and starts thinking in terms of "one program the compiler can analyze."
The source loader resolves imports, validates module declarations, detects cycles, and uses the module normalizer to rename private symbols when needed. That makes later global passes much simpler.
This file turns many source files into one merged, safer program tree.
load_programinsrc/source.rsProgramLoader::load_fileandProgramLoader::module_pathinsrc/source.rsModuleNormalizer::normalizeinsrc/source.rs
Monomorphization
Generics are comfortable for programmers and inconvenient for backends. Skunk's answer is a preparation pass in src/monomorphize.rs that turns generic templates into concrete specialized program pieces when needed.
The pass is easiest to understand if you think in terms of recipes and finished dishes. A generic function is a recipe. A monomorphized concrete function is one finished dish for one concrete set of type arguments.
Gather generic templates and concrete declarations.
Figure out which concrete instances are actually needed.
Produce a prepared program with concrete declarations ready for later passes.
prepare_programinsrc/monomorphize.rsMonomorphizer::newandMonomorphizer::prepareinsrc/monomorphize.rsapply_substitutions,specialized_struct_name, andspecialized_function_nameinsrc/monomorphize.rs
Type Checking
The type checker is where the compiler shifts from "this parses" to "this is a legal Skunk program."
The public entry point is check. The most important recursive engine under it is resolve_type. It walks expressions, determines the type they produce, and validates whether the operations used are allowed.
One especially valuable helper in this file is resolve_access, because many language rules come together in access chains like self.x, ptr.*, slice[0], or window.draw_rect(...).
- The names used by the program exist.
- The operations on those names make sense.
- Assignments are legal.
- Returns match declared function types.
- Bounds and trait relationships are satisfied.
checkinsrc/type_checker.rsresolve_type,resolve_access, andis_assignableinsrc/type_checker.rsGlobalScope::addandSymbolTablesinsrc/type_checker.rs
LLVM, Layouts, And Runtime
The backend in src/compiler.rs is where language concepts become storage and instructions. Its own internal vocabulary is LlvmType.
This file also contains the layout structures that describe how values live in memory: StructLayout, EnumLayout, TraitLayout, and TraitMethodLayout.
compile_to_llvm_ir emits textual LLVM IR. Then compile_to_executable writes the IR to disk and invokes clang along with the runtime support files.
Describe memory shape so the backend knows where fields and payloads live.
Translate statements and expressions into LLVM instructions.
Pull in support code from runtime/ when the compiled program needs it.
LlvmTypeandllvm_typeinsrc/compiler.rscollect_struct_layouts,collect_enum_layouts, andcollect_trait_layoutsinsrc/compiler.rscompile_statement,compile_expr_with_expected,compile_struct_literal, andcoerce_exprinsrc/compiler.rscompile_to_llvm_irandcompile_to_executableinsrc/compiler.rs
Worked Example: One Tiny Program Through The Compiler
The best way to make the pipeline feel real is to trace one small program through it. Here is the example used in Part 2 of the notebook:
struct Point {
x: int;
y: int;
}
attach Point {
function sum(self): int {
return self.x + self.y;
}
}
function main(): int {
p: Point = Point { x: 20, y: 22 };
return p.sum();
}
Step 1: Parse It
The parser recognizes a struct declaration, an attach declaration, and a main function. The method body becomes a nested expression tree rather than a flat string.
Step 2: Load It
Because this example has no imports, load_program has little visible work to do. But it still wraps the result as one coherent program node.
Step 3: Prepare It
Because this example is non-generic, monomorphization mostly passes it through. That is a useful lesson in itself: not every pass dramatically changes every program.
Step 4: Type-Check It
The checker proves that Point exists, the fields are legal, the struct literal initializes valid fields with assignable types, and p.sum() returns an int.
Step 5: Build Layouts
StructLayout("Point")
field 0 -> x : i32
field 1 -> y : i32
Step 6: Emit LLVM IR
The backend lowers the struct literal, method body, and return path into LLVM IR. You do not need to master LLVM syntax to understand the shape: build a value, access its fields, add them, and return the result.
Step 7: Link The Binary
Finally the compiler writes a .ll file and asks clang to produce a native executable, linking runtime support as needed.
create_struct_initandcreate_accessinsrc/ast.rsresolve_accessandresolve_typeinsrc/type_checker.rscollect_struct_layouts,compile_struct_literal, andcompile_expr_with_expectedinsrc/compiler.rs
Extending Skunk
If Parts 1 and 2 teach you how to read the compiler, Part 3 teaches you how to change it. The key idea is to stop thinking of a feature as one edit and start thinking of it as a path through the pipeline.
Grammar, AST construction, and maybe tests are often enough for small syntax sugar features.
Type checking becomes central when the feature changes meaning, validity rules, or inferred types.
Backend lowering and native runtime support matter when the feature requires execution-time behavior.
- Start with one tiny example program.
- Decide whether the feature is syntax sugar or a new semantic kind of thing.
- Touch only the stages that actually need to know about it.
- Add parser, type-checker, and compiler/runtime tests as needed.
- Update docs and examples so the feature is teachable, not just implemented.
src/grammar.pestandsrc/ast.rsfor syntaxsrc/type_checker.rsfor meaning and rulessrc/compiler.rsandruntime/for execution behavior- Open Part 3 for the full extending guide
Recommended Reading Order
Read the compiler in this order if you want the architecture before the details:
src/main.rssrc/source.rssrc/ast.rssrc/type_checker.rssrc/compiler.rs
Then go deeper with:
src/grammar.pestsrc/monomorphize.rssrc/interpreter.rsruntime/skunk_runtime.cruntime/skunk_window_runtime.m
How To Contribute Without Getting Lost
Do not try to understand every file before changing anything. Pick one feature, identify which stage first sees it, and trace only the stages that need to know about it.
A good beginner rhythm is:
- Start with one tiny example program.
- Find its syntax in the grammar and AST.
- See how the type checker validates it.
- See how the backend lowers it.
- Add or update a focused test.
The markdown versions of this guide are here: Part 1, Part 2, and Part 3.