RFC 009: Sized Integer Types & Builtin Type Registry¶
Status: Planned
Created: 2024-12-11
Updated: 2024-12-23
Summary¶
Add explicit sized integer and float types to Incan, enabling precise control over numeric representation for FFI, binary protocols, and memory-sensitive applications.
This RFC also introduces a Builtin Type Registry infrastructure (Phase 0) that allows incan_core::lang
registries to be the source of truth for builtin type methods, eliminating hardcoded method lookups scattered across
the compiler.
Motivation¶
Currently, Incan has only two numeric types:
int→i64(64-bit signed)float→f64(64-bit float)
This simplicity is great for general-purpose code, but insufficient for:
- FFI / Rust Interop — Calling Rust functions that expect specific sizes
- Binary Protocols — Parsing network packets, file formats (PNG, MP3, etc.)
- Memory Efficiency — Large arrays where
i8vsi64matters (8x difference) - Hardware Interfaces — GPIO pins, registers, embedded systems
- Cryptography — Exact bit-width operations
Example Problem¶
# Current: Can't express this cleanly
from rust::std::net import TcpStream
# TcpStream::connect expects &str, but port is u16 internally
# How do we work with u16 port numbers?
Design¶
New Types¶
| Incan Type | Rust Type | Description |
|---|---|---|
i8 |
i8 |
8-bit signed (-128 to 127) |
i16 |
i16 |
16-bit signed |
i32 |
i32 |
32-bit signed |
i64 |
i64 |
64-bit signed (same as int) |
i128 |
i128 |
128-bit signed |
u8 |
u8 |
8-bit unsigned (0 to 255) |
u16 |
u16 |
16-bit unsigned |
u32 |
u32 |
32-bit unsigned |
u64 |
u64 |
64-bit unsigned |
u128 |
u128 |
128-bit unsigned |
f32 |
f32 |
32-bit float |
f64 |
f64 |
64-bit float (same as float) |
isize |
isize |
Pointer-sized signed |
usize |
usize |
Pointer-sized unsigned |
Type Aliases (Preserved)¶
The existing int and float remain as aliases:
# These are equivalent
x: int # Preferred for general code
x: i64 # Explicit when size matters
y: float # Preferred for general code
y: f64 # Explicit when size matters
Literals¶
Without Suffix (Default)¶
Unsuffixed literals infer type from context, defaulting to int/float:
let x = 42 # int (i64)
let y = 3.14 # float (f64)
let z: i32 = 42 # i32 (from annotation)
With Suffix (Explicit)¶
let a = 42i8 # i8
let b = 42u16 # u16
let c = 42i32 # i32
let d = 42u64 # u64
let e = 3.14f32 # f32
let f = 255u8 # u8
Conversions¶
Explicit Casting Required¶
No implicit narrowing — all conversions are explicit:
let big: i64 = 1000
let small: i16 = big as i16 # Explicit cast (may truncate)
let safe: i16 = i16.try_from(big)? # Safe conversion (returns Result)
Conversion Functions¶
# Fallible conversions (return Result)
let x: Result[i16, ConversionError] = i16.try_from(big_value)
# Infallible widening (always safe)
let wide: i64 = i64.from(small_value) # i16 -> i64 always works
# Truncating cast (use with caution)
let truncated: i8 = big_value as i8 # May lose data
Arithmetic¶
Same-type operations return the same type:
let a: i32 = 10
let b: i32 = 20
let c: i32 = a + b # i32
# Mixed types require explicit conversion
let x: i32 = 10
let y: i64 = 20
let z = x as i64 + y # Must convert first
Overflow Behavior¶
Follow Rust's behavior:
- Debug builds: Panic on overflow
- Release builds: Wrap (two's complement)
Explicit wrapping/saturating operations available:
let x: u8 = 255
let y = x.wrapping_add(1) # 0 (wrapped)
let z = x.saturating_add(1) # 255 (saturated)
let w = x.checked_add(1) # None (overflow detected)
Examples¶
Binary Protocol Parsing¶
def parse_header(data: bytes) -> Result[Header, ParseError]:
if len(data) < 8:
return Err(ParseError.TooShort)
let magic: u32 = u32.from_le_bytes(data[0..4])
let version: u16 = u16.from_le_bytes(data[4..6])
let flags: u8 = data[6]
let reserved: u8 = data[7]
return Ok(Header(magic, version, flags))
Network Port Handling¶
def connect(host: str, port: u16) -> Result[Connection, IoError]:
let addr = f"{host}:{port}"
return Connection.open(addr)
# Usage
let conn = connect("localhost", 8080u16)?
Memory-Efficient Arrays¶
# Image pixel data - 3 bytes per pixel instead of 24
model Pixel:
r: u8
g: u8
b: u8
let image: List[Pixel] = load_image("photo.png")
Bit Manipulation¶
def set_bit(value: u32, bit: u8) -> u32:
return value | (1u32 << bit)
def clear_bit(value: u32, bit: u8) -> u32:
return value & ~(1u32 << bit)
def test_bit(value: u32, bit: u8) -> bool:
return (value & (1u32 << bit)) != 0
Implementation Plan¶
Phase 0: Builtin Type Registry¶
Before adding new types, establish registry-first infrastructure so the compiler has a single source of truth for:
- builtin type spellings (e.g.
i32,u8) - builtin/surface method vocabulary (e.g.
str.upper(),List.append()) - metadata like “introduced in RFC”, stability, and docs
Current crate layout note: in the current workspace, canonical language vocabulary lives in:
crates/incan_core/src/lang/types/*for builtin type namescrates/incan_core/src/lang/surface/*for method/function/type “surface” vocabularycrates/incan_syntaxfor lexer/token/parsing (syntax), consuming registries where appropriate
So Phase 0 should be implemented by extending these registries and migrating compiler call sites to consult them,
rather than parsing incan_stdlib source code.
Problem¶
Currently, builtin type methods (e.g., str.upper(), List.append(), FrozenStr.len()) are hardcoded in multiple places:
src/frontend/typechecker/check_expr/access.rs— method return typessrc/backend/ir/emit/mod.rs— code generationsrc/backend/ir/emit/expressions/methods.rs— method emission
This creates triple maintenance burden and increases drift risk between docs/spec/compiler behavior.
Solution¶
-
Define method vocab in
incan_core(source of truth):crates/incan_core/src/lang/surface/methods.rs(surface method vocab registries; re-exported asincan_core::lang::surface::{string_methods, list_methods, dict_methods, ...}for stable import paths)- For sized integer methods, add a new registry section within
methods.rs(or a sibling module re-exported fromcrates/incan_core/src/lang/surface/mod.rsif it grows too large)
-
Compiler consumes registries instead of string matches:
- Typechecker consults the registries to determine allowed methods and return types.
- Backend/codegen consults the same registries (or enum/ID mapping derived from them) to emit method calls.
-
Guardrails
- Add tests that ensure registry uniqueness and prevent drift with lexer/parser expectations (parity checks).
Benefits¶
- Single source of truth:
incan_coredefines the language surface vocabulary. - No hardcoding: typechecker and emitter read from registries.
- Extensibility: adding a method is a single edit in
incan_core::lang::surface. - Documentation: reference docs can be generated from registries.
Scope¶
Phase 0 covers:
- Registry infrastructure (IDs + metadata tables + docs generation + guardrails)
- Migration of existing hardcoded types:
str,bytesList,Dict,SetFrozenStr,FrozenBytesFrozenList,FrozenSet,FrozenDict
- Foundation for sized integer methods in Phase 5
Phase 1: Lexer¶
Add token recognition for:
- Type names:
i8,i16,i32,i64,i128,u8,u16,u32,u64,u128,f32,isize,usize - Literal suffixes:
42i32,255u8,3.14f32
Phase 2: Parser¶
- Parse sized type annotations
- Parse suffixed numeric literals
Phase 3: Type Checker¶
- Add sized integer types to type system
- Enforce explicit conversions (no implicit narrowing)
- Type inference for unsuffixed literals
- Use builtin registry (from Phase 0) for method lookups
Phase 4: Codegen¶
- Map types directly to Rust equivalents
- Emit proper literal suffixes
- Generate conversion code
- Use builtin registry for method emission
Phase 5: Standard Library¶
- Add conversion methods (
try_from,from,as) to registry - Add byte conversion methods (
from_le_bytes,to_be_bytes, etc.) to registry - Add overflow-handling methods (
wrapping_add,saturating_sub, etc.) to registry - Implement corresponding Rust methods in
incan_stdlib
Alternatives Considered¶
1. Only Add When Needed via Rust Interop¶
from rust::std::primitive import i16, u8
Rejected: Awkward syntax, doesn't integrate well with literals.
2. Python-Style Arbitrary Precision¶
Python's int is arbitrary precision. We could do the same.
Rejected: Doesn't help with FFI, binary protocols, or memory efficiency. Also has performance overhead.
3. Wrapper Types Only¶
newtype Port = u16 # Define in stdlib
Rejected: Still need the underlying sized types.
4. C-Style Type Names¶
let x: short = 10 # i16
let y: unsigned int = 20 # u32
Rejected: Verbose, platform-dependent sizes in C, Rust-style is clearer.
Open Questions¶
- Literal inference: Should
let x: u8 = 256be a compile error or runtime panic? -
Proposal: Compile error for out-of-range literals
-
Default integer type: Keep
intasi64or make it platform-dependent likeisize? -
Proposal: Keep as
i64for predictability -
Char type: Add
charas a separate type (Rust's 4-byte Unicode scalar)? -
Proposal: Defer to separate RFC
-
SIMD types: Should we include
i8x16,f32x4etc? -
Proposal: Defer to separate RFC for SIMD
-
List/array indexing: How should
intwork with indexing? - Problem:
arr[i]wherei: intfails in Rust (needsusize) - Option A: Auto-coerce
int→usizein indexing context (ergonomic, implicit) - Option B: Require explicit cast
arr[i as usize](explicit, verbose) - Option C: Make
range()returnusize(natural for loops, but breaks if used elsewhere) - Proposal: Option A — compiler inserts
as usizefor indexing; matches Python's ergonomics
Checklist¶
Phase 0: Builtin Type Registry (checklist)¶
- [ ] Extend
incan_core::lang::typeswith sized type vocabulary (i8,u16,f32,usize, ...) - [ ] Extend
incan_core::lang::surfacewith sized integer method vocabulary (new registry module) - [ ] Typechecker: consult
incan_core::lang::surfaceregistries for method typing - [ ] Backend: consult
incan_core::lang::surfaceregistries (or enum mapping) for method emission - [ ] Migrate
strmethods from hardcoded to registry - [ ] Migrate
bytesmethods from hardcoded to registry - [ ] Migrate
Listmethods from hardcoded to registry - [ ] Migrate
Dictmethods from hardcoded to registry - [ ] Migrate
Setmethods from hardcoded to registry - [ ] Migrate
FrozenStrmethods from hardcoded to registry - [ ] Migrate
FrozenBytesmethods from hardcoded to registry - [ ] Migrate
FrozenList/FrozenSet/FrozenDictmethods from hardcoded to registry - [ ] Update emitter to use registry for method emission
- [ ] Remove hardcoded method matches from
check_expr/access.rs - [ ] Remove hardcoded method matches from
emit/mod.rs
Phase 1-4: Sized Types¶
- [ ] Lexer: recognize sized type names
- [ ] Lexer: parse literal suffixes (
42i32,3.14f32) - [ ] Parser: sized type annotations
- [ ] Type checker: sized integer types in type system
- [ ] Type checker: enforce explicit conversions
- [ ] Type checker: literal range checking
- [ ] Codegen: emit proper Rust types
- [ ] Codegen: emit literal suffixes
- [ ] Codegen: auto-coerce
int→usizefor list indexing
Phase 5: Standard Library Methods¶
- [ ] Stdlib: add sized type interface traits (
I8Methods,U16Methods, etc.) - [ ] Stdlib: conversion methods (
try_from,from) - [ ] Stdlib: byte conversion methods (
from_le_bytes,to_be_bytes, etc.) - [ ] Stdlib: overflow-handling methods (
wrapping_add,saturating_sub, etc.)
Documentation¶
- [ ] Documentation update
- [ ] Examples
References¶
- Rust Primitive Types
- Rust Integer Overflow
- Python struct module (for comparison)