RFC 006: Python-style generators¶
- Status: In Progress
- Created: 2024-12-10
- Author(s): Danny Meijer (@dannymeijer)
- Related: RFC 016 (loop and break value), RFC 019 (runner testing), RFC 068 (protocol hooks for core language syntax)
- Issue: https://github.com/dannys-code-corner/incan/issues/324
- Follow-up: RFC 088 (iterator adapter surface), tracked by https://github.com/dannys-code-corner/incan/issues/127
- RFC PR: —
- Written against: v0.1
- Shipped in: —
Summary¶
This RFC introduces Python-style generators to Incan in two connected forms: generator functions that use yield inside def, and generator expressions that produce lazy generator values inline. Both forms describe the same underlying language model: a resumable producer of T exposed as Generator[T].
Motivation¶
Incan currently has eager collections and iterator-shaped loops, but it does not have a first-class way to express lazy, stateful iteration in the language surface itself. That hurts several common cases:
- large or unbounded sequences that should not be materialized eagerly;
- streaming transformations that would otherwise allocate intermediate lists;
- recursive traversals where the natural shape is "produce one value, suspend, resume";
- portability for authors coming from Python, where
yieldand generator expressions are familiar ways to express lazy iteration.
The feature also fits Incan's backend well. Incan does not need a separate user-facing coroutine model just to support this control-flow shape; the compiler can lower generators to ordinary backend state-machine machinery.
Goals¶
- Add a first-class generator model based on
yieldandGenerator[T]. - Support both generator functions and generator expressions as part of that model.
- Make lazy iteration explicit in type signatures instead of inferring it from incidental usage.
- Preserve ordinary
for-loop ergonomics over generator results. - Standardize the minimum helper surface needed to transform and realize generator values ergonomically:
.map(),.filter(),.take(), and.collect(). - Distinguish generator
yieldclearly from fixtureyield.
Non-Goals¶
- Async generators in this RFC.
- Bidirectional coroutine features such as Python-style
send(). - A wholesale redesign of comprehensions or iterator adapters beyond generator support itself.
- Standardizing the broader
Iterator[T]adapter surface in the initial draft. Methods such as.flat_map(),.skip(),.chain(),.enumerate(),.zip(),.take_while(),.skip_while(),.count(),.any(),.all(),.find(),.fold(), and.batch()belong in RFC 088.
Guide-level explanation (how users think about it)¶
Generator functions¶
def count_up(start: int, end: int) -> Generator[int]:
mut i = start
while i < end:
yield i
i += 1
for n in count_up(0, 1_000_000):
if n > 100:
break
println(n)
The function above does not build a list. Each yield produces one element for the surrounding iteration and then suspends the function until the next value is requested.
Generator expressions¶
squares = (x * x for x in range(10))
for sq in squares:
println(sq)
Generator expressions are the expression form of the same idea: they produce a lazy Generator[T] instead of an eager list.
Infinite generators¶
def fibonacci() -> Generator[int]:
mut a, b = 0, 1
while true:
yield a
a, b = b, a + b
fibs = fibonacci().take(10).collect()
This is the core value proposition: the generator can describe an unbounded stream, while the consumer decides how much to realize.
Lazy transformations¶
def square(n: int) -> int:
return n * n
def is_even(n: int) -> bool:
return n % 2 == 0
evens_squared = count_up(0, 1_000_000)
.filter(is_even)
.map(square)
.take(10)
.collect()
Generator helper methods are lazy until a terminal operation consumes them. In the example above, .filter(), .map(), and .take() build a generator pipeline; .collect() realizes only the first ten transformed values as a list[int].
Recursive traversal¶
def walk_tree(node: Node) -> Generator[Node]:
yield node
for child in node.children:
for descendant in walk_tree(child):
yield descendant
Generators are useful when the control flow is naturally incremental instead of collection-oriented.
Reference-level explanation (precise rules)¶
Generator functions reference¶
- A function is a generator function when its body contains
yieldand its declared return type isGenerator[T]. yield exprproduces one element of typeTfor the surrounding generator.yieldmust not appear in ordinary functions, except where another RFC explicitly givesyieldspecial meaning for a distinct construct such as fixtures.- A generator function may use
returnwithout a value to terminate iteration early. return valueinside a generator is a compile-time error. Generator return values are reserved for a future delegation or coroutine-oriented RFC.
Generator expressions reference¶
- A generator expression has the form
(expr for binding in iterable [if condition] ...)and yields aGenerator[T], whereTis the type ofexpr. - Generator expressions support the same comprehension-clause shape as list comprehensions: one leading
forclause followed by zero or more nestedforclauses and zero or more trailingiffilters. - The iterable source is consumed lazily as the resulting generator is advanced.
- A generator expression is semantically equivalent to an anonymous generator that iterates the source and yields
exprfor each bound element. - Generator expressions are lazy; the list-comprehension surface remains the eager collection form.
Typing¶
- Every yielded expression must type-check against the element type
TinGenerator[T]. - Declaring
Generator[T]without any reachableyieldis a compile-time error. - Using
yieldwithout aGenerator[T]return type is a compile-time error unless another construct has already claimedyieldsemantics for that context.
Consumption¶
Generator[T]is the semantic type for values produced by generator functions and generator expressions.Generator[T]must satisfy the static iteration protocol from RFC 068: it may be consumed anywhere anIterable[T]/Iterator[T]value is accepted, andforloops must accept generator values anywhere they accept iterable values.- Generator values must expose the following minimum helper surface:
def map[U](self, f: (T) -> U) -> Generator[U]
def filter(self, f: (T) -> bool) -> Generator[T]
def take(self, n: int) -> Generator[T]
def collect(self) -> list[T]
.map(),.filter(), and.take()are lazy: they return new generator values and must not materialize intermediate lists..collect()is terminal and eager: it consumes the generator and returns alist[T].- Exhausting a generator ends iteration normally.
Design details¶
One generator model, two surfaces¶
Generator functions and generator expressions are not separate features stitched together for convenience. They are two surfaces over the same language model:
- generator functions are statement-oriented and better for named, reusable, stateful producers;
- generator expressions are expression-oriented and better for inline lazy transforms.
This RFC treats both as first-class parts of Python-style generator support rather than as rollout stages.
Distinction from fixtures¶
The language already uses yield in fixture-oriented testing flows. That overlap is tolerable only if the surrounding declaration makes the meaning unambiguous:
- fixture declarations keep fixture lifecycle semantics;
- ordinary functions returning
Generator[T]use lazy iteration semantics.
This RFC therefore treats the declaration context, not the token alone, as the source of truth for yield meaning.
No separate lint or style guidance is required. Fixture yield and generator yield share the same surface mental model: produce a value, suspend, and later resume. The compiler only needs to make invalid context errors precise.
Lowering model¶
The intended implementation strategy is to lower generator functions and generator expressions through a compiler-owned state-machine transformation or equivalent backend support. That lowering choice is not the language definition; the language contract is only that generators behave as lazy, resumable producers of T.
Generator helper methods should lower to backend-native lazy iterator machinery where the target supports it. For the Rust backend, .map(), .filter(), .take(), and .collect() should preserve Rust iterator-chain behavior rather than inserting eager intermediate collections.
Interaction with existing features¶
forloops consume generators the same way they consume other iterable sources.- Recursive generators are valid as long as the yielded element type remains consistent.
- Generator expressions are the lazy counterpart to eager list-comprehension syntax rather than a separate collection feature.
- The broader
Iterator[T]adapter API is a follow-up surface, not part of this RFC. RFC 088 owns that design and issue #127 tracks it.
Compatibility / migration¶
The feature is additive. Existing functions, loops, and comprehensions keep their meaning.
Alternatives considered¶
- Explicit
genkeyword -
Clear, but more backend-shaped than Incan needs. Requiring
Generator[T]plusyieldalready communicates intent. -
Dedicated
generatordeclaration form -
Avoids overloading ordinary
def, but splits the function surface for a feature that is still "a function producing values over time." -
Functions only, expressions later
- Not actually more principled. It would make the RFC weaker while still aiming at the same north-star generator model.
Drawbacks¶
yieldnow carries two meanings in the language, so diagnostics must be explicit.- Generators introduce suspension semantics that users must learn alongside ordinary function control flow.
- Generator expressions add grammar and precedence surface that the language and tooling must handle carefully.
Layers affected¶
- Language surface:
yieldmust be valid in generator function bodies, and generator-expression syntax must be recognized. - Type system: yielded expressions must match
Generator[T], and generator declarations must remain internally consistent. - Execution model: implementations must preserve suspension points and lazy iteration semantics for both named and anonymous generator forms.
- Stdlib / surface vocabulary: the language must define the
Generatortype, its RFC 068 iteration-protocol conformance, and the minimum stable helper methods promised publicly. - Formatter / tooling: multi-line generators should format predictably, and diagnostics should explain generator-specific behavior clearly.
Implementation Plan¶
Phase 1: Spec, parser, and AST¶
- Keep RFC 006 aligned with the settled design decisions: full generator-expression clause support, no
return valuein generators, and no extra fixture/generator lint policy. - Extend parser and AST support for generator expressions using the existing comprehension clause surface.
- Preserve existing fixture
yieldparsing while separating generator-context validation into later semantic stages.
Phase 2: Typechecker and diagnostics¶
- Add
Generator[T]as a checked semantic type that satisfies the RFC 068 iteration protocol. - Validate generator function bodies: yielded expressions must match
Generator[T], ordinary functions must rejectyield, and generator functions must rejectreturn value. - Type-check generator expressions lazily while preserving the same binding and filter semantics as list comprehensions.
- Type-check the minimum generator helper surface:
.map(),.filter(),.take(), and.collect().
Phase 3: Lowering, emission, and runtime surface¶
- Lower generator functions and generator expressions to lazy backend state-machine or iterator-equivalent behavior.
- Emit Rust code that preserves suspension, ordering, and laziness for generator values.
- Expose the minimum generator helper surface through stdlib-visible declarations and runtime support as needed.
- Ensure generator values work in
forloops anywhere the RFC 068 iteration protocol is accepted.
Phase 4: Tooling, docs, and release readiness¶
- Format generator functions and multi-clause generator expressions predictably.
- Add parser, typechecker, codegen snapshot, integration, and diagnostic tests for the full RFC surface.
- Update authored user-facing docs, release notes, and the active development version when implementation lands.
Progress Checklist¶
Spec / design¶
- Decide that generator expressions support the full comprehension-clause surface.
- Decide that
return valuein generators is rejected and reserved for future delegation or coroutine work. - Decide that fixture and generator
yieldneed precise diagnostics but no extra lint/style guidance.
Parser / AST / formatter¶
- Parser: parse generator expressions with the full comprehension-clause surface.
- AST: represent generator expressions without confusing them with eager list comprehensions.
- Parser: preserve existing fixture
yieldparsing behavior. - Formatter: round-trip generator functions and multi-clause generator expressions stably.
Typechecker / diagnostics¶
- Type system: represent
Generator[T]as the checked generator semantic type. - Protocols: make
Generator[T]satisfy the RFC 068 iteration protocol. - Diagnostics: reject
yieldoutside generator functions and fixture contexts. - Diagnostics: reject generator functions whose yielded values do not match
Generator[T]. - Diagnostics: reject
return valueinside generator functions. - Typechecker: validate generator expressions with nested
forclauses and trailingiffilters. - Typechecker: validate
.map(),.filter(),.take(), and.collect()on generator values.
Lowering / emission / runtime¶
- Lower generator functions to lazy resumable producer behavior.
- Lower generator expressions to lazy generator values.
- Emit Rust that preserves generator suspension, ordering, and laziness.
- Runtime/stdlib: expose the
Generator[T]surface and minimum helper methods. - Runtime/stdlib: ensure
.map(),.filter(), and.take()stay lazy and.collect()is terminal.
Tests¶
- Parser tests for generator function
yieldand generator-expression clauses. - Typechecker tests for valid and invalid generator function return/yield combinations.
- Typechecker tests for invalid
return valuein generator functions. - Typechecker tests for generator helper callback arity and return-type diagnostics.
- Codegen snapshot tests for generator functions, generator expressions, and helper chains.
- Integration tests for
forloops over generators and.map().filter().take().collect()chains. - Regression tests that fixture
yieldbehavior still works.
Docs / release¶
- Update generator language reference or tutorial docs.
- Update release notes for the active
0.3development line. - Bump the active
0.3.0-dev.Nversion.
Design decisions¶
Generator[T]is the user-facing semantic type foryield-produced values. It should satisfy the iteration protocol rather than being collapsed into a bareIterator[T]return type.- The first stable generator helper surface is intentionally small:
.map(),.filter(),.take(), and.collect(). - The broader iterator adapter surface belongs in RFC 088, tracked by issue #127.
- Generator expressions support the full comprehension-clause surface rather than only a single
forclause. returnwithout a value may terminate a generator early, butreturn valueis rejected in RFC 006 and reserved for future generator delegation or coroutine work.- Fixture
yieldand generatoryielddo not need extra lint or style guidance beyond precise context-sensitive diagnostics.