RFC 024: Extensible Derive Protocol¶
- Status: Planned
- Author(s): Danny Meijer (@dannymeijer)
- Issue: #148
- RFC PR: —
- Created: 2026-02-17
- Related:
Summary¶
This RFC proposes an extensible derive protocol that lets modules declare themselves as derivable. A derivable module
exposes a __derives__ list that declares which of the module's traits are adoptable via @derive(). When a type
derives the module, those traits — and their methods — are adopted onto the type. This replaces the current closed
DeriveId registry for format-related derives with a trait-based, module-driven mechanism — enabling user-defined
serialization formats, schema generators, and behavioral adapters without compiler changes.
Motivation¶
The derive system is closed¶
Today, @derive(Serialize, Deserialize) is backed by a hardcoded DeriveId enum in
crates/incan_core/src/lang/derives.rs. The method injection (to_json, from_json) is wired into the typechecker via
inject_json_methods(). Adding a new serialization format requires changes across multiple compiler stages — there is no
user-facing mechanism to create custom derivable traits.
Serialization isn't just JSON¶
A natural extension of Incan's model system is serving multiple wire formats from one definition: JSON, YAML, Protobuf, Avro, Arrow, and more. Each format needs its own serialization/deserialization methods, and users need the ability to pick exactly which formats a model supports. For example:
from std.serde import json, yaml
from std.schema import protobuf
@derive(json, yaml, protobuf)
model CustomerEvent:
customer_id: str
email: str
event_type: str
amount: int
timestamp: datetime
This model gets .to_json(), .from_json(), .to_yaml(), .from_yaml(), .proto_schema() — all statically verified,
all type-safe.
Users need custom derives¶
Data engineering workflows (steps, pipelines, Readers/Writers) often use internal formats. Teams need to create their own derivable modules — an internal binary codec, a company-specific schema format, a custom wire protocol — without forking the compiler or waiting for stdlib additions.
Injected methods need trait bounds¶
The current inject_json_methods() approach makes .to_json() appear on types that derive Serialize, but there's no
trait backing it. This means generic functions cannot express "T must be JSON-serializable":
# Impossible today — no trait to bind against
def export[T](data: T) -> str:
return data.to_json() # Compiler: "T has no method to_json"
With trait-based derives, this becomes expressible:
def export[T with json.Serialize](data: T) -> str:
return data.to_json() # Verified: json.Serialize guarantees .to_json()
Non-Goals¶
- Implementing specific format libraries. This RFC uses YAML, Protobuf, Avro, SQL DDL, and others as illustrative
examples of what the protocol enables. It does not propose adding those libraries to the stdlib. Each format would
be introduced by its own RFC or feature issue (e.g., RFC 012 for
JsonValue). - Migrating built-in derives (
Eq,Clone,Debug, etc.) to the__derives__protocol. These remain compiler intrinsics handled by theDeriveIdregistry. See Interaction with existing features for details. - Runtime reflection of field values. The protocol relies on existing
__fields__()metadata reflection for schema generators. Dynamic field value access (needed to express__eq__or__repr__in pure Incan) is out of scope.
Guide-level explanation (how users think about it)¶
Deriving a format¶
Users import a format module and derive it:
from std.serde import json
@derive(json)
model Config:
host: str
port: int
debug: bool
Config now has .to_json() and .from_json() methods. The user can see exactly where they come from — the json
module defines the traits. See Phase 4 of this RFC for the std.serde.json module migration.
Deriving multiple formats¶
from std.serde import json, yaml
@derive(json, yaml)
model Config:
host: str
port: int
config = Config(host="localhost", port=8080)
json_str = config.to_json()
yaml_str = config.to_yaml()
Both format modules define their own Serialize trait, each carrying @rust.derive("serde::Serialize"). The compiler
deduplicates to a single Rust-level derive, while each module injects its own distinct methods (.to_json() vs
.to_yaml()).
Partial derives (serialize-only, deserialize-only)¶
from std.serde.json import Serialize as json_write
@derive(json_write)
model LogEntry:
message: str
level: str
timestamp: datetime
# LogEntry has .to_json() but NOT .from_json()
Schema generation (pure Incan, no Rust needed)¶
Not all derives involve serialization. Schema generators produce text artifacts from a model's field metadata using
__fields__() reflection — no rust:: imports required:
from std.schema import sql
@derive(sql)
model Users:
id: int
name: str
email: str
print(Users.sql_ddl())
# CREATE TABLE Users (
# id INTEGER,
# name TEXT,
# email TEXT
# );
The (hypothetical) sql module in this example defines a SqlSchema trait whose sql_ddl() method is implemented
entirely in Incan by iterating over __fields__(). The same pattern works for OpenAPI, GraphQL type definitions, or any
text-based schema format.
Behavioral derives¶
Derives aren't limited to formats. A derivable module can attach any behavior to a model:
from std.schema import sql
from my_company.observability import auditable
@derive(sql, auditable)
model Account:
id: int
owner: str
balance: int
Here auditable might define an Auditable trait that provides a .diff(other) method, a .changelog() method, or
field-level change tracking — whatever the module's traits declare. The mechanism is the same regardless of whether the
derive produces bytes, text, or behavior.
Using trait bounds in generic functions¶
Because derives are backed by traits, they compose with the with bound syntax (RFC 023):
from std.serde import json, yaml
def export[T with (json.Serialize, yaml.Serialize)](
data: T,
format: str,
) -> str:
if format == "json":
return data.to_json()
return data.to_yaml()
Writing a custom derivable module¶
No compiler changes needed. A user writes exactly the same pattern as stdlib:
# my_company/formats/internal.incn
from rust::my_codec import encode, decode
__derives__ = [Serialize, Deserialize]
# No @rust.derive needed — encode/decode handle serialization directly, without requiring a Rust derive on the struct.
trait Serialize:
def to_internal(self) -> bytes:
return encode(self)?
trait Deserialize:
def from_internal(data: bytes) -> Result[Self, str]:
return decode(data)?
Then anywhere in the codebase:
from my_company.formats import internal
@derive(internal)
model SensorReading:
device_id: str
value: float
SensorReading now has .to_internal() and .from_internal().
Reference-level explanation (precise rules)¶
The __derives__ module attribute¶
A module that defines a __derives__ attribute at module level is a derivable module. The attribute lists which of
the module's traits are adoptable via @derive():
__derives__ = [Serialize, Deserialize]
Here, Serialize and Deserialize refer to traits defined in the same module. When a type writes
@derive(module_name), the compiler:
- Resolves
module_nameto the imported module - Reads
module_name.__derives__to get the list of derivable traits - Adopts those traits onto the type — their methods become available on instances of the type
- Determines the Rust-level
#[derive(...)]attributes needed (an emission concern, derived from@rust.derivedecorators on the adopted traits)
Trait adoption via derive¶
The traits listed in __derives__ are adopted by any type that derives the module. This is equivalent to the type
writing with TraitName for each listed trait, but driven by the @derive() decorator. Only traits explicitly listed
in __derives__ are adopted — other traits defined in the module are not automatically included.
Rust derive binding via @rust.derive¶
A trait in a derivable module may need the compiler to emit a Rust #[derive(...)] attribute on any struct that adopts
it. This is distinct from @rust.extern (which delegates a method call to Rust) — @rust.derive declares that the
type itself requires a Rust-level derive for the trait's methods to work.
The @rust.derive("path::to::Derive") decorator on a trait declaration carries this binding:
@rust.derive("serde::Serialize")
trait Serialize:
def to_json(self) -> str:
return to_string(self)?
When a type adopts this trait via @derive(), the compiler emits #[derive(serde::Serialize)] on the Rust struct.
Traits that don't need a Rust-level derive (pure Incan behavioral traits, schema generators using __fields__()
reflection) simply omit @rust.derive — their methods compile normally without any struct-level annotation.
Derive deduplication¶
Multiple modules may declare the same @rust.derive path. For example, both json.Serialize and yaml.Serialize
carry @rust.derive("serde::Serialize"). The compiler collects all @rust.derive paths from all adopted traits into a
set before emission, producing one #[derive(serde::Serialize, serde::Deserialize)] regardless of how many format
modules are derived.
Individual trait imports¶
Traits within a derivable module can be imported individually:
from std.serde.json import Serialize
When used in @derive(Serialize), only that single trait is adopted (and its @rust.derive path, if any, is emitted).
This enables fine-grained control — derive only serialization, only deserialization, etc.
Method resolution¶
When a type derives a module, the module's traits are adopted. Method calls on instances of the type resolve through normal trait method lookup. If two derived modules define traits with the same method name, this is a compile-time error (ambiguous method), following normal trait method resolution rules.
Design details¶
Syntax¶
Three new syntactic elements:
-
Module-level
__derives__attribute: a list of derive names assigned at module scope.__derives__ = [Serialize, Deserialize]Parsed as a const assignment where the name is
__derives__and the value is a list of identifiers. Each identifier must resolve to a trait defined in the same module. -
@derive(module)expansion: the existing@derive(...)syntax is extended to accept module names (not just derive names). When the argument resolves to a module with a__derives__attribute, it is expanded.from std.serde import json @derive(json) # Module derive — expands via __derives__ @derive(Debug, Clone) # DeriveId derives — unchanged model Foo: x: int -
@rust.derivedecorator on traits: declares the Rust#[derive(...)]attribute that must be emitted on any struct adopting this trait. This is the bridge between an Incan trait and the Rust code generation it requires.@rust.derive("serde::Serialize") trait Serialize: def to_json(self) -> str: return to_string(self)?Traits without
@rust.deriveare pure Incan — no Rust-level derive is emitted for them.
No new keywords. @rust.derive follows the existing @rust.extern decorator pattern.
Semantics¶
When the compiler encounters @derive(name):
- Resolve
name: check if it refers to aDeriveId(built-in derive) or an imported symbol. - If
DeriveId: existing behavior — emit the corresponding Rust#[derive(...)]. - If module with
__derives__: adopt the traits listed in__derives__onto the type. The compiler determines the necessary Rust-level derives from the adopted traits during emission. - If trait from a derivable module: adopt only that single trait onto the type.
- Error: if
nameis neither a known derive, a derivable module, nor a trait from one — emit a diagnostic.
Trait method injection follows normal trait adoption rules. Methods with self receiver become instance methods on the
adopting type. Methods without a receiver become associated functions (e.g., Model.from_json(s)).
Three categories of derivable modules¶
1. Serialization formats (data in/out)¶
These convert instances to/from bytes or strings. They use rust:: interop to call codec libraries. Multiple serde
formats define similarly-named traits (each module has its own Serialize / Deserialize) that inject distinct methods:
| Module | __derives__ |
Traits / methods |
|---|---|---|
std.serde.json |
[Serialize, Deserialize] |
.to_json(), .from_json() |
std.serde.yaml |
[Serialize, Deserialize] |
.to_yaml(), .from_yaml() |
std.serde.toml |
[Serialize, Deserialize] |
.to_toml(), .from_toml() |
std.serde.msgpack |
[Serialize, Deserialize] |
.to_msgpack(), .from_msgpack() |
std.serde.csv |
[Serialize, Deserialize] |
.to_csv_row(), .from_csv_row() |
Example implementation:
# stdlib/serde/json.incn
from rust::serde_json import to_string, from_str
__derives__ = [Serialize, Deserialize]
@rust.derive("serde::Serialize")
trait Serialize:
def to_json(self) -> str:
return to_string(self)?
@rust.derive("serde::Deserialize")
trait Deserialize:
def from_json(json_str: str) -> Result[Self, str]:
return from_str(json_str)?
No rust.module(), no @rust.extern — the .incn file is the complete implementation. @rust.derive declares the
Rust struct-level derive needed for the rust:: interop calls to work. The rust:: interop (RFC 005) provides access
to the underlying Rust codec library.
2. Schema generators (type shape out)¶
These generate schema artifacts from the model's type definition. They operate on field metadata via __fields__()
reflection and are typically pure Incan:
| Module | __derives__ |
Traits / methods | Artifact |
|---|---|---|---|
std.schema.protobuf |
[ProtobufMessage] |
.proto_schema() |
.proto definition |
std.schema.avro |
[AvroSchemaDerive] |
.avro_schema() |
Avro schema JSON |
std.schema.openapi |
[OpenApiSchema] |
.openapi_schema() |
OpenAPI spec fragment |
std.schema.graphql |
[GraphqlType] |
.graphql_type() |
GraphQL type def |
std.schema.sql |
[SqlSchema] |
.sql_ddl() |
CREATE TABLE |
std.schema.arrow |
[ArrowSchema] |
.arrow_schema() |
arrow::Schema |
Schema generators with only one trait can still use __derives__ to make the module derivable. Since the trait has no
@rust.derive decorator, no Rust-level derive is emitted — the trait methods are pure Incan reflection:
# stdlib/schema/sql.incn
__derives__ = [SqlSchema]
trait SqlSchema:
def sql_ddl(self) -> str:
lines: list[str] = []
lines.append(f"CREATE TABLE {self.__class_name__} (")
for field in self.__fields__():
sql_type = _incan_type_to_sql(field.type_name)
lines.append(f" {field.wire_name} {sql_type},")
lines.append(");")
return "\n".join(lines)
Some formats are hybrids — they need both schema generation AND instance serialization (e.g., Avro needs schema JSON plus binary encode/decode).
3. Behavioral derives¶
These attach behavior to models without producing bytes or schemas. For example:
| Module | __derives__ |
What it does |
|---|---|---|
std.validation |
[Validate] |
Checked construction via .new() |
std.governance |
[Governed] |
PII masking, field-level access control |
std.versioning |
[Versioned] |
API version-aware response shapes |
Interaction with existing features¶
Built-in derives (Eq, Clone, Debug, etc.)¶
Built-in derives remain compiler intrinsics. They are not migrated to the __derives__ protocol because their
implementations are Rust proc macros that generate impl blocks — there is no Incan-expressible body to put in a trait.
The DeriveId registry continues to handle these.
The distinction is clear: built-in derives implement language-level semantics (equality, ordering, cloning, debug formatting). Format derives implement library-level functionality (serialization, schema generation). The protocol applies to the latter.
Note: as the language evolves, this might change. It is hypothetically possible to rewrite the built-in derives as traits in the stdlib, but that would be a significant change requiring currently unavailable functionality and syntax that is not in scope for this RFC.
rust:: imports (RFC 005)¶
The rust:: import mechanism is the primary way derivable modules access Rust codec libraries. A derivable module's
trait methods are pure Incan that call into Rust libraries via rust:: imports. The two mechanisms are complementary.
with trait bounds (RFC 023)¶
Traits from derivable modules work with the existing with bound syntax. A function can require specific format
capabilities:
def publish[T with (json.Serialize, avro.Serialize, avro.AvroSchema)](
events: List[T],
target: ExportTarget,
) -> Result[int, str]:
match target:
ExportTarget.Api =>
for e in events:
http_post(e.to_json())
ExportTarget.Kafka =>
schema = T.avro_schema()
for e in events:
kafka_publish(e.to_avro(), schema)
Field metadata (RFC 021)¶
Derivable modules can read field metadata via __fields__(). This enables format-specific field annotations:
from std.serde import json
from std.schema import protobuf
@derive(json, protobuf)
model Event:
customer_id: str
email [pii=True, proto.tag=1]: str
event_type [proto.tag=2, values=["click", "purchase"]]: str
The json module sees alias, description, etc. The protobuf module reads proto.tag for stable field numbering.
Each format consumes the metadata it understands and ignores the rest.
Compatibility / migration¶
This RFC is additive for the protocol itself — __derives__, @rust.derive, and module-based @derive() are new
capabilities. However, it includes one deprecation: bare @derive(Serialize, Deserialize) will be removed from the
DeriveId registry once the std.serde.json module is available (see design decision #4). Users migrate to the
explicit module form:
# Before (deprecated — will be removed)
@derive(Serialize, Deserialize)
model Config:
host: str
config.to_json()
# After
from std.serde import json
@derive(json)
model Config:
host: str
config.to_json()
The migration is mechanical: add the format import, replace bare Serialize/Deserialize with the module name. The
generated Rust output is identical. Built-in derives (Debug, Clone, Eq, etc.) are unaffected.
Alternatives considered¶
1. Decorator-based method injection (current approach)¶
The status quo: hardcode method injection in the typechecker per derive. Rejected because it doesn't scale to N formats and provides no trait for generic bounds.
2. __derive__ as a simple list without traits¶
A module-level __derive__ that maps to Rust derives, with methods injected by convention (e.g., to_<format> always
exists). Rejected because there's no trait to bind against in generic functions, and the method signatures are invisible
to the user.
3. Proc-macro-style user derives¶
Allow users to write Rust proc macros and register them as Incan derives. Rejected because it requires Rust expertise and breaks the "Incan all the way down" principle. The trait-based approach keeps everything in Incan.
4. Making all built-in derives use this protocol too¶
Migrate Eq, Clone, Debug, etc. to __derives__-based modules. Rejected because these are genuinely compiler
intrinsics — their implementations are Rust proc macros that generate impl blocks, not callable functions. The
protocol is for library-level functionality.
Drawbacks¶
- Two derive systems: built-in derives (
DeriveIdregistry) and module-based derives (__derives__protocol) coexist. This is intentional — they serve different purposes — but adds conceptual surface area. - Naming collisions: if a module defines a
Serializetrait and the user also importsSerializefrom another module, the compiler must disambiguate. Normal trait resolution rules apply, but the error messages need to be clear. - Rust derive deduplication: the compiler must correctly deduplicate underlying Rust derives across modules. This is straightforward (collect into a set) but adds a codegen step.
Implementation plan¶
Phase 1: Parser support for __derives__ and @rust.derive¶
- [ ] Extend the parser to recognize module-level
__derives__ = [...]as a special attribute - [ ] Store the derives list in the AST's module metadata (alongside
rust.module()) - [ ] Parse
@rust.derive("path")as a decorator on trait declarations - [ ] Store
@rust.derivepaths inTraitDeclAST metadata - [ ] Emit compile error for
__derives__ = [](empty list) - [ ] Emit compile error if
__derives__references a name that isn't a trait in the same module - Touchpoints:
crates/incan_syntax/src/parser/core.rs,crates/incan_syntax/src/ast/decls.rs
Phase 2: Derive expansion in the typechecker¶
- [ ] When
@derive(name)resolves to a module (not aDeriveId), read__derives__from the module - [ ] Adopt the listed traits onto the type; inject their methods into the type's method table
- [ ] Collect
@rust.derivepaths from adopted traits for the emission layer - [ ] When
@derive(name)resolves to a single trait (imported from a derivable module), adopt only that trait - [ ] Replace
inject_json_methods()with the general trait adoption mechanism - [ ] Add diagnostic for ambiguous method names when deriving multiple modules with conflicting trait methods
- Touchpoints:
src/frontend/typechecker/collect/decl_helpers.rs,collect/stdlib_imports.rs
Phase 3: Emission deduplication¶
- [ ] Collect all
@rust.derivepaths from adopted traits, plusDeriveId-mapped derives - [ ] Deduplicate into a set before emitting
#[derive(...)] - [ ] Verify that
@rust.derivewith multiple arguments works:@rust.derive("a::B", "c::D") - Touchpoints:
src/backend/ir/emit/decls/structures.rs
Phase 4: Migrate std.serde.json to the protocol¶
- [ ] Rewrite
stdlib/serde/json.incnwith__derives__,@rust.derive,Serializetrait,Deserializetrait - [ ]
Serialize.to_json()returnsstr(serialization of a valid model cannot fail);Deserialize.from_json()returnsResult[Self, str] - [ ] Remove
inject_json_methods()hardcoding fromdecl_helpers.rs - [ ] Remove
Serialize/DeserializefromDeriveIdregistry - [ ] Verify
@derive(json)works end-to-end (typechecks, lowers, emits correct Rust) - [ ] Verify
from std.serde.json import Serialize+@derive(Serialize)works for partial derives - [ ] Add codegen snapshot tests for single-format and partial derives
- [ ] Update existing tests that use bare
@derive(Serialize, Deserialize)to use@derive(json)
Phase 5: Add a second serde format and a schema generator¶
- [ ] Implement
std.serde.yamlfollowing the same pattern asstd.serde.json - [ ] Verify multi-format derives work:
@derive(json, yaml)— correct deduplication, distinct methods - [ ] Verify
withtrait bounds work across format modules:T with (json.Serialize, yaml.Serialize) - [ ] Implement one schema generator (e.g.,
std.schema.sql) to validate the pure-Incan__fields__()reflection path - [ ] Verify schema generator derives work:
@derive(sql)with no@rust.deriveon the trait - [ ] Add codegen snapshot tests for multi-format and schema generator derives
Phase 6: Documentation and migration guide¶
- [ ] Update user-facing docs to show the
from std.serde import jsonpattern - [ ] Document how to create custom derivable modules (user guide)
- [ ] Add deprecation notice for bare
@derive(Serialize, Deserialize)in release notes
Design decisions¶
The following questions were considered during design and are recorded here for posterity.
-
Trait naming within modules: modules use short names (
Serialize,Deserialize). Users who need disambiguation use import aliasing:from std.serde.json import Serialize as JsonSerialize. This keeps module definitions simple and pushes naming concerns to the import site where the user has full context. -
__derives__syntax: parsed as an implicit const assignment. The dunder convention already signals "compiler-recognized"; an explicitconstkeyword would be redundant. It is semantically immutable — reassigning__derives__is a compile error. -
Missing or empty
__derives__: a module without__derives__is not derivable. A module with__derives__ = []is a compile error (or at minimum a warning) — an empty list signals a mistake, since there is no reason to declare__derives__without listing at least one trait. -
Bare
Serialize/Deserializederives: bare@derive(Serialize, Deserialize)ceases to exist as aDeriveIdshortcut. Users import the format module and derive it:@derive(json). If direct access to the Rust serde traits is needed,rust::interop remains available. This eliminates ambiguity and makes the format dependency explicit. -
@rust.derivevalidation: treated the same as@rust.extern— the path string is passed through to the emitted Rust code. Validation happens at Rust compile time, not in the Incan compiler. This keeps the protocol simple and works with any Rust derive crate without the Incan compiler needing to know about them. -
Multiple
@rust.deriveon one trait: allowed. A single trait may require multiple Rust-level derives. The decorator accepts multiple arguments:@rust.derive("serde::Serialize", "apache_avro::AvroSchema").
Deferred questions¶
-
Derive-time metadata: some formats may need per-model configuration (e.g., JSON naming conventions, Protobuf field numbering strategy). Whether this should be decorator args (
@derive(json, rename_all="camelCase")), field metadata, or a separate mechanism is out of scope for this RFC and deferred to future format-specific RFCs. -
Pretty printing: should
.to_json()accept formatting options (indent, sort keys), or should pretty printing be a separate function (e.g.,json.pretty(value, indent=2))? Deferred to thestd.serde.jsonimplementation.
References¶
- RFC 005 — Rust Interop
- RFC 012 —
JsonValueType and Enum Methods - RFC 025 — Multi-Instantiation Trait Dispatch
- RFC 021 — Model field metadata and schema-safe aliases
- RFC 023 — Compilable Stdlib & Rust Module Binding
- Rust
serdecrate (format-agnostic serialization) - Rust
prostcrate (Protobuf code generation) - Rust
apache-avrocrate (Avro serialization and schema)