RFC 056: std.io — in-memory byte streams and binary parsing helpers¶
- Status: Implemented
- Created: 2026-04-13
- Author(s): Danny Meijer (@dannymeijer)
- Related:
- RFC 005 (Rust interop)
- RFC 009 (sized integers)
- RFC 010 (temporary filesystem objects)
- RFC 022 (namespaced stdlib modules and compiler handoff)
- RFC 023 (compilable stdlib and Rust module binding)
- RFC 041 (first-class Rust interop authoring)
- RFC 055 (
std.fspath-centric filesystem APIs)
- Issue: https://github.com/dannys-code-corner/incan/issues/291
- RFC PR: —
- Written against: v0.2
- Shipped in: v0.3.0-dev.37
Summary¶
This RFC introduces std.io as Incan's in-memory binary I/O module. Its core abstraction is BytesIO, a writable and seekable byte stream over an in-memory bytes buffer for binary parsing, protocol work, fixtures, and transformation pipelines that should not depend on filesystem paths. The user-facing shape is intentionally recognizable to Python users coming from io.BytesIO, while the underlying semantics also take advantage of Rust's Cursor and BufRead model where that produces a cleaner complete contract for cursor movement, delimiter-based reads, and exact-width numeric helpers.
Motivation¶
Not all binary input starts as a file on disk. Tests, network clients, generated fixtures, embedded assets, decompression stages, and parser pipelines often begin with a bytes value that is already in memory. Those users should not need to import std.fs or drop into rust::std::io::Cursor just to move a cursor, read exact byte counts, skip until a delimiter, or decode fixed-width numbers. std.io gives Incan a standard home for that in-memory story while staying separate from path and OS-file concerns. That separation matters for real binary formats: std.fs should get bytes into memory or stream them from a file, while std.io.BytesIO should make the parsing and re-encoding work itself possible in pure Incan.
Goals¶
- Provide a
BytesIO-like type overbytesfor in-memory binary parsing and rewriting. - Standardize cursor, exact-read, delimiter-read, and overwrite semantics so users do not hand-roll slicing logic for every parser.
- Commit to a complete exact-width numeric read and write surface aligned with RFC 009, including both endian families for multi-byte values.
- Keep
std.ioindependent from filesystem path APIs; users should be able to parse abytesvalue without importingstd.fs. - Make the pure-Incan binary parsing story strong enough for real format readers such as GGUF-style metadata and tensor-descriptor parsers.
Non-Goals¶
- Standardizing filesystem file handles; that belongs to RFC 055.
- Defining async networking or async stream protocols here.
- Mirroring Python's entire
iohierarchy. - Introducing spill-to-disk behavior in
BytesIO; spooled temporary storage belongs instd.tempfile, not here. - Defining general
Reader/Writerprotocol families in this RFC. - Reproducing Rust
Read/Seek/BufReadtrait names one-to-one as the user-facing surface.
Guide-level explanation¶
Authors use std.io.BytesIO when they already have a bytes value and want to parse or rewrite it incrementally.
from std.io import BytesIO, Endian
buf = BytesIO(data)
magic = buf.read_exact(4)?
version: u32 = buf.read(Endian.Little)?
metadata_count: u64 = buf.read(Endian.Little)?
nul: u8 = 0
payload = buf.read_until(nul)?
remaining = buf.remaining()
BytesIO is also writable. It overwrites from the current cursor position unless the caller explicitly seeks elsewhere first.
from std.io import BytesIO, Endian
out = BytesIO()
out.write(b"GGUF")?
version: u32 = 3
metadata_count: u64 = 42
out.write(version, Endian.Little)?
out.write(metadata_count, Endian.Little)?
blob = out.into_bytes()
The mental model is: std.fs gets bytes into or out of files, while std.io walks through and rewrites bytes already in memory.
Reference-level explanation¶
Module split and compatibility target¶
- The standard library must expose
std.iofor in-memory byte-stream reading, writing, and cursor semantics. std.iois deliberately separate fromstd.fs: open OS-file handles belong to the filesystem module, whileBytesIOoperates on already-materializedbytes.- The surface should be recognizable to Python users coming from
io.BytesIO, but the committed contract is broader than Python's minimal cursor methods because Incan also standardizes explicit numeric parsing helpers. - The committed numeric helper surface depends on RFC 009. Width-specific reads and writes must use the sized numeric vocabulary introduced there.
- Implementations may use Rust
std::io::Cursor,BufRead, and primitive byte-conversion helpers internally, but user-visible semantics are defined by this RFC and stdlib docs, not by Rust trait names.
Required capabilities (committed contract)¶
The std.io contract commits to the following BytesIO surface:
- Direct construction:
BytesIO(initial: bytes = b"") -> BytesIO. - Byte reads:
read(size: int = -1) -> Result[bytes, E],read_exact(size: int) -> Result[bytes, E]. - Delimiter helpers:
read_until(byte: u8) -> Result[bytes, E],skip_until(byte: u8) -> Result[int, E]. - Cursor helpers:
tell() -> int,seek(offset: int, whence: int = 0) -> Result[int, E],rewind() -> Result[None, E],seek_relative(offset: int) -> Result[None, E]. - Byte writes:
write(data: bytes) -> Result[int, E],truncate(size: int | None = None) -> Result[int, E]. - Buffer extraction and inspection:
getvalue() -> bytes,into_bytes() -> bytes,remaining() -> int. - Trait-backed exact-width numeric reads and writes aligned with RFC 009.
Normative cursor and buffer semantics¶
Cursor behavior is normative:
- A newly constructed
BytesIO(initial)starts with its cursor at position0. read(size)must return at mostsizebytes and must advance the cursor by the returned byte count.read(size)withsize = -1must return the remaining bytes.read(size)at EOF must return an emptybytesvalue.read_exact(size)must fail if fewer thansizebytes remain.seek(offset, whence)must follow the Python-stylewhencemodel:0for start,1for current position, and2for end.rewind()is the convenience form of seeking to the start of the buffer.seek_relative(offset)moves relative to the current cursor position and must fail if the resulting position would be invalid.
Write behavior is also normative:
BytesIOis always readable, writable, and seekable; separatereadable()/writable()/seekable()predicates are not part of the committed surface.write(data)writes from the current cursor position. It does not imply append semantics unless the caller has already moved the cursor to the end.write(data)must either write the full byte slice or fail. Partial-write behavior is not part of the user-visible contract for an in-memory buffer.truncate(size=None)must shrink or extend the buffer tosize; whensizeis omitted, it uses the current cursor position.
Delimiter behavior is normative:
read_until(byte)must return bytes up to and including the delimiter when the delimiter is found.read_until(byte)must return the remaining bytes when EOF is reached before the delimiter.skip_until(byte)must discard bytes until the delimiter or EOF and return the total number of discarded bytes, including the delimiter when it is found.read_until(byte)andskip_until(byte)must return0-length /0-count results at EOF.
Buffer extraction behavior is normative:
getvalue()returns abytessnapshot of the buffer contents.into_bytes()returns the buffer bytes without changing the cursor. Implementations may avoid an extra copy when ownership rules permit it, but callers must not rely on consumption or aliasing behavior.remaining()returns the number of unread bytes from the current cursor position to the logical end of the buffer.
Numeric helper surface¶
The numeric helper surface is committed, not tentative. It is trait-backed rather than a public matrix of dozens of method names:
EndianexposesLittleandBigbyte-order variants.BinaryRead[T]exposesread(endian: Endian) -> Result[T, E].BinaryWrite[T]exposeswrite(value: T, endian: Endian) -> Result[None, E].BytesIOadopts those traits foru8,i8,u16,i16,u32,i32,u64,i64,u128,i128,f32, andf64.- Reads are selected by expected result type, so callers must provide static type context:
value: u32 = buf.read(Endian.Little)?. - Writes are selected by the argument type:
out.write(value_u32, Endian.Little)?. - Endianness is ignored for
u8andi8, because byte order is meaningless for one-byte values. - Convenience overloads for
intandfloatare intentionally deferred. RFC 009 defines those as ergonomic aliases fori64andf64, and adding both aliases and exact-width overloads needs a coherent compiler-level alias-overload rule rather than duplicate Rust impls.
Expected API shape (skeletal)¶
BytesIO¶
BytesIO(initial: bytes = b"") -> BytesIO.read(size: int = -1) -> Result[bytes, E].read_exact(size: int) -> Result[bytes, E].read_until(byte: u8) -> Result[bytes, E].skip_until(byte: u8) -> Result[int, E].tell() -> int.seek(offset: int, whence: int = 0) -> Result[int, E].rewind() -> Result[None, E].seek_relative(offset: int) -> Result[None, E].write(data: bytes) -> Result[int, E].truncate(size: int | None = None) -> Result[int, E].getvalue() -> bytes.into_bytes() -> bytes.remaining() -> int.BinaryRead[T].read(endian: Endian) -> Result[T, E].BinaryWrite[T].write(value: T, endian: Endian) -> Result[None, E].
Errors and compatibility¶
- Operations must surface failure through ordinary
Resultreturns unless a helper is explicitly documented otherwise. - Error payloads should be actionable, including at minimum the failed operation, the requested size or delimiter where relevant, and the cursor position when that improves debugging.
- This RFC is additive. It does not change existing filesystem or builtin contracts.
std.iohelpers must not requirerust::knowledge in ordinary documentation or examples.- If RFC 009's sized numeric model changes materially before implementation, the width-specific helper signatures in this RFC must be updated to match that final language contract rather than silently drifting.
Design details¶
Why std.io is separate from std.fs¶
BytesIO solves a different problem than file handles. It helps when the bytes are already in memory. That includes tests, network payloads, decompressed buffers, and parser stages after a file has already been read. Keeping std.io separate avoids turning the filesystem module into a generic "everything binary" bucket.
Python-shaped surface, Rust-backed semantics¶
The surface should feel familiar to Python users: BytesIO(data), read, write, tell, seek, and getvalue are all recognizable from io.BytesIO. But the substrate is Rust, and Rust gives a few extra semantics that are worth standardizing instead of hiding.
Rust's Cursor model is the reason BytesIO should be treated as a real writable stream instead of a read-only parser shim. A new cursor starts at the beginning, not the end, and writes overwrite from the current cursor position rather than implying append behavior. Rust also makes rewind() and seek_relative(...) natural convenience operations, so Incan should expose them instead of forcing callers to encode every cursor move through raw seek(...) calls.
Rust's BufRead model also gives a strong case for delimiter-based helpers. read_until and skip_until are not esoteric parser machinery; they are the simple, direct way to handle NUL-terminated strings, line-like records, or bounded marker scans inside binary formats. They belong in a real in-memory binary I/O contract.
Why the numeric helper surface is broad¶
Once std.io commits to exact-width numeric parsing, arbitrary seams become harder to defend. Supporting only little-endian reads or only a couple of widths would leave the API lopsided for no principled reason. The Rust substrate already supports endian-aware conversion for the full sized-integer and sized-float family, and RFC 009 already defines that vocabulary at the language level. The coherent design is therefore: full width family, both endian families for multi-byte values, and matching read/write overloads.
The default int and float aliases do still matter ergonomically, but this RFC does not add separate alias overloads. They should be added only after the compiler has a clear rule for alias overloads that does not duplicate emitted Rust trait impls for i64 and f64.
Why getbuffer() and generic protocols stay out¶
Python's BytesIO.getbuffer() exposes a mutable view over the underlying buffer. That is powerful, but it also introduces aliasing and resize constraints that are not worth standardizing before Incan has a broader borrowed-buffer story. This RFC therefore keeps the safe extraction surface small: getvalue() for a snapshot and into_bytes() for retrieving bytes without exposing mutable buffer aliases.
The same boundary applies to general Reader / Writer protocols. Those may well make sense later for BytesIO, std.fs.File, temporary files, network bodies, or query adapters. But that is a cross-cutting stream-abstraction RFC, not part of the in-memory byte-stream contract itself. RFC 056 should finish the concrete BytesIO design rather than smuggling in a second library proposal.
Interaction with temporary storage¶
Spill-to-disk behavior does not belong in BytesIO. Python puts that concept in tempfile.SpooledTemporaryFile, not in io.BytesIO, and Incan keeps the same separation. RFC 056 is about pure in-memory streams; RFC 010 standardizes spooled temporary files in std.tempfile, aligned with BytesIO where practical without defining BytesIO as a magical disk-spilling stream.
Alternatives considered¶
- No
std.io; use slicing and builtins only — too low-level and repetitive for real parsers. - Fold
BytesIOintostd.fs— rejected because in-memory byte streams are not path-based filesystem APIs. - Expose only a Rust-shaped
CursorAPI — exposes substrate vocabulary instead of an Incan-facing contract. - Require separate
struct-style unpacking for every numeric read and write — workable, but worse ergonomically for the common fixed-width cases. - Include spill-to-disk behavior directly in
BytesIO— rejected because storage policy and tempfile lifecycle are separate concerns better handled instd.tempfile.
Drawbacks¶
std.iois a modest but real additional stdlib surface to maintain.- The full numeric helper family creates a larger testing matrix than a tiny parser-only API would.
- Excluding buffer-view APIs means some zero-copy workflows will still need Rust interop or a later dedicated buffer abstraction.
Implementation architecture¶
(Non-normative.) A practical delivery implements BytesIO as a normal Incan stdlib type backed by Rust cursor and buffer primitives, with exact-width conversions delegated to Rust's primitive byte-conversion helpers. The public API should stay Incan-first even when the runtime maps directly onto Cursor<Vec<u8>>-like semantics underneath.
Implementation plan¶
- Stdlib module and registry wiring — Add
std.ioto the stdlib namespace registry and implementBytesIOas an authored stdlib module using direct Rust interop, following thestd.fslayout rather than adding a custom Rust extern shim. - Core byte and cursor behavior — Implement construction,
read,read_exact,read_until,skip_until,tell,seek,rewind,seek_relative,write,truncate,getvalue,into_bytes, andremainingwith the normative cursor and buffer semantics above. - Numeric helper family — Implement the RFC 009-aligned trait-backed exact-width read/write overloads, including endian variants.
- Tests and snapshots — Add focused compile/run coverage for the stdlib module, cursor movement, EOF behavior, delimiter behavior, overwrite semantics, truncation, and numeric round trips. Add codegen or stdlib loader coverage where registry wiring can regress silently.
- User-facing docs and versioning — Add a curated
std.ioreference page, update stdlib navigation and release notes, regenerate generated language reference data when registry output changes, and bump the active development version for the implementation.
Layers affected¶
- Stdlib / runtime (
incan_stdlib): newstd.iomodule and theBytesIOtype. - Language surface: the module, constructor, and methods must be available as specified.
- Builtin numeric surface: numeric helper signatures depend on RFC 009's sized numeric types and aliases.
- LSP / tooling: completions and hovers for
std.io. - Docs / examples: binary parsing examples should use
std.io.BytesIOinstead ofrust::recipes for the common in-memory path.
Design Decisions¶
BytesIOuses direct construction:BytesIO(data), notBytesIO.new(data).BytesIOis a writable, seekable, in-memory binary stream rather than a read-only parser cursor.- The committed contract includes
read_until,skip_until,rewind,seek_relative,truncate,getvalue,into_bytes, andremaining. - Numeric overloads cover the full RFC 009 width family, with both endian families for multi-byte values and matching write helpers.
- Convenience overloads for
intandfloatare deferred until alias overload emission has a compiler-level rule. BytesIOdoes not includeclose(),getbuffer(), spill-to-disk behavior, or genericReader/Writerprotocols.
Implementation log¶
- Add
std.ionamespace registration and authored stdlib source. - Implement
BytesIOconstruction, byte reads/writes, cursor movement, delimiter helpers, truncation, and buffer extraction. - Implement trait-backed exact-width numeric read and write overloads for the RFC 009 integer and float family.
- Add tests for core stream semantics and numeric round trips.
- Add the curated
std.ioreference page and docs navigation updates. - Update release notes and active development version.
- Regenerate generated language reference output after registry changes.
- Run the repo verification gate before closeout.