RFC 056: std.io — in-memory byte streams and binary parsing helpers¶
- Status: Planned
- Created: 2026-04-13
- Author(s): Danny Meijer (@dannymeijer)
- Related:
- RFC 005 (Rust interop)
- RFC 009 (sized integers)
- RFC 010 (temporary filesystem objects)
- RFC 022 (namespaced stdlib modules and compiler handoff)
- RFC 023 (compilable stdlib and Rust module binding)
- RFC 041 (first-class Rust interop authoring)
- RFC 055 (
std.fspath-centric filesystem APIs)
- Issue: https://github.com/dannys-code-corner/incan/issues/291
- RFC PR: —
- Written against: v0.2
- Shipped in: —
Summary¶
This RFC introduces std.io as Incan's in-memory binary I/O module. Its core abstraction is BytesIO, a writable and seekable byte stream over an in-memory bytes buffer for binary parsing, protocol work, fixtures, and transformation pipelines that should not depend on filesystem paths. The user-facing shape is intentionally recognizable to Python users coming from io.BytesIO, while the underlying semantics also take advantage of Rust's Cursor and BufRead model where that produces a cleaner complete contract for cursor movement, delimiter-based reads, and exact-width numeric helpers.
Motivation¶
Not all binary input starts as a file on disk. Tests, network clients, generated fixtures, embedded assets, decompression stages, and parser pipelines often begin with a bytes value that is already in memory. Those users should not need to import std.fs or drop into rust::std::io::Cursor just to move a cursor, read exact byte counts, skip until a delimiter, or decode fixed-width numbers. std.io gives Incan a standard home for that in-memory story while staying separate from path and OS-file concerns. That separation matters for real binary formats: std.fs should get bytes into memory or stream them from a file, while std.io.BytesIO should make the parsing and re-encoding work itself possible in pure Incan.
Goals¶
- Provide a
BytesIO-like type overbytesfor in-memory binary parsing and rewriting. - Standardize cursor, exact-read, delimiter-read, and overwrite semantics so users do not hand-roll slicing logic for every parser.
- Commit to a complete exact-width numeric read and write surface aligned with RFC 009, including both endian families for multi-byte values.
- Keep
std.ioindependent from filesystem path APIs; users should be able to parse abytesvalue without importingstd.fs. - Make the pure-Incan binary parsing story strong enough for real format readers such as GGUF-style metadata and tensor-descriptor parsers.
Non-Goals¶
- Standardizing filesystem file handles; that belongs to RFC 055.
- Defining async networking or async stream protocols here.
- Mirroring Python's entire
iohierarchy. - Introducing spill-to-disk behavior in
BytesIO; spooled temporary storage belongs instd.tempfile, not here. - Defining general
Reader/Writerprotocol families in this RFC. - Reproducing Rust
Read/Seek/BufReadtrait names one-to-one as the user-facing surface.
Guide-level explanation¶
Authors use std.io.BytesIO when they already have a bytes value and want to parse or rewrite it incrementally.
from std.io import BytesIO
buf = BytesIO(data)
magic = buf.read_exact(4)?
version = buf.read_u32_le()?
metadata_count = buf.read_u64_le()?
payload = buf.read_until(0u8)?
remaining = buf.remaining()
BytesIO is also writable. It overwrites from the current cursor position unless the caller explicitly seeks elsewhere first.
from std.io import BytesIO
out = BytesIO()
out.write(b"GGUF")?
out.write_u32_le(3u32)?
out.write_u64_le(42u64)?
blob = out.into_bytes()
The mental model is: std.fs gets bytes into or out of files, while std.io walks through and rewrites bytes already in memory.
Reference-level explanation¶
Module split and compatibility target¶
- The standard library must expose
std.iofor in-memory byte-stream reading, writing, and cursor semantics. std.iois deliberately separate fromstd.fs: open OS-file handles belong to the filesystem module, whileBytesIOoperates on already-materializedbytes.- The surface should be recognizable to Python users coming from
io.BytesIO, but the committed contract is broader than Python's minimal cursor methods because Incan also standardizes explicit numeric parsing helpers. - The committed numeric helper surface depends on RFC 009. Width-specific reads and writes must use the sized numeric vocabulary introduced there.
- Implementations may use Rust
std::io::Cursor,BufRead, and primitive byte-conversion helpers internally, but user-visible semantics are defined by this RFC and stdlib docs, not by Rust trait names.
Required capabilities (committed contract)¶
The std.io contract commits to the following BytesIO surface:
- Direct construction:
BytesIO(initial: bytes = b"") -> BytesIO. - Byte reads:
read(size: int = -1) -> Result[bytes, E],read_exact(size: int) -> Result[bytes, E]. - Delimiter helpers:
read_until(byte: u8) -> Result[bytes, E],skip_until(byte: u8) -> Result[int, E]. - Cursor helpers:
tell() -> int,seek(offset: int, whence: int = 0) -> Result[int, E],rewind() -> Result[(), E],seek_relative(offset: int) -> Result[(), E]. - Byte writes:
write(data: bytes) -> Result[int, E],truncate(size: int | None = None) -> Result[int, E]. - Buffer extraction and inspection:
getvalue() -> bytes,into_bytes() -> bytes,remaining() -> int. - Exact-width numeric reads and writes aligned with RFC 009.
Normative cursor and buffer semantics¶
Cursor behavior is normative:
- A newly constructed
BytesIO(initial)starts with its cursor at position0. read(size)must return at mostsizebytes and must advance the cursor by the returned byte count.read(size)withsize = -1must return the remaining bytes.read(size)at EOF must return an emptybytesvalue.read_exact(size)must fail if fewer thansizebytes remain.seek(offset, whence)must follow the Python-stylewhencemodel:0for start,1for current position, and2for end.rewind()is the convenience form of seeking to the start of the buffer.seek_relative(offset)moves relative to the current cursor position and must fail if the resulting position would be invalid.
Write behavior is also normative:
BytesIOis always readable, writable, and seekable; separatereadable()/writable()/seekable()predicates are not part of the committed surface.write(data)writes from the current cursor position. It does not imply append semantics unless the caller has already moved the cursor to the end.write(data)must either write the full byte slice or fail. Partial-write behavior is not part of the user-visible contract for an in-memory buffer.truncate(size=None)must shrink or extend the buffer tosize; whensizeis omitted, it uses the current cursor position.
Delimiter behavior is normative:
read_until(byte)must return bytes up to and including the delimiter when the delimiter is found.read_until(byte)must return the remaining bytes when EOF is reached before the delimiter.skip_until(byte)must discard bytes until the delimiter or EOF and return the total number of discarded bytes, including the delimiter when it is found.read_until(byte)andskip_until(byte)must return0-length /0-count results at EOF.
Buffer extraction behavior is normative:
getvalue()returns abytessnapshot of the buffer contents.into_bytes()consumes theBytesIOvalue and returns the underlyingbyteswithout promising a copy.remaining()returns the number of unread bytes from the current cursor position to the logical end of the buffer.
Numeric helper surface¶
The numeric helper surface is committed, not tentative:
- One-byte reads and writes:
read_u8() -> Result[u8, E],read_i8() -> Result[i8, E],write_u8(value: u8) -> Result[(), E],write_i8(value: i8) -> Result[(), E]. - Unsigned integer reads and writes for
u16,u32,u64, andu128in both endian families:read_u16_le,read_u16_be,read_u32_le,read_u32_be, and so on throughu128; matchingwrite_u16_le,write_u16_be,write_u32_le,write_u32_be, and so on throughu128. - Signed integer reads and writes for
i16,i32,i64, andi128in both endian families:read_i16_le,read_i16_be,read_i32_le,read_i32_be, and so on; matching write helpers. - Floating-point reads and writes for
f32andf64in both endian families:read_f32_le,read_f32_be,read_f64_le,read_f64_be; matching write helpers. - Endianness suffixes are not used for
u8andi8, because byte order is meaningless for one-byte values. - Convenience aliases for Incan defaults are part of the surface:
read_int_le,read_int_be,write_int_le,write_int_be,read_float_le,read_float_be,write_float_le, andwrite_float_be. - RFC 009 defines
intas the ergonomic alias fori64andfloatas the ergonomic alias forf64, so those convenience helpers are aliases for the correspondingi64andf64forms rather than independent numeric contracts.
Expected API shape (skeletal)¶
BytesIO¶
BytesIO(initial: bytes = b"") -> BytesIO.read(size: int = -1) -> Result[bytes, E].read_exact(size: int) -> Result[bytes, E].read_until(byte: u8) -> Result[bytes, E].skip_until(byte: u8) -> Result[int, E].tell() -> int.seek(offset: int, whence: int = 0) -> Result[int, E].rewind() -> Result[(), E].seek_relative(offset: int) -> Result[(), E].write(data: bytes) -> Result[int, E].truncate(size: int | None = None) -> Result[int, E].getvalue() -> bytes.into_bytes() -> bytes.remaining() -> int.read_u8() -> Result[u8, E],read_i8() -> Result[i8, E].read_u16_le() -> Result[u16, E],read_u16_be() -> Result[u16, E], and corresponding helpers foru32,u64, andu128.read_i16_le() -> Result[i16, E],read_i16_be() -> Result[i16, E], and corresponding helpers fori32,i64, andi128.read_f32_le() -> Result[f32, E],read_f32_be() -> Result[f32, E],read_f64_le() -> Result[f64, E],read_f64_be() -> Result[f64, E].- Matching
write_*helpers for every committed numeric read helper. read_int_le() -> Result[int, E],read_int_be() -> Result[int, E],read_float_le() -> Result[float, E],read_float_be() -> Result[float, E].write_int_le(value: int) -> Result[(), E],write_int_be(value: int) -> Result[(), E],write_float_le(value: float) -> Result[(), E],write_float_be(value: float) -> Result[(), E].
Errors and compatibility¶
- Operations must surface failure through ordinary
Resultreturns unless a helper is explicitly documented otherwise. - Error payloads should be actionable, including at minimum the failed operation, the requested size or delimiter where relevant, and the cursor position when that improves debugging.
- This RFC is additive. It does not change existing filesystem or builtin contracts.
std.iohelpers must not requirerust::knowledge in ordinary documentation or examples.- If RFC 009's sized numeric model changes materially before implementation, the width-specific helper signatures in this RFC must be updated to match that final language contract rather than silently drifting.
Design details¶
Why std.io is separate from std.fs¶
BytesIO solves a different problem than file handles. It helps when the bytes are already in memory. That includes tests, network payloads, decompressed buffers, and parser stages after a file has already been read. Keeping std.io separate avoids turning the filesystem module into a generic "everything binary" bucket.
Python-shaped surface, Rust-backed semantics¶
The surface should feel familiar to Python users: BytesIO(data), read, write, tell, seek, and getvalue are all recognizable from io.BytesIO. But the substrate is Rust, and Rust gives a few extra semantics that are worth standardizing instead of hiding.
Rust's Cursor model is the reason BytesIO should be treated as a real writable stream instead of a read-only parser shim. A new cursor starts at the beginning, not the end, and writes overwrite from the current cursor position rather than implying append behavior. Rust also makes rewind() and seek_relative(...) natural convenience operations, so Incan should expose them instead of forcing callers to encode every cursor move through raw seek(...) calls.
Rust's BufRead model also gives a strong case for delimiter-based helpers. read_until and skip_until are not esoteric parser machinery; they are the simple, direct way to handle NUL-terminated strings, line-like records, or bounded marker scans inside binary formats. They belong in a real in-memory binary I/O contract.
Why the numeric helper surface is broad¶
Once std.io commits to exact-width numeric parsing, arbitrary seams become harder to defend. Supporting only little-endian reads or only a couple of widths would leave the API lopsided for no principled reason. The Rust substrate already supports endian-aware conversion for the full sized-integer and sized-float family, and RFC 009 already defines that vocabulary at the language level. The coherent design is therefore: full width family, both endian families for multi-byte values, and matching write helpers.
The default int and float aliases do still matter ergonomically, so read_int_le / read_int_be and read_float_le / read_float_be are good additions. But they are just shorthand over the exact-width forms, not a second independent numeric model.
Why getbuffer() and generic protocols stay out¶
Python's BytesIO.getbuffer() exposes a mutable view over the underlying buffer. That is powerful, but it also introduces aliasing and resize constraints that are not worth standardizing before Incan has a broader borrowed-buffer story. This RFC therefore keeps the safe extraction surface small: getvalue() for a snapshot and into_bytes() for ownership transfer.
The same boundary applies to general Reader / Writer protocols. Those may well make sense later for BytesIO, std.fs.File, temporary files, network bodies, or query adapters. But that is a cross-cutting stream-abstraction RFC, not part of the in-memory byte-stream contract itself. RFC 056 should finish the concrete BytesIO design rather than smuggling in a second library proposal.
Interaction with temporary storage¶
Spill-to-disk behavior does not belong in BytesIO. Python puts that concept in tempfile.SpooledTemporaryFile, not in io.BytesIO, and Incan should keep the same separation. RFC 056 is about pure in-memory streams. If Incan later standardizes spooled temporary files in RFC 010, that type should align with BytesIO where practical without being defined as a magical disk-spilling BytesIO.
Alternatives considered¶
- No
std.io; use slicing and builtins only — too low-level and repetitive for real parsers. - Fold
BytesIOintostd.fs— rejected because in-memory byte streams are not path-based filesystem APIs. - Expose only a Rust-shaped
CursorAPI — exposes substrate vocabulary instead of an Incan-facing contract. - Require separate
struct-style unpacking for every numeric read and write — workable, but worse ergonomically for the common fixed-width cases. - Include spill-to-disk behavior directly in
BytesIO— rejected because storage policy and tempfile lifecycle are separate concerns better handled instd.tempfile.
Drawbacks¶
std.iois a modest but real additional stdlib surface to maintain.- The full numeric helper family creates a larger testing matrix than a tiny parser-only API would.
- Excluding buffer-view APIs means some zero-copy workflows will still need Rust interop or a later dedicated buffer abstraction.
Implementation architecture¶
(Non-normative.) A practical delivery implements BytesIO as a normal Incan stdlib type backed by Rust cursor and buffer primitives, with exact-width conversions delegated to Rust's primitive byte-conversion helpers. The public API should stay Incan-first even when the runtime maps directly onto Cursor<Vec<u8>>-like semantics underneath.
Layers affected¶
- Stdlib / runtime (
incan_stdlib): newstd.iomodule and theBytesIOtype. - Language surface: the module, constructor, and methods must be available as specified.
- Builtin numeric surface: numeric helper signatures depend on RFC 009's sized numeric types and aliases.
- LSP / tooling: completions and hovers for
std.io. - Docs / examples: binary parsing examples should use
std.io.BytesIOinstead ofrust::recipes for the common in-memory path.
Design Decisions¶
BytesIOuses direct construction:BytesIO(data), notBytesIO.new(data).BytesIOis a writable, seekable, in-memory binary stream rather than a read-only parser cursor.- The committed contract includes
read_until,skip_until,rewind,seek_relative,truncate,getvalue,into_bytes, andremaining. - Numeric helpers cover the full RFC 009 width family, with both endian families for multi-byte values and matching write helpers.
- Convenience aliases for
intandfloatare included, but the exact-width forms remain canonical. BytesIOdoes not includeclose(),getbuffer(), spill-to-disk behavior, or genericReader/Writerprotocols.