RFC 030: std.collections — extended collection types¶
- Status: Draft
- Created: 2026-03-06
- Author(s): Danny Meijer (@dannymeijer)
- Related: RFC 022 (stdlib namespacing), RFC 023 (compilable stdlib), RFC 028 (operator overloading)
- Issue: #164
- RFC PR: —
- Written against: v0.2
- Shipped in: —
Summary¶
Introduce std.collections as Incan's standard library namespace for non-builtin container types that are common, user-facing, and semantically distinct from List, Dict, Set, and Tuple. The north-star module includes queue, multiset, ordered-map, ordered-set, sorted-map, sorted-set, default-valued map, layered-map, and priority-queue surfaces: Deque[T], Counter[T], DefaultDict[K, V], OrderedDict[K, V], OrderedSet[T], SortedDict[K, V], SortedSet[T], ChainMap[K, V], and PriorityQueue[T]. These are ordinary stdlib types under the RFC 022 / RFC 023 model: imported explicitly, specified in Incan-facing terms, and backed by Rust implementations where that is the practical runtime strategy.
This RFC is not a proposal to turn every interesting Rust or Python container into a builtin or direct re-export. It is a commitment that Incan should provide a coherent user-facing collections module with first-class, batteries-included types for the collection shapes that repeatedly matter in real application, analytics, and tooling code.
Core model¶
std.collections sits above the builtin collection floor:
- builtins remain the default, always-available containers for general-purpose code
std.collectionsprovides opt-in specialized containers with distinct semantics- these types are library types, not parser keywords or compiler primitives
- the public contract is Pythonic at the surface where that improves DX, while still taking advantage of Rust-backed implementations where they are clearly stronger
The intended north-star module surface is:
Deque[T]: efficient double-ended queueCounter[T]: multiset / counted occurrencesDefaultDict[K, V]: mapping with default-value behavior on missing-key accessOrderedDict[K, V]: insertion-ordered mappingOrderedSet[T]: insertion-ordered setSortedDict[K, V]: key-sorted mapping with deterministic order and range-oriented behaviorSortedSet[T]: value-sorted set with deterministic order and range-oriented behaviorChainMap[K, V]: layered lookup across multiple maps and record-like layersPriorityQueue[T]: heap-backed priority queue
Motivation¶
Incan's builtins cover the common floor:
| Builtin | Rust backing | Mutable? |
|---|---|---|
List[T] |
Vec<T> |
Yes |
Dict[K, V] |
HashMap<K, V> |
Yes |
Set[T] |
HashSet<T> |
Yes |
Tuple[A, B, ...] |
(A, B, ...) |
Immutable |
FrozenList[T] |
Vec<T> (immutable API) |
No |
FrozenSet[T] |
HashSet<T> (immutable API) |
No |
FrozenDict[K, V] |
HashMap<K, V> (immutable API) |
No |
But many real programs need specialized collection semantics that are still too common to leave to ad hoc userland wrappers:
# Counting occurrences — today requires manual Dict bookkeeping
word_counts: Dict[str, int] = {}
for word in words:
if word in word_counts:
word_counts[word] += 1
else:
word_counts[word] = 1
# With std.collections:
from std.collections import Counter
word_counts = Counter.from_iter(words) # Done.
# Queue with efficient push/pop from both ends
from std.collections import Deque
queue: Deque[str] = Deque()
queue.push_back("first")
queue.push_front("urgent")
item = queue.pop_front() # "urgent"
# Layered config / overlay lookup
from std.collections import ChainMap
defaults = {"region": "eu-west-1", "retries": 3}
override = {"region": "us-east-1"}
cfg = ChainMap(override, defaults)
print(cfg["region"]) # "us-east-1"
print(cfg["retries"]) # 3
# Deterministic sorted keys with range-oriented traversal
from std.collections import SortedDict
prices = SortedDict({
"apple": 2.40,
"banana": 1.80,
"pear": 2.10,
})
for key, value in prices.items():
print(key, value)
Python's collections module proves the user demand, while Rust's std::collections and adjacent ecosystem give us stronger backing choices for ordered, sorted, and queue-like structures. Incan should not mirror either language blindly. It should standardize the collection types that actually make the language feel complete for systems, data, and library work.
Goals¶
- Provide a coherent north-star
std.collectionsmodule rather than a narrow two-type sketch. - Keep builtins and specialized containers clearly separated.
- Make Pythonic surfaces first-class where that improves usability.
- Use Rust-backed implementations where they materially improve correctness, determinism, or performance.
- Include ordered and sorted collection families explicitly rather than forcing everything through future
Dictredesign speculation. - Support layered lookup as a general collection utility, including model-heavy Incan workflows.
Non-Goals¶
- Making these collection types compiler builtins.
- Re-exporting Rust container APIs wholesale.
- Freezing every method and convenience helper down to the last alias in this draft.
- Solving the entire future
Dict/FrozenDictredesign space here. - Settling every comparison-policy detail for
PriorityQueue[T]in this draft.
Guide-level explanation (how users think about it)¶
Importing collection types¶
Collection types in this RFC live under std.collections:
from std.collections import Counter, Deque, OrderedDict, SortedSet
Design principle: specialized collection semantics, not near-duplicate builtins¶
Users should reach for std.collections when the semantics are the point:
- use
Dequewhen both ends matter - use
Counterwhen counts are the data model - use
DefaultDictwhen missing-key defaulting is intentional - use
OrderedDict/OrderedSetwhen insertion order is meaningful and stable - use
SortedDict/SortedSetwhen sorted order and range traversal matter - use
ChainMapwhen layered lookup is the model - use
PriorityQueuewhen heap semantics are the model
The builtin collections remain the right default when none of those semantics matter.
Deque[T]¶
Incan bridges the Python and Rust worlds. Where method naming conventions diverge sharply between the two, std.collections may offer both as aliases. Deque is the clearest case:
| Python convention | Rust convention | Both work in Incan |
|---|---|---|
deque.append(x) |
push_back(x) |
deque.append(x) / deque.push_back(x) |
deque.appendleft(x) |
push_front(x) |
deque.appendleft(x) / deque.push_front(x) |
deque.pop() |
pop_back() |
deque.pop() / deque.pop_back() |
deque.popleft() |
pop_front() |
deque.popleft() / deque.pop_front() |
Aliases are true synonyms. Neither spelling is deprecated. This RFC does not assume every collection type needs dual Python/Rust naming; Deque gets it because both ecosystems use sharply different, equally common names for the same operations.
from std.collections import Deque
tasks: Deque[str] = Deque()
tasks.append("low priority")
tasks.appendleft("urgent")
next_task = tasks.popleft()
Counter[T]¶
Counter is the collection you use when multiplicity matters:
from std.collections import Counter
counts = Counter.from_iter(["apple", "banana", "apple"])
print(counts["apple"]) # 2
print(counts.most_common(1)) # [("apple", 2)]
DefaultDict[K, V]¶
DefaultDict is a first-class type in Incan, not a postponed Dict redesign idea. It exists because missing-key default behavior is semantically meaningful and common enough to deserve its own name:
from std.collections import DefaultDict
groups = DefaultDict[List[str]](...)
groups["a"].append("x")
The exact constructor/default-factory contract is still open in this draft, but the type itself is not.
OrderedDict[K, V] and OrderedSet[T]¶
These preserve insertion order as part of the contract:
from std.collections import OrderedDict
headers = OrderedDict()
headers["x-request-id"] = "abc"
headers["content-type"] = "application/json"
Stable insertion order matters for display, deterministic serialization, protocol-shaped data, and user-facing tooling. Incan should expose that explicitly instead of pretending ordinary hash maps are enough for every case.
SortedDict[K, V] and SortedSet[T]¶
These are key-sorted / value-sorted collections with deterministic ordering and range-friendly behavior:
from std.collections import SortedSet
ids = SortedSet([5, 2, 9, 2])
for id in ids:
print(id) # 2, 5, 9
This is one place where Rust gives Incan a better north-star than Python's stdlib alone. Sorted collections are common enough in analytics and deterministic processing to deserve first-class support.
ChainMap[K, V]¶
ChainMap is the general layered-lookup collection:
from std.collections import ChainMap
cfg = ChainMap({"region": "us-east-1"}, {"region": "eu-west-1", "retries": 3})
print(cfg["region"]) # "us-east-1"
print(cfg["retries"]) # 3
In Incan, ChainMap must also work sensibly with model-heavy code. A model layer participates through a field-overlay view:
model Defaults:
region: str = "eu-west-1"
retries: int = 3
cfg = ChainMap({"region": "us-east-1"}, Defaults())
That does not mean models become dicts. It means ChainMap supports both mapping layers and record-like layers intentionally.
PriorityQueue[T]¶
PriorityQueue belongs in the module scope. Heap semantics are distinct enough from List/Deque to justify a first-class type. The remaining open design work is the exact ordering contract, not whether the type belongs here.
Reference-level explanation (precise rules)¶
Namespace registration¶
std.collections must remain an ordinary stdlib namespace under the RFC 022 / RFC 023 model. It is not a compiler keyword surface, and its types should be imported explicitly rather than treated as global builtins.
Public type set¶
The north-star public surface standardized by this RFC is:
Deque[T]Counter[T]DefaultDict[K, V]OrderedDict[K, V]OrderedSet[T]SortedDict[K, V]SortedSet[T]ChainMap[K, V]PriorityQueue[T]
Additional collection types may be added later, but the module should already read as a complete, deliberate contract rather than a two-type placeholder.
Interaction with existing features¶
- Builtins:
std.collectionstypes are distinct from builtins.List/Dict/Setremain the always-available default containers. - Frozen builtins:
FrozenList,FrozenDict, andFrozenSetremain builtin/foundation surfaces. This RFC does not relocate them. - Generics: All
std.collectionstypes are generic where appropriate and follow the normal builtin generic rules. - Iteration: All collection types participate in ordinary
for-loop iteration through standard collection protocols. - Serialization: Ordered and sorted collections must preserve their defined order in any order-sensitive serialization or display surface; unordered collections need not.
- Models and records:
ChainMapmay accept record/model layers via a field-overlay view. Those layers are read-only inChainMapunless a separate mutable-record contract is standardized later.
Semantics by type¶
Deque[T]¶
- double-ended queue semantics are the point of the type
- both Python-style and Rust-style end-operation names are true aliases
- iteration order is front-to-back
- indexed access is allowed, but random access is not the motivation for the type
- Rust backing:
VecDeque<T>
Counter[T]¶
Counter[T]models counted membership rather than plain set membership- missing keys read as zero
- the type supports
update,subtract,most_common,total, and element expansion helpers - arithmetic-style combination belongs naturally on the type
- the exact count-sign contract remains open in this draft
DefaultDict[K, V]¶
- missing-key access materializes and stores a default value according to the collection's configured defaulting rule
DefaultDictis a distinct public type, not merely documentation sugar forDict- ordinary
Dictremains non-defaulting
OrderedDict[K, V] and OrderedSet[T]¶
- insertion order is preserved by iteration and order-sensitive serialization
- reinserting an existing key/value does not create a duplicate entry
- Rust backing:
IndexMap/IndexSetor equivalent ordered hash collections
SortedDict[K, V] and SortedSet[T]¶
- iteration order is sorted order
- the types support order-aware traversal and range-oriented operations
- keys/values must satisfy the language's ordering requirements for sorted collections
- Rust backing:
BTreeMap/BTreeSet
ChainMap[K, V]¶
- lookup walks layers from first to last; earlier layers override later layers
- writes go to the first writable mapping layer by default
- mapping layers are ordinary map-like collections
- model/record layers participate through field names
- model/record layers are read-only in this RFC
- nested models are not flattened automatically
- this is a general collection utility;
ctxmay use aChainMap-like overlay internally, butChainMapis not defined in terms ofctx
PriorityQueue[T]¶
- heap semantics are the point of the type
- the underlying backing is heap-based (
BinaryHeapor equivalent) - the exact public ordering contract remains open in this draft
Compatibility / migration¶
This RFC is additive. It introduces new opt-in stdlib types under a new namespace and does not change the meaning of builtin collection types.
Design details¶
Python and Rust influence¶
The module should be designed from Incan's point of view, not as a copy of either source ecosystem:
- Python gives the strongest user-facing intuition for
Deque,Counter,DefaultDict,OrderedDict, andChainMap - Rust gives stronger backing choices and vocabulary for
VecDeque,BTreeMap,BTreeSet,BinaryHeap, and theIndexMap/IndexSetfamily - Incan should standardize the public semantics that make sense, then back them with Rust implementations where that is the cleanest runtime strategy
Why separate map and set types still make sense¶
This RFC does not accept the idea that ordered/default/sorted map behavior must wait for a future monolithic Dict redesign. Those collection semantics are important enough, and common enough, that distinct first-class stdlib types are justified now. They are not near-duplicate noise; they are honest semantic distinctions:
- ordinary hash map semantics
- defaulting map semantics
- insertion-ordered map semantics
- sorted map semantics
Trying to force all of that into one future Dict redesign would leave the current collections story underpowered for no real gain.
Why ChainMap belongs here¶
ChainMap is not just "proto-ctx". It is a general collection for layered lookup. But RFC 033 is still relevant: the precedence intuition should align. Earlier layers override later layers. The cleaner dependency direction is that ctx may use a ChainMap-like overlay internally, not that ChainMap is defined in terms of ctx.
Alternatives considered¶
Keep the RFC intentionally narrow¶
Keep std.collections limited to Deque and Counter. Rejected because it reads like a cautious implementation sketch rather than a credible north-star collections module.
Force ordered/default behavior into future Dict redesign only¶
Rejected because it postpones clearly useful collection semantics behind a more abstract future design question. Distinct public types are justified here.
Make all collection types builtins¶
Rejected because these are specialized containers, not global-language defaults.
Re-export Rust collections directly¶
Rejected because that would leak Rust names and backend details into the public contract instead of giving Incan a deliberate collections story.
Leave all specialized collections to third-party libraries¶
Rejected because this is basic language completeness territory, not an exotic ecosystem extension.
Drawbacks¶
- Stdlib surface area grows materially.
- Some collection families overlap conceptually with builtins, so documentation quality matters.
ChainMapbecomes more subtle once model/record layers are supported explicitly.PriorityQueue[T]is in scope before its ordering contract is fully settled, so the draft must stay honest about that unresolved point.
Layers affected¶
- Stdlib registry —
std.collectionsremains a registered stdlib namespace. - Stdlib source — public Incan-facing declarations, docs, examples, and protocol integration for the full type set.
- Stdlib runtime — Rust-backed implementations over
VecDeque,HashMap,IndexMap,IndexSet,BTreeMap,BTreeSet,BinaryHeap, and equivalent runtime structures where appropriate. - Typechecker / protocol surface — generic collection typing, iteration behavior, ordering constraints for sorted collections, and record-layer participation in
ChainMap. - Serialization / docs / tooling — deterministic-order behavior for ordered and sorted types must be documented and surfaced consistently in docs, completions, and examples.
Design Decisions¶
std.collectionsis a full north-star module, not a narrowDeque + Counterplaceholder.DefaultDict,OrderedDict, andOrderedSetare first-class public types in this RFC.SortedDictandSortedSetbelong in the stdlib surface and should be backed by sorted-tree structures.ChainMapremains in scope and should align with RFC 033's precedence intuition without being defined in terms ofctx.ChainMapsupports both mapping layers and record/model layers; record/model layers are read-only field overlays in this RFC.PriorityQueue[T]remains in scope even though one core policy question is still unresolved.NamedTupleis out of scope; Incanmodelalready covers the named-record use case better.FrozenDequeis out of scope for now; this RFC is about specialized mutable/runtime collection semantics first.
Unresolved questions¶
- What is the public ordering contract for
PriorityQueue[T]: max-first only, min-first only, or a construction-time policy? - Should
Counter[T]counts be fully signed, or should the core contract remain non-negative except where arithmetic helpers explicitly produce negatives? - What is the best first-class construction contract for
DefaultDict[K, V]: default value, default factory, or both?