Chapter 44Collections And Algorithms

Collections and Algorithms

Overview

With the stdlib index providing your map, you now dive into Zig’s collection types—the workhorses of data manipulation. This chapter explores dynamic arrays (ArrayList), hash tables (HashMap and variants), priority structures (PriorityQueue), linked lists, specialized containers like MultiArrayList and SegmentedList, and sorting algorithms (see array_list.zig and hash_map.zig). Each collection embraces Zig’s explicit allocator model, giving you control over memory lifetime and enabling leak detection during testing.

Unlike languages with implicit garbage collection, Zig collections require you to deinit() or transfer ownership explicitly. This discipline, combined with the standard library’s rich suite of adapters (unmanaged variants, sentinel-aware slices, custom contexts), makes collections both powerful and predictable. By chapter’s end, you’ll confidently choose the right structure for your use case and understand the performance trade-offs inherent in each design (see sort.zig).

Learning Goals

  • Use ArrayList(T) for dynamic arrays: append, insert, remove, iterate, and understand reallocation strategies.
  • Employ HashMap and AutoHashMap for key-value lookups with custom hash and equality functions.
  • Leverage PriorityQueue for min/max heap operations and understand comparison contexts (see priority_queue.zig).
  • Apply std.sort for in-place sorting with stable and unstable algorithms (pdqsort, block sort, insertion sort).
  • Recognize specialized structures: MultiArrayList for structure-of-arrays layout, SegmentedList for stable pointers, linked lists for intrusive designs (see multi_array_list.zig and segmented_list.zig).
  • Appreciate allocator impact: how collection growth triggers reallocation and how arenas simplify bulk-free patterns (see 10).

ArrayList: Dynamic Arrays

ArrayList(T) is Zig’s foundational growable array, analogous to C++'s std::vector or Rust’s Vec<T>. It manages a contiguous slice of T values, expanding capacity as needed. You interact with .items (the current slice) and call methods like append, pop, insert, and remove.

Basic Operations

Create an ArrayList by specifying the element type and passing an allocator. Call deinit() when done to free the backing memory.

Zig
const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var list: std.ArrayList(i32) = .empty;
    defer list.deinit(allocator);

    try list.append(allocator, 10);
    try list.append(allocator, 20);
    try list.append(allocator, 30);

    for (list.items, 0..) |item, i| {
        std.debug.print("Item {d}: {d}\n", .{ i, item });
    }

    const popped = list.pop();
    std.debug.print("Popped: {d}\n", .{popped.?});
    std.debug.print("Remaining length: {d}\n", .{list.items.len});
}
Build
Shell
$ zig build-exe arraylist_basic.zig
Run
Shell
$ ./arraylist_basic
Output
Shell
Item 0: 10
Item 1: 20
Item 2: 30
Popped: 30
Remaining length: 2

ArrayList doubles capacity when full (exponential growth), amortizing reallocation cost. You can pre-allocate with try list.ensureTotalCapacity(allocator, n) if you know the final size.

Ownership and Unmanaged Variants

By default, ArrayList(T) stores its allocator internally (managed variant). For more explicit control, use the unmanaged form by accessing .items and .capacity directly, or use the deprecated Unmanaged APIs. The modern pattern is to use the simpler managed form unless you need to decouple allocation from the list itself.

Zig
const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // ArrayList with explicit allocator
    var list: std.ArrayList(u32) = .empty;
    defer list.deinit(allocator);

    try list.append(allocator, 1);
    try list.append(allocator, 2);
    try list.append(allocator, 3);

    std.debug.print("Managed list length: {d}\n", .{list.items.len});

    // Transfer ownership to a slice
    const owned_slice = try list.toOwnedSlice(allocator);
    defer allocator.free(owned_slice);

    std.debug.print("After transfer, original list length: {d}\n", .{list.items.len});
    std.debug.print("Owned slice length: {d}\n", .{owned_slice.len});
}
Build and Run
Shell
$ zig build-exe arraylist_ownership.zig && ./arraylist_ownership
Output
Shell
Managed list length: 3
After transfer, original list length: 0
Owned slice length: 3

toOwnedSlice() empties the list and returns the backing memory as a slice—you become responsible for freeing it with allocator.free(slice).

Insertion and Removal

Beyond append and pop, ArrayList supports mid-array operations. orderedRemove maintains element order (shifts subsequent elements), while swapRemove is O(1) but doesn’t preserve order (swaps with the last element).

Zig
const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var list: std.ArrayList(i32) = .empty;
    defer list.deinit(allocator);

    try list.appendSlice(allocator, &.{ 1, 2, 3, 4 });

    // Insert 99 at index 1
    try list.insert(allocator, 1, 99);
    std.debug.print("After insert at 1: {any}\n", .{list.items});

    // Remove at index 2 (shifts elements)
    _ = list.orderedRemove(2);
    std.debug.print("After orderedRemove at 2: {any}\n", .{list.items});

    // Remove at index 1 (swaps with last, no shift)
    _ = list.swapRemove(1);
    std.debug.print("After swapRemove at 1: {any}\n", .{list.items});
}
Build and Run
Shell
$ zig build-exe arraylist_insert_remove.zig && ./arraylist_insert_remove
Output
Shell
After insert at 1: [1, 99, 2, 3, 4]
After orderedRemove at 2: [1, 99, 3, 4]
After swapRemove at 1: [1, 4, 3]

orderedRemove is O(n) in the worst case (removing the first element requires shifting all others); use swapRemove when order doesn’t matter for O(1) performance.

HashMap: Key-Value Lookups

Zig’s hash map family provides O(1) average-case lookups via open addressing and linear probing. HashMap(K, V, Context, max_load_percentage) requires a context with hash and eql functions. For convenience, AutoHashMap auto-generates these for hashable types, and StringHashMap specializes for []const u8 keys.

StringHashMap Basics

For string keys ([]const u8), use StringHashMap(V) which provides optimized string hashing. Note that AutoHashMap does not support slice types like []const u8 to avoid ambiguity—use StringHashMap instead.

Zig
const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var map = std.StringHashMap(i32).init(allocator);
    defer map.deinit();

    try map.put("foo", 42);
    try map.put("bar", 100);

    if (map.get("foo")) |value| {
        std.debug.print("Value for 'foo': {d}\n", .{value});
    }

    std.debug.print("Contains 'bar': {}\n", .{map.contains("bar")});
    std.debug.print("Contains 'baz': {}\n", .{map.contains("baz")});

    _ = map.remove("foo");
    std.debug.print("After removing 'foo', contains: {}\n", .{map.contains("foo")});
}
Build and Run
Shell
$ zig build-exe hashmap_basic.zig && ./hashmap_basic
Output
Shell
Value for 'foo': 42
Contains 'bar': true
Contains 'baz': false
After removing 'foo', contains: false

Use put to insert or update, get to retrieve (returns ?V), and remove to delete. Check existence with contains without retrieving the value.

StringHashMap for String Keys

When keys are []const u8, use StringHashMap(V) for optimized string hashing. Remember: the map doesn’t copy key memory—you must ensure strings outlive the map or use an arena allocator.

Zig
const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var population = std.StringHashMap(u32).init(allocator);
    defer population.deinit();

    try population.put("Seattle", 750_000);
    try population.put("Austin", 950_000);
    try population.put("Boston", 690_000);

    var iter = population.iterator();
    while (iter.next()) |entry| {
        std.debug.print("City: {s}, Population: {d}\n", .{ entry.key_ptr.*, entry.value_ptr.* });
    }
}
Build and Run
Shell
$ zig build-exe hashmap_string.zig && ./hashmap_string
Output
Shell
City: Seattle, Population: 750000
City: Austin, Population: 950000
City: Boston, Population: 690000

String keys are not duplicated by the map—if you pass stack-allocated or temporary strings, they must remain valid. Use an arena allocator or dupe to manage key lifetimes.

Custom Hash and Equality

For types not supported by autoHash, define a context with custom hash and eql functions.

Zig
const std = @import("std");

const Point = struct {
    x: i32,
    y: i32,
};

const PointContext = struct {
    pub fn hash(self: @This(), p: Point) u64 {
        _ = self;
        var hasher = std.hash.Wyhash.init(0);
        std.hash.autoHash(&hasher, p.x);
        std.hash.autoHash(&hasher, p.y);
        return hasher.final();
    }

    pub fn eql(self: @This(), a: Point, b: Point) bool {
        _ = self;
        return a.x == b.x and a.y == b.y;
    }
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var map = std.HashMap(Point, []const u8, PointContext, 80).init(allocator);
    defer map.deinit();

    try map.put(.{ .x = 10, .y = 20 }, "Alice");
    try map.put(.{ .x = 30, .y = 40 }, "Bob");

    var iter = map.iterator();
    while (iter.next()) |entry| {
        std.debug.print("Point({d}, {d}): {s}\n", .{ entry.key_ptr.x, entry.key_ptr.y, entry.value_ptr.* });
    }

    std.debug.print("Contains (10, 20): {}\n", .{map.contains(.{ .x = 10, .y = 20 })});
}
Build and Run
Shell
$ zig build-exe hashmap_custom.zig && ./hashmap_custom
Output
Shell
Point(10, 20): Alice
Point(30, 40): Bob
Contains (10, 20): true

The context parameter in HashMap(K, V, Context, max_load_percentage) allows stateful hashing (e.g., salted hashes). For stateless contexts, pass void.

PriorityQueue: Heap-Based Priority Structures

PriorityQueue(T, Context, compareFn) implements a binary min-heap or max-heap depending on your comparison function. It supports add, peek, remove (pop the top element), and removeIndex.

Min-Heap Example

A min-heap pops the smallest element first. The comparison function returns .lt when the first argument should come before the second.

Zig
const std = @import("std");

fn lessThan(context: void, a: i32, b: i32) std.math.Order {
    _ = context;
    return std.math.order(a, b);
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var queue = std.PriorityQueue(i32, void, lessThan).init(allocator, {});
    defer queue.deinit();

    try queue.add(10);
    try queue.add(5);
    try queue.add(20);
    try queue.add(1);

    while (queue.removeOrNull()) |item| {
        std.debug.print("Popped: {d}\n", .{item});
    }
}
Build and Run
Shell
$ zig build-exe priorityqueue_min.zig && ./priorityqueue_min
Output
Shell
Popped: 1
Popped: 5
Popped: 10
Popped: 20

For a max-heap, reverse the comparison logic: return .gt when a < b.

Priority Queue for Task Scheduling

Priority queues excel at scheduling: add tasks with priorities, then always process the highest-priority task first.

Zig
const std = @import("std");

const Task = struct {
    name: []const u8,
    priority: u32,
};

fn compareTasks(context: void, a: Task, b: Task) std.math.Order {
    _ = context;
    // Higher priority comes first (max-heap behavior)
    return std.math.order(b.priority, a.priority);
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var queue = std.PriorityQueue(Task, void, compareTasks).init(allocator, {});
    defer queue.deinit();

    try queue.add(.{ .name = "Documentation", .priority = 1 });
    try queue.add(.{ .name = "Feature request", .priority = 5 });
    try queue.add(.{ .name = "Critical bug", .priority = 10 });

    while (queue.removeOrNull()) |task| {
        std.debug.print("Processing: {s} (priority {d})\n", .{ task.name, task.priority });
    }
}
Build and Run
Shell
$ zig build-exe priorityqueue_tasks.zig && ./priorityqueue_tasks
Output
Shell
Processing: Critical bug (priority 10)
Processing: Feature request (priority 5)
Processing: Documentation (priority 1)

PriorityQueue uses a heap internally, so add is O(log n), peek is O(1), and remove is O(log n).

Sorting

Zig’s std.sort module provides multiple algorithms: insertion (stable, O(n²)), heap (unstable, O(n log n)), pdq (pattern-defeating quicksort, O(n log n) worst-case), and block (stable, O(n log n) with extra memory). The default recommendation is pdq for most use cases.

Basic Sorting

Call std.sort.pdq with a slice, a context, and a lessThan function.

Zig
const std = @import("std");

fn lessThan(context: void, a: i32, b: i32) bool {
    _ = context;
    return a < b;
}

fn greaterThan(context: void, a: i32, b: i32) bool {
    _ = context;
    return a > b;
}

pub fn main() !void {
    var numbers = [_]i32{ 5, 2, 8, 1, 10 };

    std.sort.pdq(i32, &numbers, {}, lessThan);
    std.debug.print("Sorted ascending: {any}\n", .{numbers});

    std.sort.pdq(i32, &numbers, {}, greaterThan);
    std.debug.print("Sorted descending: {any}\n", .{numbers});
}
Build and Run
Shell
$ zig build-exe sort_basic.zig && ./sort_basic
Output
Shell
Sorted ascending: [1, 2, 5, 8, 10]
Sorted descending: [10, 8, 5, 2, 1]

pdq is unstable but fast. Use block if you need stability (equal elements retain their original order).

Sorting Structs

Sort by struct fields by providing a custom comparison function.

Zig
const std = @import("std");

const Person = struct {
    name: []const u8,
    age: u32,
};

fn byAge(context: void, a: Person, b: Person) bool {
    _ = context;
    return a.age < b.age;
}

pub fn main() !void {
    var people = [_]Person{
        .{ .name = "Alice", .age = 30 },
        .{ .name = "Bob", .age = 25 },
        .{ .name = "Charlie", .age = 35 },
    };

    std.sort.pdq(Person, &people, {}, byAge);

    std.debug.print("Sorted by age:\n", .{});
    for (people) |person| {
        std.debug.print("{s}, age {d}\n", .{ person.name, person.age });
    }
}
Build and Run
Shell
$ zig build-exe sort_structs.zig && ./sort_structs
Output
Shell
Sorted by age:
Alice, age 30
Bob, age 25
Charlie, age 35

The context parameter in sorting functions can hold state (e.g., sort direction flags or comparison modifiers). Use anytype for flexibility.

MultiArrayList: Structure-of-Arrays Layout

MultiArrayList(T) stores structs in a structure-of-arrays (SoA) format: each field is stored in its own contiguous array, improving cache locality when accessing individual fields across many elements.

Zig
const std = @import("std");

const Entity = struct {
    id: u32,
    x: f32,
    y: f32,
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var entities = std.MultiArrayList(Entity){};
    defer entities.deinit(allocator);

    try entities.append(allocator, .{ .id = 1, .x = 10.5, .y = 20.3 });
    try entities.append(allocator, .{ .id = 2, .x = 30.1, .y = 40.7 });

    for (0..entities.len) |i| {
        const entity = entities.get(i);
        std.debug.print("Entity {d}: id={d}, x={d}, y={d}\n", .{ i, entity.id, entity.x, entity.y });
    }

    // Access a single field slice for efficient iteration
    const x_coords = entities.items(.x);
    var sum: f32 = 0;
    for (x_coords) |x| {
        sum += x;
    }
    std.debug.print("Sum of x coordinates: {d}\n", .{sum});
}
Build and Run
Shell
$ zig build-exe multiarraylist.zig && ./multiarraylist
Output
Shell
Entity 0: id=1, x=10.5, y=20.3
Entity 1: id=2, x=30.1, y=40.7
Sum of x coordinates: 40.6

Use MultiArrayList when you frequently iterate over a single field (e.g., positions in a game engine) but rarely need the entire struct. This layout maximizes CPU cache efficiency.

SegmentedList: Stable Pointers

SegmentedList(T, prealloc_item_count) grows by allocating fixed-size segments rather than reallocating a single contiguous array. This ensures pointers to elements remain valid across insertions.

Zig
const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var list = std.SegmentedList(i32, 4){};
    defer list.deinit(allocator);

    try list.append(allocator, 10);
    try list.append(allocator, 20);

    // Take a pointer to the first element
    const first_ptr = list.at(0);
    std.debug.print("First item: {d}\n", .{first_ptr.*});

    // Append more items - pointer remains valid!
    try list.append(allocator, 30);

    std.debug.print("First item (after append): {d}\n", .{first_ptr.*});
    std.debug.print("List length: {d}\n", .{list.len});
}
Build and Run
Shell
$ zig build-exe segmentedlist.zig && ./segmentedlist
Output
Shell
First item: 10
First item (after append): 10
List length: 3

Unlike ArrayList, pointers to SegmentedList elements remain valid even as you add more items. Use this when you need stable addressing (e.g., storing pointers in other data structures).

Linked Lists

Zig provides DoublyLinkedList(T) and SinglyLinkedList(T) as intrusive linked lists: nodes embed the link pointers directly (see DoublyLinkedList.zig and SinglyLinkedList.zig). This avoids allocator overhead per node and integrates naturally with existing structs.

Zig
const std = @import("std");

const Node = struct {
    data: i32,
    link: std.DoublyLinkedList.Node = .{},
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var list = std.DoublyLinkedList{};

    var node1 = try allocator.create(Node);
    node1.* = .{ .data = 10 };
    list.append(&node1.link);

    var node2 = try allocator.create(Node);
    node2.* = .{ .data = 20 };
    list.append(&node2.link);

    var node3 = try allocator.create(Node);
    node3.* = .{ .data = 30 };
    list.append(&node3.link);

    var it = list.first;
    while (it) |node| : (it = node.next) {
        const data_node: *Node = @fieldParentPtr("link", node);
        std.debug.print("Node: {d}\n", .{data_node.data});
    }

    // Clean up
    allocator.destroy(node1);
    allocator.destroy(node2);
    allocator.destroy(node3);
}
Build and Run
Shell
$ zig build-exe linkedlist.zig && ./linkedlist
Output
Shell
Node: 10
Node: 20
Node: 30

Intrusive lists don’t own node memory—you allocate and manage nodes yourself. This is powerful but requires discipline to avoid use-after-free bugs.

Specialized Maps

ArrayHashMap

ArrayHashMap stores keys and values in separate arrays, preserving insertion order and enabling iteration by index (see array_hash_map.zig).

StaticStringMap

StaticStringMap(V) is a compile-time perfect hash map for string keys—fast lookups with zero runtime allocation or hashing overhead (see static_string_map.zig).

Zig
const std = @import("std");

const status_codes = std.StaticStringMap(u32).initComptime(.{
    .{ "ok", 200 },
    .{ "created", 201 },
    .{ "not_found", 404 },
    .{ "server_error", 500 },
});

pub fn main() !void {
    std.debug.print("Status code for 'ok': {d}\n", .{status_codes.get("ok").?});
    std.debug.print("Status code for 'not_found': {d}\n", .{status_codes.get("not_found").?});
    std.debug.print("Status code for 'server_error': {d}\n", .{status_codes.get("server_error").?});
}
Build and Run
Shell
$ zig build-exe static_string_map.zig && ./static_string_map
Output
Shell
Status code for 'ok': 200
Status code for 'not_found': 404
Status code for 'server_error': 500

Use StaticStringMap for compile-time constant mappings (e.g., keyword tables, command parsers). It compiles to optimal switch statements or lookup tables.

Allocator Impact on Collections

Every collection requires an allocator, either passed at initialization (ArrayList(T).init(allocator)) or per operation (unmanaged variants). Growth strategies trigger reallocations, and failure returns error.OutOfMemory (see 10).

Arena Pattern for Bulk-Free

When building temporary collections that live for a single scope, use an arena allocator to free everything at once.

Zig
const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var arena = std.heap.ArenaAllocator.init(allocator);
    defer arena.deinit();
    const arena_allocator = arena.allocator();

    var list: std.ArrayList(i32) = .empty;
    for (0..1000) |i| {
        try list.append(arena_allocator, @intCast(i));
    }
    std.debug.print("List has {d} items\n", .{list.items.len});

    var map = std.AutoHashMap(u32, []const u8).init(arena_allocator);
    for (0..500) |i| {
        try map.put(@intCast(i), "value");
    }
    std.debug.print("Map has {d} entries\n", .{map.count()});

    // No need to call list.deinit() or map.deinit()
    // arena.deinit() frees everything at once
    std.debug.print("All freed at once via arena.deinit()\n", .{});
}
Build and Run
Shell
$ zig build-exe collections_arena.zig && ./collections_arena
Output
Shell
List has 1000 items
Map has 500 entries
All freed at once via arena.deinit()

The arena doesn’t call individual collection deinit() methods. It frees all memory at once. Use this pattern when you know collections won’t outlive the arena’s scope (see 10).

Performance Considerations

  • ArrayList growth: Doubling capacity amortizes reallocation cost, but large allocations may fail. Pre-allocate if size is known.
  • HashMap load factor: Default max_load_percentage is 80%. Higher values save memory but increase collision chains.
  • Sort stability: pdq is fastest but unstable. Use block or insertion when order of equal elements matters.
  • MultiArrayList cache: SoA layout shines when iterating single fields but adds indirection overhead for full-struct access.
  • SegmentedList segments: Smaller prealloc_item_count means more segments (more allocations); larger values waste memory if lists stay small.

Exercises

  • Implement a FrequencyMap using StringHashMap(u32) that counts word occurrences in a text file, then print the top-10 most frequent words using a PriorityQueue.
  • Compare ArrayList vs SegmentedList performance: create 10,000 items, take pointers to the first 100, then append 10,000 more. Verify pointers remain valid with SegmentedList but may invalidate with ArrayList.
  • Write an LRU cache using HashMap for lookups and DoublyLinkedList for eviction order. When capacity is reached, remove the least-recently-used item.
  • Sort an ArrayList of structs by multiple keys (e.g., sort by age, then by name for ties) using a custom comparator and std.sort.pdq.

Caveats, Alternatives, Edge Cases

  • Unmanaged variants: Most collections have unmanaged counterparts (e.g., ArrayListUnmanaged(T)) for manual allocator threading, useful in generic code or when embedding collections in structs.
  • HashMap key lifetimes: Maps don’t duplicate keys. Ensure key memory outlives the map, or use an arena allocator to manage key storage collectively.
  • Iterator invalidation: Like C++, modifying a collection (append, remove) may invalidate iterators or pointers to elements. Always check documentation for each operation.
  • Stable vs unstable sort: If your data has equal elements that must maintain relative order (e.g., sorting a table by column but preserving row order for ties), use std.sort.block or insertion, not pdq.
  • Treap: Zig also provides std.Treap, a tree-heap hybrid for ordered maps with probabilistic balancing, useful when you need both sorted iteration and O(log n) operations (see treap.zig).

Help make this chapter better.

Found a typo, rough edge, or missing explanation? Open an issue or propose a small improvement on GitHub.