Chapter 49Compression And Archives

Compression and Archives

Overview

Zig trims its compression APIs down to the pragmatic core: high-quality decompressors that plug into the new std.Io.Reader/Writer interfaces and feed formats like TAR and ZIP without hidden side effects. #reworked stdcompressflateflate.zig Bringing these pieces together lets you revive logs, package assets, or slurp registries straight into memory while keeping the same explicit resource management discipline.

Because Zig treats archives as simple byte streams, the challenge shifts from magic helper functions to composing the right iterators, buffers, and metadata checks. Mastering the decompression building blocks here prepares you for the package pipelines and deployment tooling. tar.zigzip.zig

Learning Goals

  • Drive std.compress.flate.Decompress, std.compress.lzma2.decompress, and friends directly against std.Io.Reader/Writer endpoints.Decompress.ziglzma2.zigWriter.zig
  • Choose history buffers, streaming limits, and allocators that keep decompression memory-safe under both debug and release builds.10
  • Generate small TAR archives on the fly and iterate them without touching disk state.28
  • Inspect and extract ZIP central directory entries while enforcing filesystem hygiene and compression-method constraints.36

Streaming Decompression Interfaces

Zig’s decompressors speak the same streaming dialect: you hand them any reader, optionally supply a scratch buffer, and they emit their payload into a writer you already own. That design leaves full control over allocation, error propagation, and flushing behavior.22

Flate Containers in Practice

Deflate-style payloads (raw, zlib, gzip) rely on a history window up to 32 KiB. Zig 0.15.2 lets you skip allocating that window when you pipe data straight into another writer—pass &.{}, and the decoder will call streamRemaining with minimal buffering.

Zig
const std = @import("std");

pub fn main() !void {
    var stdout_buffer: [4096]u8 = undefined;
    var stdout_writer = std.fs.File.stdout().writer(&stdout_buffer);
    const stdout = &stdout_writer.interface;

    const compressed = [_]u8{
        0x78, 0x9c, 0x0b, 0x2e, 0x29, 0x4a, 0x4d, 0xcc, 0xcd, 0xcc, 0x4b, 0x57, 0x48, 0x49,
        0x4d, 0xce, 0xcf, 0x2d, 0x28, 0x4a, 0x2d, 0x2e, 0xce, 0xcc, 0xcf, 0x53, 0xc8, 0x4e,
        0x4d, 0x2d, 0x28, 0x56, 0x28, 0xc9, 0xcf, 0xcf, 0x29, 0x56, 0x00, 0x0a, 0xa6, 0x64,
        0x26, 0x97, 0x24, 0x26, 0xe5, 0xa4, 0xea, 0x71, 0x01, 0x00, 0xdf, 0xba, 0x12, 0xa6,
    };

    var source: std.Io.Reader = .fixed(&compressed);
    var inflater = std.compress.flate.Decompress.init(&source, .zlib, &.{});

    var plain_buf: [128]u8 = undefined;
    var sink = std.Io.Writer.fixed(&plain_buf);

    const decoded_len = try inflater.reader.streamRemaining(&sink);
    const decoded = plain_buf[0..decoded_len];

    try stdout.print("decoded ({d} bytes): {s}\n", .{ decoded.len, decoded });
    try stdout.flush();
}
Run
Shell
$ zig run inflate_greeting.zig
Output
Shell
decoded (49 bytes): Streaming decompression keeps tools predictable.

std.Io.Writer.fixed provides a stack-allocated sink with deterministic capacity; always flush manual stdout buffers afterwards to avoid losing output when the process exits.1

LZMA2 Without External Tooling

Some registries still ship LZMA2 frames for deterministic byte-for-byte payloads. Zig wraps the decoder behind a single helper that grows an std.Io.Writer.Allocating for you—perfect for short configuration bundles or firmware blocks.12

Zig
const std = @import("std");

pub fn main() !void {
    var stdout_buffer: [4096]u8 = undefined;
    var stdout_writer = std.fs.File.stdout().writer(&stdout_buffer);
    const stdout = &stdout_writer.interface;

    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(gpa.deinit() == .ok);
    const allocator = gpa.allocator();

    const compressed = [_]u8{
        0x01, 0x00, 0x05, 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x0a, 0x02, 0x00, 0x06, 0x57, 0x6f,
        0x72, 0x6c, 0x64, 0x21, 0x0a, 0x00,
    };

    var stream = std.io.fixedBufferStream(&compressed);
    var collector = std.Io.Writer.Allocating.init(allocator);
    defer collector.deinit();

    try std.compress.lzma2.decompress(allocator, stream.reader(), &collector.writer);
    const decoded = collector.writer.buffer[0..collector.writer.end];

    try stdout.print("lzma2 decoded ({d} bytes):\n{s}\n", .{ decoded.len, decoded });
    try stdout.flush();
}
Run
Shell
$ zig run lzma2_memory_decode.zig
Output
Shell
lzma2 decoded (13 bytes):
Hello
World!

std.heap.GeneralPurposeAllocator now reports leaks via an enum—assert on .ok during teardown so corrupted archives fail loudly under debug builds.heap.zig13

Window Sizing Across zstd, xz, and Friends

std.compress.zstd.Decompress defaults to an 8 MiB window, while std.compress.xz.Decompress performs checksum validation as part of stream finalization.zstd.zigxz.zig When wiring unfamiliar data sources, start with empty scratch buffers to minimize peak memory, then profile with ReleaseFast builds before opting into persistent ring buffers.39

Archive Workflows

With decompression primitives in hand, archives become composition exercises: format-specific iterators hand you metadata, and you decide whether to buffer, discard, or stream to disk.28

TAR Roundtrip Entirely in Memory

std.tar.Writer emits deterministic 512-byte blocks, so you can assemble small bundles in RAM, inspect them, and only then decide whether to persist them.24

Zig
const std = @import("std");

pub fn main() !void {
    var stdout_buffer: [4096]u8 = undefined;
    var stdout_writer = std.fs.File.stdout().writer(&stdout_buffer);
    const stdout = &stdout_writer.interface;

    var archive_storage: [4096]u8 = undefined;
    var archive_writer = std.Io.Writer.fixed(&archive_storage);
    var tar_writer = std.tar.Writer{ .underlying_writer = &archive_writer };

    try tar_writer.writeDir("reports", .{ .mode = 0o755 });
    try tar_writer.writeFileBytes(
        "reports/summary.txt",
        "cpu=28%\nmem=512MiB\n",
        .{ .mode = 0o644 },
    );

    const archive = archive_writer.buffer[0..archive_writer.end];

    try stdout.print("tar archive is {d} bytes and holds:\n", .{archive.len});

    var source: std.Io.Reader = .fixed(archive);
    var name_buf: [std.fs.max_path_bytes]u8 = undefined;
    var link_buf: [std.fs.max_path_bytes]u8 = undefined;
    var iter = std.tar.Iterator.init(&source, .{
        .file_name_buffer = &name_buf,
        .link_name_buffer = &link_buf,
    });

    while (try iter.next()) |entry| {
        try stdout.print("- {s} ({s}, {d} bytes)\n", .{ entry.name, @tagName(entry.kind), entry.size });
        if (entry.kind == .file) {
            var file_buf: [128]u8 = undefined;
            var file_writer = std.Io.Writer.fixed(&file_buf);
            try iter.streamRemaining(entry, &file_writer);
            const written = file_writer.end;
            const payload = file_buf[0..written];
            try stdout.print("  contents: {s}\n", .{payload});
        }
    }

    try stdout.flush();
}
Run
Shell
$ zig run tar_roundtrip.zig
Output
Shell
tar archive is 1536 bytes and holds:
- reports (directory, 0 bytes)
- reports/summary.txt (file, 19 bytes)
  contents: cpu=28%
mem=512MiB

After calling Iterator.next on a regular file, you must drain the payload with streamRemaining; otherwise, the next header will be misaligned and the iterator throws error.UnexpectedEndOfStream.

Peeking Into ZIP Central Directories Safely

ZIP support exposes the central directory via std.zip.Iterator, leaving extraction policy to you. Routing entries through std.testing.tmpDir keeps artifacts isolated while you validate compression methods and inspect contents.testing.zig

Zig
const std = @import("std");

pub fn main() !void {
    var stdout_buffer: [4096]u8 = undefined;
    var stdout_writer = std.fs.File.stdout().writer(&stdout_buffer);
    const stdout = &stdout_writer.interface;

    const archive_bytes = @embedFile("demo.zip");

    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(gpa.deinit() == .ok);
    const allocator = gpa.allocator();

    var tmp = std.testing.tmpDir(.{});
    defer tmp.cleanup();

    var zip_file = try tmp.dir.createFile("demo.zip", .{ .read = true, .truncate = true });
    defer {
        zip_file.close();
        tmp.dir.deleteFile("demo.zip") catch {};
    }

    try zip_file.writeAll(archive_bytes);
    try zip_file.seekTo(0);

    var read_buffer: [4096]u8 = undefined;
    var archive_reader = zip_file.reader(&read_buffer);
    var iter = try std.zip.Iterator.init(&archive_reader);

    var name_buf: [std.fs.max_path_bytes]u8 = undefined;

    try stdout.print("zip archive contains:\n", .{});

    while (try iter.next()) |entry| {
        try entry.extract(&archive_reader, .{}, &name_buf, tmp.dir);
        const name = name_buf[0..entry.filename_len];
        try stdout.print(
            "- {s} ({s}, {d} bytes)\n",
            .{ name, @tagName(entry.compression_method), entry.uncompressed_size },
        );

        if (name.len != 0 and name[name.len - 1] == '/') continue;

        var file = try tmp.dir.openFile(name, .{});
        defer file.close();
        const info = try file.stat();
        const size: usize = @intCast(info.size);
        const contents = try allocator.alloc(u8, size);
        defer allocator.free(contents);
        const read_len = try file.readAll(contents);
        const slice = contents[0..read_len];

        if (std.mem.endsWith(u8, name, ".txt")) {
            try stdout.print("  text: {s}\n", .{slice});
        } else {
            try stdout.print("  bytes:", .{});
            for (slice, 0..) |byte, idx| {
                const prefix = if (idx % 16 == 0) "\n    " else " ";
                try stdout.print("{s}{X:0>2}", .{ prefix, byte });
            }
            try stdout.print("\n", .{});
        }
    }

    try stdout.flush();
}
Run
Shell
$ zig run zip_iterator_preview.zig
Output
Shell
zip archive contains:
- demo/readme.txt (store, 34 bytes)
  text: Decompression from Zig streaming.

- demo/raw.bin (store, 4 bytes)
  bytes:
    00 01 02 03

std.zip.Entry.extract only supports store and deflate; reject other methods up front or shell out to a third-party library when interoperability demands it.

Pattern Catalog for Mixed Sources

Blend these techniques to hydrate manifests from package registries, decompress release artifacts before signature checks, or stage binary blobs for GPU uploads—all without leaving Zig’s standard toolbox.35

Notes & Caveats

  • Passing a zero-length buffer to std.compress.flate.Decompress.init disables history reuse, but large archives benefit from reusing a [flate.max_window_len]u8 scratch array.
  • TAR iterators keep state about unread file bytes; always stream or discard them before advancing to the next header.
  • ZIP extraction normalizes backslashes only when allow_backslashes = true; enforce forward slashes to avoid directory traversal bugs on Windows.33

Exercises

  • Rework the flate example to stream directly into std.fs.File.stdout().writer without a fixed buffer and profile the difference across build modes.39
  • Extend the TAR roundtrip demo to attach a generated checksum footer file summarizing every entry length.43
  • Add a verify_checksums pass to the ZIP iterator by computing CRC32 over extracted data and comparing it to the central directory record.crc.zig

Caveats, Alternatives, Edge Cases

  • Compression backends (especially zstd) may require larger buffers on older CPUs without BMI2; detect builtin.cpu.features before choosing lean windows.41
  • LZMA2 decoding still allocates internal state; stash a shared decoder if you process many small frames to avoid heap churn.10
  • For reproducible release archives, pin file ordering and timestamps explicitly—host filesystem metadata leaks otherwise.24

Help make this chapter better.

Found a typo, rough edge, or missing explanation? Open an issue or propose a small improvement on GitHub.