Overview
Zig trims its compression APIs down to the pragmatic core: high-quality decompressors that plug into the new std.Io.Reader/Writer interfaces and feed formats like TAR and ZIP without hidden side effects. #reworked stdcompressflateflate.zig Bringing these pieces together lets you revive logs, package assets, or slurp registries straight into memory while keeping the same explicit resource management discipline.
Because Zig treats archives as simple byte streams, the challenge shifts from magic helper functions to composing the right iterators, buffers, and metadata checks. Mastering the decompression building blocks here prepares you for the package pipelines and deployment tooling. tar.zigzip.zig
Learning Goals
- Drive
std.compress.flate.Decompress,std.compress.lzma2.decompress, and friends directly againststd.Io.Reader/Writerendpoints.Decompress.ziglzma2.zigWriter.zig - Choose history buffers, streaming limits, and allocators that keep decompression memory-safe under both debug and release builds.10
- Generate small TAR archives on the fly and iterate them without touching disk state.28
- Inspect and extract ZIP central directory entries while enforcing filesystem hygiene and compression-method constraints.36
Streaming Decompression Interfaces
Zig’s decompressors speak the same streaming dialect: you hand them any reader, optionally supply a scratch buffer, and they emit their payload into a writer you already own. That design leaves full control over allocation, error propagation, and flushing behavior.22
Flate Containers in Practice
Deflate-style payloads (raw, zlib, gzip) rely on a history window up to 32 KiB. Zig 0.15.2 lets you skip allocating that window when you pipe data straight into another writer—pass &.{}, and the decoder will call streamRemaining with minimal buffering.
const std = @import("std");
pub fn main() !void {
var stdout_buffer: [4096]u8 = undefined;
var stdout_writer = std.fs.File.stdout().writer(&stdout_buffer);
const stdout = &stdout_writer.interface;
const compressed = [_]u8{
0x78, 0x9c, 0x0b, 0x2e, 0x29, 0x4a, 0x4d, 0xcc, 0xcd, 0xcc, 0x4b, 0x57, 0x48, 0x49,
0x4d, 0xce, 0xcf, 0x2d, 0x28, 0x4a, 0x2d, 0x2e, 0xce, 0xcc, 0xcf, 0x53, 0xc8, 0x4e,
0x4d, 0x2d, 0x28, 0x56, 0x28, 0xc9, 0xcf, 0xcf, 0x29, 0x56, 0x00, 0x0a, 0xa6, 0x64,
0x26, 0x97, 0x24, 0x26, 0xe5, 0xa4, 0xea, 0x71, 0x01, 0x00, 0xdf, 0xba, 0x12, 0xa6,
};
var source: std.Io.Reader = .fixed(&compressed);
var inflater = std.compress.flate.Decompress.init(&source, .zlib, &.{});
var plain_buf: [128]u8 = undefined;
var sink = std.Io.Writer.fixed(&plain_buf);
const decoded_len = try inflater.reader.streamRemaining(&sink);
const decoded = plain_buf[0..decoded_len];
try stdout.print("decoded ({d} bytes): {s}\n", .{ decoded.len, decoded });
try stdout.flush();
}
$ zig run inflate_greeting.zigdecoded (49 bytes): Streaming decompression keeps tools predictable.std.Io.Writer.fixed provides a stack-allocated sink with deterministic capacity; always flush manual stdout buffers afterwards to avoid losing output when the process exits.1
LZMA2 Without External Tooling
Some registries still ship LZMA2 frames for deterministic byte-for-byte payloads. Zig wraps the decoder behind a single helper that grows an std.Io.Writer.Allocating for you—perfect for short configuration bundles or firmware blocks.12
const std = @import("std");
pub fn main() !void {
var stdout_buffer: [4096]u8 = undefined;
var stdout_writer = std.fs.File.stdout().writer(&stdout_buffer);
const stdout = &stdout_writer.interface;
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer std.debug.assert(gpa.deinit() == .ok);
const allocator = gpa.allocator();
const compressed = [_]u8{
0x01, 0x00, 0x05, 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x0a, 0x02, 0x00, 0x06, 0x57, 0x6f,
0x72, 0x6c, 0x64, 0x21, 0x0a, 0x00,
};
var stream = std.io.fixedBufferStream(&compressed);
var collector = std.Io.Writer.Allocating.init(allocator);
defer collector.deinit();
try std.compress.lzma2.decompress(allocator, stream.reader(), &collector.writer);
const decoded = collector.writer.buffer[0..collector.writer.end];
try stdout.print("lzma2 decoded ({d} bytes):\n{s}\n", .{ decoded.len, decoded });
try stdout.flush();
}
$ zig run lzma2_memory_decode.ziglzma2 decoded (13 bytes):
Hello
World!Window Sizing Across zstd, xz, and Friends
std.compress.zstd.Decompress defaults to an 8 MiB window, while std.compress.xz.Decompress performs checksum validation as part of stream finalization.zstd.zigxz.zig When wiring unfamiliar data sources, start with empty scratch buffers to minimize peak memory, then profile with ReleaseFast builds before opting into persistent ring buffers.39
Archive Workflows
With decompression primitives in hand, archives become composition exercises: format-specific iterators hand you metadata, and you decide whether to buffer, discard, or stream to disk.28
TAR Roundtrip Entirely in Memory
std.tar.Writer emits deterministic 512-byte blocks, so you can assemble small bundles in RAM, inspect them, and only then decide whether to persist them.24
const std = @import("std");
pub fn main() !void {
var stdout_buffer: [4096]u8 = undefined;
var stdout_writer = std.fs.File.stdout().writer(&stdout_buffer);
const stdout = &stdout_writer.interface;
var archive_storage: [4096]u8 = undefined;
var archive_writer = std.Io.Writer.fixed(&archive_storage);
var tar_writer = std.tar.Writer{ .underlying_writer = &archive_writer };
try tar_writer.writeDir("reports", .{ .mode = 0o755 });
try tar_writer.writeFileBytes(
"reports/summary.txt",
"cpu=28%\nmem=512MiB\n",
.{ .mode = 0o644 },
);
const archive = archive_writer.buffer[0..archive_writer.end];
try stdout.print("tar archive is {d} bytes and holds:\n", .{archive.len});
var source: std.Io.Reader = .fixed(archive);
var name_buf: [std.fs.max_path_bytes]u8 = undefined;
var link_buf: [std.fs.max_path_bytes]u8 = undefined;
var iter = std.tar.Iterator.init(&source, .{
.file_name_buffer = &name_buf,
.link_name_buffer = &link_buf,
});
while (try iter.next()) |entry| {
try stdout.print("- {s} ({s}, {d} bytes)\n", .{ entry.name, @tagName(entry.kind), entry.size });
if (entry.kind == .file) {
var file_buf: [128]u8 = undefined;
var file_writer = std.Io.Writer.fixed(&file_buf);
try iter.streamRemaining(entry, &file_writer);
const written = file_writer.end;
const payload = file_buf[0..written];
try stdout.print(" contents: {s}\n", .{payload});
}
}
try stdout.flush();
}
$ zig run tar_roundtrip.zigtar archive is 1536 bytes and holds:
- reports (directory, 0 bytes)
- reports/summary.txt (file, 19 bytes)
contents: cpu=28%
mem=512MiBAfter calling Iterator.next on a regular file, you must drain the payload with streamRemaining; otherwise, the next header will be misaligned and the iterator throws error.UnexpectedEndOfStream.
Peeking Into ZIP Central Directories Safely
ZIP support exposes the central directory via std.zip.Iterator, leaving extraction policy to you. Routing entries through std.testing.tmpDir keeps artifacts isolated while you validate compression methods and inspect contents.testing.zig
const std = @import("std");
pub fn main() !void {
var stdout_buffer: [4096]u8 = undefined;
var stdout_writer = std.fs.File.stdout().writer(&stdout_buffer);
const stdout = &stdout_writer.interface;
const archive_bytes = @embedFile("demo.zip");
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer std.debug.assert(gpa.deinit() == .ok);
const allocator = gpa.allocator();
var tmp = std.testing.tmpDir(.{});
defer tmp.cleanup();
var zip_file = try tmp.dir.createFile("demo.zip", .{ .read = true, .truncate = true });
defer {
zip_file.close();
tmp.dir.deleteFile("demo.zip") catch {};
}
try zip_file.writeAll(archive_bytes);
try zip_file.seekTo(0);
var read_buffer: [4096]u8 = undefined;
var archive_reader = zip_file.reader(&read_buffer);
var iter = try std.zip.Iterator.init(&archive_reader);
var name_buf: [std.fs.max_path_bytes]u8 = undefined;
try stdout.print("zip archive contains:\n", .{});
while (try iter.next()) |entry| {
try entry.extract(&archive_reader, .{}, &name_buf, tmp.dir);
const name = name_buf[0..entry.filename_len];
try stdout.print(
"- {s} ({s}, {d} bytes)\n",
.{ name, @tagName(entry.compression_method), entry.uncompressed_size },
);
if (name.len != 0 and name[name.len - 1] == '/') continue;
var file = try tmp.dir.openFile(name, .{});
defer file.close();
const info = try file.stat();
const size: usize = @intCast(info.size);
const contents = try allocator.alloc(u8, size);
defer allocator.free(contents);
const read_len = try file.readAll(contents);
const slice = contents[0..read_len];
if (std.mem.endsWith(u8, name, ".txt")) {
try stdout.print(" text: {s}\n", .{slice});
} else {
try stdout.print(" bytes:", .{});
for (slice, 0..) |byte, idx| {
const prefix = if (idx % 16 == 0) "\n " else " ";
try stdout.print("{s}{X:0>2}", .{ prefix, byte });
}
try stdout.print("\n", .{});
}
}
try stdout.flush();
}
$ zig run zip_iterator_preview.zigzip archive contains:
- demo/readme.txt (store, 34 bytes)
text: Decompression from Zig streaming.
- demo/raw.bin (store, 4 bytes)
bytes:
00 01 02 03std.zip.Entry.extract only supports store and deflate; reject other methods up front or shell out to a third-party library when interoperability demands it.
Pattern Catalog for Mixed Sources
Blend these techniques to hydrate manifests from package registries, decompress release artifacts before signature checks, or stage binary blobs for GPU uploads—all without leaving Zig’s standard toolbox.35
Notes & Caveats
- Passing a zero-length buffer to
std.compress.flate.Decompress.initdisables history reuse, but large archives benefit from reusing a[flate.max_window_len]u8scratch array. - TAR iterators keep state about unread file bytes; always stream or discard them before advancing to the next header.
- ZIP extraction normalizes backslashes only when
allow_backslashes = true; enforce forward slashes to avoid directory traversal bugs on Windows.33
Exercises
- Rework the flate example to stream directly into
std.fs.File.stdout().writerwithout a fixed buffer and profile the difference across build modes.39 - Extend the TAR roundtrip demo to attach a generated checksum footer file summarizing every entry length.43
- Add a
verify_checksumspass to the ZIP iterator by computing CRC32 over extracted data and comparing it to the central directory record.crc.zig
Caveats, Alternatives, Edge Cases
- Compression backends (especially zstd) may require larger buffers on older CPUs without BMI2; detect
builtin.cpu.featuresbefore choosing lean windows.41 - LZMA2 decoding still allocates internal state; stash a shared decoder if you process many small frames to avoid heap churn.10
- For reproducible release archives, pin file ordering and timestamps explicitly—host filesystem metadata leaks otherwise.24