Overview
This project turns raw bytes into a tidy, alignment-aware hex view. We’ll read a file incrementally, format each line as OFFSET: HEX ASCII, and keep output stable across platforms. The writer interface uses buffered stdout via std.fs.File.writer and std.Io.Writer, as described in File.zig and Io.zig.
The formatter prints 16 bytes per line by default and can be configured with --width N (4..32). Bytes are grouped 8|8 to ease scanning, and non-printable ASCII becomes a dot in the right-hand gutter, as described in fmt.zig and #Command-line-flags.
Learning Goals
- Parse CLI flags and validate numbers with
std.fmt.parseInt. - Stream a file with a fixed buffer and assemble exact-width output lines.
- Use the non-deprecated
File.Writer+Io.Writerto buffer stdout and flush cleanly.
Building the Dump
We’ll wire three pieces: a tiny CLI parser, a line formatter, and a loop that feeds the formatter in exact-width chunks. The implementation leans on Zig’s slices and explicit lifetimes (dup the path before freeing args) to stay robust; see process.zig and #Error-Handling.
const std = @import("std");
// Chapter 9 – Project: Hexdump
//
// A small, alignment-aware hexdump that prints:
// OFFSET: 16 hex bytes (grouped 8|8) ASCII
// Default width is 16 bytes per line; override with --width N (4..32).
//
// Usage:
// zig run hexdump.zig -- <path>
// zig run hexdump.zig -- --width 8 <path>
const Cli = struct {
width: usize = 16,
path: []const u8 = &[_]u8{},
};
fn printUsage() void {
std.debug.print("usage: hexdump [--width N] <path>\n", .{});
}
fn parseArgs(allocator: std.mem.Allocator) !Cli {
var cli: Cli = .{};
const args = try std.process.argsAlloc(allocator);
defer std.process.argsFree(allocator, args);
if (args.len == 1 or (args.len == 2 and std.mem.eql(u8, args[1], "--help"))) {
printUsage();
std.process.exit(0);
}
var i: usize = 1;
while (i + 1 < args.len and std.mem.eql(u8, args[i], "--width")) : (i += 2) {
const val = args[i + 1];
cli.width = std.fmt.parseInt(usize, val, 10) catch {
std.debug.print("error: invalid width '{s}'\n", .{val});
std.process.exit(2);
};
if (cli.width < 4 or cli.width > 32) {
std.debug.print("error: width must be between 4 and 32\n", .{});
std.process.exit(2);
}
}
if (i >= args.len) {
std.debug.print("error: expected <path>\n", .{});
printUsage();
std.process.exit(2);
}
// Duplicate the path so it remains valid after freeing args.
cli.path = try allocator.dupe(u8, args[i]);
return cli;
}
fn isPrintable(c: u8) bool {
// Printable ASCII (space through tilde)
return c >= 0x20 and c <= 0x7E;
}
fn dumpLine(stdout: *std.Io.Writer, offset: usize, bytes: []const u8, width: usize) !void {
// OFFSET (8 hex digits), colon and space
try stdout.print("{X:0>8}: ", .{offset});
// Hex bytes with grouping at 8
var i: usize = 0;
while (i < width) : (i += 1) {
if (i < bytes.len) {
try stdout.print("{X:0>2} ", .{bytes[i]});
} else {
// pad absent bytes to keep ASCII column aligned
try stdout.print(" ", .{});
}
if (i + 1 == width / 2) {
try stdout.print(" ", .{}); // extra gap between 8|8
}
}
// Two spaces before ASCII gutter
try stdout.print(" ", .{});
i = 0;
while (i < width) : (i += 1) {
if (i < bytes.len) {
const ch: u8 = if (isPrintable(bytes[i])) bytes[i] else '.';
try stdout.print("{c}", .{ch});
} else {
try stdout.print(" ", .{});
}
}
try stdout.print("\n", .{});
}
pub fn main() !void {
const allocator = std.heap.page_allocator;
const cli = try parseArgs(allocator);
var file = std.fs.cwd().openFile(cli.path, .{ .mode = .read_only }) catch {
std.debug.print("error: unable to open '{s}'\n", .{cli.path});
std.process.exit(1);
};
defer file.close();
// Buffered stdout using the modern File.Writer + Io.Writer interface.
var out_buf: [16 * 1024]u8 = undefined;
var file_writer = std.fs.File.writer(std.fs.File.stdout(), &out_buf);
const stdout = &file_writer.interface;
var offset: usize = 0;
var carry: [64]u8 = undefined; // enough for max width 32
var carry_len: usize = 0;
var buf: [64 * 1024]u8 = undefined;
while (true) {
const n = try file.read(buf[0..]);
if (n == 0 and carry_len == 0) break;
var idx: usize = 0;
while (idx < n) {
// fill a line from carry + buffer bytes
const need = cli.width - carry_len;
const take = @min(need, n - idx);
@memcpy(carry[carry_len .. carry_len + take], buf[idx .. idx + take]);
carry_len += take;
idx += take;
if (carry_len == cli.width) {
try dumpLine(stdout, offset, carry[0..carry_len], cli.width);
offset += carry_len;
carry_len = 0;
}
}
if (n == 0 and carry_len > 0) {
try dumpLine(stdout, offset, carry[0..carry_len], cli.width);
offset += carry_len;
carry_len = 0;
}
}
try file_writer.end();
}
$ zig run hexdump.zig -- sample.txt00000000: 48 65 6C 6C 6F 2C 20 48 65 78 64 75 6D 70 21 0A Hello, Hexdump!.The ASCII gutter replaces non-printable bytes with .; the newline at the end of the file shows up as 0A and a dot on the right.
Width and Grouping
Pass --width N to change bytes per line. Grouping still splits the line in half (N/2) to keep the eye anchored.
$ zig run hexdump.zig -- --width 8 sample.txt00000000: 48 65 6C 6C 6F 2C 20 48 Hello, H
00000008: 65 78 64 75 6D 70 21 0A exdump!.The line formatter pads both the hex and ASCII regions so that the columns align nicely on the last line, where bytes may not fill a complete width.
Notes & Caveats
- Avoid deprecated I/O surfaces; this example uses
File.writerplus anIo.Writerbuffer and callsend()to flush and set the final position. - Hex formatting is kept simple—no
-C-style index columns beyond the offset. Extending the formatter is an easy follow-on exercise. - Argument lifetimes matter: duplicate the path string if you free
argsbefore usingcli.path.
Exercises
- Add
--group Nto control the extra space position (currentlyN = width/2). - Support
--offset 0xNNto start addresses at a base other than zero. - Include a right-hand hex checksum per line and a final footer (e.g., total bytes).
Alternatives & Edge Cases
- Large files: the code streams in fixed-size blocks and assembles lines; adjust buffer sizes to match your I/O environment.
- Non-ASCII encodings: the ASCII gutter is deliberately crude. For UTF-8 awareness, you’d need a more careful renderer; see unicode.zig.
- Binary pipes: read from
stdinwhen no path is provided; adapt the open/loop accordingly if you want to support pipelines.