Overview
Last chapter we explored semantic inlining and SIMD to shape hotspots (see 39); this time we go hands-on with the measurement loop that tells you whether those tweaks actually paid off. We will combine lightweight timers, build-mode comparisons, and hardened error guards to turn experimental code into a reliable toolchain. Each technique leans on recent CLI improvements, such as zig build --time-report, to keep feedback fast (see v0.15.2).
By the end of this chapter you will have a repeatable recipe: collect timing baselines, choose a release strategy (speed versus size), and run safeguards across optimization levels so regressions surface before deployment.
Learning Goals
- Instrument hot paths with
std.time.Timerand interpret the relative deltas (see time.zig). - Compare ReleaseFast and ReleaseSmall artifacts, understanding the trade-off between diagnostics and binary size (see #releasefast).
- Harden parsing and throttling code with error guards that hold under every optimization setting (see testing.zig).
Profiling Baselines with Monotonic Timers
std.time.Timer samples a monotonic clock, making it ideal for quick \"is it faster?\" experiments without touching global state. Paired with deterministic input data, it keeps microbenchmarks honest when you repeat them under different build modes.
Example: Sorting Strategies Under a Single Timer Harness
We reuse the dataset for three algorithms—block sort, heap sort, and insertion sort—to illustrate how timing ratios guide further investigation. The dataset is regenerated for each run so cache effects stay consistent (see sort.zig).
// This program demonstrates performance measurement and comparison of different
// sorting algorithms using Zig's built-in Timer for benchmarking.
const std = @import("std");
// Number of elements to sort in each benchmark run
const sample_count = 1024;
/// Generates a deterministic array of random u32 values for benchmarking.
/// Uses a fixed seed to ensure reproducible results across multiple runs.
/// @return: Array of 1024 pseudo-random u32 values
fn generateData() [sample_count]u32 {
var data: [sample_count]u32 = undefined;
// Initialize PRNG with fixed seed for deterministic output
var prng = std.Random.DefaultPrng.init(0xfeed_beef_dead_cafe);
var random = prng.random();
// Fill each array slot with a random 32-bit unsigned integer
for (&data) |*slot| {
slot.* = random.int(u32);
}
return data;
}
/// Measures the execution time of a sorting function on a copy of the input data.
/// Creates a scratch buffer to avoid modifying the original data, allowing
/// multiple measurements on the same dataset.
/// @param sortFn: Compile-time sorting function to benchmark
/// @param source: Source data to sort (remains unchanged)
/// @return: Elapsed time in nanoseconds
fn measureSort(
comptime sortFn: anytype,
source: []const u32,
) !u64 {
// Create scratch buffer to preserve original data
var scratch: [sample_count]u32 = undefined;
std.mem.copyForwards(u32, scratch[0..], source);
// Start high-resolution timer immediately before sort operation
var timer = try std.time.Timer.start();
// Execute the sort with ascending comparison function
sortFn(u32, scratch[0..], {}, std.sort.asc(u32));
// Capture elapsed nanoseconds
return timer.read();
}
pub fn main() !void {
// Generate shared dataset for all sorting algorithms
var dataset = generateData();
// Benchmark each sorting algorithm on identical data
const block_ns = try measureSort(std.sort.block, dataset[0..]);
const heap_ns = try measureSort(std.sort.heap, dataset[0..]);
const insertion_ns = try measureSort(std.sort.insertion, dataset[0..]);
// Display raw timing results along with build mode
std.debug.print("optimize-mode={s}\n", .{@tagName(@import("builtin").mode)});
std.debug.print("block sort : {d} ns\n", .{block_ns});
std.debug.print("heap sort : {d} ns\n", .{heap_ns});
std.debug.print("insertion sort : {d} ns\n", .{insertion_ns});
// Calculate relative performance metrics using block sort as baseline
const baseline = @as(f64, @floatFromInt(block_ns));
const heap_speedup = baseline / @as(f64, @floatFromInt(heap_ns));
const insertion_slowdown = @as(f64, @floatFromInt(insertion_ns)) / baseline;
// Display comparative analysis showing speedup/slowdown factors
std.debug.print("heap speedup over block: {d:.2}x\n", .{heap_speedup});
std.debug.print("insertion slowdown vs block: {d:.2}x\n", .{insertion_slowdown});
}
$ zig run 01_timer_probe.zig -OReleaseFastoptimize-mode=ReleaseFast
block sort : 43753 ns
heap sort : 75331 ns
insertion sort : 149541 ns
heap speedup over block: 0.58x
insertion slowdown vs block: 3.42xFollow up with zig build --time-report -Doptimize=ReleaseFast on the same module when you need attribution for longer stages like hashing or parsing.
Trading Binary Size for Diagnostics
Switching between ReleaseFast and ReleaseSmall is more than a compiler flag: ReleaseSmall strips safety checks and aggressively prunes code to shrink the final binary. When you profile on laptops but deploy on embedded devices, build both variants and confirm the difference justifies the lost diagnostics.
Example: Tracing Logic That Disappears in ReleaseSmall
Tracing is enabled only when the optimizer leaves safety checks intact. Measuring binary sizes provides a tangible signal that ReleaseSmall is doing its job.
// This program demonstrates how compile-time configuration affects binary size
// by conditionally enabling debug tracing based on the build mode.
const std = @import("std");
const builtin = @import("builtin");
// Compile-time flag that enables tracing only in Debug mode
// This demonstrates how dead code elimination works in release builds
const enable_tracing = builtin.mode == .Debug;
// Computes a FNV-1a hash for a given word
// FNV-1a is a fast, non-cryptographic hash function
// @param word: The input byte slice to hash
// @return: A 64-bit hash value
fn checksumWord(word: []const u8) u64 {
// FNV-1a 64-bit offset basis
var state: u64 = 0xcbf29ce484222325;
// Process each byte of the input
for (word) |byte| {
// XOR with the current byte
state ^= byte;
// Multiply by FNV-1a 64-bit prime (with wrapping multiplication)
state = state *% 0x100000001b3;
}
return state;
}
pub fn main() !void {
// Sample word list to demonstrate the checksum functionality
const words = [_][]const u8{ "profiling", "optimization", "hardening", "zig" };
// Accumulator for combining all word checksums
var digest: u64 = 0;
// Process each word and combine their checksums
for (words) |word| {
const word_sum = checksumWord(word);
// Combine checksums using XOR
digest ^= word_sum;
// Conditional tracing that will be compiled out in release builds
// This demonstrates how build mode affects binary size
if (enable_tracing) {
std.debug.print("trace: {s} -> {x}\n", .{ word, word_sum });
}
}
// Output the final result along with the current build mode
// Shows how the same code behaves differently based on compilation settings
std.debug.print(
"mode={s} digest={x}\n",
.{
@tagName(builtin.mode),
digest,
},
);
}
$ zig build-exe 02_binary_size.zig -OReleaseFast -femit-bin=perf-releasefast
$ zig build-exe 02_binary_size.zig -OReleaseSmall -femit-bin=perf-releasesmall
$ ls -lh perf-releasefast perf-releasesmall-rwxrwxr-x 1 zkevm zkevm 876K Nov 6 13:12 perf-releasefast
-rwxrwxr-x 1 zkevm zkevm 11K Nov 6 13:12 perf-releasesmallKeep both artifacts around—ReleaseFast for symbol-rich profiling sessions, ReleaseSmall for production handoff. Share them via zig build --artifact or package manager hashes to keep CI deterministic.
Hardening Across Optimization Modes
After tuning performance and size, wrap the pipeline with tests that assert guard rails across build modes. This is vital because ReleaseFast and ReleaseSmall disable runtime safety checks by default (see #setruntimesafety). Running the same test suite in ReleaseSafe ensures diagnostics still fire when safety remains enabled.
Example: Validating Input Parsing and Throttling in Every Mode
The pipeline parses limits, clamps workloads, and defends against empty input. The final test loops through values inline, mirroring the real application path while staying cheap to execute.
// This example demonstrates input validation and error handling patterns in Zig,
// showing how to create guarded data processing pipelines with proper bounds checking.
const std = @import("std");
// Custom error set for parsing and validation operations
const ParseError = error{
EmptyInput, // Returned when input contains only whitespace or is empty
InvalidNumber, // Returned when input cannot be parsed as a valid number
OutOfRange, // Returned when parsed value is outside acceptable bounds
};
/// Parses and validates a text input as a u32 limit value.
/// Ensures the value is between 1 and 10,000 inclusive.
/// Whitespace is automatically trimmed from input.
fn parseLimit(text: []const u8) ParseError!u32 {
// Remove leading and trailing whitespace characters
const trimmed = std.mem.trim(u8, text, " \t\r\n");
if (trimmed.len == 0) return error.EmptyInput;
// Attempt to parse as base-10 unsigned 32-bit integer
const value = std.fmt.parseInt(u32, trimmed, 10) catch return error.InvalidNumber;
// Enforce bounds: reject zero and values exceeding maximum threshold
if (value == 0 or value > 10_000) return error.OutOfRange;
return value;
}
/// Applies a throttling limit to a work queue, ensuring safe processing bounds.
/// Returns the actual number of items that can be processed, which is the minimum
/// of the requested limit and the available work length.
fn throttle(work: []const u8, limit: u32) ParseError!usize {
// Precondition: limit must be positive (enforced at runtime in debug builds)
std.debug.assert(limit > 0);
// Guard against empty work queues
if (work.len == 0) return error.EmptyInput;
// Calculate safe processing limit by taking minimum of requested limit and work size
// Cast is safe because we're taking the minimum value
const safe_limit = @min(limit, @as(u32, @intCast(work.len)));
return safe_limit;
}
// Test: Verify that valid numeric strings are correctly parsed
test "valid limit parses" {
try std.testing.expectEqual(@as(u32, 750), try parseLimit("750"));
}
// Test: Ensure whitespace-only input is properly rejected
test "empty input rejected" {
try std.testing.expectError(error.EmptyInput, parseLimit(" \n"));
}
// Test: Verify throttling respects the parsed limit and work size
test "in-flight throttling respects guard" {
const limit = try parseLimit("32");
// Work length (4) is less than limit (32), so expect work length
try std.testing.expectEqual(@as(usize, 4), try throttle("hard", limit));
}
// Test: Validate multiple inputs meet the maximum threshold requirement
// Demonstrates compile-time iteration for testing multiple scenarios
test "validate release configurations" {
const inputs = [_][]const u8{ "8", "9999", "500" };
// Compile-time loop unrolls test cases for each input value
inline for (inputs) |value| {
const parsed = try parseLimit(value);
// Ensure parsed values never exceed the defined maximum
try std.testing.expect(parsed <= 10_000);
}
}
$ zig test 03_guarded_pipeline.zig -OReleaseFastAll 4 tests passed.Repeat the command with -OReleaseSafe and plain zig test to make sure guard clauses work identically in safety-on builds. The inline loop proves the compiler can still unroll checks without sacrificing correctness.
Notes & Caveats
- Use deterministic data when microbenchmarking so timer noise reflects algorithm changes, not PRNG drift (see Random.zig).
- ReleaseSmall disables error return traces and many assertions; pair it with a ReleaseFast smoke test before shipping to catch missing diagnostics.
std.debug.assertremains active in Debug and ReleaseSafe. If ReleaseFast removes it, compensate with integration tests or explicit error handling (see debug.zig).
Exercises
- Add a
--sortflag to select the algorithm at runtime, then capturezig build --time-reportsnapshots for each choice. - Extend the size example with a
--metricsflag that turns tracing back on; document the binary delta usingzig build-exe -fstripfor extra savings. - Parameterize
parseLimitto accept hexadecimal input and tighten the tests so they run underzig test -OReleaseSmallwithout triggering UB. 37
Alternatives & Edge Cases
- Microbenchmarks that rely on
std.debug.printwill skew ReleaseSmall timings because the call is removed. Consider logging into ring buffers instead. - Use
zig build run --watch -fincrementalwhen iterating on instrumentation. Threaded codegen in 0.15.2 keeps rebuilds responsive even after large edits (see v0.15.2). - If your tests mutate data structures with undefined behavior in ReleaseFast, isolate the risky code behind
@setRuntimeSafety(true)for the duration of the hardening exercise.