Chapter 40Profiling Optimization Hardening

Profiling, Optimization, Hardening

Overview

Last chapter we explored semantic inlining and SIMD to shape hotspots (see 39); this time we go hands-on with the measurement loop that tells you whether those tweaks actually paid off. We will combine lightweight timers, build-mode comparisons, and hardened error guards to turn experimental code into a reliable toolchain. Each technique leans on recent CLI improvements, such as zig build --time-report, to keep feedback fast (see v0.15.2).

By the end of this chapter you will have a repeatable recipe: collect timing baselines, choose a release strategy (speed versus size), and run safeguards across optimization levels so regressions surface before deployment.

Learning Goals

  • Instrument hot paths with std.time.Timer and interpret the relative deltas (see time.zig).
  • Compare ReleaseFast and ReleaseSmall artifacts, understanding the trade-off between diagnostics and binary size (see #releasefast).
  • Harden parsing and throttling code with error guards that hold under every optimization setting (see testing.zig).

Profiling Baselines with Monotonic Timers

std.time.Timer samples a monotonic clock, making it ideal for quick \"is it faster?\" experiments without touching global state. Paired with deterministic input data, it keeps microbenchmarks honest when you repeat them under different build modes.

Example: Sorting Strategies Under a Single Timer Harness

We reuse the dataset for three algorithms—block sort, heap sort, and insertion sort—to illustrate how timing ratios guide further investigation. The dataset is regenerated for each run so cache effects stay consistent (see sort.zig).

Zig
// This program demonstrates performance measurement and comparison of different
// sorting algorithms using Zig's built-in Timer for benchmarking.
const std = @import("std");

// Number of elements to sort in each benchmark run
const sample_count = 1024;

/// Generates a deterministic array of random u32 values for benchmarking.
/// Uses a fixed seed to ensure reproducible results across multiple runs.
/// @return: Array of 1024 pseudo-random u32 values
fn generateData() [sample_count]u32 {
    var data: [sample_count]u32 = undefined;
    // Initialize PRNG with fixed seed for deterministic output
    var prng = std.Random.DefaultPrng.init(0xfeed_beef_dead_cafe);
    var random = prng.random();
    // Fill each array slot with a random 32-bit unsigned integer
    for (&data) |*slot| {
        slot.* = random.int(u32);
    }
    return data;
}

/// Measures the execution time of a sorting function on a copy of the input data.
/// Creates a scratch buffer to avoid modifying the original data, allowing
/// multiple measurements on the same dataset.
/// @param sortFn: Compile-time sorting function to benchmark
/// @param source: Source data to sort (remains unchanged)
/// @return: Elapsed time in nanoseconds
fn measureSort(
    comptime sortFn: anytype,
    source: []const u32,
) !u64 {
    // Create scratch buffer to preserve original data
    var scratch: [sample_count]u32 = undefined;
    std.mem.copyForwards(u32, scratch[0..], source);

    // Start high-resolution timer immediately before sort operation
    var timer = try std.time.Timer.start();
    // Execute the sort with ascending comparison function
    sortFn(u32, scratch[0..], {}, std.sort.asc(u32));
    // Capture elapsed nanoseconds
    return timer.read();
}

pub fn main() !void {
    // Generate shared dataset for all sorting algorithms
    var dataset = generateData();

    // Benchmark each sorting algorithm on identical data
    const block_ns = try measureSort(std.sort.block, dataset[0..]);
    const heap_ns = try measureSort(std.sort.heap, dataset[0..]);
    const insertion_ns = try measureSort(std.sort.insertion, dataset[0..]);

    // Display raw timing results along with build mode
    std.debug.print("optimize-mode={s}\n", .{@tagName(@import("builtin").mode)});
    std.debug.print("block sort     : {d} ns\n", .{block_ns});
    std.debug.print("heap sort      : {d} ns\n", .{heap_ns});
    std.debug.print("insertion sort : {d} ns\n", .{insertion_ns});

    // Calculate relative performance metrics using block sort as baseline
    const baseline = @as(f64, @floatFromInt(block_ns));
    const heap_speedup = baseline / @as(f64, @floatFromInt(heap_ns));
    const insertion_slowdown = @as(f64, @floatFromInt(insertion_ns)) / baseline;

    // Display comparative analysis showing speedup/slowdown factors
    std.debug.print("heap speedup over block: {d:.2}x\n", .{heap_speedup});
    std.debug.print("insertion slowdown vs block: {d:.2}x\n", .{insertion_slowdown});
}
Run
Shell
$ zig run 01_timer_probe.zig -OReleaseFast
Output
Shell
optimize-mode=ReleaseFast
block sort     : 43753 ns
heap sort      : 75331 ns
insertion sort : 149541 ns
heap speedup over block: 0.58x
insertion slowdown vs block: 3.42x

Follow up with zig build --time-report -Doptimize=ReleaseFast on the same module when you need attribution for longer stages like hashing or parsing.

Trading Binary Size for Diagnostics

Switching between ReleaseFast and ReleaseSmall is more than a compiler flag: ReleaseSmall strips safety checks and aggressively prunes code to shrink the final binary. When you profile on laptops but deploy on embedded devices, build both variants and confirm the difference justifies the lost diagnostics.

Example: Tracing Logic That Disappears in ReleaseSmall

Tracing is enabled only when the optimizer leaves safety checks intact. Measuring binary sizes provides a tangible signal that ReleaseSmall is doing its job.

Zig

// This program demonstrates how compile-time configuration affects binary size
// by conditionally enabling debug tracing based on the build mode.
const std = @import("std");
const builtin = @import("builtin");

// Compile-time flag that enables tracing only in Debug mode
// This demonstrates how dead code elimination works in release builds
const enable_tracing = builtin.mode == .Debug;

// Computes a FNV-1a hash for a given word
// FNV-1a is a fast, non-cryptographic hash function
// @param word: The input byte slice to hash
// @return: A 64-bit hash value
fn checksumWord(word: []const u8) u64 {
    // FNV-1a 64-bit offset basis
    var state: u64 = 0xcbf29ce484222325;
    
    // Process each byte of the input
    for (word) |byte| {
        // XOR with the current byte
        state ^= byte;
        // Multiply by FNV-1a 64-bit prime (with wrapping multiplication)
        state = state *% 0x100000001b3;
    }
    return state;
}

pub fn main() !void {
    // Sample word list to demonstrate the checksum functionality
    const words = [_][]const u8{ "profiling", "optimization", "hardening", "zig" };
    
    // Accumulator for combining all word checksums
    var digest: u64 = 0;
    
    // Process each word and combine their checksums
    for (words) |word| {
        const word_sum = checksumWord(word);
        // Combine checksums using XOR
        digest ^= word_sum;
        
        // Conditional tracing that will be compiled out in release builds
        // This demonstrates how build mode affects binary size
        if (enable_tracing) {
            std.debug.print("trace: {s} -> {x}\n", .{ word, word_sum });
        }
    }

    // Output the final result along with the current build mode
    // Shows how the same code behaves differently based on compilation settings
    std.debug.print(
        "mode={s} digest={x}\n",
        .{
            @tagName(builtin.mode),
            digest,
        },
    );
}
Run
Shell
$ zig build-exe 02_binary_size.zig -OReleaseFast -femit-bin=perf-releasefast
$ zig build-exe 02_binary_size.zig -OReleaseSmall -femit-bin=perf-releasesmall
$ ls -lh perf-releasefast perf-releasesmall
Output
Shell
-rwxrwxr-x 1 zkevm zkevm 876K Nov  6 13:12 perf-releasefast
-rwxrwxr-x 1 zkevm zkevm  11K Nov  6 13:12 perf-releasesmall

Keep both artifacts around—ReleaseFast for symbol-rich profiling sessions, ReleaseSmall for production handoff. Share them via zig build --artifact or package manager hashes to keep CI deterministic.

Hardening Across Optimization Modes

After tuning performance and size, wrap the pipeline with tests that assert guard rails across build modes. This is vital because ReleaseFast and ReleaseSmall disable runtime safety checks by default (see #setruntimesafety). Running the same test suite in ReleaseSafe ensures diagnostics still fire when safety remains enabled.

Example: Validating Input Parsing and Throttling in Every Mode

The pipeline parses limits, clamps workloads, and defends against empty input. The final test loops through values inline, mirroring the real application path while staying cheap to execute.

Zig

// This example demonstrates input validation and error handling patterns in Zig,
// showing how to create guarded data processing pipelines with proper bounds checking.

const std = @import("std");

// Custom error set for parsing and validation operations
const ParseError = error{
    EmptyInput,      // Returned when input contains only whitespace or is empty
    InvalidNumber,   // Returned when input cannot be parsed as a valid number
    OutOfRange,      // Returned when parsed value is outside acceptable bounds
};

/// Parses and validates a text input as a u32 limit value.
/// Ensures the value is between 1 and 10,000 inclusive.
/// Whitespace is automatically trimmed from input.
fn parseLimit(text: []const u8) ParseError!u32 {
    // Remove leading and trailing whitespace characters
    const trimmed = std.mem.trim(u8, text, " \t\r\n");
    if (trimmed.len == 0) return error.EmptyInput;

    // Attempt to parse as base-10 unsigned 32-bit integer
    const value = std.fmt.parseInt(u32, trimmed, 10) catch return error.InvalidNumber;
    
    // Enforce bounds: reject zero and values exceeding maximum threshold
    if (value == 0 or value > 10_000) return error.OutOfRange;
    return value;
}

/// Applies a throttling limit to a work queue, ensuring safe processing bounds.
/// Returns the actual number of items that can be processed, which is the minimum
/// of the requested limit and the available work length.
fn throttle(work: []const u8, limit: u32) ParseError!usize {
    // Precondition: limit must be positive (enforced at runtime in debug builds)
    std.debug.assert(limit > 0);

    // Guard against empty work queues
    if (work.len == 0) return error.EmptyInput;

    // Calculate safe processing limit by taking minimum of requested limit and work size
    // Cast is safe because we're taking the minimum value
    const safe_limit = @min(limit, @as(u32, @intCast(work.len)));
    return safe_limit;
}

// Test: Verify that valid numeric strings are correctly parsed
test "valid limit parses" {
    try std.testing.expectEqual(@as(u32, 750), try parseLimit("750"));
}

// Test: Ensure whitespace-only input is properly rejected
test "empty input rejected" {
    try std.testing.expectError(error.EmptyInput, parseLimit("   \n"));
}

// Test: Verify throttling respects the parsed limit and work size
test "in-flight throttling respects guard" {
    const limit = try parseLimit("32");
    // Work length (4) is less than limit (32), so expect work length
    try std.testing.expectEqual(@as(usize, 4), try throttle("hard", limit));
}

// Test: Validate multiple inputs meet the maximum threshold requirement
// Demonstrates compile-time iteration for testing multiple scenarios
test "validate release configurations" {
    const inputs = [_][]const u8{ "8", "9999", "500" };
    // Compile-time loop unrolls test cases for each input value
    inline for (inputs) |value| {
        const parsed = try parseLimit(value);
        // Ensure parsed values never exceed the defined maximum
        try std.testing.expect(parsed <= 10_000);
    }
}
Run
Shell
$ zig test 03_guarded_pipeline.zig -OReleaseFast
Output
Shell
All 4 tests passed.

Repeat the command with -OReleaseSafe and plain zig test to make sure guard clauses work identically in safety-on builds. The inline loop proves the compiler can still unroll checks without sacrificing correctness.

Notes & Caveats

  • Use deterministic data when microbenchmarking so timer noise reflects algorithm changes, not PRNG drift (see Random.zig).
  • ReleaseSmall disables error return traces and many assertions; pair it with a ReleaseFast smoke test before shipping to catch missing diagnostics.
  • std.debug.assert remains active in Debug and ReleaseSafe. If ReleaseFast removes it, compensate with integration tests or explicit error handling (see debug.zig).

Exercises

  • Add a --sort flag to select the algorithm at runtime, then capture zig build --time-report snapshots for each choice.
  • Extend the size example with a --metrics flag that turns tracing back on; document the binary delta using zig build-exe -fstrip for extra savings.
  • Parameterize parseLimit to accept hexadecimal input and tighten the tests so they run under zig test -OReleaseSmall without triggering UB. 37

Alternatives & Edge Cases

  • Microbenchmarks that rely on std.debug.print will skew ReleaseSmall timings because the call is removed. Consider logging into ring buffers instead.
  • Use zig build run --watch -fincremental when iterating on instrumentation. Threaded codegen in 0.15.2 keeps rebuilds responsive even after large edits (see v0.15.2).
  • If your tests mutate data structures with undefined behavior in ReleaseFast, isolate the risky code behind @setRuntimeSafety(true) for the duration of the hardening exercise.

Help make this chapter better.

Found a typo, rough edge, or missing explanation? Open an issue or propose a small improvement on GitHub.