Chapter 59Advanced Inline Assembly

Appendix E. Advanced Inline Assembly

Overview

Inline assembly grants you the power to reach below Zig’s abstractions when you need one-off instructions, interoperability with legacy ABIs, or access to processor features not yet wrapped by the standard library. 33 Zig 0.15.2 hardened inline assembly by enforcing alignment checks for pointer casts and providing clearer constraint diagnostics, making it both safer and easier to debug than previous releases. v0.15.2

Learning Goals

  • Recognize the structure of Zig’s GNU-style inline assembly blocks and map operands to registers or memory.
  • Apply register and clobber constraints to orchestrate data flow between Zig variables and machine instructions.
  • Guard architecture-specific snippets with compile-time checks so your build fails fast on unsupported targets.

Shaping Assembly Blocks

Zig adopts the familiar GCC/Clang inline assembly layout: a template string followed by colon-separated outputs, inputs, and clobbers. Start with simple arithmetic to get comfortable with operand binding before you reach for more exotic instructions. The first example uses addl to combine two 32-bit values, binding both operands to registers without touching memory. x86_64.zig

Zig
//! Minimal inline assembly example that adds two integers.
const std = @import("std");

pub fn addAsm(a: u32, b: u32) u32 {
    var result: u32 = undefined;
    asm volatile ("addl %[lhs], %[rhs]\n\t"
        : [out] "=r" (result),
        : [lhs] "r" (a),
          [rhs] "0" (b),
    );
    return result;
}

test "addAsm produces sum" {
    try std.testing.expectEqual(@as(u32, 11), addAsm(5, 6));
}
Run
Shell
$ zig test chapters-data/code/59__advanced-inline-assembly/01_inline_add.zig
Output
Shell
All 1 tests passed.

Operand placeholders such as %[lhs] reference the symbolic names you assign in the constraint list; keeping those names mnemonic pays off once your templates grow beyond a single instruction. 58

Register Choreography Without Footguns

More complex snippets often need bidirectional operands (read/write) or additional bookkeeping once the instruction finishes. The xchg sequence below swaps two integers entirely in registers, then writes the updated values back to Zig-managed memory. 4 Guarding the function with @compileError prevents accidental use on non-x86 platforms, while the +r constraint indicates that each operand is both read and written. pie.zig

Zig
//! Swaps two words using the x86 xchg instruction with memory constraints.
const std = @import("std");
const builtin = @import("builtin");

pub fn swapXchg(a: *u32, b: *u32) void {
    if (builtin.cpu.arch != .x86_64) @compileError("swapXchg requires x86_64");

    var lhs = a.*;
    var rhs = b.*;
    asm volatile ("xchgl %[left], %[right]"
        : [left] "+r" (lhs),
          [right] "+r" (rhs),
    );
    a.* = lhs;
    b.* = rhs;
}

test "swapXchg swaps values" {
    var lhs: u32 = 1;
    var rhs: u32 = 2;
    swapXchg(&lhs, &rhs);
    try std.testing.expectEqual(@as(u32, 2), lhs);
    try std.testing.expectEqual(@as(u32, 1), rhs);
}
Run
Shell
$ zig test chapters-data/code/59__advanced-inline-assembly/02_xchg_swap.zig
Output
Shell
All 1 tests passed.

Because the swap operates only on registers, you stay clear of tricky memory constraints; when you do need to touch memory directly, add an explicit "memory" clobber so Zig’s optimizer does not reorder surrounding loads or stores. 36

Observability and Guard Rails

Once you trust the syntax, inline assembly becomes a precision tool for hardware-provided counters or instructions not yet surfaced elsewhere. Reading the x86 time-stamp counter with rdtsc gives you cycle-level timing while demonstrating multi-output constraints and the new alignment assertions introduced in 0.15.x. 39 The example bundles the low and high halves of the counter into a u64 and falls back to a compile error on non-x86_64 targets.

Zig
//! Reads the x86 time stamp counter using inline assembly outputs.
const std = @import("std");
const builtin = @import("builtin");

pub fn readTimeStampCounter() u64 {
    if (builtin.cpu.arch != .x86_64) @compileError("rdtsc example requires x86_64");

    var lo: u32 = undefined;
    var hi: u32 = undefined;
    asm volatile ("rdtsc"
        : [low] "={eax}" (lo),
          [high] "={edx}" (hi),
    );
    return (@as(u64, hi) << 32) | @as(u64, lo);
}

test "readTimeStampCounter returns non-zero" {
    const a = readTimeStampCounter();
    const b = readTimeStampCounter();
    // The counter advances monotonically; allow equality in case calls land in the same cycle.
    try std.testing.expect(b >= a);
}
Run
Shell
$ zig test chapters-data/code/59__advanced-inline-assembly/03_rdtsc.zig
Output
Shell
All 1 tests passed.

Instructions like rdtsc can reorder around other operations; consider pairing them with serializing instructions (e.g. lfence) or explicit memory clobbers when precise measurement matters. 39

Patterns to Keep on Hand

  • Wrap architecture-specific blocks in if (builtin.cpu.arch != …) @compileError guards so cross-compilation fails early. 41
  • Prefer register-only operands when prototyping—once the logic is correct, introduce memory operands and clobbers deliberately. 33
  • Treat inline assembly as an escape hatch; if the standard library (or builtins) exposes the instruction, prefer that higher-level API to stay portable. mem.zig

Notes & Caveats

  • Inline assembly is target-specific; always document the minimum CPU features required and consider feature probes before executing the block. 29
  • Clobber lists matter—forgetting "cc" or "memory" may lead to miscompilations that only surface under optimization. 36
  • When mixing Zig and foreign ABIs, double-check the calling convention and register preservation rules; the compiler will not save registers for you. builtin.zig

Exercises

  • Add an lfence instruction before rdtsc and measure the impact on stability; compare results in Debug and ReleaseFast builds. 39
  • Extend swapXchg with a "memory" clobber and benchmark the difference when swapping values in a tight loop. time.zig
  • Rewrite addAsm using a compile-time format string that emits add or sub based on a boolean parameter. 15

Alternatives & Edge Cases

  • Some instructions (e.g., privileged system calls) require elevated privileges—wrap them in runtime checks so they never execute inadvertently. 48
  • On microarchitectures with out-of-order execution, pair timing reads with fences to avoid skewed measurements. 39
  • For portable timing, prefer std.time.Timer or platform APIs and reserve inline assembly for truly architecture-specific hot paths.

Help make this chapter better.

Found a typo, rough edge, or missing explanation? Open an issue or propose a small improvement on GitHub.