Overview
Inline assembly grants you the power to reach below Zig’s abstractions when you need one-off instructions, interoperability with legacy ABIs, or access to processor features not yet wrapped by the standard library. 33 Zig 0.15.2 hardened inline assembly by enforcing alignment checks for pointer casts and providing clearer constraint diagnostics, making it both safer and easier to debug than previous releases. v0.15.2
Learning Goals
- Recognize the structure of Zig’s GNU-style inline assembly blocks and map operands to registers or memory.
- Apply register and clobber constraints to orchestrate data flow between Zig variables and machine instructions.
- Guard architecture-specific snippets with compile-time checks so your build fails fast on unsupported targets.
Shaping Assembly Blocks
Zig adopts the familiar GCC/Clang inline assembly layout: a template string followed by colon-separated outputs, inputs, and clobbers. Start with simple arithmetic to get comfortable with operand binding before you reach for more exotic instructions. The first example uses addl to combine two 32-bit values, binding both operands to registers without touching memory. x86_64.zig
//! Minimal inline assembly example that adds two integers.
const std = @import("std");
pub fn addAsm(a: u32, b: u32) u32 {
var result: u32 = undefined;
asm volatile ("addl %[lhs], %[rhs]\n\t"
: [out] "=r" (result),
: [lhs] "r" (a),
[rhs] "0" (b),
);
return result;
}
test "addAsm produces sum" {
try std.testing.expectEqual(@as(u32, 11), addAsm(5, 6));
}
$ zig test chapters-data/code/59__advanced-inline-assembly/01_inline_add.zigAll 1 tests passed.Operand placeholders such as %[lhs] reference the symbolic names you assign in the constraint list; keeping those names mnemonic pays off once your templates grow beyond a single instruction. 58
Register Choreography Without Footguns
More complex snippets often need bidirectional operands (read/write) or additional bookkeeping once the instruction finishes. The xchg sequence below swaps two integers entirely in registers, then writes the updated values back to Zig-managed memory. 4 Guarding the function with @compileError prevents accidental use on non-x86 platforms, while the +r constraint indicates that each operand is both read and written. pie.zig
//! Swaps two words using the x86 xchg instruction with memory constraints.
const std = @import("std");
const builtin = @import("builtin");
pub fn swapXchg(a: *u32, b: *u32) void {
if (builtin.cpu.arch != .x86_64) @compileError("swapXchg requires x86_64");
var lhs = a.*;
var rhs = b.*;
asm volatile ("xchgl %[left], %[right]"
: [left] "+r" (lhs),
[right] "+r" (rhs),
);
a.* = lhs;
b.* = rhs;
}
test "swapXchg swaps values" {
var lhs: u32 = 1;
var rhs: u32 = 2;
swapXchg(&lhs, &rhs);
try std.testing.expectEqual(@as(u32, 2), lhs);
try std.testing.expectEqual(@as(u32, 1), rhs);
}
$ zig test chapters-data/code/59__advanced-inline-assembly/02_xchg_swap.zigAll 1 tests passed.Because the swap operates only on registers, you stay clear of tricky memory constraints; when you do need to touch memory directly, add an explicit "memory" clobber so Zig’s optimizer does not reorder surrounding loads or stores. 36
Observability and Guard Rails
Once you trust the syntax, inline assembly becomes a precision tool for hardware-provided counters or instructions not yet surfaced elsewhere. Reading the x86 time-stamp counter with rdtsc gives you cycle-level timing while demonstrating multi-output constraints and the new alignment assertions introduced in 0.15.x. 39 The example bundles the low and high halves of the counter into a u64 and falls back to a compile error on non-x86_64 targets.
//! Reads the x86 time stamp counter using inline assembly outputs.
const std = @import("std");
const builtin = @import("builtin");
pub fn readTimeStampCounter() u64 {
if (builtin.cpu.arch != .x86_64) @compileError("rdtsc example requires x86_64");
var lo: u32 = undefined;
var hi: u32 = undefined;
asm volatile ("rdtsc"
: [low] "={eax}" (lo),
[high] "={edx}" (hi),
);
return (@as(u64, hi) << 32) | @as(u64, lo);
}
test "readTimeStampCounter returns non-zero" {
const a = readTimeStampCounter();
const b = readTimeStampCounter();
// The counter advances monotonically; allow equality in case calls land in the same cycle.
try std.testing.expect(b >= a);
}
$ zig test chapters-data/code/59__advanced-inline-assembly/03_rdtsc.zigAll 1 tests passed.Instructions like rdtsc can reorder around other operations; consider pairing them with serializing instructions (e.g. lfence) or explicit memory clobbers when precise measurement matters. 39
Patterns to Keep on Hand
- Wrap architecture-specific blocks in
if (builtin.cpu.arch != …) @compileErrorguards so cross-compilation fails early. 41 - Prefer register-only operands when prototyping—once the logic is correct, introduce memory operands and clobbers deliberately. 33
- Treat inline assembly as an escape hatch; if the standard library (or builtins) exposes the instruction, prefer that higher-level API to stay portable. mem.zig
Notes & Caveats
- Inline assembly is target-specific; always document the minimum CPU features required and consider feature probes before executing the block. 29
- Clobber lists matter—forgetting
"cc"or"memory"may lead to miscompilations that only surface under optimization. 36 - When mixing Zig and foreign ABIs, double-check the calling convention and register preservation rules; the compiler will not save registers for you. builtin.zig
Exercises
- Add an
lfenceinstruction beforerdtscand measure the impact on stability; compare results in Debug and ReleaseFast builds. 39 - Extend
swapXchgwith a"memory"clobber and benchmark the difference when swapping values in a tight loop. time.zig - Rewrite
addAsmusing a compile-time format string that emitsaddorsubbased on a boolean parameter. 15
Alternatives & Edge Cases
- Some instructions (e.g., privileged system calls) require elevated privileges—wrap them in runtime checks so they never execute inadvertently. 48
- On microarchitectures with out-of-order execution, pair timing reads with fences to avoid skewed measurements. 39
- For portable timing, prefer
std.time.Timeror platform APIs and reserve inline assembly for truly architecture-specific hot paths.