Overview
After mastering collections for structured data, 44 you now turn to text—the fundamental medium of human-computer interaction. This chapter explores std.fmt for formatting and parsing, std.ascii for ASCII character operations, std.unicode for UTF-8/UTF-16 handling, and encoding utilities like base64. fmt.zigascii.zig
Unlike high-level languages that hide encoding complexities, Zig exposes the mechanics: you choose between []const u8 (byte slices) and proper Unicode code point iteration, control number formatting precision, and handle encoding errors explicitly.
Text processing in Zig demands awareness of byte vs. character boundaries, allocator usage for dynamic formatting, and the performance implications of different string operations. By chapter’s end, you’ll format numbers with custom precision, parse integers and floats safely, manipulate ASCII efficiently, navigate UTF-8 sequences, and encode binary data for transport—all with Zig’s characteristic explicitness and zero hidden costs. unicode.zig
Learning Goals
- Format values with
Writer.print()using format specifiers for integers, floats, and custom types. Writer.zig - Parse strings into integers (
parseInt) and floats (parseFloat) with proper error handling. - Use
std.asciifor character classification (isDigit,isAlpha,toUpper,toLower). - Navigate UTF-8 sequences with
std.unicodeand understand code point vs. byte distinctions. - Encode and decode Base64 data for binary-to-text transformations. base64.zig
- Implement custom formatters for user-defined types using the
{f}specifier in Zig 0.15.2.
Formatting with std.fmt
Zig’s formatting revolves around Writer.print(fmt, args), which writes formatted output to any Writer implementation. Format strings use {} placeholders with optional specifiers: {d} for decimal, {x} for hex, {s} for strings, {any} for debug representation, and {f} for custom formatters.
Basic Formatting
The simplest pattern: capture a buffer with std.io.fixedBufferStream, then print into it.
const std = @import("std");
pub fn main() !void {
var buffer: [100]u8 = undefined;
var fbs = std.io.fixedBufferStream(&buffer);
const writer = fbs.writer();
try writer.print("Answer={d}, pi={d:.2}", .{ 42, 3.14159 });
std.debug.print("Formatted: {s}\n", .{fbs.getWritten()});
}
$ zig build-exe format_basic.zig && ./format_basicFormatted: Answer=42, pi=3.14std.io.fixedBufferStream provides a Writer backed by a fixed buffer. No allocation needed. For dynamic output, use std.ArrayList(u8).writer(). fixed_buffer_stream.zig
Format Specifiers
Zig’s format specifiers control number bases, precision, alignment, and padding.
const std = @import("std");
pub fn main() !void {
const value: i32 = 255;
const pi = 3.14159;
const large = 123.0;
std.debug.print("Decimal: {d}\n", .{value});
std.debug.print("Hexadecimal (lowercase): {x}\n", .{value});
std.debug.print("Hexadecimal (uppercase): {X}\n", .{value});
std.debug.print("Binary: {b}\n", .{value});
std.debug.print("Octal: {o}\n", .{value});
std.debug.print("Float with 2 decimals: {d:.2}\n", .{pi});
std.debug.print("Scientific notation: {e}\n", .{large});
std.debug.print("Padded: {d:0>5}\n", .{42});
std.debug.print("Right-aligned: {d:>5}\n", .{42});
}
$ zig build-exe format_specifiers.zig && ./format_specifiersDecimal: 255
Hexadecimal (lowercase): ff
Hexadecimal (uppercase): FF
Binary: 11111111
Octal: 377
Float with 2 decimals: 3.14
Scientific notation: 1.23e2
Padded: 00042
Right-aligned: 42Use {d} for decimal, {x} for hex, {b} for binary, {o} for octal. Precision (.N) and width work with floats and integers. Padding with 0 creates zero-filled fields.
Parsing Strings
Zig provides parseInt and parseFloat for converting text to numbers, returning errors for invalid input rather than crashing or silently failing.
Parsing Integers
parseInt(T, buf, base) converts a string to an integer of type T in the specified base (2-36, or 0 for auto-detection).
const std = @import("std");
pub fn main() !void {
const decimal = try std.fmt.parseInt(i32, "42", 10);
std.debug.print("Parsed decimal: {d}\n", .{decimal});
const hex = try std.fmt.parseInt(i32, "FF", 16);
std.debug.print("Parsed hex: {d}\n", .{hex});
const binary = try std.fmt.parseInt(i32, "111", 2);
std.debug.print("Parsed binary: {d}\n", .{binary});
// Auto-detect base with prefix
const auto = try std.fmt.parseInt(i32, "0x1234", 0);
std.debug.print("Auto-detected (0x): {d}\n", .{auto});
// Error handling
const result = std.fmt.parseInt(i32, "not_a_number", 10);
if (result) |_| {
std.debug.print("Unexpected success\n", .{});
} else |err| {
std.debug.print("Parse error: {}\n", .{err});
}
}
$ zig build-exe parse_int.zig && ./parse_intParsed decimal: 42
Parsed hex: 255
Parsed binary: 7
Auto-detected (0x): 4660
Parse error: InvalidCharacterparseInt returns error{Overflow, InvalidCharacter}. Always handle these explicitly or propagate with try. Base 0 auto-detects 0x (hex), 0o (octal), 0b (binary) prefixes.
Parsing Floats
parseFloat(T, buf) converts a string to a floating-point number, handling scientific notation and special values (nan, inf).
const std = @import("std");
pub fn main() !void {
const pi = try std.fmt.parseFloat(f64, "3.14159");
std.debug.print("Parsed: {d}\n", .{pi});
const scientific = try std.fmt.parseFloat(f64, "1.23e5");
std.debug.print("Scientific: {d}\n", .{scientific});
const infinity = try std.fmt.parseFloat(f64, "inf");
std.debug.print("Special (inf): {d}\n", .{infinity});
}
$ zig build-exe parse_float.zig && ./parse_floatParsed: 3.14159
Scientific: 123000
Special (inf): infparseFloat supports decimal notation (3.14), scientific notation (1.23e5), hexadecimal floats (0x1.8p3), and special values (nan, inf, -inf). parse_float.zig
ASCII Character Operations
std.ascii provides fast character classification and case conversion for 7-bit ASCII. Functions gracefully handle values outside the ASCII range by returning false or leaving them unchanged.
Character Classification
Test whether characters are digits, letters, whitespace, etc.
const std = @import("std");
pub fn main() void {
const chars = [_]u8{ 'A', '5', ' ' };
for (chars) |c| {
std.debug.print("'{c}': alpha={}, digit={}, ", .{ c, std.ascii.isAlphabetic(c), std.ascii.isDigit(c) });
if (c == 'A') {
std.debug.print("upper={}\n", .{std.ascii.isUpper(c)});
} else if (c == '5') {
std.debug.print("upper={}\n", .{std.ascii.isUpper(c)});
} else {
std.debug.print("whitespace={}\n", .{std.ascii.isWhitespace(c)});
}
}
}
$ zig build-exe ascii_classify.zig && ./ascii_classify'A': alpha=true, digit=false, upper=true
'5': alpha=false, digit=true, upper=false
' ': alpha=false, digit=false, whitespace=trueASCII functions operate on bytes (u8). Non-ASCII bytes (>127) return false for classification checks.
Case Conversion
Convert between uppercase and lowercase for ASCII characters.
const std = @import("std");
pub fn main() void {
const text = "Hello, World!";
var upper_buf: [50]u8 = undefined;
var lower_buf: [50]u8 = undefined;
_ = std.ascii.upperString(&upper_buf, text);
_ = std.ascii.lowerString(&lower_buf, text);
std.debug.print("Original: {s}\n", .{text});
std.debug.print("Uppercase: {s}\n", .{upper_buf[0..text.len]});
std.debug.print("Lowercase: {s}\n", .{lower_buf[0..text.len]});
}
$ zig build-exe ascii_case.zig && ./ascii_caseOriginal: Hello, World!
Uppercase: HELLO, WORLD!
Lowercase: hello, world!std.ascii functions operate byte-by-byte and only affect ASCII characters. For full Unicode case mapping, use dedicated Unicode libraries or manually handle UTF-8 sequences.
Unicode and UTF-8
Zig strings are []const u8 byte slices, typically UTF-8 encoded. std.unicode provides utilities for validating UTF-8, decoding code points, and converting between UTF-8 and UTF-16.
UTF-8 Validation
Check whether a byte sequence is valid UTF-8.
const std = @import("std");
pub fn main() void {
const valid = "Hello, 世界";
const invalid = "\xff\xfe";
if (std.unicode.utf8ValidateSlice(valid)) {
std.debug.print("Valid UTF-8: {s}\n", .{valid});
}
if (!std.unicode.utf8ValidateSlice(invalid)) {
std.debug.print("Invalid UTF-8 detected\n", .{});
}
}
$ zig build-exe utf8_validate.zig && ./utf8_validateValid UTF-8: Hello, 世界
Invalid UTF-8 detectedUse std.unicode.utf8ValidateSlice to verify entire strings. Invalid UTF-8 can cause undefined behavior in code that assumes well-formed sequences.
Iterating Code Points
Decode UTF-8 byte sequences into Unicode code points using std.unicode.Utf8View.
const std = @import("std");
pub fn main() !void {
const text = "Hello, 世界";
var view = try std.unicode.Utf8View.init(text);
var iter = view.iterator();
var byte_count: usize = 0;
var codepoint_count: usize = 0;
while (iter.nextCodepoint()) |codepoint| {
const len: usize = std.unicode.utf8CodepointSequenceLength(codepoint) catch unreachable;
const c = iter.bytes[iter.i - len .. iter.i];
std.debug.print("Code point: U+{X:0>4} ({s})\n", .{ codepoint, c });
byte_count += c.len;
codepoint_count += 1;
}
std.debug.print("Byte count: {d}, Code point count: {d}\n", .{ text.len, codepoint_count });
}
$ zig build-exe utf8_iterate.zig && ./utf8_iterateCode point: U+0048 (H)
Code point: U+0065 (e)
Code point: U+006C (l)
Code point: U+006C (l)
Code point: U+006F (o)
Code point: U+002C (,)
Code point: U+0020 ( )
Code point: U+4E16 (世)
Code point: U+754C (界)
Byte count: 13, Code point count: 9UTF-8 is variable-width: ASCII characters are 1 byte, but many Unicode characters require 2-4 bytes. Always iterate code points when character semantics matter, not bytes.
Base64 Encoding
Base64 encodes binary data as printable ASCII, useful for embedding binary in text formats (JSON, XML, URLs). Zig provides standard, URL-safe, and custom Base64 variants.
Encoding and Decoding
Encode binary data to Base64 and decode it back.
const std = @import("std");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const original = "Hello, World!";
// Encode
const encoded_len = std.base64.standard.Encoder.calcSize(original.len);
const encoded = try allocator.alloc(u8, encoded_len);
defer allocator.free(encoded);
_ = std.base64.standard.Encoder.encode(encoded, original);
std.debug.print("Original: {s}\n", .{original});
std.debug.print("Encoded: {s}\n", .{encoded});
// Decode
var decoded_buf: [100]u8 = undefined;
const decoded_len = try std.base64.standard.Decoder.calcSizeForSlice(encoded);
try std.base64.standard.Decoder.decode(&decoded_buf, encoded);
std.debug.print("Decoded: {s}\n", .{decoded_buf[0..decoded_len]});
}
$ zig build-exe base64_basic.zig && ./base64_basicOriginal: Hello, World!
Encoded: SGVsbG8sIFdvcmxkIQ==
Decoded: Hello, World!std.base64.standard.Encoder and .Decoder provide encode/decode methods. The == padding is optional and can be controlled with encoder options.
Custom Formatters
Implement the format function for your types to control how they’re printed with Writer.print().
const std = @import("std");
const Point = struct {
x: i32,
y: i32,
pub fn format(self: @This(), writer: *std.Io.Writer) std.Io.Writer.Error!void {
try writer.print("({d}, {d})", .{ self.x, self.y });
}
};
pub fn main() !void {
const p = Point{ .x = 10, .y = 20 };
std.debug.print("Point: {f}\n", .{p});
}
$ zig build-exe custom_formatter.zig && ./custom_formatterPoint: (10, 20)In Zig 0.15.2, the format method signature is simplified to: pub fn format(self: @This(), writer: *std.Io.Writer) std.Io.Writer.Error!void. Use the {f} format specifier to invoke custom formatters (e.g., "{f}", not "{}").
Formatting to Buffers
For stack-allocated formatting without allocation, use std.fmt.bufPrint.
const std = @import("std");
pub fn main() !void {
var buffer: [100]u8 = undefined;
const result = try std.fmt.bufPrint(&buffer, "x={d}, y={d:.2}", .{ 42, 3.14159 });
std.debug.print("Formatted: {s}\n", .{result});
}
$ zig build-exe bufprint.zig && ./bufprintFormatted: x=42, y=3.14bufPrint returns error.NoSpaceLeft if the buffer is too small. Always size buffers appropriately or handle the error.
Dynamic Formatting with Allocation
For dynamically sized output, use std.fmt.allocPrint which allocates and returns a formatted string.
const std = @import("std");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const result = try std.fmt.allocPrint(allocator, "The answer is {d}", .{42});
defer allocator.free(result);
std.debug.print("Dynamic: {s}\n", .{result});
}
$ zig build-exe allocprint.zig && ./allocprintDynamic: The answer is 42allocPrint returns a slice you must free with allocator.free(result). Use this when output size is unpredictable.
Exercises
- Write a CSV parser using
std.mem.splitandparseIntto read rows of numbers from a comma-separated file. mem.zig - Implement a hex dump utility that formats binary data as hexadecimal with ASCII representation (similar to
hexdump -C). - Create a string validation function that checks if a string contains only ASCII printable characters, rejecting control codes and non-ASCII bytes.
- Build a simple URL encoder/decoder using Base64 for the encoding portion and custom logic for percent-encoding special characters.
Caveats, alternatives, edge cases
- UTF-8 vs. bytes: Zig strings are
[]const u8. Always clarify whether you’re working with bytes (indexing) or code points (semantic characters). Mismatched assumptions cause bugs with multi-byte characters. - Locale-sensitive operations:
std.asciiandstd.unicodedon’t handle locale-specific case mapping or collation. For Turkishivs.Ior locale-aware sorting, you need external libraries. - Float formatting precision:
parseFloatround-trips through text may lose precision for very large or very small numbers. For exact decimal representation, use fixed-point arithmetic or dedicated decimal libraries. - Base64 variants: Standard Base64 uses
+/, URL-safe uses-_. Choose the correct encoder/decoder for your use case (std.base64.standardvs.std.base64.url_safe_no_pad). - Format string safety: Format strings are
comptime-checked, but runtime-constructed format strings won’t benefit from compile-time validation. Avoid building format strings dynamically when possible. - Writer interface: All formatting functions accept
anytypeWriters, allowing output to files, sockets, ArrayLists, or custom destinations. Ensure your Writer implementswrite(self, bytes: []const u8) !usize.