I have recently been exploring some systems programming languages, with a recent focus on Zig. It seems like there’s been a lot of buzz about the language in the past year or so, and I wanted to get some hands-on experience with it to see what that buzz was all about. In general, I’ve quite enjoyed building a couple of small projects on Codecrafters using Zig. Whether I end up using Zig on a regular basis in the future or not, it has some interesting design choices that I think will help me be a better developer in general - not least of which is keeping every memory allocation top-of-mind.
An interesting thing about Zig is that it doesn’t hide much from the developer. There is no syntactic sugar for interfaces, for instance, but interface-like patterns are supported and common. Likewise, generics don’t have special syntax, but they are also very common. You, as the developer, just have to use the language, as it is, to implement these things on your own. (Thankfully, there are lots of common patterns available in the standard library and in the growing ecosystem, so you’re not starting from scratch here!)
I really appreciate this explicitness, especially as I’m learning the language.
But, I’ll admit that some of the syntax it provides for unions seemed “magical”
to me. I wasn’t quite sure why, sometimes, it was okay to use .foo in a
union assignment, whereas other times, it might need to be .{ .bar = 42 }. I
felt like there was just this bizarre, inconsistent syntax I needed to remember
for different types of union values.
I’ve since learned a bit about how Zig type coercion works and why the syntax is consistent as long as you realize coercion is happening and recognize some of the bits of sugar the language does provide. This post describes my (new) mental model, and I hope it helps clear up some potential confusion for other newcomers to the language.
NOTE: Zig is still evolving, and syntax could change over time. All examples and links in this post target Zig 0.15.2.
A Motivating Example Link to heading
Let’s start with a small tagged union type:
const MyUnion = union(enum) {
one,
two,
three: u8,
};
Any instance of MyUnion can have one of three tags: one and two have no
payload (they are void types), while three carries an unsigned 8-bit integer.
We’ll come back to MyUnion later on, but it’s worth knowing that this isn’t
a toy example. Consider the ResetMode type that you can
provide to reset Arena Allocators. In 0.15.2, it is defined as follows:
pub const ResetMode = union(enum) {
/// Releases all allocated memory in the arena.
free_all,
/// This will pre-heat the arena for future allocations by allocating a
/// large enough buffer for all previously done allocations.
/// Preheating will speed up the allocation process by invoking the backing allocator
/// less often than before. If `reset()` is used in a loop, this means that after the
/// biggest operation, no memory allocations are performed anymore.
retain_capacity,
/// This is the same as `retain_capacity`, but the memory will be shrunk to
/// this value if it exceeds the limit.
retain_with_limit: usize,
};
When you call reset on an arena, you must provide an instance
of this tagged union, which has two fields whose payload type is void and one
with a payload of type usize. If you read Zig code, you might come across calls
like arena.reset(.free_all) or arena.reset(.{ .retain_with_limit = 1024 }).
It seems clear here that there’s something different between the void non-void
enum variants. I mistook this originally as just a difference in syntax for
different payload types. But in reality, it isn’t special syntax - it’s type
coercion made possible by information the compiler already has.
Type Coercion Link to heading
The Zig Language Reference says (emphasis mine):
Type coercion occurs when one type is expected, but [a] different type is provided. Type coercions are only allowed when it is completely unambiguous how to get from one type to another, and the transformation is guaranteed to be safe.
So, when Zig expects one data type, but is given another one that it can reliably translate into the expected type, it will do so at compile time. What does it mean for Zig to “expect” a type? Well, this is where the compiler relies on context from the source code. Let’s consider a brief example:
// The type of the number 42, here, is a comptime-known integer
// literal. Because the compiler has no additional context, it gives
// `a` the type comptime_int.
const a = 42;
// 42 is _still_ a comptime-known integer literal. However, we have told
// Zig that the type of the variable b is an 8-bit, unsigned integer using
// a type annotation. Zig knows how to reliably and unambiguously change
// the comptime_int value 42 into a u8. So, it does this for us during
// compilation, coercing the type into what is required for variable b.
const b: u8 = 42;
Context is critical here for the compiler. The example above shows a type annotation, but other annotations also provide context - function parameter types, return types, struct fields, and union fields.
It’s worth being very clear here: the compiler needs to know the value in order
to ensure it can coerce it. If the literal value were out of bounds for a u8
(let’s say 300, as an example), compilation would fail. Similarly, if you were
assigning the value of a larger-sized variable, the compiler would not be able
to guarantee coercion safety, so compilation would fail.
This same idea - comptime-known values plus context - is exactly what enables the union syntax we started with.
Enum Literals Link to heading
We’ve seen that an integer literal can be coerced into a particular type given that the compiler has the right context. But, Zig has other literal types. Consider the case of enum literals:
Enum literals allow specifying the name of an enum field without specifying the enum type.
Consider the following example, which shows enum literal syntax and typing.
const MyEnum = enum {
foo,
bar,
};
const YourEnum = enum {
foo,
};
// The .foo syntax represents an enum literal, "foo" - however, it does not
// have a more precise type. It is an enum tag name, waiting for context.
// It is neither `MyEnum.foo` nor `YourEnum.foo`. Nor is it any other concrete
// type!
const a = .foo;
// `.foo` is still an enum literal. However, the Zig compiler knows that b
// is of type `MyEnum`. Therefore, it is safely and unambiguously able to
// determine that the tag `foo` is a valid member of `MyEnum` and coerce
// this into the proper type for the variable.
const b: MyEnum = .foo;
You may wonder what the point of this syntax is. Looking only at the above, you don’t really save any typing using an alternative way to write the same value:
const b = MyEnum.foo;
But there’s a bit more than just saving keystrokes going on. Enum literals allow values to be written without naming the concrete type, letting context determine it later.
And, remember that type annotations are not the only source of context for the compiler. Function parameters and union fields also include types. Both of those will be relevant for our discussion.
Union and Enum Coercion Link to heading
The language reference provides additional information about coercion between various data types, including between unions and enums. The core rule relevant here is simple:
Tagged unions can be coerced to enums, and enums can be coerced to tagged unions when they are comptime-known to be a field of the union that has only one possible value, such as void:
(This isn’t a main point of this post, but the first condition provides a
mental model for why you can do something like switch (a_union) with cases
that are its enum variants.)
The second rule demonstrates exactly why enum literals can be coerced to tagged
unions when the corresponding union variant has no payload. Let’s go back to the
MyUnion example introduced earlier and see how this could look.
const MyUnion = union(enum) {
one,
two,
three: u8,
};
// This is still an enum literal, as described earlier. The tag "one" doesn't
// mean anything without extra context.
const a = .one;
// Aha! Zig knows that the type of `b` is `MyUnion`. It knows the concrete
// type for `b` is `MyUnion`, and that `MyUnion` has a variant with the tag `one`.
// Further, that variant can only ever have one possible value - `void`. So,
// at compile-time, Zig can unambiguously and safely coerce this into the
// correct variant of `MyUnion`.
const b: MyUnion = .one;
Anywhere a MyUnion is expected, at compile-time, Zig will be able to coerce
the enum literals .one and .two into that type. This extends to function
parameters as well, explaining why it’s fine to write code like
function_that_takes_my_union(.one) instead of
function_that_takes_my_union(MyUnion.one). Or, corresponding to the example
of the arena rest method that takes a ResetMode argument,
arena.reset(.free_all).
Let’s take a moment to consider what would happen if we tried something like this:
// THIS CODE IS NOT VALID!
const c: MyUnion = .three;
It seems reasonable that Zig would be able to identify that the enum literal
.three has a corresponding tag in MyUnion with no issue. But, what value
should it assign to that tag’s payload? It’s a u8 - if Zig had zero values
like some other languages like Go, maybe the zero value (probably 0) would make
sense? Or should it assign undefined perhaps? It’s ambiguous, and Zig will
only coerce types when doing so is unambiguous.
Anonymous Struct Literals Link to heading
It’s time to introduce the final type that brings the shorthand syntax for union assignment together. Similar to how Zig provides enum literals, which require context to coerce to concrete types, it also provides anonymous struct literals.
Let’s take a look at how this works for coercion to named structs first, with fields of a few different types. This may feel like a detour, but anonymous struct literals are the missing piece that explains the remaining union syntax.
const MyEnum = enum {
one,
two,
};
const MyStruct = struct {
foo: u8,
bar: bool,
baz: MyEnum,
};
// The right-hand side of this assignment is an anonymous struct literal. It
// is enclosed in a `.{}`, with the dot taking the place of the struct name.
// Zig infers the names and types of the fields as best as possible, based
// on the assignments found inside. We'll see how it does at assigning types
// in a moment.
const a = .{
.foo = 2,
.bar = false,
.baz = .one,
};
// Here, we've given Zig context about what `b` is - it's a `MyStruct`. The
// right-hand side is still an anonymous struct literal. However, now the
// compiler knows how to coerce any fields within the anonymous struct so that
// they match the expectations for `MyStruct`.
const b: MyStruct = .{
.foo = 42,
.bar = true,
.baz = .two,
};
It’s worth noting that the .foo, .bar, and .baz identifiers above have
nothing to do with enum literals, even though they also start with .. The
.id = <value> syntax is just how fields are assigned in Zig.
Continuing from above, let’s see how Zig has identified types for the fields of
a and b.
std.debug.print("type of a: {s}\n", .{@typeName(@TypeOf(a))});
std.debug.print("type of a.foo: {s}\n", .{@typeName(@TypeOf(a.foo))});
std.debug.print("type of a.bar: {s}\n", .{@typeName(@TypeOf(a.bar))});
std.debug.print("type of a.baz: {s}\n", .{@typeName(@TypeOf(a.baz))});
std.debug.print("\n", .{});
std.debug.print("type of b: {s}\n", .{@typeName(@TypeOf(b))});
std.debug.print("type of b.foo: {s}\n", .{@typeName(@TypeOf(b.foo))});
std.debug.print("type of b.bar: {s}\n", .{@typeName(@TypeOf(b.bar))});
std.debug.print("type of b.baz: {s}\n", .{@typeName(@TypeOf(b.baz))});
This gives us something like the following:
type of a: main.main__struct_19019
type of a.foo: comptime_int
type of a.bar: bool
type of a.baz: @Type(.enum_literal)
type of b: main.main.MyStruct
type of b.foo: u8
type of b.bar: bool
type of b.baz: main.main.MyEnum
For a, Zig was able to identify that bar is a boolean because false is
always a bool - there aren’t multiple “flavors” of boolean. But, it did not
have context to coerce the fields foo or baz to concrete types.
On the other hand, based on the type of b, Zig is able to do the type coercions
necessary on each field to assign the proper types.
In essence, when converting the anonymous struct type to a concrete type, Zig looks at each field to confirm that the anonymous field has a corresponding concrete field, and then it coerces the value of the anonymous field to the concrete type.
Note: This post is already quite long, but the mechanism Zig uses for this field-by-field writing is called Result Location Semantics. This is a fascinating core feature of the language that has performance impacts and is an example of Zig minimizing allocations, and I may write about it more in the future.
Anonymous Struct Literals and Tagged Unions Link to heading
So, now we have the background to look at coercion from anonymous struct literals to tagged unions. The anonymous struct literal fields need names and values. And union fields with more than one possible value require a value to be set. So, we have a good alignment of features here!
const MyUnion = union(enum) {
one,
two,
three: u8,
};
const b: MyUnion = .{
.three = 42,
};
Just like with structs, Zig looks for a corresponding field when coercing the
anonymous struct literal above into the MyUnion type. It finds a matching field,
three. So, then, it checks whether the assigned value can be coerced to the
type expected by the union, and it finds that - yes, indeed - 42 can be
unambiguously coerced into an unsigned 8-bit integer. So, this is how it
is able to create a MyUnion from an anonymous struct literal!
A property of unions in Zig is that only one “field” can be active at a time. If the anonymous struct literal has more than one field, it becomes unclear what to do with the “extra” field, so coercion will fail, and you’ll get a compile error.
Could you use anonymous struct literals to instantiate unions that have their
void fields active? Yes, but the syntax is a little verbose.
const MyUnion = union(enum) {
one,
two,
three: u8,
};
const b: MyUnion = .{
.one = {},
};
In the anonymous struct, the field must have a value assigned. In Zig, a block
with no labeled break evaluates to void. So we assign the result of the
expression {} here.
So, while this is absolutely possible, it’s a little briefer (and arguably clearer -
the purpose of the {} might not be obvious) when assigning a void variant
to just use the const b: MyUnion = .one; syntax, relying on enum to union
coercion.
Conclusion Link to heading
As I dug into what originally seemed like “magic” Zig syntax, I came to realize that it is actually quite consistent - you just need to know a little bit about some of the language’s features to understand it. I am a big fan of language consistency, so making these connections really gave me a much better feel and appreciation for the language. As I mentioned, I’m still pretty new to Zig, but this is the kind of signal that makes me look forward to using it more.
If you’ve stuck around with me for this long, thanks! I hope you also may have come away with a better mental model for Zig’s type coercion and how it enables a minimal, consistent syntax.