Programmatic Comptime Type Creation in Zig

I’ve recently been learning the Zig programming language. I’ve found it to be a great language as I experiment with systems programming in more depth. It doesn’t hide much from developers, even when it seems magical at first. I appreciate this because it forces me to consider the system holistically.

One of Zig’s flashiest features is comptime. Comptime allows Zig to execute regular Zig code at the time that programs are compiled. If you’re familiar with languages like Rust, C, or C++, you are likely familiar with macros. However, whereas macros often involve a secondary syntax, comptime Zig is Zig, meaning you can get metaprogramming capabilities in a statically compiled language without the overhead of remembering two languages.

In a compiled language, some information is known at compile-time (when the program is turned into an executable), while some is known at runtime (while the program is running inside a process). Comptime allows Zig to use the compile-time information to alter its compilation. This is how Zig supports generics, it allows the compiler to evaluate code up front rather than during runtime, and it even allows creating whole new data types while it’s compiling!

To get more familiar with comptime, I recently created a small library for creating bitfields from schemas. In this post, I’ll describe some of the things I learned along the way, so that you might feel more comfortable doing something similar if and when you ever need to!

This post assumes you’ll have a basic understanding of Zig syntax and types, but no other experience with comptime metaprogramming is required.

NOTE: Zig is still evolving, and syntax could change over time. All examples and links in this post target Zig 0.15.2. In particular, it looks like @Type will be replaced with some more ergonomic built-in functions like @Struct beginning in 0.16.0. I plan to keep bitfield.zig up to date with language changes, but I might not update this blog post, so please be aware of this!

A Brief Aside on Bitfields Link to heading

Because I’ll refer to them throughout the post, it’s worth knowing a little about bitfields and why I thought a library might be useful. I started my career as a network engineer, where interoperability requires well-defined protocols, which you could think of as types in a sense. For example, a DNS request needs to be formatted according to standards, with certain fields set in order for two systems to communicate with each other.

But also, protocols are designed not to waste space. A single bit can convey whether a feature should be “on” or “off,” for instance. To avoid wasting bytes of memory for bits of information, protocols commonly use bitfields - many of these smaller components packed together into a single, larger block. So, for instance, whereas the number 255 could be represented in binary as 11111111, you could also consider this to be eight different little toggles, all set to 1, conveying totally different meanings.

Although network protocols use bitfields frequently, so too do lots of other computer systems. You’ll find them in binary data formats and controls for processor registers too. Since this is an area where Zig fits great, I thought a library that could generate bitfield types from schemas might be useful.

Type Functions: Functions That Return Types Link to heading

If you’ve worked with statically typed languages in the past, the following code snippet probably seems intuitive even if you’ve never worked with Zig:

var foo: u32 = 42;

Although the placement of the type, u32 (an unsigned, 32-bit integer), might not be quite what you’d see in other languages, it would make sense that the type would be defined. Otherwise, it would be unclear how much space to give the variable and even what operations were valid for it.

Now, how about this?

var foo: MaybeSigned(false) = 42;

Wait, what?

In Zig, types are first-class values at compile time. You can pass them as function parameters, and you can derive them from functions evaluated at compile time. Although the type annotation above calls a poorly-named function, as long as MaybeSigned returns a type - it is a so-called “type function” - this is absolutely valid.

Consider a potential definition for the function below.

fn MaybeSigned(comptime is_signed: bool) type {
  if (is_signed) {
    return i32;
  }
  return u32;
}

Notice a few things here. First, the return type is literally type. This means that the function is going to return a value that can be used as a type. Additionally, the is_signed parameter is explicitly identified as a value that must be known at compile-time. (Technically, Zig will enforce type function evaluation only at compile-time, but I find the comptime qualifier to be a nice bit of extra clarification for human readers.) Finally, note that even though the function’s logic is kind of bizarre, it’s still just normal code. The only difference is that the compiler will evaluate this during compilation instead of sticking it into the executable to evaluate at runtime.

Returning types isn’t just a party trick. (What kind of parties are you going to where this would even be a trick?) It is actually quite a common pattern in Zig. Want a new DebugAllocator from a default configuration? Here you go:

var dba: std.heap.DebugAllocator(.{}) = .init;

The .init on the right-hand side here is a declaration literal - a topic for another day - but note that even the value .init relies on the type returned by the function.

Try the following based on our earlier example:

// This isn't going to work!
var foo: MaybeSigned(false) = -42;

Compilation will fail with this example because the function returns an unsigned integer type, and the compiler will kindly inform you type 'u32' cannot represent integer value '-42'. So, not only do we get compile-time types, but we still also get compile-time type safety. Not too bad, huh?

Reflection and Reification Link to heading

Sometimes, it’s useful to introspect data types and change program behavior based on what you find. This is known as reflection. Reflection goes from type to data about the type. Reification is the inverse: it constructs a type from data.

Let’s see how Zig supports both.

Reflection with @typeInfo Link to heading

Type information in Zig only exists at compile-time. It is removed from the executable and is not available using runtime reflection, like some languages provide. But, because of comptime, you can still use reflection to alter program behavior, as long as that behavior doesn’t rely on any information that would only be available at runtime. For example, if behavior relied on a constant or a comptime variable, that would be fine; if behavior relied on user input while the program runs, that’s not possible.

Let’s take a look at comptime reflection to evaluate a type, using the @typeInfo builtin. @typeInfo takes a type as a parameter and returns a std.builtin.Type. This is essentially a mapping from a type to data about the type. The definition of std.builtin.Type looks like this:

// std.builtin.Type (partial)
pub const Type = union(enum) {
  type: void,
  void: void,
  bool: void,
  noreturn: void,
  int: Int,
  float: Float,
  pointer: Pointer,
  array: Array,
  @"struct": Struct,
  // several more members below
};

It’s a tagged union, where each tag correlates to one of the builtin data types in the language. Some of these instances have payloads, which further describe the type. Zooming in on Int, we see this:

// std.builtin.Type.Int
pub const Int = struct {
    signedness: Signedness,
    bits: u16,
};

So, if the tag of Type is "int", it will also contain a payload with an indication of signedness (signed or unsigned) and the bit width. (Not really the point here, but bits being of type u16 means that, yes, integers in Zig can be 65,535 bits wide!)

You can use @typeInfo to go from a type to a struct that allows you to inspect it, then. Consider the following function:

fn printInfo(T: type) void {
  const info = @typeInfo(T);
  switch (info) {
      .int => |i| {
          std.debug.print(
              "(un)signed: {s}, bits: {d}\n",
              .{
                  @tagName(i.signedness),
                  i.bits,
              },
          );
      },
      else => |t| {
          @compileError("not an expected type: " ++ @tagName(t));
      },
  }
}

This function uses @typeInfo to get type information and then alters its behavior based on what it finds. Specifically, if it is an .int instance, it prints the signedness and bit width. Note that because it runs at comptime, though, if this function is ever called by a non-int type, compilation will fail with a note to tell what type it was actually given. This is comptime reflection. If you call it with the following, here’s what happens:

const foo: MaybeSigned(false) = 42;
printInfo(@TypeOf(foo));
// prints "(un)signed: unsigned, bits: 32" at runtime

const bar = true;
printInfo(@TypeOf(bar));
// compilation fails with: "not an expected type: bool"

Reification with @Type Link to heading

As mentioned earlier, @Type will be replaced with more ergonomic alternatives in 0.16.0.

Whereas @typeInfo gives you a std.builtin.Type from a type, @Type gives you a type from a std.builtin.Type. Using a little type coercion, this allows the following:

printInfo(@Type(
    std.builtin.Type{
        .int = .{
            .bits = 16,
            .signedness = .signed,
        },
    },
));
// prints "(un)signed: signed, bits: 16" at runtime

And, this same thing can be done in type functions as well. Let’s make a convenience function for producing arbitrarily-sized unsigned integers, for instance:

fn UInt(comptime bits: u16) type {
  return @Type(.{
      .int = .{
          .bits = bits,
          .signedness = .unsigned,
      },
  });
}

Now, we can do the following:

const bar: UInt(4) = 0;
printInfo(@TypeOf(bar));
// prints "(un)signed: unsigned, bits: 4" at runtime

You could make a similar function that also took a signedness parameter and produce any integer type based on a type function. In fact, this is precisely what the upcoming 0.16.0 ergonomic @Type alternatives (like @Int) will do!

The key insight here is that if you can describe a type as data, you can construct that type. And, this isn’t limited to “simple” types like numbers - you can do the same thing with compound types like structs.

Building a Struct at Comptime Link to heading

So, let’s see what this looks like. In the DNS header, there’s a run of four bits that we’ll use as an example. They are the AA (Authoritative Answer), TC (TrunCation), RD (Recursion Desired), and RA (Recursion Available) bits. The meaning of these bits is not important to the discussion - just know that each one signals whether something is “true” or “false” essentially.

Here’s how we could define this in Zig. Note that this is a packed struct, rather than a standard (or auto) struct. Packed structs have the property that the layout is well-defined (Zig reserves the right to order fields in standard structs however the compiler sees fit), and there is no extra padding between fields (an extension of the previous property).

const simple_dns_bitfield = packed struct {
    aa: bool,
    tc: bool,
    rd: bool,
    ra: bool,
};
const message: simple_dns_bitfield = .{
    .aa = true,
    .tc = false,
    .rd = false,
    .ra = false,
};
std.debug.print(
    "message representation: {b:04}\n",
    .{
        @as(
            u4,
            @bitCast(message),
        ),
    },
);

If you run the code above, you might expect that the printed representation would be 1000 because the first bit is true, while the others are false. One quirk of packed structs is that they actually map the first field to the least significant bits. So, what you’ll actually see printed is: "message representation: 0001". This may just be how my brain works, but this is easy to get wrong.

This works fine for a small struct, but imagine doing this for dozens of different protocol formats. Let’s see how we can generate this struct instead.

As we know from the previous section, if we can construct a std.builtin.Type, then we can construct an actual type. For structs, the enum variant to look at is @"struct". (This syntax looks a little weird, but the enclosing @"" just allows you to re-use reserved language keywords as identifiers.) The type of this variant is std.builtin.Type.Struct. It and some of its useful subfield types are below:

// Member of std.builtin.Type
pub const Struct = struct {
    layout: ContainerLayout,
    /// Only valid if layout is .@"packed"
    backing_integer: ?type = null,
    fields: []const StructField,
    decls: []const Declaration,
    is_tuple: bool,
};

// Member of std.builtin.Type
pub const ContainerLayout = enum(u2) {
    auto,
    @"extern",
    @"packed",
};

// Member of std.builtin.Type
pub const StructField = struct {
    name: [:0]const u8,
    type: type,
    /// The type of the default value is the type of this struct field, which
    /// is the value of the `type` field in this struct. However there is no
    /// way to refer to that type here, so we use `*const anyopaque`.
    default_value_ptr: ?*const anyopaque,
    is_comptime: bool,
    alignment: comptime_int,
};

Let’s use this newfound knowledge to reflect on our simple bitfield from above, adding a new switch prong in printInfo to print useful information about structs.

// The same printInfo from above, now with some new code in the middle
fn printInfo(T: type) void {
  const info = @typeInfo(T);
  switch (info) {
    // same int case as earlier above here...

    .@"struct" => |s| {
        std.debug.print(
            "layout: {s}\n",
            .{
                @tagName(s.layout),
            },
        );
        // inline forces this loop to run at comptime, which
        // is required because type information is not known
        // at runtime
        inline for (s.fields) |field| {
            std.debug.print(
                "field name:     {s}\n",
                .{field.name},
            );
            std.debug.print(
                "field type:     {s}\n",
                .{@typeName(field.type)},
            );
            std.debug.print(
                "field align:    {d}\n",
                .{field.alignment},
            );
            std.debug.print(
                "comptime only:  {}\n",
                .{field.is_comptime},
            );
            std.debug.print(
                "default ptr:    {?}\n",
                .{field.default_value_ptr},
            );
            std.debug.print("\n", .{});
        }
    },

    // same else case as earlier below here...
  }
}

There is a comment in the snippet above, but just to reiterate: An inline loop is evaluated at compile time, whereas a standard loop is evaluated at runtime. At the risk of belaboring the point, type information is not available at runtime in Zig, so if you want to evaluate a loop using type information, you need it to be inline.

Now, let’s use it to inspect our manually constructed struct type.

printInfo(simple_dns_bitfield);
// prints the following:
//  layout: packed
//  field name:     aa
//  field type:     bool
//  field align:    0
//  comptime only:  false
//  default ptr:    null
//
//  field name:     tc
//  field type:     bool
//  field align:    0
//  comptime only:  false
//  default ptr:    null
//
//  field name:     rd
//  field type:     bool
//  field align:    0
//  comptime only:  false
//  default ptr:    null
//
//  field name:     ra
//  field type:     bool
//  field align:    0
//  comptime only:  false
//  default ptr:    null

Some of the information printed might not seem useful at first. However, we have to provide values for every field of the StructField type, so by printing the values, we can see a little more about what is done using normal struct definition syntax.

Let’s see if we can recreate the simple_dns_bitfield type - or, at least, a type that is identical to it - using reification, based on what we’ve explored in the std.builtin.Type.Struct type.

const another_simple_dns_bitfield = @Type(
    .{
        .@"struct" = .{
            .layout = .@"packed",
            .fields = &.{
                .{
                    .name = "aa",
                    .type = bool,
                    .alignment = 0,
                    .is_comptime = false,
                    .default_value_ptr = null,
                },
                .{
                    .name = "tc",
                    .type = bool,
                    .alignment = 0,
                    .is_comptime = false,
                    .default_value_ptr = null,
                },
                .{
                    .name = "rd",
                    .type = bool,
                    .alignment = 0,
                    .is_comptime = false,
                    .default_value_ptr = null,
                },
                .{
                    .name = "ra",
                    .type = bool,
                    .alignment = 0,
                    .is_comptime = false,
                    .default_value_ptr = null,
                },
            },
            .is_tuple = false,
            .decls = &.{},
        },
    },
);

printInfo(another_simple_dns_bitfield);
// prints the following:
//  layout: packed
//  field name:     aa
//  field type:     bool
//  field align:    0
//  comptime only:  false
//  default ptr:    null
//
//  field name:     tc
//  field type:     bool
//  field align:    0
//  comptime only:  false
//  default ptr:    null
//
//  field name:     rd
//  field type:     bool
//  field align:    0
//  comptime only:  false
//  default ptr:    null
//
//  field name:     ra
//  field type:     bool
//  field align:    0
//  comptime only:  false
//  default ptr:    null

I glossed over some of the required parameters, such as is_tuple and decls. The first simply indicates whether the struct is a tuple (values accessed by index rather than named fields). The second is a little more interesting: it refers to the set of associated functions and methods of the struct. This is actually one thing that the language explicitly will not allow you to create at comptime using reification, however. Andrew Kelley has said this was an intentional decision in the language. I won’t be covering this in this post, but there are probably ways to accomplish the goal of returning types with behaviors for whatever your use case - it might just be that the way to do it isn’t exactly what you first think of.

So, now we’ve created a new struct type, similar to the one we defined using the typical struct type definition syntax, but we’ve done it from an abstract description of the struct! But, wow, wasn’t that a lot of typing?! Well, yes, it was. But, remember how we created a UInt function earlier that took a parameter and then produced a consistent type using only the parameter to differentiate? We can do the same thing here!

A Simple Bitfield Function Link to heading

Let’s get back to the essence of a bitfield. For our purposes here, we’re only going to concern ourselves with on/off flags, just like we looked at in the DNS example above. I contend that the primary requirements will be: 1) each flag is accessible by a name; and, 2) the bitfield can be represented as a collection, numerically (i.e., castable to an unsigned integer).

This is, admittedly, an oversimplification, but I think it’s a useful set of requirements for the current discussion. And, if you’d like to see a more thorough implementation, be sure to check out bitfield.zig.

To produce a bitfield with these properties, what if we had a function that took a slice of field names and gave us back a packed struct? Remember, too, that the order Zig puts packed struct fields is the opposite of how most protocols are defined (Zig puts the first field at the least-significant bit, whereas protocols start from the most-significant bit). Let’s write a function like this:

/// A type function that produces a bitfield, comprised of a
/// set of flags.
fn BitField(comptime flags: []const [:0]const u8) type {
    if (flags.len == 0) {
        @compileError("cannot produce a bitfield with zero flags");
    }

    var struct_fields: [flags.len]std.builtin.Type.StructField = undefined;
    // Reverse the fields so that they get into the right order for Zig packed
    // structs.
    inline for (0..flags.len) |i| {
        const flag = flags[i];
        const order = flags.len - i - 1;
        struct_fields[order] = .{
            .name = flag,
            .type = bool,
            .is_comptime = false,
            .alignment = 0,
            .default_value_ptr = null,
        };
    }

    return @Type(
        .{
            .@"struct" = .{
                .layout = .@"packed",
                .fields = &struct_fields,
                .is_tuple = false,
                .decls = &.{},
            },
        },
    );
}

Note that we have compile time parameter checking - if no flags are provided, then the bitfield is useless, so this is a compilation error. We also use an inline for loop (remember: required if you want to loop at comptime!) to create each StructField and to put them into the right order, which is the opposite order that Zig uses by default.

With this, we now have the infrastructure in place to produce any flag-based bitfield much more simply. Let’s see how this comes together in our DNS example.

Putting it Together: Creating a DNS Bitfield Link to heading

Let’s create a DNS bitfield using our new type function:

const dynamic_dns_bitfield = BitField(
    &.{
        "aa",
        "tc",
        "rd",
        "ra",
    },
);
printInfo(dynamic_dns_bitfield);
// prints the following:
//  layout: packed
//  field name:     ra
//  field type:     bool
//  field align:    0
//  comptime only:  false
//  default ptr:    null
//
//  field name:     rd
//  field type:     bool
//  field align:    0
//  comptime only:  false
//  default ptr:    null
//
//  field name:     tc
//  field type:     bool
//  field align:    0
//  comptime only:  false
//  default ptr:    null
//
//  field name:     aa
//  field type:     bool
//  field align:    0
//  comptime only:  false
//  default ptr:    null

That’s not too difficult to create now! And, note that it mostly produces the same type as before, but now we’ve even flipped the bits around so that they are in the right order. Let’s create a message and verify this:

const dynamic_message: dynamic_dns_bitfield = .{
    .aa = true,
    .tc = false,
    .rd = false,
    .ra = false,
};
std.debug.print(
    "dynamic representation: {b:04}\n",
    .{@as(u4, @bitCast(dynamic_message))},
);
// prints the following:
// dynamic representation: 1000

Nice!

Conclusion Link to heading

In this post, we’ve explored one corner of Zig’s comptime capabilities, focusing on comptime reflection and type reification. Using comptime, we can programmatically generate new data types that follow common patterns while a program is being compiled. This allows for flexible data type creation, and these new types are baked right into the executable - they’re zero-cost at runtime!

I learned a lot while working on bitfield.zig. I encourage you to look at that repo for more detailed and flexible bitfield construction. But at the very least, I hope you now have a better understanding of programmatic comptime type creation in Zig!