C / C++ data type specification

I see many questions on StackOverflow regarding how to decipher C and C++ type definitions. I also used to get asked about them a fair bit back at university. Needless to say, plenty of people see them as some weird messy voodoo.

Personally, I find them easy as they are precise, terse, and consistent with the language syntax. I’m not some hyper-intelligent demi-god with a superpower for deciphering syntax though, so why do I find them easy?

The parsing rules

There are three rules to follow for parsing a C/C++ type specifier:

  • Start at the type name (or where the name should be in the case of anonymous types)
  • Move right when you can
  • Move left when you must

And that’s all there is to it. Like a really basic Turing machine.

Examples

int *var

int *var;

Start at the variable name. We can’t move right, so we move left: *. We still can’t move right, so we move left again: int.

So we have the expression var * int – “var is a pointer to an int”.

When I first moved to C from Pascal, I used to write this kind of type as int* var, in line with the Pascal style (var: ^Integer;) where you have “variable-name: type”. In C however, variable declarations are expressed a little differently: int *var says that *var is an int. For more complex types, this style of thinking is quite important, so I stopped using the Pascal-style for denoting types in C.

char * const (*)(void (*a)(int), const int * b, int const * c, int * const d)

char * const (*)(long (*a[])(int), const int * b, int const * c, int * const d)
              ^
         start here

Start at the right of the asterisk illustrated above (where the variable name would go if the declaration wasn’t anonymous). We can’t move right, so we move left: *. Now we’ve consumed the parenthesized part so we can move right: (...). We can’t move right, so we must move left, consuming: const, *, char.

Now we have:

* (...) const * char – “pointer to function that returns constant pointer to char”

To parse the types of the function arguments, re-apply the process for each type:

  • a – long (*a[])(int) parses to a [] * (int) long – “a is array of pointers to function of int that returns long”.
  • b – const int * b parses to b * int const – “b is pointer to int that is constant”
  • c – int const * c parses to c * const int – “c is pointer to constant int” (same as type of argument “b”)
  • d – int * const d parses to d const * int – “d is constant pointer to int”.

“b” and “c” point to constant int, so ints *b and *c are immutable but pointers b and c are mutable.

“d” is immutable, but the int that it points to (*d) is mutable.

Putting it all together:

char * const (*)(long (*a[])(int), const int * b, int const * c, int * const d)

“pointer to function of (a, b, c, d) that returns a constant pointer to char”, where:

  • “a” is “array of pointers to function of int that returns long”
  • “b” is “pointer to constant int”
  • “c” is “pointer to constant int”
  • “d” is “constant pointer to int”

Sure, it seems complicated, but the type expression was complicated. To represent that same type in Pascal, you would need several separate type definitions to compose the final data type. In C++, it’s a single expression, which can be aliased or typedef’d as needed.