Of all the reasons to love metaprogramming,1 probably the most compelling is that it lets us embed “little languages” in our programs. For example, we might want to express a parser for complex numbers this way, rather than explicitly writing a recursive descent parser ourselves:
auto parse_complex = '(' >> double_ >> -(',' >> double_) >> ')' | double_;
Or, instead of hand-coding the state machine for a CD player, we might want to express its transition table like this:
auto player = Stopped + play [some_guard] / (some_action , start_playback) == Playing , Stopped + open_close/ open_drawer == Open , Stopped + stop == Stopped , Open + open_close / close_drawer == Empty , Empty + open_close / open_drawer == Open , Empty + cd_detected [good_disk_format] / store_cd_info == Stopped
These two examples, using Boost’s Spirit and Meta State Machine libraries respectively, show how one can write a parser and a state machine. The metaprograms in each of these libraries accept a user’s high-level declaration of the desired result and generate super-efficient executable code. The “little language” used to express each declaration is specific to the library’s domain (parsing or state machines) and is embedded in the host language, C++, so it is commonly called an Embedded Domain Specific Language (EDSL).
Expression Templates
Both of the EDSLs shown above use a technique called “expression templates.” In expression templates, operators and functions are overloaded so that instead of doing what you’d normally think of as immediate computation, they build up a compile-time parse tree that the metaprogram eventually uses to generate the desired code. This “delayed evaluation” allows libraries to perform optimizations that would be impossible if expressions were evaluated immmediately, because they can take into account the whole expression.
Expression templates have proven to be broadly applicable and remarkably effective, to the point where Boost has a library, Proto, just for building expression template libraries. However, they come with one severe limitation: the languages they can express have to be valid C++ expressions. One of the most important strengths of these little languages is that often, they already exist, are broadly known, and have evolved a syntax that is appropriate for human consumption in their domain. In some cases, this evolution occurred centuries before the language was ever processed by a computer (consider music notation for example). As a result, few DSLs can be faithfully captured as valid C++ expressions; we can at best come up with a near approximation. Even some very “computer-y” languages, such as regular expressions, don’t render directly in C++:
auto all_caps = [A-Z]+; // SYNTAX ERROR
What About Good Old Strings?
Instead of C++ expressions, of course, strings could be used to represent the DSL code snippet in the embedding C++ code. In fact, that’s an approach used by many “traditional” libraries:
auto all_caps = std::regex("[A-Z]+"); // OK
The only problem is, the contents of that string are only known at runtime, which means the language has to be interpreted at runtime, and we lose the benefit of being able to translate the structure into compiled code. The “delayed evaluation” has been delayed too long.
If, however, these strings could be parsed at compile-time, the resulting programs could have optimal performance again. In order to be able to do this, one has to be able to pass a string to a template metaprogram. The template metaprogram has to be able to process it and it has to remain easily readable in the C++ code at the same time. An ideal way of doing it would be being able to write something like
tmp_string<"Hello World!">
but the string literal "Hello World!" is a character array with
internal linkage and therefore can not be a template argument.
Strings in Metaprograms
So how do we represent a string so that a template metaprogram can access the sequence of characters?
MPL provides various sequences for template metaprograms, with operations for accessing the individual elements or iterating over them at compile-time. For example:
boost::mpl::vector_c< char, 'H','e','l','l','o',' ','W','o','r','l','d','!' > s; char const fifth_char // 'o' = boost::mpl::at_c<s,4>::type::value; // like s[4]
Once we have a string represented as this way, processing it is straightforward.2 But how can we get there? We could write it out as above, but that format is hard to read and write. MPL does provide a special wrapper template accepting multi-character constants:
boost::mpl::string<'Hell','o Wo','rld!'>
That makes it a bit shorter and quicker to type, but it’s arguably gotten harder to read. Also, changing the string – inserting and removing characters – is extremely difficult now because it has to be broken into four-character chunks, and the boundaries are always shifting as we edit.
A Better Way
What we want is a way of transforming a real C++ string literal like
"Hello, World!" into an MPL sequence automatically. To get there,
we’re going to build the string up one character at a time, using
MPL’s push_back metafunction, like this:
boost::mpl::push_back< boost::mpl::string<'Hell'>, boost::mpl::char_<'o'> >::type // approximately boost::mpl::string<'Hell','o'>
Because MPL sequence elements are actually all types, we use
mpl::char_ wrapper to turn the 'o' character into a
type. The above expression constructs the string "Hello" as a
compile-time sequence of mpl::char_s.
We can use this approach to construct the entire string, character-by-character:
boost::mpl::push_back< boost::mpl::push_back< // ... boost::mpl::push_back< boost::mpl::string<>, // The empty string boost::mpl::char_<'H'> >::type, // ... boost::mpl::char_<'d'> >::type, boost::mpl::char_<'!'> >::type
This type-expression starts with the empty string and appends the
characters of "Hello World!" one by one—but naturally, we don’t
want to write this out by hand; the trick is to get the compiler to
generate it for us automatically. To do that, we’ll need to exploit
some new language features.
Generalized Constant Expressions
For C++11, we standardized “generalized constant expressions”3, which basically did two things:
Newly blessed many things as compile-time constants, including string literals:
enum { o = "Hello"[4] }; // OK in C++11
Added
constexprfunctions, whose results are compile-time constants if all their arguments are, too.constexpr unsigned factorial(unsigned x) { return x == 0 ? 1 : x * factorial(x-1); } int a[factorial(6)]; // OK in C++11
At first blush it probably seems like these two features give us
everything we need to straightforwardly operate on string literals at
compile-time. Unfortunately, constexpr functions are crippled by a
rule that says they have to compile as ordinary functions. If the
arguments aren’t all constant expressions, the function’s work will
simply be done at runtime, and the result won’t be a compile-time
constant. As a result, inside the function—and in its return
type—we have to treat the parameters as though they are regular
runtime values. That prevents the argument values from ever
affecting types, which as we all know are the bread and butter of C++
metaprogramming.
constexpr std::array<int,n> // ERROR: n is not a constant expression f(unsigned n); constexpr void g(unsigned m) { std::array<int, m> a; // ERROR: m is not a constant expression }
Sidestepping Limitation
Fortunately, all is not lost. We may not be able to do much with
constexpr functions, but as our earlier example showed, a string
literal is a constant expression, and the indexing operator on an
array that is a constant expression is also a constant expression.
Thus, "Hello World!"[0] is a constant expression, and our type
sequence could also be constructed the following way:
boost::mpl::push_back< boost::mpl::push_back< // ... boost::mpl::push_back< boost::mpl::string<>, boost::mpl::char_<"Hello World!"[0]> >::type, // ... boost::mpl::char_<"Hello World!"[10]> >::type, boost::mpl::char_<"Hello World!"[11]> >::type
The above expression is just like the previous version of the
sequence, but instead of naming individual characters, it pulls them
out of "Hello World!" by index.
We know, complexity builds on complexity, and it seems like things are only getting worse! But wait: the above expression can be generated by a preprocessor macro taking the string literal as a macro argument:
#define _S(s) \
boost::mpl::push_back< \
boost::mpl::push_back< \
// ... \
boost::mpl::push_back< \
boost::mpl::string<>, \
boost::mpl::char_<s[0]> \
>::type, \
// ... \
boost::mpl::char_<s[10]> \
>::type, \
boost::mpl::char_<s[11]> \
>::typeNow the string "Hello World!" can be passed as a template
argument, like this:
f<_S("Hello World!")>
Cleaning Up The Macro
Had we not abbreviated the macro above, it would have been four times as long and a stellar example of DRY violation. To get this code under control, we can use Boost’s preprocessor metaprogramming library to generate the repetition for us:
#define PRE(z, n, u) boost::mpl::push_back< #define POST(z, n, u) , boost::mpl::char_<s[n]>>::type #define _S(s) \ BOOST_PP_REPEAT(12, PRE, ~) \ boost::mpl::string<>, \ BOOST_PP_REPEAT(12, POST, ~)
When BOOST_PP_REPEAT expands PRE or POST, it passes an index to it
as the n macro argument – the position of the expansion in the list.
POST uses this index to generate the index of the actual character in
s.4
Finally, constexpr is Useful
The value 12 was chosen because it matches the length of the string
used in the example, but our macro can’t handle shorter strings. For
example, _S("foo") would expand to the following expression:
boost::mpl::push_back< // ... boost::mpl::push_back< boost::mpl::push_back< boost::mpl::push_back< boost::mpl::push_back< boost::mpl::string<>, boost::mpl::char_<"foo"[0]> >::type, boost::mpl::char_<"foo"[1]> >::type, boost::mpl::char_<"foo"[2]> >::type, boost::mpl::char_<"foo"[3]> >::type, // ... boost::mpl::char_<"foo"[11]> >::type
Since {'f', 'o', 'o', '\0'} is only four elements long, we index
past the end of the array 9 times. Fortunately for us, this is all
happening at compile-time and the compiler issues an error. Being
safe isn’t enough, though; we want to be capable, too! To get around
this problem, we can write our own function template for indexing the
array, and make it constexpr:
template <int N> constexpr char at(char const(&s)[N], int i) { return i >= N ? '\0' : s[i]; }
The form char const(&s)[N] declares s to be a reference to an
array of N const chars, allowing us to deduce the array length
and simply return zero when the index exceeds it. So now we can
replace the indexing operator used by the _S macro:
#define PRE(z, n, u) boost::mpl::push_back< #define POST(z, n, u) , boost::mpl::char_<at(s, n)>>::type #define _S(s) \ BOOST_PP_REPEAT(12, PRE, ~) \ boost::mpl::string<>, \ BOOST_PP_REPEAT(12, POST, ~)
Those Pesky Zeroes
The above implementation allows us to compile _S("foo"), but now
produces an invalid result. Instead of building a sequence
corresponding to "foo", it builds one corresponding to
"foo\0\0\0\0\0\0\0\0\0". Since _S is not aware of the length of
the string, it just keeps appending null characters to the end of the
string. To avoid this fate, we can replace our use of
boost::mpl::push_back with a wrapper that stops extending the string
when it’s long enough:
template <class S, char C, bool EOS> struct push_back_if_c : boost::mpl::push_back<S, boost::mpl::char_<C>> {}; template <class S, char C> struct push_back_if_c<S, C, true> : S {};
This wrapper takes the original string, the character to append and an
extra boolean argument that, when true, indicates that we’re past the
end of the string. In that case, we just return the previous result;
otherwise, we forward the argument on to push_back.
Using this function, our macro can properly handle strings shorter than 12 characters:
#define PRE(z, n, u) push_back_if_c< #define POST(z, n, u) , at(s, n), (n >= sizeof(s)-1)>::type #define _S(s) \ BOOST_PP_REPEAT(12, PRE, ~) \ boost::mpl::string<>, \ BOOST_PP_REPEAT(12, POST, ~)
Configurable Limits
What about strings longer than 12 characters? Well, our macro breaks. However, if we replace the hard-coded 12 with a symbol, we can allow users to set the limit themselves:
#ifndef STRING_MAX_LENGTH #define STRING_MAX_LENGTH 32 #endif #define PRE(z, n, s) push_back_if_c< #define POST(z, n, s) , at(s, n), (n >= sizeof(s)-1)>::type #define _S(s) \ BOOST_PP_REPEAT(STRING_MAX_LENGTH, PRE, ~) \ boost::mpl::string<> \ BOOST_PP_REPEAT(STRING_MAX_LENGTH, POST, s)
This code introduces the macro STRING_MAX_LENGTH for representing
the maximal string length _S can handle. If the user doesn’t
#define STRING_MAX_LENGTH before including the definition of _S,
the default value 32 is used.
Given that push_back_if is used STRING_MAX_LENGTH times for every
usage of _S, it makes sense not to choose a higher limit than
necessary. Note, however, that for each unused character, the same set of arguments
is passed repeatedly to push_back_if_c:
push_back_if_c< // The final string, '\0', true >
Therefore, unused length does not contribute to the number of template instantiations, which are a major contributor to long compile times.
Conclusion
Using a combination of the preprocessor, constexpr, and template
metaprogramming, we have demonstrated how to turn string literals into
types that can be processed by template metaprograms. The solution
presented in this article has been implemented and is available to
download as part of the
Metaparse
library – a parser construction kit for compile-time strings.
-
For a discussion of legitimate reasons to hate metaprogramming, we suggest a different forum
↩ -
About as straightforward as template metaprogramming gets, anyway ↩
-
This example takes advantage of the fact that the C++ preprocessor can generate code fragments that aren’t syntactically valid on their own. Instead of using nested macro invocations to generate a nested type structure, we simply generate two flat repetitions: one for the opening delimiters and one for the closing delimiters. This is often the most effective way to work with the preprocessor. ↩

I found an elusive bug in your code. In the POST(z, n, u) macro, it should be (n >= sizeof(s)-1) instead of (n >= sizeof(s)). Without this fix, you are appending a ‘\0′ char to the mpl::string which is problematic when you try to query its size (unless its size % 4 == 0) or append another mpl_string to it (in which case it has a null terminator embedded in the string).
I checked the Metaparse library to see if it has this important fix, and it does (https://github.com/sabel83/mpllibs/blob/master/mpllibs/metaparse/string.hpp) but you should fix the code on this page as well. Cheers.
There are a few errors in your pp macros which prevents compiling the demo code on this page… 1) there shouldn’t be a comma after ‘boost::mpl:string<>’ in the _S macro, … 2) third arg to BOOST_PP_REPEAT should be ‘s’ instead of ‘~’, and … 3) third param for PRE/POST macros should be ‘s’ instead of ‘u’.
To be clear, here’s code w/ corrections (which I was able to compile with g++/clang++): http://codepad.org/Vmua2RTh
-etherice
Thank you for the fixes.
When is your next post? I am eagerly awaiting. I check your site every day for updates.
I am glad that my approach more than a year ago wasn’t too far away from this experts solution
http://stackoverflow.com/a/10287598/34509
How embarassing.. I pasted the wrong link
Fixed: http://stackoverflow.com/a/4709240/34509
Building an EDSL with this approach would be neat, but one problem I see is the inability to produce simple error messages. The problem is that you’d likely want to use static_assert for error reporting with the message refering to individual pieces of the string, I.E. “identifiers” in the EDSL. This, unfortunately, cannot be done.
A compile-time parser generator library, Metaparse offers some limited tools for this problem. It assumes that your parser for the EDSL can either successfully parse the code and construct the result or finds an error and constructs an error message (a string for example). It doesn’t break the compilation, even if it finds an error. If you fail to compile your EDSL and the rest of your code tries using it, you’ll probably get an error message and it will probably not tell you anything about your EDSL. But with some special metafunctions, you can generate code that takes the error result returned by the parser and print it to the standard output or standard error at runtime. So you can copy your EDSL into a separate program that does the parsing at compile-time and generate code that prints the error at runtime. You compile this small program and run it to get the error message. You can find an example for this here:
https://github.com/sabel83/mpllibs/blob/master/libs/metaparse/example/parsing_error/main.cpp
If you compile and run it, you get:
Compile-time parsing resultsInput text: aaac
Parsing failed: line 1, col 4: expected: b
This is extremely inconvenient to do this for getting an error message, but at least it offers a way to get a better error message than a long output from the compiler.
Right. What would be really powerful is to be able to use constexpr char arrays as a static_assert message (that’s not that much to ask for standards committee people, is it???). Then you could do all of this and also get pretty much the same quality of error messages that you would expect from a non-embedded language.
This is something I’ve also run into while I was working on error reporting. Error messages can be propagated out of metaprograms, but there is no way I’m aware of to make the compiler display them. Being able to generate error strings for static_assert from a metaprogram could make a difference.
How very “IO monad”-ish of you!
Perhaps I am missing something obvious, but why is it that push_back is used at all here? Why not just have the macro expand out to:
vector_c< char, at(“Hello”,0), at(“Hello”,1), at(“Hello”,2), // etc. with trailing nulls >
This certainly gets rid of a lot of template instantiations and also simplifies the macro definition.
You are right, doing that would be more efficient. However, vector_c (or any other container being used) would have to be prepared for the fact that the trailing nulls don’t belong to the string and it would have to remove or ignore them. This is the same problem why a custom version of push_back had to be developed.
You could solve this without further instantiations by providing a size parameter. string_c< sizeof( “Hello” ) – 1, at(“Hello”,0), at(“Hello”,1), at(“Hello”,2), // etc. with trailing nulls >
Yes, you can do that. One thing to be careful with is that you need to generate multi-character constant values instead of single characters from the constexpr functions or change the template metaprogramming string implementation and prepare it for this thing.
Love the article. You teach me something every time.
And I found an error where it says “have to compile as an ordinary functions” (take out the an).
Thanks for the fix and for the kind words!
In C++98 the size of an array could also be determined at compile-time, without using constexpr, e.g. by
template < typename T, size_t N > char (&array(const T(&)[N]))[N];
int a[100];
double b[sizeof array(a)];
Hi mm,
Yes, even though it is
constexpr, theatfunction above uses that C++98 deduction capability in a way that’s not affected byconstexpr. In fact, the statement in the article about what cripplesconstexprfunctions implies that anything they do has to work withoutconstexpr; it’s only in the usage ofconstexprfunctions that there’s a new capability, because they may return a compile-time constant.On the similar topic of parsing strings at compile-time, I wanted to share a technique that allows, at compile-time, the evaluation of an arithmetic expression (in Reverse Polish notation) encoded in a string literal. No macros required:
http://akrzemi1.wordpress.com/2011/05/11/parsing-strings-at-compile-time-part-i/
http://akrzemi1.wordpress.com/2011/05/20/parsing-strings-at-compile-time-part-ii/
Regards, &rzej
You can parse string literals using constexpr parsers and build small DSLs with them. They are simpler (easier to write and understand later) than parsers processing the metaprogramming strings presented here. However, template metaprogramming parsers seem to be more expressive. For example how could you build a type-safe printf (mentioned in another comment) using constexpr functions?
Yep, you can do stuff like this. What you can’t do without macros, as far as we can tell, is to generate code that depends on the contents of the string.
In practice, I found it useful to automatically stringize macro arguments before passing them to _S.
template <typename String> unspecified-functor-type compile_python_function(); #define PYTHON(...) compile_python_function<_S(#__VA_ARGS__)>() auto factorial = PYTHON( def fact(n): if n == 0: return 1 else return n * fact(n - 1) );Compare with explicitly quoted string literal:
@Roman, I have a lot of multiline code that looks like that
Can you be more specific what is your trick? What is the “unspecified-functor-type”?
unspecified-functor-type is a functor that executes the python function when called. compile_python_function() creates such functors from compile-time strings containing python code.
Here’s how you can format multi-line string literals.
This is equivalent to the following:
std::string code = "\n" " line1\n" " line2\n" ... " more lines\n";If you don’t like the leading new line and whitespaces in the beginning of each line, you can remove them programmatically.
Note that you can’t use this approach with all strings. For example, it won’t work with unbalanced parentheses.
Nifty! Raw string literals would give us another way to handle it, but right now I can’t tell whether they are supposed to interact usefully with the preprocessor. This program is accepted by clang but rejected by gcc and comeau:
I’m sad…
When I first read about the new user defined literals in C++11, especially the raw form, I thought finally we have everything to avoid such hackery like this. So I was little surprised why don’t you use them. And I just learned, the string literals can be processed in cooked form only. I think it’s a big hit and miss, and makes this new feature nearly useless…
I agree, it’s very disappointing. Unfortunately, that restriction on user-defined literals appeared in the proposals between N2282 and N2378, in 2007.
I recently wrote up my experience trying to use C++11 features to do compile-time things with strings and how disappointing it was. I posted it to /r/cpp but unfortunately it never showed up in anyone’s view of /r/cpp so no one ever saw it. It’s a more novice version of this blog post, so I’m glad to see a more thorough treatment of the subject.
http://www.reddit.com/r/cpp/comments/10f7r5/c11_missed_opportunity_strings/
There is a type-safe printf implementation based on these strings (and a template metaprogramming parser-generator library): http://abel.web.elte.hu/mpllibs/safe_printf/index.html Example using it: https://github.com/sabel83/mpllibs/blob/master/libs/safe_printf/example/safe_printf/main.cpp
You may find it interesting.
I find what you’ve managed to do amazing but disappointing that you had to resort to preprocessor macros and jump through so many hoops to get to a usable syntax. C++11 did a lot of good things, but this is an area it fell short. At this point I am most interested in learning how C++ committee members plan on addressing the problems so libraries like mpl don’t have to resort to the preprocessor and maybe don’t even need to exist because writing compile-time parsers with templates and constexpr will be so easy and natural. I want attribute((format(printf, 1, 2))) to be moved from the realm of compiler authors to compiler users.
I always thought that the C++11 standards committee made a really grave error in not special-casing string literals when passed as a template argument. Variadic templates seemed to offer all the support needed for useful string manipulation (hashing, for instance) at compile time.
The fact that we still need this level of hackery to achieve something as simple as compile-time string evaluation in C++ is a bit embarrassing. And with C++11 published the window of opportunity is gone. Any corrections to the behavior will not find widespread compiler support.
I agree with you about everything but the last statement. I think Microsoft in particular will be a long time catching up to the C++11 standard anyway, but once they get there, extensions like the one you propose would be easy and fast to implement, and other vendors already seem to be able to add language features at a pretty good clip.
@bobstevens Not that I object to having variadic-form user defined string literals, but compile-time string hashing is quite doable in today’s C++11 without the need for a single macro.
I have an implementation which enables the following (IMO quite elegant) syntax:
constexpr std::uint32_t id = "Hash me"_hash;Have a look at: https://gist.github.com/3839813
This is pretty cool. It’s annoying that such PP voodoo needs to be resorted to. I’m curious, though, since the post is tagged as “c++11″, wouldn’t variadic templates & macros allowed by C++11 greatly simplify this implementation, without even needing to use Boost?
Also, it may be worth noting there is another way to use strings as template arguments: even in C++03, extern variables may be used as template arguments. (It obviously will not allow for quite the same usage/manipulation as you’re indicating here). But, you can do, as a contrived example:
This approach uses Boost’s compile-time string implementation. It could be replaced by a custom one, which is based on variadic templates – what is important for the string creation/manipulation is that you have a template-metaprogramming sequence implementation to use.
This approach uses Boost’s preprocessor iteration utilities as well. The point there is to reduce manual code repetition and to make it configurable how many times the code is repeated by the preprocessor. I can’t see how variadic macros could help there.
We could make strings template arguments this way and pass them across template metafunctions, but we can’t look into them until runtime. It doesn’t allow any manipulation at compile-time. They are black boxes passed around at compile-time and used at runtime – as your example does it as well. For example, we can’t query the length of the string or have access to the individual characters while we can do these things with this template metaprogram/constexpr/preprocessor metaprogram combination.
You have an error here:
constexpr unsigned factorial(unsigned x)
{
return x == 0 ? 0 : x * factorial(x-1);}
int a[factorial(6)]; // OK in C++11
The above is not valid C++11, because your factorial definition is incorrect and factorial(6) is equivalent to 0, and zero length arrays are still not allowed.
The corrected version:
constexpr unsigned factorial(unsigned x)
{
return x == 0 ? 1 : x * factorial(x-1);}
int a[factorial(6)]; // OK in C++11
Thanks for the fix; applied!