Welcome to the second in a series of articles devoted Domain-Specific Embedded Languages in C++. In the intro I discussed the benefits of DSELs (“dee-sells”), and talked up the possibilities offered by Boost.Proto, a DSEL compiler-construction toolkit. I also threw in examples like Boost.Spirit, a large and complex DSEL that approximates EBNF in C++.
After all that talk, you might be forgiven for thinking that all DSELs are big and complicated, and that this whole DSEL thing is not very practical. Not so. In this article, I’ll step through the design and implementation of a fairly simple DSEL. By the end, you should be able to write your own small DSELs that are actually useful.
Mad Libs1 Formatting
Here’s a little problem. You have a Mad Libs-like string formatting API that takes a format string like "There are more things in {place}, Horatio, than are dreamt of in your {thing}.\n" that contains placeholders, and a std::map containing substitutions.
// Declare and initialize a map containing substitutions std::map<std::string, std::string> subst; subst["place"] = "heaven and earth"; subst["thing"] = "philosophy"; // Prints: There are more things in heaven and earth, Horatio, // than are dreamt of in your philosophy. std::cout << format( "There are more things in {place}, Horatio, " "than are dreamt of in your {thing}.\n", subst );
Calling this API is a pain because declaring and filling a std::map is a multi-line proposition. Recognizing that laziness is one of the traits of a great programmer2, you’d like to change this multi-liner into a one-liner. It’s easily done, and the best part is: you get to design your own language.
DSEL Language Design
Now comes the fun part: getting to decide what you mini-language looks like. The first step is to understand the peculiar rules of C++ expression-based DSELs: your language must consist of valid C++ expressions. You can only use C++ identifiers, functions and operators, and your language must obey the precendence and associativity rules of C++. So for instance, as nice as it may be, this is right out:
// WRONG: not a valid C++ expression format( "There are...", {"place": "heaven and earth", "thing": "philosophy"} );
These however, are valid and reasonable approaches (for some appropriately defined map object):
// OK, valid C++ expressions: format( "There are...", map("place", "heaven and earth") ("thing", "philosophy") ); format( "There are...", map["place"]("heaven and earth") ["thing"]("philosophy") ); format( "There are...", map["place"] = "heaven and earth" + map["thing"] = "philosophy" ); format( "There are...", (map["place"] = "heaven and earth", map["thing"] = "philosophy") );
Notice how in each case, we’ve used expressions that obey the syntax of the C++ language. Take a moment to think about the operator overloads that would make each expression compile. For instance, the first expression requires the map object to have an overloaded function call operator that takes two arguments and returns something that itself has a function call operator, etc.
The last one with the overloaded comma operator is interesting. Notice how I had to wrap the expression in an extra set of parentheses. The comma’s dual nature as an operator and as a separator in argument lists makes it persnickety to work with in DSELs.For Extra Credit»
Operators Are Standing By
Perhaps you’re thinking, “How will I even get that to compile?” That’s actually the easy part, and it’s where Proto shines. With just the following, the above code compiles and runs (and does nothing):
#include <string> #include <boost/proto/proto.hpp> struct map_ {}; boost::proto::terminal<map_>::type map = {}; template< class Expr > void format( std::string fmt, Expr const & expr ) {} int main() { // OK, valid C++ expressions: format( "There are...", map("place", "heaven and earth") ("thing", "philosophy") ); format( "There are...", map["place"]("heaven and earth") ["thing"]("philosophy") ); format( "There are...", map["place"] = "heaven and earth" + map["thing"] = "philosophy" ); format( "There are...", (map["place"] = "heaven and earth", map["thing"] = "philosophy") ); }
We included a header, defined a symbol and that’s it. Proto defines operator overloads so that any Proto expression, like the map terminal above, can be combined into larger Proto expression trees. You get that for free.
Heterogeneous Trees
The expression trees Proto builds are not what a Java programmer or an OO advocate might expect. There are no dynamic allocations, no abstract base classes, and no virtual member functions. It is a heterogeneous tree; each node has full static type information about what operation created it and what the types of its child nodes are. It is also built without any copying, which makes it essentially free at runtime (assuming the optimizer is doing its job).
Visualizing Expression Trees
Let’s take a closer look at two of the expressions above and view the structure of the tree Proto builds for us. Here are the trees for the last two expressions, the one with the comma operator, and the one with plus:
Wow, what a different an operator makes! The expression with the comma gives a very sensible tree where each placeholder is near its associated substitution string. That will make the tree easier to interpret as a map later. The expression with the plus has a totally different structure where the placeholder "place" is nowhere near its substitution string "heaven and earth" (which is what you’d need to move to make any use of this expression). The fact that the substitution node doesn’t have a simple relative location to its associated placeholder would make it very difficult to interpret this tree as a map. Kudos»
The reason for the difference is the precedence of the + operator. It has higher precedence than the = operator, so that (a = b + c = d) groups as (a = (b + c) = d). The comma operator has very low precedence, so (a = b, c = d) groups as ((a = b), (c = d)). That’s more like what we want. The lesson is that the rules of C++ are fixed and immutable; ignore them at your own peril!
Boost.Proto gives you a handy tool for exploring the structure of expressions. Just pass them to display_expr to pretty-print them and get a view of their structure. For instance, this:
proto::display_expr( (map["place"]="heaven and earth", map["thing"]="philosophy") );
writes the following to std::cout3:
comma( assign( subscript( terminal(struct map_) , terminal(place) ) , terminal(heaven and earth) ) , assign( subscript( terminal(struct map_) , terminal(thing) ) , terminal(philosophy) ) )
The upshot is that Boost.Proto makes it painless to explore the design space for your domain-specific language. Experiment and have fun, commitment-free.
Walk a Tree, Build a Map
We’ve explored our design space, and it’s time to settle on a syntax for our little language. Since the goal of this exercise was to save typing, let’s go with the most concise syntax:
format( "There are...", map("place", "heaven and earth") ("thing", "philosophy") );
If we pass the map expression to proto::display_expr, we see something like this:
function( function( terminal(struct map_) , terminal(place) , terminal(heaven and earth) ) , terminal(thing) , terminal(philosophy) )
That’s really a very simple tree; there are only two non-terminal nodes and they’re both function call invocations. These nodes each have three child nodes: the “function”, the key, and the value. The “function” in this case is either the map terminal or another function node, making this a recursive data structure.
The natural way to process a recursive data structure is with recursive functions. We can easily walk this tree and fill in a std::map as we go. The following two functions do just that. It is explained below:
typedef std::map<std::string, std::string> string_map; // Recursive function used to fill the map template< class Expr > void fill_map( Expr const & expr, string_map & subs ) { using boost::proto::value; // read a value from a terminal using boost::proto::child_c; // get the Nth child of a non-terminal subs[ value(child_c<1>(expr)) ] = value(child_c<2>(expr)); fill_map( child_c<0>(expr), subs ); } // The 'map' terminal ends the recursion void fill_map( boost::proto::terminal<map_>::type const &, string_map & ) {}
We’ll use fill_map by passing it both the expression tree and a std::map to fill. It works by peeling off one layer of the tree at a time, using the two terminals at each layer to insert a new key/value pair into the map.
Fill_map introduces two of Proto’s accessor functions: value and child_c. Value is used to access the value in a Proto terminal node. Child_c retreives the Nth child of a non-terminal.
Putting the Pieces Together
Now that we have a way to walk the tree and build a std::map, the rest is cake. We simply offer a new format API that forwards to the old one and voila! we’re done. For those keeping score, the completed solution looks like this:
//////////////////////////////////////////////////////////////////////////// // // A simple DSEL for creating map-like associations // //////////////////////////////////////////////////////////////////////////// #include <map> #include <string> #include <iostream> #include <boost/proto/proto.hpp> #include <boost/xpressive/xpressive.hpp> #include <boost/xpressive/regex_actions.hpp> struct map_ {}; boost::proto::terminal<map_>::type map = {}; typedef std::map<std::string, std::string> string_map; // Recursive function used to fill the map template< class Expr > void fill_map( Expr const & expr, string_map & subs ) { using boost::proto::value; // read a value from a terminal using boost::proto::child_c; // get the Nth child of a non-terminal subs[ value(child_c<1>( expr )) ] = value(child_c<2>(expr)); fill_map( child_c<0>(expr), subs ); } // The 'map' terminal ends the recursion void fill_map( boost::proto::terminal<map_>::type const &, string_map & ) {} // The old format API that accepts a map of string substitutions std::string format( std::string fmt, string_map & subs ) { namespace xp = boost::xpressive; using namespace xp; sregex const rx = '{' >> (s1= +_w) >> '}'; // like "{(\\w+)}" return regex_replace(fmt, rx, xp::ref(subs)[s1]); } // The new format API that forwards to the old one template< class Expr > std::string format( std::string fmt, Expr const & expr ) { string_map subs; fill_map( expr, subs ); return format( fmt, subs ); } int main() { std::cout << format("There are more things in {place}, Horatio, " "than are dreamt of in your {thing}.\n" , map("place", "heaven and earth") ("thing", "philosophy") ); }
As expected, the code prints out:
There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy.
What’s boost::xpressive?
I snuck some extra code in there to see if you were paying attention. It’s Boost.Xpressive, a regular expression and string manipulation library. The first format overload builds a regular expression:
sregex const rx = '{' >> (s1= +_w) >> '}'; // like "{(\\w+)}"
The next line passes the regex to regex_replace to actually do the substitution:
return regex_replace(fmt, rx, xp::ref(subs)[s1]);
The third parameter is a kind of anonymous substitution function. It says: take the part of the string that matched the pattern +_w (a whole word), use it to index into the subs map, and use the result as the substitution string. xp::ref is needed to delay the evaluation of the index operation until after the match has occured. The regex and the anonymous function are from two different DSELs, but notice how they collaborate with each other through the shared use of the s1 placeholder. This kind of DSEL composition is one of the most powerful and exciting features of DSELs.
Counting the DSEL we just defined, this source code uses a total of 3 DSELs. They’re all built with Proto.
Conclusions and What’s to Come
In this article, we built a little DSEL for expressing map-like relationships inline. With the help of Proto, it was pretty easy. In fact, one could argue that the term “DSEL” doesn’t even apply to such a simple library interface. It’s true: the boundary between DSELs and merely clever library interfaces is a fuzzy one. But even for the most borderline cases like this, Proto comes with a collection of tools to make the job easier.
In the next article, we’ll firm up the specification of this library interface, add some parameter checking, and use some nifty techniques to improve template error reporting.
Until next time!
- Expressive C++: Introduction
- Expressive C++: Playing with Syntax
- Expressive C++: Why Template Errors Suck and What You Can Do About It
- Expressive C++: A Lambda Library in 30 Lines (Part 1 of 2)
- Expressive C++: A Lambda Library in 30 Lines (Part 2 of 2)
- Expressive C++: Fun With Function Composition
- Expressive C++: Trouble With Tuples
- Expressive C++: Expression Optimization


If you don’t know the arity of each expression, so that this is valid:
format( “There are…”, map(“place”, “heaven “, “and “, “earth”) (“thing”, “philosophy”) );
and you want to implement fill_map so that it maps the first string to the concatenation of the rest.
Is that possible? I am trying to do something similar with proto_arity, but not sure how.
Thank you.
I likely have done something wrong, but to compile your example of display_expr(…) I needed to define an ostream& operator<<( ostream&, map_ const & );
Perhaps newer versions of boost do not need them (though you mentioned it has been in since 1.37), or perhaps it is just my compiler (g++4.4).
I continue to enjoy this series
Yes, newer versions of Boost have dropped that requirement. If Proto can’t find an appropriate
ostreaminsertion operator for a terminal typeT, it prints the result oftypeid(T).name(). That’s new. I can’t remember exactly when I added that. Sorry for setting you up for that fall.There is an opposite problem now (at least with Boost 1.44.0).
Consider this:
struct A {}; struct B:A {}; std::ostream& operator<<( std::ostream& out, const A& ) { return out << "this is A!"; }If you do
it works, but if you do
it doesn't compile due to ambiguity introduced by hidden_detail_::operator<< that does this nice typeid output.
Should probably be easy to fix with yes/no ellipsis technique… Maybe it's already fixed in 1.45.0?
Hmm, wasn’t that simple as operators can’t have an ellipsis parameter…
Anyway, it’s solved, the patch for boost/proto/debug.hpp against 1.44.0 (works for me, gcc4.4) and last revision 66440 (not tried to compile, just by analogy) is here: https://svn.boost.org/trac/boost/ticket/4910
Please let me know if it works for you.
Thanks, Maxim! Sorry for the delay. I ended up going with a simpler fix. You can see it here. If the tests look good, this will be in 1.46.
Considering there’s industry precedent for dee-ess-ee-el (or more commonly EDSL, letters pronounced), and the possible verbal confusion with DSLs, I’d like to see the “dee-sells” pronunciation stemmed before it becomes more widespread.
Okay, so let’s just settle this once and for all. Should we use “EDSL,” despite the fact that my book uses “DSEL?” Seems like a good idea to me to go with the most common usage.
I’m in favor of using EDSL exclusively. On the other hand, I do recognize the cuteness of embedding the E between the S and L.
Let’s make everybody happy and call it Edsel. Kidding. I blow with the wind. EDSL it shall be from here on out.
Or you can just use String.Format from .Net
Wonderful series. I would be interested to understand how xp::ref is used in the implementation to delay the index operation.
There’s nothing magical about
xp::ref—it just creates a lazy reference so that any operations on it are delayed until the lambda is evaluated. It’s just likevarfrom the Boost Lambda Library, if you’re familiar with that. All this will become more clear when I get around to talking about the design an implementation of Phoenix3.Hello,
I just wanted to encourage you to continue this series
I don’t see the point of constructing the map inside the format call. You can’t reuse it in other format calls when you do that. You might as well skip the map and embed the actual arguments in the format string. Of course, you could construct the std::map separately which leads to a different question…
Since all the strings are known at compile time, can you make the map construction and lookup happen entirely at compile time? Instead of std::map, you would use the Proto expression tree. Then in the format, can you search the Proto tree at compile time and return the correct string literal?
It’s not hard to imagine that you might use something like this for replacing
"{USER}"and"{HOME}"with strings read from environment variables. So in general, the substitution strings would not be known at compile time, and the inline map associations could actually be useful.The lookup cannot be done entirely statically, though, because you can’t evaluate string comparisons at compile time.
Thanks for writing this excellent series, Eric. Proto is an amazing software library.
There are two things I like about DSELs: (1) their ability to express a solution clearly and concisely and (2) their potential to improve run-time performance. It’s the performance improvement aspect that I was trying to get at with my question.
I know DSELs are great for improving run-time performance by reorganizing computations, like matrix operations, so they run faster. But I’m not so sure about DSELs ability to shift static computations from run-time to compile-time. Do you see much potential for compile-time computation using DSELs or are language/compiler limitations, such as no compile-time string comparisons, too big a problem? Could you use your quirky but brilliant mpl::string to do the compile-time string comparisons for the map lookup problem?
This is one of the reasons I loathe C++. When you’re looking at a piece of code, you have no idea what you’re really looking at, and end up having to trawl through 17 include files to try to work out the meaning of some wonderful new syntax. Ugh.
I won’t argue with you—different strokes—but there are plenty of people who would actually prefer to work with a syntax that more closely matches their problem domain.
I’ve done most of these tricks and more to make an embedded DSL. The code is horrible to maintain and still doesn’t really give me the expressiveness I want. So I wrote a proper grammar instead and it turns out, I had forgotten just how easy lex and yacc really are.
IMHO: DSLs like this are fine if you have a very small “language” to express and if your users are not producing large volumes of this “fake” language. If you need to go beyond simple expressions like map() etc, you’re better off defining (or using) a language that actually captures your requirements.
Case in point : boost.lambda. It’s cute for very small things, but it’s frigging impossible to use in a larger context. And hence, C++1x will give us real lambdas.
Proto lets you write proper grammars for your DSELs and more. I haven’t shown that yet. Soon. I developed Proto because I was writing and maintaining one of the horrible, ad-hoc DSEL you’re speaking of, and I agree, it’s a mess. Proto really makes this much easier.
As for Boost.Lambda—it’s very old technology, to be replaced soon with Boost.Phoenix3, which is being built with Proto. Time will tell, but I think it’ll make lambdas more usable and bring other benefits besides those in C++0x.
I’d love to see more examples of C++ DS(E)Ls that don’t have silly artificial hoops put in place just to dodge C++ rules (and while macros solve some of the hoops, yuck!)
boost lambda was done in part to prove “that it could be done” and that C++ needed real lambdas, and if there were real lambdas how handy they would be.
Well, I’ll argue with Mike
. This is where the declarative nature of most DSLs becomes important. See for example, Boost.Python’s operator binding syntax. In this case you don’t want to know “what’s actually going on,” which is all you’re going to learn from trawling headers. Mike even said it: you want to know what the code means, and a well-designed DSL says what it means on its face. Yes, you may need to learn the library’s idioms, but that’s what documentation is for. Trying to understand the use of a library by reading its implementation like trying to learn C++ by reading the implementation of GCC.
Boost.MSM’s eUML is another interesting example.
Great piece again. Why didn’t you go with the map_list_of example from the boostcon slides. I found it more easy to follow.
This essentially is the map_list_of example, with the simplification that no implicit conversion to
std::mapis needed. That means I can put off explaining expression extension for now.I kinda found the similarity and didn’t think of the point you raised.