Expressive C++: Playing with Syntax

This entry is part of a series, Expressive C++»

Welcome to the second in a series of articles devoted Domain-Specific Embedded Languages in C++. In the intro I discussed the benefits of DSELs (“dee-sells”), and talked up the possibilities offered by Boost.Proto, a DSEL compiler-construction toolkit. I also threw in examples like Boost.Spirit, a large and complex DSEL that approximates EBNF in C++.

After all that talk, you might be forgiven for thinking that all DSELs are big and complicated, and that this whole DSEL thing is not very practical. Not so. In this article, I’ll step through the design and implementation of a fairly simple DSEL. By the end, you should be able to write your own small DSELs that are actually useful.

Mad Libs1 Formatting

Here’s a little problem. You have a Mad Libs-like string formatting API that takes a format string like "There are more things in {place}, Horatio, than are dreamt of in your {thing}.\n" that contains placeholders, and a std::map containing substitutions.

// Declare and initialize a map containing substitutions
std::map<std::string, std::string> subst;
subst["place"] = "heaven and earth";
subst["thing"] = "philosophy";
 
// Prints: There are more things in heaven and earth, Horatio,
//         than are dreamt of in your philosophy.
std::cout << format( "There are more things in {place}, Horatio, "
                     "than are dreamt of in your {thing}.\n", subst );

Calling this API is a pain because declaring and filling a std::map is a multi-line proposition. Recognizing that laziness is one of the traits of a great programmer2, you’d like to change this multi-liner into a one-liner. It’s easily done, and the best part is: you get to design your own language.

DSEL Language Design

Now comes the fun part: getting to decide what you mini-language looks like. The first step is to understand the peculiar rules of C++ expression-based DSELs: your language must consist of valid C++ expressions. You can only use C++ identifiers, functions and operators, and your language must obey the precendence and associativity rules of C++. So for instance, as nice as it may be, this is right out:

// WRONG: not a valid C++ expression
format( "There are...", {"place": "heaven and earth",
                         "thing": "philosophy"} );

These however, are valid and reasonable approaches (for some appropriately defined map object):

// OK, valid C++ expressions:
format( "There are...", map("place", "heaven and earth")
                           ("thing", "philosophy") );
 
format( "There are...", map["place"]("heaven and earth")
                           ["thing"]("philosophy") );
 
format( "There are...", map["place"] = "heaven and earth" +
                        map["thing"] = "philosophy" );
 
format( "There are...", (map["place"] = "heaven and earth",
                         map["thing"] = "philosophy") );

Notice how in each case, we’ve used expressions that obey the syntax of the C++ language. Take a moment to think about the operator overloads that would make each expression compile. For instance, the first expression requires the map object to have an overloaded function call operator that takes two arguments and returns something that itself has a function call operator, etc.

The last one with the overloaded comma operator is interesting. Notice how I had to wrap the expression in an extra set of parentheses. The comma’s dual nature as an operator and as a separator in argument lists makes it persnickety to work with in DSELs.For Extra Credit»

Operators Are Standing By

Perhaps you’re thinking, “How will I even get that to compile?” That’s actually the easy part, and it’s where Proto shines. With just the following, the above code compiles and runs (and does nothing):

#include <string>
#include <boost/proto/proto.hpp>
 
struct map_ {};
boost::proto::terminal<map_>::type map = {};
 
template< class Expr >
void format( std::string fmt, Expr const & expr )
{}
 
int main()
{
    // OK, valid C++ expressions:
    format( "There are...", map("place", "heaven and earth")
                               ("thing", "philosophy") );
 
    format( "There are...", map["place"]("heaven and earth")
                               ["thing"]("philosophy") );
 
    format( "There are...", map["place"] = "heaven and earth" +
                            map["thing"] = "philosophy" );
 
    format( "There are...", (map["place"] = "heaven and earth",
                             map["thing"] = "philosophy") );
}

We included a header, defined a symbol and that’s it. Proto defines operator overloads so that any Proto expression, like the map terminal above, can be combined into larger Proto expression trees. You get that for free.

Heterogeneous Trees

The expression trees Proto builds are not what a Java programmer or an OO advocate might expect. There are no dynamic allocations, no abstract base classes, and no virtual member functions. It is a heterogeneous tree; each node has full static type information about what operation created it and what the types of its child nodes are. It is also built without any copying, which makes it essentially free at runtime (assuming the optimizer is doing its job).

Visualizing Expression Trees

Let’s take a closer look at two of the expressions above and view the structure of the tree Proto builds for us. Here are the trees for the last two expressions, the one with the comma operator, and the one with plus:

Figure 1: Comparison of two expression trees

Wow, what a different an operator makes! The expression with the comma gives a very sensible tree where each placeholder is near its associated substitution string. That will make the tree easier to interpret as a map later. The expression with the plus has a totally different structure where the placeholder "place" is nowhere near its substitution string "heaven and earth" (which is what you’d need to move to make any use of this expression). The fact that the substitution node doesn’t have a simple relative location to its associated placeholder would make it very difficult to interpret this tree as a map. Kudos»

The reason for the difference is the precedence of the + operator. It has higher precedence than the = operator, so that (a = b + c = d) groups as (a = (b + c) = d). The comma operator has very low precedence, so (a = b, c = d) groups as ((a = b), (c = d)). That’s more like what we want. The lesson is that the rules of C++ are fixed and immutable; ignore them at your own peril!

Boost.Proto gives you a handy tool for exploring the structure of expressions. Just pass them to display_expr to pretty-print them and get a view of their structure. For instance, this:

proto::display_expr( (map["place"]="heaven and earth", map["thing"]="philosophy") );

writes the following to std::cout3:

comma(
    assign(
        subscript(
            terminal(struct map_)
          , terminal(place)
        )
      , terminal(heaven and earth)
    )
  , assign(
        subscript(
            terminal(struct map_)
          , terminal(thing)
        )
      , terminal(philosophy)
    )
)

The upshot is that Boost.Proto makes it painless to explore the design space for your domain-specific language. Experiment and have fun, commitment-free.

Walk a Tree, Build a Map

We’ve explored our design space, and it’s time to settle on a syntax for our little language. Since the goal of this exercise was to save typing, let’s go with the most concise syntax:

format( "There are...", map("place", "heaven and earth")
                           ("thing", "philosophy") );

If we pass the map expression to proto::display_expr, we see something like this:

function(
    function(
        terminal(struct map_)
      , terminal(place)
      , terminal(heaven and earth)
    )
  , terminal(thing)
  , terminal(philosophy)
)

That’s really a very simple tree; there are only two non-terminal nodes and they’re both function call invocations. These nodes each have three child nodes: the “function”, the key, and the value. The “function” in this case is either the map terminal or another function node, making this a recursive data structure.

The natural way to process a recursive data structure is with recursive functions. We can easily walk this tree and fill in a std::map as we go. The following two functions do just that. It is explained below:

typedef std::map<std::string, std::string> string_map;
 
// Recursive function used to fill the map
template< class Expr >
void fill_map( Expr const & expr, string_map & subs )
{
    using boost::proto::value;      // read a value from a terminal
    using boost::proto::child_c;    // get the Nth child of a non-terminal
    subs[ value(child_c<1>(expr)) ] = value(child_c<2>(expr));
    fill_map( child_c<0>(expr), subs );
}
 
// The 'map' terminal ends the recursion
void fill_map( boost::proto::terminal<map_>::type const &, string_map & )
{}

We’ll use fill_map by passing it both the expression tree and a std::map to fill. It works by peeling off one layer of the tree at a time, using the two terminals at each layer to insert a new key/value pair into the map.

Fill_map introduces two of Proto’s accessor functions: value and child_c. Value is used to access the value in a Proto terminal node. Child_c retreives the Nth child of a non-terminal.

Putting the Pieces Together

Now that we have a way to walk the tree and build a std::map, the rest is cake. We simply offer a new format API that forwards to the old one and voila! we’re done. For those keeping score, the completed solution looks like this:

////////////////////////////////////////////////////////////////////////////
//
// A simple DSEL for creating map-like associations
//
////////////////////////////////////////////////////////////////////////////
 
#include <map>
#include <string>
#include <iostream>
#include <boost/proto/proto.hpp>
#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>
 
struct map_ {};
boost::proto::terminal<map_>::type map = {};
 
typedef std::map<std::string, std::string> string_map;
 
// Recursive function used to fill the map
template< class Expr >
void fill_map( Expr const & expr, string_map & subs )
{
    using boost::proto::value;      // read a value from a terminal
    using boost::proto::child_c;    // get the Nth child of a non-terminal
    subs[ value(child_c<1>( expr )) ] = value(child_c<2>(expr));
    fill_map( child_c<0>(expr), subs );
}
 
// The 'map' terminal ends the recursion
void fill_map( boost::proto::terminal<map_>::type const &, string_map & )
{}
 
// The old format API that accepts a map of string substitutions
std::string format( std::string fmt, string_map & subs )
{
    namespace xp = boost::xpressive;
    using namespace xp;
    sregex const rx = '{' >> (s1= +_w) >> '}';        // like "{(\\w+)}"
    return regex_replace(fmt, rx, xp::ref(subs)[s1]);
}
 
// The new format API that forwards to the old one
template< class Expr >
std::string format( std::string fmt, Expr const & expr )
{
    string_map subs;
    fill_map( expr, subs );
    return format( fmt, subs );
}
 
int main()
{
    std::cout << format("There are more things in {place}, Horatio, "
                        "than are dreamt of in your {thing}.\n"
                      , map("place", "heaven and earth")
                           ("thing", "philosophy") );
}

As expected, the code prints out:

There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy.

What’s boost::xpressive?

I snuck some extra code in there to see if you were paying attention. It’s Boost.Xpressive, a regular expression and string manipulation library. The first format overload builds a regular expression:

sregex const rx = '{' >> (s1= +_w) >> '}';  // like "{(\\w+)}"

The next line passes the regex to regex_replace to actually do the substitution:

return regex_replace(fmt, rx, xp::ref(subs)[s1]);

The third parameter is a kind of anonymous substitution function. It says: take the part of the string that matched the pattern +_w (a whole word), use it to index into the subs map, and use the result as the substitution string. xp::ref is needed to delay the evaluation of the index operation until after the match has occured. The regex and the anonymous function are from two different DSELs, but notice how they collaborate with each other through the shared use of the s1 placeholder. This kind of DSEL composition is one of the most powerful and exciting features of DSELs.

Counting the DSEL we just defined, this source code uses a total of 3 DSELs. They’re all built with Proto.

Conclusions and What’s to Come

In this article, we built a little DSEL for expressing map-like relationships inline. With the help of Proto, it was pretty easy. In fact, one could argue that the term “DSEL” doesn’t even apply to such a simple library interface. It’s true: the boundary between DSELs and merely clever library interfaces is a fuzzy one. But even for the most borderline cases like this, Proto comes with a collection of tools to make the job easier.

In the next article, we’ll firm up the specification of this library interface, add some parameter checking, and use some nifty techniques to improve template error reporting.

Until next time!


  1. Mad Libs is a registered trademark of Penguin Group (USA) Inc. 

  2. “We will encourage you to develop the three great virtues of a programmer: laziness, impatience, and hubris.”, Larry Wall, Programming Perl (1st edition), O’Reilly And Associates. 

  3. Tested on Visual C++ 10 2010 

One of these expressions is not like the others. Think about operator precedence. Can you spot the odd man out?Powered by Hackadelic Sliding Notes 1.6.5
If you took the “Extra Credit” quiz above and correctly guessed that the expression with the plus was not like the others, give yourself a pat on the back. Because of its structure, it’s largely useless for our purposes.Powered by Hackadelic Sliding Notes 1.6.5
Posted Saturday, September 11th, 2010 under Boost.

27 Responses to “Expressive C++: Playing with Syntax”

  1. Irit Katriel says:

    If you don’t know the arity of each expression, so that this is valid:

    format( “There are…”, map(“place”, “heaven “, “and “, “earth”) (“thing”, “philosophy”) );

    and you want to implement fill_map so that it maps the first string to the concatenation of the rest.

    Is that possible? I am trying to do something similar with proto_arity, but not sure how.

    Thank you.

      Quote
  2. Michal Mocny says:

    I likely have done something wrong, but to compile your example of display_expr(…) I needed to define an ostream& operator<<( ostream&, map_ const & );

    Perhaps newer versions of boost do not need them (though you mentioned it has been in since 1.37), or perhaps it is just my compiler (g++4.4).

    I continue to enjoy this series :)

      Quote
    • Eric Niebler says:

      Yes, newer versions of Boost have dropped that requirement. If Proto can’t find an appropriate ostream insertion operator for a terminal type T, it prints the result of typeid(T).name(). That’s new. I can’t remember exactly when I added that. Sorry for setting you up for that fall.

        Quote
      • Maxim Yanchenko says:

        There is an opposite problem now (at least with Boost 1.44.0).

        Consider this:

        struct A {};
        struct B:A {};
        std::ostream& operator<<( std::ostream& out, const A& )
          { return out << "this is A!"; }
        

        If you do

        proto::display_expr( map( A(), 1 ) );

        it works, but if you do

        proto::display_expr( map( B(), 1 ) );

        it doesn't compile due to ambiguity introduced by hidden_detail_::operator<< that does this nice typeid output.

        Should probably be easy to fix with yes/no ellipsis technique… Maybe it's already fixed in 1.45.0?

          Quote
  3. David Sankel says:
    In the intro I discussed the benefits of DSELs (“dee-sells”) . . .

    Considering there’s industry precedent for dee-ess-ee-el (or more commonly EDSL, letters pronounced), and the possible verbal confusion with DSLs, I’d like to see the “dee-sells” pronunciation stemmed before it becomes more widespread.

      Quote
  4. aoeu says:

    Or you can just use String.Format from .Net

      Quote
  5. Amit Kumar says:

    Wonderful series. I would be interested to understand how xp::ref is used in the implementation to delay the index operation.

      Quote
    • Eric Niebler says:

      There’s nothing magical about xp::ref—it just creates a lazy reference so that any operations on it are delayed until the lambda is evaluated. It’s just like var from the Boost Lambda Library, if you’re familiar with that. All this will become more clear when I get around to talking about the design an implementation of Phoenix3.

        Quote
  6. Marc Glisse says:

    Hello,

    I just wanted to encourage you to continue this series :-)

      Quote
  7. Dave Jenkins says:

    I don’t see the point of constructing the map inside the format call. You can’t reuse it in other format calls when you do that. You might as well skip the map and embed the actual arguments in the format string. Of course, you could construct the std::map separately which leads to a different question…

    Since all the strings are known at compile time, can you make the map construction and lookup happen entirely at compile time? Instead of std::map, you would use the Proto expression tree. Then in the format, can you search the Proto tree at compile time and return the correct string literal?

      Quote
    • Eric Niebler says:

      It’s not hard to imagine that you might use something like this for replacing "{USER}" and "{HOME}" with strings read from environment variables. So in general, the substitution strings would not be known at compile time, and the inline map associations could actually be useful.

      The lookup cannot be done entirely statically, though, because you can’t evaluate string comparisons at compile time.

        Quote
      • Dave Jenkins says:

        Thanks for writing this excellent series, Eric. Proto is an amazing software library.

        There are two things I like about DSELs: (1) their ability to express a solution clearly and concisely and (2) their potential to improve run-time performance. It’s the performance improvement aspect that I was trying to get at with my question.

        I know DSELs are great for improving run-time performance by reorganizing computations, like matrix operations, so they run faster. But I’m not so sure about DSELs ability to shift static computations from run-time to compile-time. Do you see much potential for compile-time computation using DSELs or are language/compiler limitations, such as no compile-time string comparisons, too big a problem? Could you use your quirky but brilliant mpl::string to do the compile-time string comparisons for the map lookup problem?

          Quote
  8. Mike says:

    This is one of the reasons I loathe C++. When you’re looking at a piece of code, you have no idea what you’re really looking at, and end up having to trawl through 17 include files to try to work out the meaning of some wonderful new syntax. Ugh.

      Quote
    • Eric Niebler says:

      I won’t argue with you—different strokes—but there are plenty of people who would actually prefer to work with a syntax that more closely matches their problem domain.

        Quote
      • Tim H says:

        I’ve done most of these tricks and more to make an embedded DSL. The code is horrible to maintain and still doesn’t really give me the expressiveness I want. So I wrote a proper grammar instead and it turns out, I had forgotten just how easy lex and yacc really are.

        IMHO: DSLs like this are fine if you have a very small “language” to express and if your users are not producing large volumes of this “fake” language. If you need to go beyond simple expressions like map() etc, you’re better off defining (or using) a language that actually captures your requirements.

        Case in point : boost.lambda. It’s cute for very small things, but it’s frigging impossible to use in a larger context. And hence, C++1x will give us real lambdas.

          Quote
        • Eric Niebler says:

          Proto lets you write proper grammars for your DSELs and more. I haven’t shown that yet. Soon. I developed Proto because I was writing and maintaining one of the horrible, ad-hoc DSEL you’re speaking of, and I agree, it’s a mess. Proto really makes this much easier.

          As for Boost.Lambda—it’s very old technology, to be replaced soon with Boost.Phoenix3, which is being built with Proto. Time will tell, but I think it’ll make lambdas more usable and bring other benefits besides those in C++0x.

            Quote
          • Tim H says:

            I’d love to see more examples of C++ DS(E)Ls that don’t have silly artificial hoops put in place just to dodge C++ rules (and while macros solve some of the hoops, yuck!)

              Quote
        • Gary Powell says:

          boost lambda was done in part to prove “that it could be done” and that C++ needed real lambdas, and if there were real lambdas how handy they would be.

            Quote
      • I won’t argue with you

        Well, I’ll argue with Mike :-) . This is where the declarative nature of most DSLs becomes important. See for example, Boost.Python’s operator binding syntax. In this case you don’t want to know “what’s actually going on,” which is all you’re going to learn from trawling headers. Mike even said it: you want to know what the code means, and a well-designed DSL says what it means on its face. Yes, you may need to learn the library’s idioms, but that’s what documentation is for. Trying to understand the use of a library by reading its implementation like trying to learn C++ by reading the implementation of GCC.

        Boost.MSM’s eUML is another interesting example.

          Quote
  9. Joel Falcou says:

    Great piece again. Why didn’t you go with the map_list_of example from the boostcon slides. I found it more easy to follow.

      Quote

Leave a Comment (post replies using links below individual comments)