Expressive C++: A Lambda Library in 30 Lines (Part 2 of 2)

This entry is part of a series, Expressive C++»

Welcome to the latest installment of Expressive C++, a series of articles devoted Embedded Domain-Specific Languages (EDSLs) and Boost.Proto, a library for implementing them in C++. In the last installment, we started developing a very simple library for creating inline, anonymous function objects: lambdas. In this installment we’ll finish the job. By the end of this article, you’ll know how make your EDSL come alive by giving expressions domain-specific behaviors. When we’re done, we’ll have a very tiny lambda library that is actually useful.

Quick Recap

Last week we wrote out the very beginnings of a lambda library. Since we’ll be referring back to it, I’ll reproduce the complete example here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <cassert>
#include <boost/proto/proto.hpp>
using namespace boost;
 
struct arg1_tag {};
proto::terminal<arg1_tag>::type const arg1 = {};
 
// A Proto "algorithm": a grammar with embedded transforms for
// evaluating lambda expressions (explained in the previous article):
struct Lambda
  : proto::or_<
        // When evaluating a placeholder terminal, return the state.
        proto::when< proto::terminal<arg1_tag>, proto::_state >
        // Otherwise, do the "default" thing.
      , proto::otherwise< proto::_default< Lambda > >
    >
{};
 
int main()
{
    // Evaluate the lambda arg1 + 42, replacing arg1 with 1
    int i = Lambda()( arg1 + 42, 1 );
    assert( i == 43 );
}

With this simple program, we can evaluate arbitrarily complicated expression trees involving the arg1 placeholder. Not too bad for such a small program.

But as a lambda library, this solution leaves much to be desired. The biggest problem is that the expression arg1 + 42 is a dumb tree with no particular meaning; it just sits there until it is passed to the Lambda algorithm for evaluation. It’d be way better if we could evaluate the expression tree by passing arguments to the lambda directly, like (arg1 + 42)(1). That way, we could pass arg1 + 42 to algorithms like std::transform. We’ll see how in a minute; but first, I have to head off some potential confusion.

Expressions: What’s In A Name?

I’ve been carelessly throwing this word “expression” around a lot. “Expression” can mean many different things, and its important to keep it all straight, so forgive me while I make a brief digression. At the lowest level is syntactic conformance to the rules for valid C++ expressions. 1 << 8 is a C++ expression in the trivial sense that a C++ compiler can parse it. Here, << is just a binary operator with certain associativity and precedence.

Things get more interesting when we think about semantics. What does << mean? It takes the left operand and left-shifts it by the amount of the right operand, right? Interestingly, nobody thinks to wonder what it means to left-shift std::cout by the amount "hello world". In the context of output stream operations, << means something else entirely. (To this day, some folks have a hard time accepting that. Those folks should read no further.)

Are there other ways to understand expressions? Consider the C++ type system and object model. A C++ expression has a type and a value that can be completely independent of its syntax and semantics. std::cout << "hello world" has type std::ostream & and value std::cout. That’s mostly orthogonal to the fact that it has the effect of writing "hello world" to an output stream.

More? How about the conceptual level? If cont is an STL container, then we know (A) that cont.begin() parses as valid C++, (B) that it means “get the begin iterator”, and (C) that it surely has a type and a value; but we also know that its type models a concept like ForwardIterator or RandomAccessIterator. In short, expressions are not as simple as one might think!

Back to EDSL Design

What does this have to do with EDSL design? Writers and readers of EDSLs in C++ must be able to move throught these different levels when reading and writing domain-specific code. arg1 + 42 parses as C++, has a type and a value (a tree that represents the expression), and its type models a concept called ProtoExpression. So when I casually say that arg1 + 42 is a “Proto expression”, I’m really saying all of the above.

There’s not much to say about the semantics of Proto expressions besides the fact that they tend to clump together to form larger and larger Proto expressions. In particular, arg1 + 42 does not yet have “lambda” semantics in and of itself; you must pass it to the Lambda algorithm which interprets it as a lambda expression. We’d like to define a LambdaExpression concept that refines ProtoExpression by adding operator(), and we’d like arg1 + 42 to model that new concept. Now, when I say “lambda expression”, all that information—syntax, semantics, types, values, concepts—comes along for the ride.

Figure 1: Relationships between different kinds of expressions.

There’s a simple relationship between the expressions I’ve been discussing: a lambda expression is a kind of Proto expression, which is a kind of C++ expression. These relationships are shown in Figure 1. At the lowest level—syntax—they’re all the same thing. At higher levels, they diverge, and it’s up to the EDSL user to keep it straight. This doesn’t have to be hard; after all, when was the last time you looked at std::cout << "hello world" and wondered what it meant?

Conceptual Difficulties

As defined above, arg1 + 42 is a Proto expression. But we’re building a lambda library, dammit—we don’t want no stinkin’ Proto expressions, we want lambda expressions! How do we create a type that models the concept LambdaExpression? We can easily add operator() to arg1 such that arg1(1) returns 1. But we haven’t told anybody about our operator(), so as far as Proto is concerned arg1 is still just a ProtoExpression. Worse, arg1 + 42 is also just a ProtoExpression, without our special operator() member. Proto has effectively sliced it off!

There needs to be a way to “sub-class” Proto expressions to create lambda expressions in such a way that they combine into larger lambdas and not degenerate back into boring old Proto expressions. In Proto, the process of sub-classing Proto expressions is called expression extension, and it involves telling Proto a little something about your domain.

Proto Domains

In Proto, you can extend expressions by defining a domain and assigning properties (like extra member functions) to expresions within that domain. What’s a domain? In the abstract domain-specific-language sense, a domain is a kind of intelectual scope; problems within a domain all tend to share characteristics, as do their solutions. Linear algebra is a domain; text manipulation is a domain; etc. A Proto domain is similar: all Proto expressions that share a Proto domain have the same properties, like extra member functions.

A Proto domain is just a C++ type. Like a trait, a Proto domain tells you things about an EDSL. All Proto expressions have an associated Proto domain that describes what makes expressions in that domain special. You can query for it with proto::domain_of, but your rarely need to. Since it’s awkward to say, “expression types have associated domain types,” we just say “expressions are in domains.”

The most important job of a Proto domain is to communicate to Proto how it should create new expressions in that domain. By default, expressions are in proto::default_domain, and there’s nothing special Proto needs to do when building new expressions. But if you assign a custom domain to an expression (we’ll see how in a minute), then you can tell Proto to wrap all new expressions with a custom expression wrapper where you can add extra behaviors—like an operator() member function, for instance.

The domain-specific expression wrappers are known as expression extension classes. You tell Proto about them via a domain with a type called a generator, described below. So in short, we have:

Domain
A type associated with a Proto expression. A Proto domain is a trait with a number of associated types that tell Proto about an EDSL. Larger expressions share the domain of the smaller ones from which they’re composed. One of the types associated with a Proto domain is a generator.
Generator
A function object that accepts accepts a Proto expression object and returns a new object—a lambda, perhaps—that extends the expression. Usually, this simply involves wrapping it in a custom wrapper.
Expression Extensions
Custom wrappers are called expression extensions, and they are where all the domain-specific stuff like custom member functions live.

Figure 2: Relationship between Proto expressions, domains and generators.

There’s a loopy relationship between these three concepts: for instance, lambda expressions are in the lambda domain, which has an associated lambda generator, which creates lambda expressions, closing the circle. The relationship between expressions, domains and generators is shown in Figure 2 to the right.

Some code will make this a little more real.

The Lambda Domain, Generator, and Extension

In the code below, you will see how the three concepts described above—domains, generators, and expression extensions—play together to create lambda expression. First, we define lambda_domain:

1
2
3
4
5
6
7
8
// Forward-declare our custom expression wrapper
template<typename ProtoExpression> struct lambda_expr;
 
// Define a lambda domain and its generator, which
// simply wraps all new expressions our custom wrapper
struct lambda_domain
  : proto::domain< proto::generator<lambda_expr> >
{};

On line 2 we forward-declare our lambda expression wrapper. That lets us use it on line 7 when defining lambda_domain. The lambda_domain struct is how we tell Proto how we would like it to post-process all new lambda expressions; namely, to pass them through proto::generator<lambda_expr>. That’s a function object that wraps expressions in the (yet to be defined) lambda_expr.

Now we define lambda_expr, our custom expression wrapper. This is where we put our domain-specific goodies (explained below):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// A lambda is an expression with an operator() that
// evaluates the lambda.
template<typename ProtoExpression>
struct lambda_expr
  : proto::extends<ProtoExpression, lambda_expr<ProtoExpression>, lambda_domain>
{
    lambda_expr(ProtoExpression const &expr = ProtoExpression())
      : lambda_expr::proto_extends(expr)
    {}
 
    // So that boost::result_of can be used to calculate
    // the return type of this lambda expression.
    template<typename Sig> struct result;
 
    template<typename This, typename Arg>
    struct result<This(Arg)>
      : boost::result_of<Lambda(This const &, Arg const &)>
    {};
 
    // Evaluate the lambda expressions
    template<typename Arg1>
    typename result<lambda_expr(Arg1)>::type
    operator()(Arg1 const & arg1) const
    {
        return Lambda()(*this, arg1);
    }
};

Line 5—inheritance from proto::extends—is where the associations between the lambda domain, generator, and extension class are established. The three template parameters are: the Proto expression we’re extending, the lambda expression we’re defining, and the domain with which all lambda expressions are associated. The constructor on line 7 is some necessary boilerplate, and the rest is up to you.

Line 23 is where we (finally!) give lambda expressions their operator(). On line 25, we use the Lambda algorithm that we defined last time. Note that we pass *this as the first parameter. The Lambda algorithm expects to be passed a model of the ProtoExpression concept. Since *this refers to an instantiation of lambda_expr which models LambdaExpression, and LambdaExpression refines ProtoExpression, this works.

The only other interesting thing is the nested result class template and the use of boost::result_of to calculate the return type of the Lambda algorithm. Lambda is a valid TR1-style function object (inheritance from proto::or_ makes it so). And with the addition of the nested result class template, all lambda_expr objects are valid TR1-style function objects as well.1

Making arg1 A Lambda Expression

We’re almost done. All that’s left is to make arg1 a lambda expression so that the expressions in which it appears are also lambda expressions. To do that, we just wrap it in lambda_expr:

// Define arg1 as before, but wrapped in lambda_expr
typedef lambda_expr<proto::terminal<arg1_tag>::type> arg1_type;
arg1_type const arg1 = arg1_type();

With this last change, every expression involving arg1 gets infected with lambda-ness and has an operator() that evaluates the lambda expression. Now we’re done. (For now.)

Complete Solution

When all the parts are assembled, our little lambda library looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <boost/proto/proto.hpp>
using namespace boost;
 
struct arg1_tag {};
 
// Forward-declare our custom expression wrapper
template<typename ProtoExpression> struct lambda_expr;
 
// Define a lambda domain and its generator, which
// simply wraps all new expressions our custom wrapper
struct lambda_domain
  : proto::domain< proto::generator<lambda_expr> >
{};
 
// A Proto "algorithm": a grammar with embedded transforms for
// evaluating lambda expressions (explained in the previous article):
struct Lambda
  : proto::or_<
        // When evaluating a placeholder terminal, return the state.
        proto::when< proto::terminal<arg1_tag>, proto::_state >
        // Otherwise, do the "default" thing.
      , proto::otherwise< proto::_default< Lambda > >
    >
{};
 
// A lambda is an expression with an operator() that
// evaluates the lambda.
template<typename ProtoExpression>
struct lambda_expr
  : proto::extends<ProtoExpression, lambda_expr<ProtoExpression>, lambda_domain>
{
    lambda_expr(ProtoExpression const &expr = ProtoExpression())
      : lambda_expr::proto_extends(expr)
    {}
 
    // So that boost::result_of can be used to calculate
    // the return type of this lambda expression.
    template<typename Sig> struct result;
 
    template<typename This, typename Arg>
    struct result<This(Arg)>
      : boost::result_of<Lambda(This const &, Arg const &)>
    {};
 
    // Evaluate the lambda expressions
    template<typename Arg1>
    typename result<lambda_expr(Arg1)>::type
    operator()(Arg1 const & arg1) const
    {
        return Lambda()(*this, arg1);
    }
};
 
// Define arg1 as before, but wrapped in lambda_expr
typedef lambda_expr<proto::terminal<arg1_tag>::type> arg1_type;
arg1_type const arg1 = arg1_type();
 
// End lambda library here. Begin test code.
#include <cassert>
 
int main()
{
    int i = (arg1 + 42)(1);
    assert( i == 43 );
}

Excluding comments, blank lines and the test code (which isn’t part of the library), this is 34 lines by my count. Now that we have it working, let’s take it for a spin:

Figure 3: Computing the Fibonacci Numbers

The above code uses our 34-line lambda library to fill an array with the first ten Fibonacci numbers. Not shabby. If you’re confused about how this lambda works, perhaps you’re thrown by the expression &arg1. Proto has overloaded unary operator& so that &arg1 builds a Proto expression tree. The tree for (&arg1)[-2] + (&arg1)[-1] looks like Figure 4.

Figure 4: Expression tree for (&arg1)[-2] + (&arg1)[-1]

Order of Initialization

There is one small problem with our solution, but unless you were a language lawyer, I wouldn’t expect you to spot it: the arg1 placeholder requires dynamic initialization. This means that it must be constructed at runtime. The problem is that there could be initialization order dependencies that cause arg1 to be used before its constructor is executed. That’s not good.

We can avoid the order-of-initialization problem by making arg1 POD and using static initialization. In C++03, that means lambda_expr can’t have a constructor or use inheritance.3 No problem; Proto offers another mechanism for extending expressions: BOOST_PROTO_EXTENDS and friends. We can redefine lambda_expr as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// A lambda is an expression with an operator() that
// evaluates the lambda.
template<typename ProtoExpression>
struct lambda_expr
{
    BOOST_PROTO_BASIC_EXTENDS(ProtoExpression, lambda_expr, lambda_domain)
    BOOST_PROTO_EXTENDS_ASSIGN()
    BOOST_PROTO_EXTENDS_SUBSCRIPT()
 
    // So that boost::result_of can be used to calculate
    // the return type of this lambda expression.
    template<typename Sig> struct result;
 
    template<typename This, typename Arg>
    struct result<This(Arg)>
      : boost::result_of<Lambda(This const &, Arg const &)>
    {};
 
    // Evaluate the lambda expressions
    template<typename Arg1>
    typename result<lambda_expr(Arg1)>::type
    operator()(Arg1 const & arg1) const
    {
        return Lambda()(*this, arg1);
    }
};

We had to replace inheritance from proto::extends and the lambda_expr constructor with three macros. The first on line 6 establishes the relationships between the lambda domain, generator and extension class. The second and third macros define tree-building operator= and operator[] member functions, respectively. (Admittedly, the macros are distasteful but necessary with C++03. The relaxed rules for POD types and constexpr constructors in C++0x will eliminate the need for this hoop-jumping.)

Now that we have changed lambda_expr to allow static initialization, we can use it when initializing the arg1 placeholder as follows:

lambda_expr<proto::terminal<arg1_tag>::type> const arg1 = {};

The curly brace initialization (called aggregate initialization) reflects the fact that lambda_expr no longer has a constructor. And since initializing lambda_expr objects now requires a different syntax, we must change lambda’s generator when defining lambda_domain as follows:

// Define a lambda domain and its generator, which
// simply wraps all new expressions our custom wrapper
struct lambda_domain
  : proto::domain< proto::pod_generator<lambda_expr> >
{};

Notice that we replaced proto::generator with proto::pod_generator. The only purpose of this change was to get Proto to use curly-braces when initializing new lambda_expr objects instead of the regular constructor-call syntax.

Click here to view the complete solution.

Conclusions and What’s To Come

With the above changes to fix the order-of-initialization problem (and the qualifications above), our mini lambda library now has 32 lines of code by my count. OK, so it’s not quite 30 as promised. Bygones? I hope you’ve enjoyed this introduction to Proto grammars, transforms, and extension classes. We’ll be seeing a lot more about grammars and transforms in future articles, but at this point, you have most of what you need to start building your own embedded languages with Proto.

In the next article, we’ll address the one major shortcoming of this lambda EDSL: the fact that it only accepts a single argument. Accepting additional arguments is a simple matter, and along the way we’ll see how to compose Proto transforms, making them far more flexible and powerful.

See you then!

Acknowledgements

Thanks to Bartosz Milewski and David Abrahams for their valuable feedback about this post.


  1. On C++0x compilers, the nested result class template is unnecessary because std::result_of uses decltype to deduce the return type of operator()

  2. Since this lambda works by using pointer manipulation, it only works on contiguous data. Careful! 

  3. In C++0x, the situation is different. Some types that use inheritance and constructors still qualify for static initialization (see standard layout types). But if you want your EDSL to be portable to C++03 compilers, you need to respect the stricter C++03 rules. 

Posted Sunday, October 17th, 2010 under Boost.

11 Responses to “Expressive C++: A Lambda Library in 30 Lines (Part 2 of 2)”

  1. Tristan Cragnolini says:

    Hello, just to mention that there is a mistake in the complete solution at the end of this article, at http://cpp-next.com/wp-content/uploads/2010/10/05-mini-lambda.txt

    the first line should read “#include <boost/proto/proto.hpp>”, I think.

    Btw, great serie of articles !

      Quote
  2. Serious question: have you used this in production? Where is it actually useful? All the stated examples seem better off being written using straight-up old-school C/C++ idioms.

    The cost of this kind of metaprogramming seems immense, both from a build time perspective, and from actually reading the code. Quite frankly, it doesn’t exactly roll off the tongue and it’s actually hard to quickly read the code and understand what it’s doing.

    And I don’t want to even begin to imagine what it’s like to step through in the debugger.

      Quote
    • Eric Niebler says:

      First off: domain-specific languages are a good way to approach a lot of problems. But if you’re not accustomed to approaching problems in this way, it probably seems like object-orientation did when you first started learning it: extra complexity for little return.

      Now, is this being used in production? YES! In fact, more people are using Proto than realize it. It’s a foundational library that is used to build other libraries. Boost.Spirit, Xpressive, Phoenix3, and MSM are all built on top of Proto, just within Boost; Boost.Spirit in particular is very popular. Xpressive is used in the Alioth language shootout to put C++ on the map for fastest C++ regex engine. Outside of Boost there is the forthcoming version of NT2, a MatLAB-like matrix manipulation EDSL that is 4x faster than parallel MatLAB itself, and ~40x faster than regular MatLAB.

      Is Proto hard? Not fundamentally; it’s just unlike regular C++ and the syntax is weird. It’s easier if you’re already familiar with functional programming. The examples I’ve shown so far in this series—initializing a map, calculating the Fibonocci numbers—are trivial. They’re examples, and they’ll get more interesting as I go.

      Does it help at all to know all those popular, powerful, high-performance libraries have been built with Proto’s help?

        Quote
    • Thomas Heller says:
      And I don’t want to even begin to imagine what it’s like to step through in the debugger.   

      I agree, traditional debuggers do not play along nice with EDSLs. An easy way to debug your Expression Templates should be provided along with the EDSL library.

        Quote
  3. Jimmy Karlsson says:

    Wow, my head is spinning wildly right now. Though, I’m not too familiar with either C++ templating nor Boost in general, so I guess it’s understandable.

    Will this piece of code go into your toolkit or do you have a more direct application for it (other then for the ‘Expressive C++’ series that is)?

      Quote
    • Eric Niebler says:

      This code is just an example of the sorts of things you can do with Boost.Proto. I will revisit this example in future articles and flesh it out to illustrate some implementation techniques used by the forthcoming Phoenix3 library, the successor to Boost.Lambda.

        Quote
  4. Minor Nit: << is the left shift operator.

      Quote
  5. Sohail says:

    The static/dynamic initialization problem is not a language lawyer problem IMHO. Run-of-the-mill C++ programmers need to know and understand it too. I think the serialization library had a significant problem with this for some time.

    Great post, as usual!

    Question out of curiosity: how many iterations of the Proto design did you go through?

      Quote
    • Eric Niebler says:

      Thanks, Sohail. Hopefully, not many people are declaring global objects (I can hope, right?), but if they do, then yes they should know about static initialization.

      And the version of Proto that I finally committed to Boost was v4. I have a lot of people to thank for feedback on early designs.

        Quote

Leave a Comment (post replies using links below individual comments)