Want Speed? Pass by Value.

Be honest: how does the following code make you feel?

std::vector<std::string> get_names();
…
std::vector<std::string> const names = get_names();

Frankly, even though I should know better, it makes me nervous. In principle, when get_names() returns, we have to copy a vector of strings. Then, we need to copy it again when we initialize names, and we need to destroy the first copy. If there are N strings in the vector, each copy could require as many as N+1 memory allocations and a whole slew of cache-unfriendly data accesses as the string contents are copied.

Rather than confront that sort of anxiety, I’ve often fallen back on pass-by-reference to avoid needless copies:

get_names(std::vector<std::string>& out_param );
…
std::vector<std::string> names;
get_names( names );

Unfortunately, this approach is far from ideal.

  • The code grew by 150%
  • We’ve had to drop const-ness because we’re mutating names.
  • As functional programmers like to remind us, mutation makes code more complex to reason about by undermining referential transparency and equational reasoning.
  • We no longer have strict value semantics1 for names.

But is it really necessary to mess up our code in this way to gain efficiency? Fortunately, the answer turns out to be no (and especially not if you are using C++0x). This article is the first in a series that explores rvalues and their impliciations for efficient value semantics in C++.

RValues

Rvalues are expressions that create anonymous temporary objects. The name rvalue refers to the fact that an rvalue expression of builtin type can only appear on the right-hand side of an assignment. Unlike lvalues, which can always be used on the left-hand-side of an assignment, rvalue expressions yield objects without any persistent identity to assign into.2

The important thing about anonymous temporaries for our purposes, though, is that they can only be used once in an expression. How could you possibly refer to such an object a second time? It doesn’t have a name (thus, “anonymous”); and after the full expression is evaluated, the object is destroyed (thus, “temporary”)!

Once you know you are copying from an rvalue, then, it should be possible to “steal” the expensive-to-copy resources from the source object and use them in the target object without anyone noticing. In this case that would mean transferring ownership of the source vector’s dynamically-allocated array of strings to the target vector. If we could somehow get the compiler to execute that “move” operation for us, it would be cheap–almost free–to initialize names from a vector returned by-value.

That would take care of the second expensive copy, but what about the first? When get_names returns, in principle, it has to copy the function’s return value from the inside of the function to the outside. Well, it turns out that return values have the same property as anonymous temporaries: they are about to be destroyed, and won’t be used again. So, we could eliminate the first expensive copy in the same way, transferring the resources from the return value on the inside of the function to the anonymous temporary seen by the caller.

Copy Elision and the RVO

The reason I kept writing above that copies were made “in principle” is that the compiler is actually allowed to perform some optimizations based on the same principles we’ve just discussed. This class of optimizations is known formally as copy elision. For example, in the Return Value Optimization (RVO), the calling function allocates space for the return value on its stack, and passes the address of that memory to the callee. The callee can then construct a return value directly into that space, which eliminates the need to copy from inside to outside. The copy is simply elided, or “edited out,” by the compiler. So in code like the following, no copies are required:

std::vector<std::string> names = get_names();

Also, although the compiler is normally required to make a copy when a function parameter is passed by value (so modifications to the parameter inside the function can’t affect the caller), it is allowed to elide the copy, and simply use the source object itself, when the source is an rvalue.

1
2
3
4
5
6
7
8
9
10
11
12
std::vector<std::string> 
sorted(std::vector<std::string> names)
{
    std::sort(names);
    return names;
}
 
// names is an lvalue; a copy is required so we don't modify names
std::vector<std::string> sorted_names1 = sorted( names );
 
// get_names() is an rvalue expression; we can omit the copy!
std::vector<std::string> sorted_names2 = sorted( get_names() );

This is pretty remarkable. In principle, in line 12 above, the compiler can eliminate all the worrisome copies, making sorted_names2 the same object as the one created in get_names(). In practice, though, the principle won’t take us quite that far, as I’ll explain later.

Implications

Although copy elision is never required by the standard, recent versions of every compiler I’ve tested do perform these optimizations today. But even if you don’t feel comfortable returning heavyweight objects by value, copy elision should still change the way you write code.

Consider this cousin of our original sorted(…) function, which takes names by const reference and makes an explicit copy:

std::vector<std::string> 
sorted2(std::vector<std::string> const& names) // names passed by reference
{
    std::vector<std::string> r(names);        // and explicitly copied
    std::sort(r);
    return r;
}

Although sorted and sorted2 seem at first to be identical, there could be a huge performance difference if a compiler does copy elision. Even if the actual argument to sorted2 is an rvalue, the source of the copy, names, is an lvalue,3 so the copy can’t be optimized away. In a sense, copy elision is a victim of the separate compilation model: inside the body of sorted2, there’s no information about whether the actual argument to the function is an rvalue; outside, at the call site, there’s no indication that a copy of the argument will eventually be made.

That realization leads us directly to this guideline:

Guideline: Don’t copy your function arguments. Instead, pass them by value and let the compiler do the copying.

At worst, if your compiler doesn’t elide copies, performance will be no worse. At best, you’ll see an enormous performance boost.

One place you can apply this guideline immediately is in assignment operators. The canonical, easy-to-write, always-correct, strong-guarantee, copy-and-swap assignment operator is often seen written this way:

T& T::operator=(T const& x) // x is a reference to the source
{ 
    T tmp(x);          // copy construction of tmp does the hard work
    swap(*this, tmp);  // trade our resources for tmp's
    return *this;      // our (old) resources get destroyed with tmp 
}

but in light of copy elision, that formulation is glaringly inefficient! It’s now “obvious” that the correct way to write a copy-and-swap assignment is:

T& operator=(T x)    // x is a copy of the source; hard work already done
{
    swap(*this, x);  // trade our resources for x's
    return *this;    // our (old) resources get destroyed with x
}

Reality Bites

Of course, lunch is never really free, so I have a couple of caveats.

First, when you pass parameters by reference and copy in the function body, the copy constructor is called from one central location. However, when you pass parameters by value, the compiler generates calls to the copy constructor at the site of each call where lvalue arguments are passed. If the function will be called from many places and code size or locality are serious considerations for your application, it could have a real effect.

On the other hand, it’s easy to build a wrapper function that localizes the copy:

std::vector<std::string> 
sorted3(std::vector<std::string> const& names)
{
    // copy is generated once, at the site of this call
    return sorted(names);
}

Since the converse doesn’t hold—you can’t get back a lost opportunity for copy elision by wrapping—I recommend you start by following the guideline, and make changes only as you find them to be necessary.

Second, I’ve yet to find a compiler that will elide the copy when a function parameter is returned, as in our implementation of sorted. When you think about how these elisions are done, it makes sense: without some form of inter-procedural optimization, the caller of sorted can’t know that the argument (and not some other object) will eventually be returned, so the compiler must allocate separate space on the stack for the argument and the return value.

If you need to return a function parameter, you can still get near-optimal performance by swapping into a default-constructed return value (provided default construction and swap are cheap, as they should be):

std::vector<std::string> 
sorted(std::vector<std::string> names)
{
    std::sort(names);
    std::vector<std::string> ret;
    swap(ret, names);
    return ret;
}

More To Come

Hopefully you now have the ammunition you need to stave off anxiety about passing and returning nontrivial objects by value. But we’re not done yet: now that we’ve covered rvalues, copy elision, and the RVO, we have all the background we need to attack move semantics, rvalue references, perfect forwarding, and more as we continue this article series. See you soon!

Follow this link to the next installment.

Acknowledgements

Howard Hinnant is responsible for key insights that make this article series possible. Andrei Alexandrescu was posting on comp.lang.c++.moderated about how to leverage copy elision years before I took it seriously. Most of all, though, thanks in general to all readers and reviewers!


  1. Googling for a good definition of value semantics turned up nothing for me. Unless someone else can point to one (and maybe even if they can), we’ll be running an article on that topic—in which I promise you a definition—soon. 

  2. For a detailed treatment of rvalues and lvalues, please see this excellent article by Dan Saks 

  3. Except for enums, every value with a name is an lvalue. 

Posted Saturday, August 15th, 2009 under Value Semantics.

59 Responses to “Want Speed? Pass by Value.”

  1. Hendrik Schober says:

    So what’s wrong with the following (note the &)?

    std::vector<std::string>& const names = get_names();
       (Quote)
    • SG says:

      I think you meant

      vector<string> const  names = get_names(); // #1
      vector<string> const& names = get_names(); // #2

      Assuming get_names returns by value (and not a reference) and you’re dealing with a “good” compiler there should not be a difference between the two. In the first case (#1) the compiler can make the function construct its return value directly in that space that will be referred to by the name “names”. In #2 the return value that lives on the stack gets an extension of its life-time and a reference to it is created. A good compiler doesn’t even need to allocate space for the reference but I’m not sure if most compilers are that smart (even though they could be).

      If get_names returns a reference it makes a big difference, of course. Is the following code safe?

      string const& blah = string("123") + "456";

      Currently, it is safe. It’s not safe C++0x (according to the current draft) because operator+(string&&,char const*) returns an rvalue reference –> You get a dangling reference.

      In my opinion, people should not write #2 instead of #1 just because they think they can safe a copy. Many current compilers successfully elide the copy in #1. Also, they should not declare functions that return rvalue references (I really hope operator+(string&&,char const*) and others will be fixed) for “temporary recycling” because it opens up the possibility of dangling references.

      Cheers! SG

         (Quote)
    • Trick question? It’s a syntax error: you can’t cv-qualify the reference itself.

         (Quote)
  2. Niels Dekker says:

    Thanks for the article, Dave!

    A very minor remark: your recommended copy-and-swap assignment implementation cannot do a fast assignment to itself. Fast self-assignment could be achieved by adding an extra check to the “canonical” version:

    T& T::operator=(T const& x) {

    // Self-assignment check:
    if (&x == this) return *this;
    T tmp(x);
    swap(*this, tmp);
    return *this;

    }

    Of course, the speed of self-assignment is rarely relevant. But I find it slightly counter-intuitive, having a self-assignment that might fail! But if a user doesn’t even have enough memory to assign something to itself, she’s probably in deep trouble anyway! ;-)

       (Quote)
    • Niels Dekker: Fast self-assignment could be achieved by adding an extra check to the “canonical” version

      …which would penalize the usual case and complicate the code in order to optimize a rare case, which is almost always a bad idea.

      Niels Dekker: I find it slightly counter-intuitive, having a self-assignment that might fail!

      These self-assignments never have the form x = x anyway (nobody knowingly does a self-assignment except in test suites). They’re almost always x = y cases, where x and y may or may not refer to the same object. That means we’re in code that has to cope with an exception anyhow. There’s really zero advantage in making self-assignment a no-throw operation.

         (Quote)
  3. Great article, but could you also include a conclusion with an example of what we should do and not do (i.e. best practices?). Your article seems more like a discussion than a “good rule to follow”, which makes it difficult to pick out the important parts. But definitely thanks for the article. Keep posting more.

       (Quote)
    • Thanks for the feedback! Speaking generally, I think the popular C++ literature is long on rules and short on insight, and I’m not naturally inclined to boil things down to a prescription, so it’s great to know when the important stuff doesn’t stand out.

      The guideline is: Pass by value any arguments that you would otherwise copy explicitly.

      But it’s just a guideline; as I explain in this comment, you’ll end up generating bigger code if you often pass lvalues that way, and bigger code, in some circumstances can be slower code—or it can just be unacceptable because of its size. So maybe you can see why I don’t dispense a lot of rules.

         (Quote)
      • Mathias Gaunard says:

        That guideline appears a bit wrong. For example, with that kind of code

        void vector::push_back(const T& t)
        {
            ...
            new(buffer) T(t);
            ...
        }

        It would be silly to pass t by value.

        So I believe it should be something like:

        Pass by value any arguments that you would otherwise copy explicitly, and for which you do not need to control the storage of the copy.

           (Quote)
      • Mathias Gaunard says:

        Or rather, pass by value any arguments that you would otherwise copy explicitly and which you don’t want to control the storage of the copy of.

        Indeed, for something like vector::push_back, passing by value is useless, even though you want to explicitly copy the argument.

           (Quote)
  4. Thomas Petit says:

    It seems that your new style copy assignment operator don’t mix well with rvalue reference in MSVC10 and in gcc 4.4 :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    
    struct X
    {
       X(){}
       X& operator=(X x){return *this;}
       X& operator=(X&& x){return *this;}
    };
    X foo()
    {
       X x;
       return x;
    }
    int main()
    {
       X x;
       x = foo();
       return 0;
    }
    At line 15 (gcc error) :
    In function ‘int main()’:|
    error: ambiguous overload for ‘operator=’ in ‘x = foo()’|
    note: candidates are: X& X::operator=(X)|
    note:                 X& X::operator=(X&&)|
    
    I don’t understand. The copy assignment operator takes an lvalue, the move assignment operator take an rvalue and foo returns an rvalue. Where is the ambiguity ? Is this a bug ?    (Quote)
  5. Andrzej Krzemienski says:

    Hi, This is a couple of unconnected thoughts that hauted me after reading your article.

    (1) The copy-and-swap idiom. It implements a copy in terms of swap. Then, the default swap function is implemented in terms of three copies. It is all fine if you always implement customized swap for your classes, but if you don’t you are risking infinite recursion. Wouldn’t it be safer if the idiom was always accompained by the note saying that you need to implement swap too?

    (2) In C++0x we will have lval references, rval references, and values (no-references). The Cartesian product with const/non-const qualifier gives 6 possible ways in which we can define function arguments: 1. fun( YourClass v ); // by value 2. fun( const YourClass v ); // const copy. useless?? 3. fun( YourClass & v ); // output parameter 4. fun( YourClass const& v ); // sort of by value, bot no copying 5. fun( YourClass && v ); // temporary that I can change 6. fun( YourClass const&& v ); // useless – #4 would do

    Numbers 2 and 4 are probably useless, but it still leaves us with four. Even if we do not want to talk about rval refs right now it leaves us with three, which is one too much. Having spent a number of years programming in C++ I do not find in strange any more, but if you look at it from a new commer perspective there should be only two: either I will change your object, or not. I think Andrei Alexandrescu pointed that out somewhere in the discussion groups. It could be only #1 (interested in value) and #3 (interested in an object – memory location). The other two (4 and 5) are just for performance tweeks, aren’t they? Well, #5 is also about unique ownership, but half of it’s job is still performance, isn’t it? It is troublesome that you have four choices, and after your article, it is clear that it is not clear which one to choose. We may think taht we optimize, but in fact we inadvertently pessimize. If we have only two options and teh support of copy elision, move semantics, and perhaps something newer and even more powerful, we could just write:

    fun( set<vector> data );

    and be sure that we never add any slowdown. I am not surewhat point I am trying to make here, but Im pretty sure I want to make some point. The two functions below have different argument type. Trying to pick one when matching overload candidates would be ambiguous, but they are still two types. 1. fun( YourClass v ); 2. fun( YourClass const& v ); Why do we need #2? Because it is sometimes faster. Why do we need #1? Not sure. Because it is sometimes faster? Perhaps we do not need #1 at all? If we discard it we can change the syntax of #2 to

    fun( YourClass v );

    This is the same as #1 used to be, but since we discarded it there is no ambiguity. If there is some programmer’s knowlege required to perform optimization, shouldn’t it rather be provided via attributes:

    fun( YourClass [[copy]] v );

    But do we need even that? Is the compiler simply not smarter that us?

    (4) Value semantics. It has a great suppot in C++, e.g. in form of a implicitly defined copy constructor/assignment. A couple of things were suggested to the Standards Committe to make it even better. I just wanted to list them here: 1. Implicitly defined comparison operator (that is a logical conjunction of member-wise comparisons). It was mentioned in N2326, but never really proposed 2. The definition of “the same”. In N2479. It has a status “Outstanding issues” – not sure what it means. 3. Not generating copy operations implicitly for classes with non-trivial destructors. Proposed in N2904. No idea what its status is.

    Regards.

       (Quote)
    • Andrzej Krzemienski: Hi, This is a couple of unconnected thoughts that hauted me after reading your article.

      Heh, just a couple? ;-)

      This is a whole article unto itself! Thanks for your contribution; I may have to respond in pieces.

      (1)The copy-and-swap idiom. It implements a copy in terms of swap.

      An assignment, I think.

      Then, the default swap function is implemented in terms of three copies.

      A copy and two assignments, but…

      It is all fine if you always implement customized swap for your classes, but if you don’t you are risking infinite recursion. Wouldn’t it be safer if the idiom was always accompained by the note saying that you need to implement swap too?

      …point taken.

      I’ll try to get to the rest of your material soon, but let me say now that a lot of what you’ve written sounds a lot like thoughts I‘ve been having lately. I ask the question this way: what would a language that was designed to support mutable value semantics look like?

         (Quote)
    • Sebastian Redl says:

      (1) That’s why it’s customary to use the member swap in the copy-and-swap idiom:

      T& operator =(T t) { t.swap(*this); return *this; }

      No chance of infinite recursion, unless you implemented member swap in the canonical way, and that would be stupid.

      (2) Const rvalue references are useless to the programmer and should never appear written in a program. (They can appear through template deduction.) Const by-value arguments are also useless, so as you say, we’re down to 4 variants. Let’s leave rvalue references out for the moment. We deal with by-ref, by-val and by-const-ref. Combining the usual C++ wisdom with this article pretty much leads to these guidelines: - Unconditionally use by-ref is for out or in-out parameters. However, reconsider whether you need out parameters, because returning might be just as efficient. - Unconditionally use by-val for arguments that are cheap to copy (primitives) and by-const-ref for arguments that are not copyable. - This leaves arguments that are copyable, but it is expensive to do so. This article essentially says that you should pass these by-val iff you plan to modify them inside the function, but don’t want the modifications to be visible outside. The downside of this approach is that you leak an implementation detail into the interface: if you have a traditional assignment operator, you should pass the argument by-const-ref, but if you convert it to a copy-and-swap assignment operator, you should change it to by-val.

         (Quote)
      • Andrzej Krzemienski says:

        Your guidelines are clear and fine. But what I was writng about was more fantasizing how a more perfect language could look like. In fact, one aspect could be achieved only by even more advanced compiler optimization technique. As I have little (i.e. none at all) familiarity with compilers, I may still be really fantasizing, but just consider:

        FatCopy fc = prepare(); read1( fc ); read2( fc );

        Two read functions do not alter the parameter; we are used to declaring:

        void read1( FatCopy const & fc );

        This is in order to avoid copying. This, in turn, is because we are used to think that:

        void read1( FatCopy fc );

        means copying. The way I have been taught C++, one thinks that the above line means passing data by copying, unless copy elision is employed. How about changing our thinking to “passing data by value”: there will be no copying, unless the function really changes value fc and copying is really unavoidable. The compiler will decide to use your copy constructor only if necessary, and in situations where you would need a copy anyways. This could be achieved as follows: The compiler always compiles

        void read1( FatCopy fc );

        as passing by reference, and if it finds that fc is modified by read1, it marks it “somehow” this fact in that function’s meta data, and later, it adds a copy at the call site while linking, so the calling function would be compiled to:

        FatCopy fc = prepare(); FatCopy __copy = fc; read1( __copy ); read2( fc );

        The copy is only if read1 modifies fc. Otherwise there is no copy whatsoever.

        Regards.

           (Quote)
    • Andrzej Krzemienski: The copy-and-swap idiom. It implements a copy in terms of swap. Then, the default swap function is implemented in terms of three copies. It is all fine if you always implement customized swap for your classes, but if you don’t you are risking infinite recursion. Wouldn’t it be safer if the idiom was always accompained by the note saying that you need to implement swap too?

      Maybe, but it’s probably not as scary as you think. Outside namespace std, an unqualified call to swap won’t find std::swap unless you’ve explicitly brought it into scope with a using-declaration.

         (Quote)
  6. Brad says:

    Dave, et al. Thanks much for this site and the material so far.

       (Quote)
  7. Maxim Yanchenko says:

    One consideration about the following:

    Consider this cousin of our original sorted(…) function, which takes names by const reference and makes an explicit copy:
    std::vector 
    sorted2(std::vector const& names) // names passed by reference
    {
        std::vector r(names);        // and explicitly copied
        std::sort(r);
        return r;
    }

    In principle, the fact that the function makes a copy is an implementation detail, which is invisible if you use const vector&. When you switch to pass-by-value, you are effectively exposing your implementation to the interface of the function, so in case you later come up with another solution that doesn’t involve copying, you’ll switch to reference with the obvious signature change, that at least will require your clients to recompile (fortunately, their code won’t change unless they are using the signature explicitly, say as a function pointer). Without this, they would just replace .so/.dll with the new version.

    It’s not a major problem, just a point to consider. After all, almost no explicit optimization comes at zero price. (Btw, if elimination of reference-bound temporaries was allowed, it would work in this case as well!)

       (Quote)
    • First, yeah this is just another manifestation of the code size issue explained in this comment.

      However #1, I have a hard time seeing a switch to pass-by-value as exposing an implementation detail. I’m not sure why; it may just be a gut reaction, but my orientation is toward pass-by-value as a default, and pass-by-reference as an optimization. Nothing obliges the function to modify or steal resources from the copied value, and the copy ought to be side-effect-free. Okay, I’ll admit I’m flailing about in the dark hoping to hit the right explanation for why it’s not an implementation detail. Let me give that some thought.

      However #2, I totally disagree that legalizing your optimization—which I won’t even call “elimination of reference-bound temporaries” because you surely would not want that to be allowed in all cases—would help with the recompilation problem. It’s like I said earlier: your optimization depends on being able to see inside the called function, which is in conflict with the separate compilation model.

         (Quote)
      • Andrzej Krzemienski says:

        Hi, Your “However #1″ is somehow very inspiring. When I type the function:

        int double1( int const& i ) { return 2*i; }

        and then change it to:

        int double2( int i ) { return 2*i; }

        No-one will say taht I exposed any imlementation detail. I just want a value. In fact I would probably never write double1, because double2 is so natural: “just give me this integral value”. double1, on the other hand says: “I will take it by reference”. Imagine a function call:

        return double2(5);

        Who is interested in knowing that you will have yet another name to refer to the 5? I just want to double it. Now, ‘const’ means “I will not try to change it, so pas me literals and temporaries as well”. This is “weird” too. I wasn’t asking whether you would be changing it or not, I just wanted to double the number, but now I have to puzzle about whether someone will mutate my value or not. And in fact “int const&” somehow isn’t as much mutation-proof as “int” alone. You can cast away const, but you cannot cast the copy back onto the original.

        Also the change from double1 to double2 only exposes a copy operation (and the destructor). Not any other function. Copy operation is special to that extent that compilers implement it for you for your own classes. Function:

        void fun( YourClass cc );

        Doesn’t expose any function of YourClass. It simply requires a value. Well, I know it is just some loose thoughts.

        Regards.

           (Quote)
  8. Sebastian says:

    It seems you’re going to “blogify” the “RValue Reference 101″ article which is nice because it is currently inaccessible to users who don’t have a trac account at boostpro.

    Cheers! Sebastian

       (Quote)
  9. Maxim Yanchenko says:

    Hi Dave,

    Could you please comment on this note in the Standard (12.8/15) “when a temporary class object that has not been bound to a reference (12.2) …” Why there is this “has not been bound to a reference” constraint? Consider this (no idea how to get syntax-highlighted code here, a “how to” link would be great):

    struct string
    {
        const char* str;
        string(const char* str) : str(str) { puts("constructor"); }
        string(const string& other) : str(other.str) { puts("copy constructor"); }
    };
    struct holder_1 { string s; holder_1(const string& s) : s(s) {} };
    int main()
    {
        holder_1("some string");
    }
    

    Due to the constraint, holder_1::s can’t be initialized directly from “some string”, and a temporary string will always be created.

    And this constraint appears to be in the latest draft as well.

    Thanks

       (Quote)
    • Maxim Yanchenko: Consider this (no idea how to get syntax-highlighted code here, a “how to” link would be great):

      Please see the new “posting” tab.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      
      struct string
      {
          const char* str;
          string(const char* str) : str(str) { puts("constructor"); }
          string(const string& other) : str(other.str) { puts("copy constructor"); }
      };
      struct holder_1 { string s; holder_1(const string& s) : s(s) {} };
      int main()
      {
          holder_1("some string");
      }
      Due to the constraint, holder_1::s can’t be initialized directly from “some string”, and a temporary string will always be created.

      I don’t think I agree with your analysis of this example. The type of "some string" is char[12], and it must be converted to a string in order to match the signature of holder_1’s ctor in line 7 long before holder_1::s is initialized. There’s no opportunity to use the ctor in line 4 as far as I can tell. Am I missing something?

         (Quote)
      • Maxim Yanchenko says:

        Well, to me, copyctor elision is (conceptually) something like “get rid of creating a temporary if it’s being used only to initialize an object of the same type” (please correct me right here if I’m wrong)? and the wording in the standard is, well, just wording, which is subject to change (e.g. there are two more allowed cases in the latest C++0x draft comparing to 2003). To do the elision, compiler should look ahead anyway? to check the usage of the temporary. E.g. here:

        string s = "some string";
        The sequence of calls should be string(const char*)->string(const string&), but the compiler “looks a bit further and sees” that the copyctor initializes from a temporary, and gets rid of it (of course, making all necessary checks about copyctor availability).

        I don’t see why it can’t be done here as well, as the compiler has all code, everything is inline etc. It’s an optimization, and the compiler should be clever enough to look ahead (and we know they are in many cases, like link-time optimization). But here it’s simply forbidden by the standard because the temporary has been bound to a reference. That’s why I’m asking why do we have this constraint.

           (Quote)
        • Maxim Yanchenko: It’s an optimization, and the compiler should be clever enough to look ahead (and we know they are in many cases, like link-time optimization). But here it’s simply forbidden by the standard because the temporary has been bound to a

          OK, I understand what you’re asking, though I’m not at all sure that eliminating the text in question would be enough to allow that particular optimization. For what it’s worth, your idea is in a completely different class from today’s copy elision, because the current optimizations only “look a bit further” at the call site, and don’t require looking inside the callee as yours does, which in principle is possible though it may be too late by link time, practically speaking (too much high-level information missing by then).

          I don’t really know why we have that constraint (“see D&E” is my stock answer), but if I had to guess I’d say it was there to prevent lvalues from being mutated in scenarios like

          1
          2
          3
          4
          5
          
          X produce();
          int consume(X);
           
          X& a = produce();
          int b = consume(a);

          If line 5 mutated a that would be pretty surprising.

             (Quote)
          • Rodrigo says:

            I read somewhere that it is because the committee agreed (for C++98) that some use-cases exist where monitoring the number of copies is useful. I’m completely opposed to this; the programmer that needs to monitor the number of copies created also needs to know the strange cases where the copy is allowed to be avoided.

               (Quote)
            • Rodrigo: I read somewhere that it is because the committee agreed (for C++98) that some use-cases exist where monitoring the number of copies is useful.

              I would be very surprised if the particular optimization Maxim was asking for was ever considered, and even more surprised if it were ruled out on those grounds. That seems completely inconsistent with the intent of copy elision.

                 (Quote)
              • Rodrigo says:

                Yes sorry Dave, I misread it.

                However, Alex Stepanov in his notes and his latest book smartly points that if we could force construction, copying and equality to retain their expected semantics, the compiler could apply that optimization.

                The committee, prefering freedom, didn’t enforce any semantics. It means that

                struct s { s(const char*); s(const s&); };

                bool operator==(const s&, const s&);

                s a1 = “123″; s a2 = a1;

                (a1 == a2); // could be false, the programmer could rely on it being false!

                then, we simply aren’t allowed to rewrite s a2 = a1; to s a2 = “123″;

                   (Quote)
                • I think you mean “the programmer couldn’t rely on it being true!” In any case, yes, Elements of Programming and regular types will definitely be topics in upcoming articles.

                     (Quote)
                • Maxim Yanchenko says:
                  Rodrigo: s a1 = “123″; s a2 = a1;

                  a1 is not a temporary here, so it doesn’t apply. Here is my conceptual understanding for the copy elision:

                  “If a temporary object a1 is only used to initialize another object a2 of the same type, a1 can be eliminated and a2 can be directly initialized using a1’s initializer”

                  In your example a1 is also used later in comparison (and it’s not a temporary object at all), so it can’t be eliminated.

                  Rodrigo, Dave, do you agree with this conceptual definition for the copy elision?

                     (Quote)
                  • Rodrigo says:

                    In any case yes.

                    One thing I hate from C++ is the copy-ellision rules. It would be easier if the problem you point were surely caused by a weak compiler instead of a C++ rule.

                    While I’m not completely sure if your optimization is currently allowed in C++, for me it makes sense.

                    Btw my point was that a copy from an object could be different from an object created using the same initializer from the source.

                       (Quote)
          • Maxim Yanchenko says:
            Dave Abrahams:it may be too late by link time, practically speaking (too much high-level information missing by then).

            Well, I mentioned link time just to emphasize the power of present day optimizers. For the code in question, all analysis can be done in compile time. It’s optimization, which is never mandatory, so we should be ready for the cases when it doesn’t work (e.g. it falls to link time).

            Dave Abrahams: If line 5 mutated a that would be pretty surprising.

            It looks like you have meant const X& a = produce();, right? Something strange with the markup.

            Well (all the latter is not strictly according to the standard definitions), a is not a “real temporary” here: the object returned by produce is, but when you bind it to a reference that extends its lifetime, it’s not a temporary anymore in terms of elision, as you explicitly say “I want this object to live beyond the full expression it was created in”. OTOH, it applies to all other references as well (e.g. references in parameters), so probably they should be also subject to elision.

            As long as a is used (and only used) to initialize a temporary (I believe this is the necessary precondition for the elision) in the consume’s parameter, I see no problem with the a optimized out.

            But this should probably also mean that we don’t rely on the destructor of X (obvious usage is the Guard idiom).

            I need to think more about it, I still don’t have clear picture.

               (Quote)
          • litb says:

            I think there is another version of that. Consider:

            void f() {
              try {
                prolly_failing();
              } catch(Exception &e) {
                Exception e1(e);
                e1.modify();
                  // oops, e is modified potentially
              }
            }

            Since exception objects are temporaries too, the restriction about reference binding makes the irritating case of above not possible.

               (Quote)
          • litb says:

            Hi there. A similar example where we would mutate a temporary that has got a name:

            void f() {
              try {
                dangerous();
              } catch(Exception &e) {
                // (seemingly) copy it then (seemingly) modify 
                // only the copy
                Exception copy(e);
                copy.enjoy();
                throw; 
              }
            }

            We would accidentally modify the exception object. I think the restriction of the reference binding will generally make it so we can’t refer to the temporary object a second time.

               (Quote)
  10. Joel Falcou says:
    Dave Abrahams It looks vaguely as though you’re trying to show something about const vs. non-const member functions, but that distinction doesn’t make any difference to copy elision, so maybe I’m misunderstanding.

    Well I was trying to have a function like the sort in the article and see what happens when I call it. The memory allocation is here so i have costly things going on when copy ellision is not done.

    Basically, is there a 2-3 classes examples that when compile tells : look here ellision, there no ellision.

       (Quote)
  11. Joel Falcou says:

    Ok. So I wanted to see if I can “checj” this things work with my current g++. So, I wrote that : http://codepad.org/craBkxXL

    The output is, for gcc 4.3 : non-const call A new : 4 A delete : 6 A copy : 3

    non-const call B new : 4 B delete : 6 B copy : 3

    The non-const call is there to “emulate” the sort function from the article.

    So, does this means the copy-stuff is done or do I don’t call the proper thing and, hence, not trigger the mecanism or is gcc 4.3 not copy-ellision aware (which I doubt) ?

    I think I’m just not doing the correct thing to check this. So how should I butcher this so I can validate the use of copy ellision and show this to unbeliever co-worker ?

       (Quote)
    • Joel, it’s a little hard to tell what you’re trying to demonstrate with this example. It looks vaguely as though you’re trying to show something about const vs. non-const member functions, but that distinction doesn’t make any difference to copy elision, so maybe I’m misunderstanding. Try cutting it down to the absolute minimum (e.g. remove all that memory allocation stuff) and if you’re testing several different things, separate those tests as well.

         (Quote)
  12. Thomas Petit says:

    Very interesting article !

    Sure, copy elision really mess with our C++ programmer’s “common sense”. It’s quite surprising to find a better way of writing assignment operator after all these years. There is certainly a lot of textbook to correct. :)

    However, exploiting copy elision further than that, especially for return value, seems a bit fragile to me : 1) It’s hard to check if RVOs really take place, unless adding I/O in constructors. 2) Turn on debug mode and all these nice copy elisions disappear. (at least with MSVC)

    “we have all the background we need to attack move semantics, rvalue references, perfect forwarding, and more as we continue this article series. See you soon!”

    Great ! I’m really looking forward to seeing that. There is a lot of resource on move semantic out there, but it’s still very confusing to me. For example, I’ve yet to see a straightforward explanation of what is the correct and efficient way of writing functions like “sorted” in presence of move semantic.

    std::vector//&& ? 
    sorted(std::vector/&& ?/ names)
    {
        std::sort(names);
        return /std::move ?/names;
    }
    

       (Quote)
    • Thomas Petit: However, exploiting copy elision further than that, especially for return value, seems a bit fragile to me : 1) It’s hard to check if RVOs really take place, unless adding I/O in constructors.

      Any side-effect will do; you can increment a counter.

      2) Turn on debug mode and all these nice copy elisions disappear. (at least with MSVC)

      I just did a quick test with the MSVC 2010 beta, and for that version you’re half right (elisions of arguments passed by value still happen in debug mode) update: see below . That’s a bit surprising, actually: a copied return value doesn’t help make debugging much easier (especially when argument copies are still getting elided), is likely to be misleading for those people looking for release-mode performance, could actually make debug-mode performance unacceptable even for testing, and it doesn’t take significantly more compile-time resources to do the elision. Once you implement RVO it seems like more work to leave a branch in the compiler where it’s disabled.

      It is true that copy elision isn’t guaranteed by the standard, so vendors are free not to implement it, or to turn it off depending on compiler options. But is “fragile” really the right word? It’s not as though any vendor who implements copy elision can afford to break it in their next release. Also, given the lack of standard guarantees, I don’t see why you’d be more worried about exploiting return value elisions than argument copy elisions.

         (Quote)
  13. Hi Dave,

    Thanks for the great article. I however have a slightly deeper question regarding rvalues, move semantics, and copy elision:

    How do you ensure that the object passed by value is definitely copied and the copy is not elided by the compiler? Also, how then do you make sure that you really do have a copied object instead of a move-constructed object as a function parameter?

    void foo(T t); // how to make sure t is copy constructed, instead of move-constructed?

       (Quote)
    • Dean Michael Berris: How do you ensure that the object passed by value is definitely copied and the copy is not elided by the compiler?

      If you want to be sure to avoid copy elision (why would you want to do that?) then you need to pass an lvalue. Copies of lvalue function arguments are never elided.

      Also, how then do you make sure that you really do have a copied object instead of a move-constructed object as a function parameter? void foo(T t); // how to make sure t is copy constructed, instead of move-constructed?

      Well, we haven’t even touched move construction yet, but as long as you’re bringing it up here, again I wonder why you’d want to do that? The answer is the same: pass an lvalue.

         (Quote)
  14. Very interesting article. I tend to “const &” by default and this article reminds me that this is not as optimal as I thought.

       (Quote)
  15. Joel Falcou says:

    I’m intrigued on how we can check, for a given compiler at had, if it’s actually doing copy ellision and generate the proper code when we’re using such idioms. I guess adding I/O in the constructor don’t help as it’ll force the ctor to be called. Do you just check the assmebly output or what ?

    Moreover, does this means that we can completely and forever dump the old pass-by-const-reference & copy operator= ?

       (Quote)
  16. dvi says:

    Could you elaborate further on this paragraph?

    “First, when you pass parameters by reference and copy in the function body, the copy constructor is called from one central location. However, when you pass parameters by value, the compiler generates calls to the copy constructor at the site of each call where lvalue arguments are passed. If the function will be called from many places and code size or locality are serious considerations for your application, it could have a real effect.”

    I don’t quite follow it. Good article though, thanks!

       (Quote)
    • @dvi: first, you’re most welcome, and thanks for asking for clarification; it helps to know when I fail to connect.

      So here, f takes its argument by value, and does whatever it does… but it doesn’t copy a because (copy elision aside) it already has a copy of whatever was actually passed.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      
      void f(X a) { … modify(a);}
      void g()
      {
          X b;
          f(b); // call f with an lvalue
      }
      void h()
      {
          X c;
          f(c);  // call f with an lvalue
      }

      In this case, the compiler has to generate calls to X’s copy constructor in the body of g and h at lines 5 and 10. That’s a total of two calls. Probably not a big deal, but in some embedded applications, for example, there’s limited space available for code.

      Now compare with what happens when f takes its argument by reference and copies it:

      void f(X const&amp; r)
      {
          X a(r);    // explicit copy
          …
          modify(a);}

      Now there’s just one call to X’s copy constructor, in the body of f. The exact same definitions of g and h still work, but calling f no longer involves copying at the call site.

      Does that help?

         (Quote)

Leave a Comment (post replies using links below individual comments)

Spam Protection by WP-SpamFree

Subscribe without commenting