This is the third article in a series about efficient value types in C++. In the previous installment, we introduced C++0x rvalue references, described how to build a movable type, and showed how to explicitly take advantage of that movability. Now we’ll look at another opportunity for move optimization and explore some new areas of the move landscape.
Resurrecting an Rvalue
Before we can discuss our next optimization, you need to know that an unnamed rvalue reference is an rvalue, but a named rvalue reference is an lvalue. I’ll write that again so you can let it sink in:
Important
A named rvalue reference is an lvalueI realize that’s counterintuitive, but consider the following:
int g(X const&); // logically non-mutating int g(X&&); // ditto, but moves from rvalues int f(X&& a) { g(a); g(a); }
if a were treated as an rvalue inside f, the first call to g
would move from a, and the second would see a modified a. That is
not just counter-intuitive; it violates the guarantee that calling g
doesn’t visibly modify anything. So a named rvalue reference is just
like any other reference, and only unnamed rvalue references are
treated specially. To give the second call to g a chance to move,
we’d have to rewrite f as follows:
#include <utility> // for std::move int f(X&& a) { g(a); g( std::move(a) ); } |
Recall that std::move doesn’t itself do any moving. It merely
converts its argument into an unnamed rvalue reference so that move
optimizations can kick in.
Binary Operators
Move semantics can be especially great for optimizing the use of binary operators. Consider the following code:
class Matrix { … std::vector<double> storage; }; Matrix operator+(Matrix const& left, Matrix const& right) { Matrix result(left); result += right; // delegates to += return result; } Matrix a, b, c, d; … Matrix x = a + b + c + d;
The Matrix copy constructor gets invoked every time operator+ is
called, to create result. Therefore, even if RVO elides the copy of
result when it is returned, the expression above makes three
Matrix copies (one for each + in the expression), each of which
constructs a large vector. Copy elision allows one of these result
matrices to be the same object as x, but the other two will need to
be destroyed, which adds further expense.
Now, it is possible to write operator+ so that it does better on our
expression, even in C++03:
// Guess that the first argument is more likely to be an rvalue Matrix operator+(Matrix x, Matrix const& y) { x += y; // x was passed by value, so steal its vector Matrix temp; // Compiler cannot RVO x, so swap(x, temp); // make a new Matrix and swap return temp; } Matrix x = a + b + c + d;
A compiler that elides copies wherever possible will do a near-optimal
job with that implementation, making only one temporary and moving its
contents directly into x. However, aside from being ugly, it’s easy
to foil our optimization:
Matrix x = a + (b + (c + d));
This is actually worse than we’d have done with a naive
implementation: now the rvalues always appear on the right-hand side
of the + operator, and are copied explicitly. Lvalues always appear
on the left-hand side, but are passed by value, and thus are copied
implicitly with no hope of elision, so we make six expensive copies.
With rvalue references, though, we can do a reliably optimal1 job by adding overloads to the original implementation:
// The "usual implementation" Matrix operator+(Matrix const& x, Matrix const& y) { Matrix temp = x; temp += y; return temp; } // --- Handle rvalues --- Matrix operator+(Matrix&& temp, const Matrix& y) { temp += y; return std::move(temp); } Matrix operator+(const Matrix& x, Matrix&& temp) { temp += x; return std::move(temp); } Matrix operator+(Matrix&& temp, Matrix&& y) { temp += y; return std::move(temp); }
Move-Only Types
Some types really shouldn’t be copied, but passing them by value,
returning them from functions, and storing them in containers makes
perfect sense. One example you might be familiar with is
std::auto_ptr<T>: you can invoke its copy constructor, but that
doesn’t produce a copy. Instead… it moves! Now, moving from an
lvalue with copy syntax is even worse for equational reasoning than
reference semantics is. What would it mean to sort a container of
auto_ptrs if copying a value out of the container altered the
original sequence?
Because of these issues, the original standard explicitly outlawed the
use of auto_ptr in standard containers, and it has been deprecated
in C++0x. Instead, we have a new type of smart pointer that can’t be
copied, but can still move:
template <class T> struct unique_ptr { private: unique_ptr(const unique_ptr& p); unique_ptr& operator=(const unique_ptr& p); public: unique_ptr(unique_ptr&& p) : ptr_(p.ptr_) { p.ptr_ = 0; } unique_ptr& operator=(unique_ptr&& p) { delete ptr_; ptr_ = p.ptr_; p.ptr_ = 0; return *this; } private: T* ptr_; };
A unique_ptr can be placed in a standard container and can do all
the things auto_ptr can do, except implicitly move from an
lvalue. If you want to move from an lvalue, you simply pass it
through std::move:
int f(std::unique_ptr<T>); // accepts a move-only type by value unique_ptr<T> x; // has a name so it's an lvalue int a = f( x ); // error! (requires a copy of x) int b = f( std::move(x) ); // OK, explicit move
Other types that will be move-only in C++0x include stream types, threads and locks (from new mulithreading support), and any standard container holding move-only types.
C++Next Up
There’s still lots to cover. Among other topics in this series, we’ll touch on exception safety, move assignment (again), perfect forwarding, and how to move in C++03. Stay tuned!
Please follow this link to the next installment.
-
Technically, you can do still better with expression templates, by delaying evaluation of the whole expression until assignment and adding all the matrices “in parallel,” making only one pass over the result. It would be interesting to know if there is a problem that has a truly optimal solution with rvalue references; one that can’t be improved upon by expression templates. ↩
- Want Speed? Pass by Value.
- Making Your Next Move
- Your Next Assignment...
- Exceptionally Moving!
- Onward, Forward!

I’m a bit confused with one of the examples. You write that when operator+ is declared like this: Matrix operator+(Matrix x, Matrix const& y) and the following is evaluated: Matrix x = a + (b + (c + d)); “the rvalues always appear on the right-hand side of the + operator, and are copied explicitly.” Why is this? You earlier wrote that rvalues bind to const references, so why would they need to be copied? Or is this the effect of parenthesis? Do the parenthesis force a non-temporary to be created?
I was playing around with RValues for a good explanation what they are. Your article series helps me a lot!
I wanted to figure out an example to explain RValue References and was starting an example with the code (similar to your vector factory):
struct BigThing { explicit BigThing(int i); BigThing(BigThing&&); //move BigThing(BigThing&) = deleted; //no copy }; BigThing create_big_thing(int i) { BigThing ver1(i/2); // prevent construction BigThing ver2(i*3+1); //... in place of 'bt' return i%2==0 ? ver1 : ver2; } int main() { BigThing bt = create_big_thing(66); }I was a bit surprised that the compiler accepted that, but pleased. Move is better then Copy. I looked up [class.copy] and found in paragraph 20 the explanation of “overload resolution with an rvalue first”. Great — so it’s safe to delete Copy but implement Move to force high-perf code only.
But then I remembered your example and Howards explanation:
Matrix operator+(Matrix&& temp, Matrix&& y) { temp += y; return std::move(temp); }Here the “move” is necessary because
ymust be treated as an lvalue inside the function. Ok, I understood that. But whats the difference? Ah, paragraph 19: “[only] when the expression is the name of an automatic object”. That fits,yis not an automatic object.Ok, complicated, but I think I got it. I just want to make sure here, I don’t want to use examples and doing the explanations wrong.
Did I get that right?
I think you got it!
Surely it would be good to specify just 1 (or at worst 2) of the 4 “commutative binary operation” implementations and get an equivalent outcome. Is there a preprocessor macro that manages?
I also agree that it’s too bad that the compiler won’t put move(a) around the last use of a for me.
Not breaking old code is a nice ideal, but in my code I use similar to:
string const& b = "1"+string("2");And apparently that will no longer work, so it’s not like old code is actually sacred.
Hi Dave! You wrote the following C++03 code in your article:
Matrix operator+(Matrix x, Matrix const& y) { x += y; // x was passed by value, so steal its vector Matrix temp; // Compiler cannot RVO x, so swap(x, temp); // make a new Matrix and swap return temp; }So I tested the following, using Microsoft Visual C++ 2008 SP1, in debug mode:
Matrix a; Matrix b = Matrix() + a;Unfortunately, it does call the copy-constructor, because – as you reported – MSVC does no NRVO in debug mode.
MSVC does perform copy elision, even in debug mode, when the test is compiled with a slightly different operator+ implementation:
Matrix operator+(Matrix x, Matrix const& y) { x += y; return Matrix(&x); // Moving constructor! }The class would of course need an extra constructor, in order to get this working:
// Moves an lvalue Matrix into this object. explicit Matrix(Matrix* arg) { assert(arg != NULL); storage.swap(arg->storage); }The signature of this “explicit” constructor may look a bit unusual, but I think it would be more risky to pass the argument by reference instead (as it would look like a copy-constructor). I know it doesn’t look extremely elegant, but do you agree that this approach is a reasonable alternative to your C++03 operator+ (given that particular compiler limitation)?
A page saying something like: “your message looks a bit spammy, try again”, and the message doesn’t appear. The “quote” functionality also gave an invalid html to begin with (a closing div marker without any opening one). But that’s ok, I managed to post eventually
I see. I’ll have to check the status of the side-effect-free marker proposals…
? I don’t understand how that is going to allow something like an in-place change of sign, but that’s probably not what you are saying.
Sure Joel,
the refed ASTs through expression templates works very well. But they don’t expand an expression, they store it so that they can element-wise evaluate it as it was given. By expanding here I mean real algebraic expansion (i.e. 2(a+b)+c = 2a+2b+c), so that the “rvalue” pattern can work with an approach similar to Dave’s in an optimised manner. In this case the above would be evaluated differently than ASTs. I just wonder how fast you can go with that approach (hence the “write” and “read” comment on the bottom of my post).
A problem that you can solve with rvalue refs vs straight expression templates is for example the temporary dumping in cases like: A = inv(B) + C;
I see. The A = inv(B) + C is indeed somethign rvalue may solve. Have to dig into that
Isn’t it what building a ref-enabled AST using expression template and delay the evaluation all together at the end of the assignement do ? I do that for years and performance were always optimal and vectorization or other HPC pattern was trivially added.
The matrix example is exactly what I am currently working on! so I will testify my bit of experience:
I have tried an implementation similar to Dave Abraham’s and it indeed works nicely, it although has a few problems as you would expect: Let’s assume that you have a multiplication operator in the spirit of the addition operator and you want to evaluate the expression:
What happens here is that you will get two temporaries, one from 2a and one from 3b, one of which is unnecessarily computed.
To counter this I tested an approach of using expression templates (that are move-evaluated for whole matrices, not just elements) with rvalue references for the multiplication and it goes around the problem, but doesn’t entirely eliminate it.
I am now working on also providing the same for the addition and I am soberly optimistic that rvalue tehchniques can provide near-optimal performance, but there are challenges yet to come. Ideally for this approach to work,a meta-programming pattern that can expand expressions would be the optimum. Some help from the compiler would also do towards that (in the very far future). One thing I noticed is that enabling vectorisation on matrix operations under those principles is trivial and provides significant boost over straight expression template implementations. Another feeling I get is that although expression templates are “write” optimized (in terms of cache locality) the approach with the rvalues appears “read” optimized and their hybrid appears trying to do them both.
Thanks for a nice article again!
Hi Dave, good to see another “moving” article of yours! Is it really necessary (or otherwise useful) to have the std::move in there, when you do return std::move(temp), inside the implementations of your operator+ examples?
Yes, it really is necessary. Since
tempdoesn’t refer to a local value with automatic storage duration, there’s no implicit cast to rvalue when it is returned. However, you make a good point, that when the lvalue’s declared type is an rvalue reference, I think we could safely implicitly cast back to rvalue reference upon return. I’m just hedging a tiny bit because history is littered with the failures of those who hastily “improved” on details of the rvalue reference proposal. Let’s see what Howard Hinnant thinks.Ah, yes, now I remember. Andrei Alexandrescu proposed something similar on comp.std.c++ but Niklas Matthies demonstrated why it was a bad idea. This is subtle territory.
Hmm, the new rule that lvalues never bind implicitly to rvalue references may change the picture a bit. I’ll have to think this over.
Do you think it makes sense to treat local named rvalue references as rvalues — just like other local automatic objects — for the purpose of returning from a function? Now, after N2812 and N2844, such a reference really only refers to a temporary (or some object we don’t care about anymore) and it seems it’s worth reconsidering the rules. Rvalue references act like non-reference types in the sense that they logically refer to a unique object. Implementation-wise there’s already little difference between functions that take objects of nontrivial classes by value and by rvalue-reference. In both cases the name refers to a unique object and the parameter is probably implemented as pointer at the lower level. The only difference is that rvalue references can only bind to rvalues whereas lvalues are copied in a pass-by-value case. So, why are we still excluding local rvalue references in treating local things as rvalue while returning them? I havn’t yet come up with a sensible example where this treatment would break things. If there really are none I’d prefer to make
Matrix operator+(Matrix && l, Matrix const& r) { l += m; return l; }as efficient as
Matrix operator+(Matrix && l, Matrix const& r) { l += m; return std::move(l); }Cheers, Sebastian
Heh, great minds think alike! See my reply to Niels Dekker.
Thanks for continuing with these articles. Reading this causes a lot of “why?”. Why can’t the compiler automatically add std::move to the last use of an lvalue? When and why use std::move in a return statement? The near optimal version for Matrix with a swap looks highly artificial, why can’t the RVO be cleverer? Is a=std::move(a); legal? The answers are not that hard, but it still takes a bit of time thinking about it.
Mainly because this would have been dangerous before a very recent change we had to make to the langauge, and since then, no one has implemented, gained experience with, and proposed this change. It takes a lot of time and work to push something through the standardization process.
Use std::move when the argument is not eligible for RVO, and you do want to move from it.
It is advisable to allow self-move assignment in only very limited circumstances. Namely when “a” has already been moved from (is resourceless). It is my hope that the move assignment operator need not go to the trouble or expense of checking for self move assignment:
http://home.roadrunner.com/~hinnant/issue_review/lwg-active.html#1204
I.e. A::operator(A&& a) should be able to assume that “a” really does refer to a temporary.
Howard, thanks for chiming in! For anyone who doesn’t know, Howard and I co-authored many of the early rvalue reference proposals along with Peter Dimov, and Howard is responsible for most of the follow-through with standardese and shepherding the proposals through the committee. He’s probably also the person in the world with the most experience actually using rvalue references in the field. In other words, he knows whereof he speaks
!
You’re welcome. I’ve been hoping people would ask some of these questions. While it sounds like you may have the answers all figured out, I’ll write my answers for everyone else’s benefit.
In general, the answer is that it could break existing code—see the scopeguard example in this posting by Niklas Matthies. If I was considering the design of a new language focused on value semantics (let’s call it V++), I might consider loosening those rules and using an explicit construct for scopeguard-ish things instead, but I would want to be sure not to break cases like this one:
struct X { … }; struct Y { Y(X& x) : x_(x) {} ~Y() { do_something_with(x_); } X& x_; }; void f() { X a; Y b(a); … }No matter what else happens,
Y::~Y()should not encounter anx_that has been implicitly moved from.When you need to move from an lvalue that doesn’t name a local value with automatic storage duration.
The need for the swap is explained here.
It’s legal, but not a good idea. Unlike with copy assignment, it’s fairly hard to manufacture a case where a self-move-assignment occurs by mistake, and move assignment is often so fast already that an extra test-and-branch to handle self-assignment can account for a significant fraction of its overall cost, so it’s probably a better practice not to do it. I’ll have more to say about this in the series’ next article.
(Argh the comment software is painful, and I was rejected as a spammer this morning. I’ll try a simpler form)
Oh, I don’t actually.
About allowing the compiler to automatically add std::move to the last use of an lvalue, I meant only allowing it when the compiler can determine that this is safe (no other reference). Although it is possibly too hard for the compiler to prove in many cases.
So basically there will be std::move in the return statement if we are returning an rvalue reference argument. Although from the previous article, it looks like
{ Matrix temp = x; temp += y; return std::move(temp); }is harmlessly longer thanreturn temp;.About the need for swap, if I understand correctly, compilers could actually do the proper optimizations without the swap (if they inline enough of the code to see all the copies for instance). That’s cool (even if they don’t currently do it).
The reason I asked about
a=std::move(a);is that I am wondering what happens toa=-a;. To avoid a useless copy and without expression templates, I could write-std::move(a), but that will only work for types that have a move operator-, hence the idea ofa=-std::move(a);(or something similar with swap). Howard’s link to 1204 shows this is problematic at best. I guess I’ll have to settle for a useless copy, or an overloaded negate function (not as nice a syntax asa=-a;).I forgot one question: what kind of function can have an rvalue reference as its return type?
Thanks to Howard and you for your answers. Rvalue references are really crucial in C++0X and getting users to understand how to use them properly as early as possible is great.
Oh, very strange. What was the symptom? I don’t see another comment from you in the spam bucket.
Almost impossible, actually. The required ordering of side-effects may not be possible to determine just by following pointers. Think mutexes and locks.
Among other cases, yes
That would certainly be true had I written that anywhere
Yes, it’s an allowed optimization.
…or with an
operator-taking its argument by value, with copy elision, I think.std::moveis one example, but in general I think Howard advises against it, IIRC. Howard?You’re very welcome!
My general answer to this is “yes” – here I and Howard disagree.
Since this article series is intended to be a discussion starter, it might be a good idea if somebody actually tried to defend the strict position that “a=std::move(a)” is not legal for a general “a”. This is different from Howard’s relaxed position “It is advisable to allow self-move assignment in only very limited circumstances. Namely when “a” has already been moved from (is resourceless).”, because general “a” refers to the type of “a”, not the circumstances where “a” occurs.
One reason for Howard’s relaxed position seems to be the swap example:
template <class T> swap(T& a, T& b) { T tmp(std::move(a)); a = std::move(b); b = std::move(tmp); }Since “a” and “b” might actually refer to the same object, the strict position must consider this code as buggy, even so it is quite unlikely that this code will ever go wrong in practice.
However, it’s easy to fix the swap example:
template <class T> swap(T& a, T& b) { if (&a != &b) { T tmp(std::move(a)); a = std::move(b); b = std::move(tmp); } }The fixed version of the swap example will probably not incur any noticeable slowdown in practice. And even a normally skilled developer can verify that the fixed swap example doesn’t do any self-move-assignment. This is important, because the argument against forbidding self-move-assignment was that it might be difficult to verify the absence of self-move-assignment.
Forbidding self-move-assignment is a prerequisite for more general simplifications like: “If a function argument binds to an rvalue reference parameter, the C++ standard library may assume that this parameter is a unique reference to this argument.”
Could you elaborate on the very last statement of your posting? I don’t get it.
Thanks!
If self-move assignment would be allowed, no move-assignment operator of the C++ standard library would not be able to assume that the (rhs) parameter is a unique reference to the (rhs) argument.
Of course you can invent twists to get around this simple logic conclusion. I reread
http://home.roadrunner.com/~hinnant/issue_review/lwg-active.html#1204
And indeed, it uses a twist to get around my simple logic conclusion by writing: “Each of the following statements applies to all arguments to functions defined in the C++ standard library, UNLESS EXPLICITLY STATED OTHERWISE.”
So, IIUC, you’re saying that the proposed language allows for some standard functions, which state otherwise, to accomodate self-move-assignment?
If that’s what you’re saying, I guess I don’t see how it relates to your earlier point. Or maybe I didn’t get your earlier point.
No, I say that I overlooked the twist in Howard’s text when I first read it. I don’t know how Howard intents to use the back door he left by using proposed language, but it could clearly be used to allow self-move-assignment under certain circumstances (i.e. when the moved from object is empty).
My initial point was that somebody should defend the strict position that “a=std::move(a)” is not legal for a general “a”. I don’t think that this position is any more wrong or right than the opposite position that “a=std::move(a)” is legal for a general “a”.
From my point of view, it is a question of responsibility assignment. Does the caller has to ensure that “std::move(a)” can be treated like a true temporary object? Or does the called function has to ensure that nothing goes wrong? I think the advantages of clear responsibilities are more important than some lost optimization opportunities.
You may think that the unfixed swap example will always work in practice, but what about a class that actually asserts on self-move-assignment, even if the moved from object is empty? Sure this would be easy to fix by excluding the empty case from the assertion, but the question would remain, whether it is the calling function that is buggy or the called function. Is the called function allowed to assert at all, even for a self-move-assignment from a non-empty object? The caller might use an old std algorithm with a move_iterator and don’t care about the behavior in case of self-move-assignment, as long as the program doesn’t crash (or stops on an assert).
The interesting thing about this is that it’s not some boost-library that has to write the move member functions, but me as the individual programmer. Adding support for automatic generation of swap member functions to the c++ language was always a no-go, but now that the move member functions get so twisted, adding automatic generation for them to the c++ language is suddenly welcomed as a good idea. The funny thing is that automatically generated swap member functions would probably be much less error prone than automatically generated move member functions.
Sounds like that somebody might be you
Agreed
That’s a very good point. Of course, saying that doesn’t help us choose one answer or the other. So we’re still left with efficiency and resilience as the two useful criteria.
I don’t really understand anything you’re saying after this part (especially not your use of the word “twisted”), but the following gives me the sense you think the committee is being somehow inconsistent in supporting automatic move generation.
Dave Abrahams: Of course, saying that doesn’t help us choose one answer or the other. So we’re still left with efficiency and resilience as the two useful criteria.
If it would just be about efficiency and resilience, I would go for resilience (because a simple check for self-assignment won’t loose too much efficiency anyway). What I don’t like about the resilience option is that it might provoke even more non-trivial boilerplate code in my classes, if I’m unwilling to give up even more efficiency (by implementing move assignment with a call to the move constructor followed by a call to swap).
The other part of the problem is to formulate the decision in precise words. And it would be nice if even the non-native speaking reader…
But one final honest word: Move support will soon be widely available, the currently agreed upon interpretation of moving is quite different from ‘“std::move(a)” can be treated like a true temporary object’, so going with the resilient option and allow self-move-assignment would at least be consequent.
I’m not a native speaker, so I don’t know about the connotations of “twisted”. I used it with the following meaning in mind:
“twist” = “trick”
“twisted” = “not completely trivial”
The committee is consistent in supporting automatic move generation, because the copy constructor and assignment operator also get generated automatically. And it will help indeed for the problems that it tries to address.
I’m just a bit unhappy that moving turns into something significantly more complicated than swapping. And I have the impression that moving will inherit some of the problems swapping has:
(a) “swap” should be relatively cheap and don’t throw, otherwise it will be much less useful.
(b) A reasonable swap can’t be added to an existing class without modifying the source code of the class definition.
(c) calling “swap” with a pointer to a derived object is a really bad idea (worse than slicing), but it is impossible to forbid it.
Problems (a) is already actively being discussed for moving, and the automatic move generation tries to address (b). With respect to problem (c), someone once told me that it’s a good idea to only inherit from abstract classes. So as long as the abstract classes take care to declare the swap member function as protected, everything is fine. I don’t know whether the automatically generated move member functions will take care of this detail, or whether such details are important after all.
I think move is fundamentally more complicated because it is asymmetric.
To avoid move problems, we can be less restrictive than requiring inheritance only of abstract classes in OO hierarchies. Scott Meyers’ advice (Effective C++) that bases not have public copy operations is sufficient if it’s extended to move operations (including swap).
Another case to consider is when public inheritance is used to compose value types (not for polymorphism). Here slicing is possible, and perhaps occasionally desirable. Move operations seem to work naturally here too. ‘Move slicing’ might steal one resource but leave another to be cleaned up by the original object.
Great article! Boost rocks.
Thanks, Chila, and welcome!