Your Next Assignment…

This entry is part of a series, RValue References: Moving Forward»

This is the fourth article in a series about efficient value types in C++. In the previous installment, we discussed how to deal with rvalue reference function paramters and introduced move-only types. Here, we’ll revisit move assignment and see how to write it both correctly and efficiently.

There’s a subtle problem hiding in the vector move assignment implementation we showed in this series’ second installment. Take this code for example:

mutex m1, m2;
std::vector<shared_ptr<lock> > v1, v2;
v1.push_back(shared_ptr<lock>(new lock(m1)));
v2.push_back(shared_ptr<lock>(new lock(m2)));
v2 = v1;              // #1
…
v2 = std::move(v1);   // #2 - Done with v1 now, so I can move it

Assignment #1 above releases all the locks held exclusively by v2 (and makes v2 co-owner of everything v1 owns). Assignment #2, however, has no immediate effect other than swapping lock ownership. In case #2, the locks originally held by v2 aren’t released until v1 goes out of scope, which could be much, much later. Getting locking and unlocking order right can make the difference between a correct multithreaded program and a deadlock, so this is a serious issue, and our previous description description of the responsibilities of move assignment needs to be amended:

The Semantics of Move Assignment

A move assignment operator “steals” the value of its argument, leaving that argument in a destructible and assignable state, and preserves any user-visible side-effects on the left-hand-side.

Armed with that guideline, we can now correct our implementation of move assignment for std::vector:

vector& operator=(vector&& rhs)
{ 
    this->clear();
    std::swap(*this, rhs);
    return *this;
}

It turns out that in practice, the eager clear() tends to have no effect (and thus no cost) because a target of move assignment is often already empty, having itself just been the source object in an earlier move. That is certainly true of most of the standard algorithms and of the insertion sort we’ve been using as an example. However, adding the clear() saves our butts when move assigning into a nonempty left-hand side.

Canonical Assignment?

As noted in the previous article, there is a pretty good “canonical implementation” of copy assignment based on a copy construction and a swap:

T& operator=(T rhs) { swap(*this, rhs); return *this; }

Among its nice properties, it:

  • is easy to write correctly
  • hugely reduces complexity compared with “doing it manually”
  • takes advantage of copy elision
  • offers the strong guarantee if swap is non-throwing

Provided you’ve implemented move construction and a cheap, non-throwing swap, the above also looks like a decent move assignment operator. An rvalue argument will be move-constructed into x and then swapped into *this. Slick! If you’ve used the canonical copy assignment operator, you might not need to write a move assignment operator at all.

That said, even back in the C++03 world, the “canonical implementation” often copies from lvalues too eagerly. In the case of std::vector, there may be enough capacity on the left-hand side of the assignment operator to simply destroy the existing elements and copy all the elements from the right-hand side, thus avoiding an expensive allocation and, if the source vector is large, a serious spike in memory usage.

So, std::vector uses a more economical assignment operator with the signature vector& operator=(vector const&), which allows it to delay copying lvalue arguments until it can be sure a copy is needed. Unfortunately, if we also tried to keep the canonical copy assignment signature around for use with rvalue arguments, the overloads would be ambiguous. Instead, we’d need to resort to something like this, which is equivalent but moves the creation of the temporary inside the assignment operator:

1
2
3
4
5
6
vector& operator=(vector&& rhs)
{
    vector(std::move(rhs))
      .swap(*this);
    return *this;
}

The above actually does seem to be a generic implementation of the semantics of move assignment, but there’s another problem here. Let’s count the total cost of the operations:

  • line 3: three loads and four to six stores to memory (depending on implementation)
  • line 4: six loads and six stores to memory
  • line 6: destruction of original contents of *this

Now compare with the “clear, then swap” implementation:

1
2
3
4
5
6
vector& operator=(vector&& rhs)
{ 
    this->clear();
    std::swap(*this, rhs);
    return *this;
}
  • line 3: destruction of original contents of *this
  • line 4: six loads and six stores to memory

Recall that earlier we noted that the left-hand side of many (perhaps even most) move assignments is an object that has just been moved from. In that common case, the cost of destroying the original contents of *this—whether via clear or via the destruction of a temporary—is just the cost of a single test and branch. Therefore, the other operations account for most of the cost, and the “clear, then swap” implementation could be nearly twice as fast as the other one.

In fact, we might actually do better. Yes, the swap theoretically allows us to recycle the old capacity of the left-hand side rather than eagerly disposing of it, but when will that actually matter? Only when assigning from an lvalue that’s been std::move‘d, since real rvalues are about to be destroyed. And how often will the left-hand side actually have any capacity to offer? Not often, since it’s usually an object that has just been moved-from. So an implementation like this might actually be the most efficient of all:

vector& operator=(vector&& rhs)
{ 
    if (this->begin_)
    {
        this->destroy_all(); // destroy all elements
        this->deallocate();  // deallocate memory buffer
    }
    this->begin_ = rhs.begin_;
    this->end_ = rhs.end_;
    this->cap_ = rhs.cap_;
    rhs.begin_ = rhs.end_ = rhs.cap_ = 0;
    return *this;
}

Enough Speculation. Show Me Numbers!

Vector Move Assignment Implementations

Vector Move Assignment Implementations

In case you’re wondering how this stuff plays out in practice, I used this test file to probe the additional speedups available not just by implementing move assignment, but by relentlessly optimizing it. The test uses a move-enabled std::rotate on a sequence of simplified std::vectors and boost::arrays. As you can see, for std::vector, the “canonical” move assignment implementation is actually a bit better than the “clear, swap” implementation, but assiduously minimizing the operations involved produces move assignment that’s faster still.

Array move assignment implementations

Array move assignment implementations

For boost::array (and, one would assume, similar structures like boost::tuple), it turns out that using swap is almost three times slower than even the simplest element-by-element move implementation, and with careful optimization we can do better still.

So, the moral of this story is: be wary of formulaic implementations of move operations; remember that move semantics is all about optimization, so move operations tend to be very fast, and a few cycles here and there can make a significant difference.

Coming Up Next

In the series’ next installment, we’ll explore the gnarly world of exception-safe move constructors. Stay tuned…

Posted Monday, September 28th, 2009 under Value Semantics.

24 Responses to “Your Next Assignment…”

  1. Michal Mocny says:

    First, a minor point: calling std::swap(*this, rhs); inside the move assignment operator of the “clear, and swap” will likely lead to a infinite call loop, since the default std::swap is defined using move assignment. I just changed this to this->swap(rhs); for a type I was writing.

    More-so than performance issues, I think that it is too bad that the canonical move assignment operator relies on a specialized definition of swap, since the promise of “if your type is movable, you can just use the default std::swap” was a big benefit. Now its a chicken and egg thing for canonical swap (need swap for move, need move for swap).

    Hopefully most of the time the “= default” move operators will suffice, and when they don’t, hopefully writing a specialized move operator is warranted anyway.

    As a side note, the only other reason I can think of to still write a swap member is to allow swap-with-temporary. So, could swap-with-temporary be replaced with move assignment? T().swap(t) <=?replaceable?=> t = T(); ? The only times I used swap-with-temp before was for non copyable objects or when copy semantics were insufficient (ie, shrinking vector capacity). Seems move assignment covers those basis?

      Quote
    • calling std::swap(*this, rhs); inside the move assignment operator of the “clear, and swap” will likely lead to a infinite call loop, since the default std::swap is defined using move assignment. I just changed this to this->swap(rhs); for a type I was writing.

      Not in this case. You forget that there’s a std::swap overload for std::vector.

      More-so than performance issues, I think that it is too bad that the canonical move assignment operator relies on a specialized definition of swap, since the promise of “if your type is movable, you can just use the default std::swap” was a big benefit. Now its a chicken and egg thing for canonical swap (need swap for move, need move for swap).

      It would make sense for adapting a legacy type that had already implemented swap, but one of the real points of this article is that there really isn’t a suitable “canonical move assignment.”

      Hopefully most of the time the “= default” move operators will suffice, and when they don’t, hopefully writing a specialized move operator is warranted anyway. As a side note, the only other reason I can think of to still write a swap member is to allow swap-with-temporary. So, could swap-with-temporary be replaced with move assignment? T().swap(t) t = T(); ? The only times I used swap-with-temp before was for non copyable objects or when copy semantics were insufficient (ie, shrinking vector capacity). Seems move assignment covers those basis?

      Probably.

        Quote
      • Michal Mocny says:

        Thanks. I agree that there isn’t a canonical move assignment, and just noted that I tried to ignore the performance aspect entirely and came to the same conclusion regardless.

        Thank goodness default move constructor/assignment was added to the (draft) standard.

        Great read, thanks for the series.

          Quote
  2. Scott M says:

    What’s supposed to happen with move-assign-to-self? Like x = std::move(x);

      Quote
    • IIRC Howard’s answer was “don’t do that,” but I could be misremembering

        Quote
      • Sean Parent says:

        I think “don’t do that” is a bad answer – self move, like self assignment, should be a no-op. Otherwise writing permutation algorithms (like sort) get considerably more complex. That’s one attraction to the formulaic swap – it gets self move correct. In your optimal version, I don’t think a check for self move will cost you much.

        Generally, I think it is likely worth the effort to squeeze cycles out of std::vector but I think it is much better to teach the formulaic “pass by value and swap” assignment because it is so much easier to get correct. I’ve come across a lot of code that blows the self-assignment case.

          Quote
  3. Thomas Petit says:
    Dave Abrahams: I don’t remember the details of how the optimization works, but I think it just had a special overload of copy for when the inputs were move_iterators; you can crawl the code yourself to find out.

    Thanks.

    I finally look at the source, and it turns out that I misunderstood the comment “libstdc++ (the GCC standard lib) has optimizations that can…” and was confused about what a move iterator is. :)

    So, actually :

    1) make_move_iterator have nothing to with that memmove optimization. memmove is called when the iterator value type is a pod, nothing else.

    2) A move iterator is an iterator adaptor. (I missed that part :) )

    3 )

    std::copy(std::make_move_iterator(rhs.begin()), std::make_move_iterator(rhs.end()), begin());

    is a verbose way to write :

    std::move(rhs.begin(), rhs.end(), begin());
      Quote
  4. Joe Gottman says:

    Another problem with the “canonical” implementation of move-copy assignment is that it fails for a converting move-copy assignment operator. Consider the class

    template <class X> class shared_ptr;

    This class has a converting assignment operator

    template <class Y> shared_ptr &operator=(shared_ptr<Y> &&r);

    that converts a shared_ptr<Y> to a shared_ptr<X> whenever Y * is convertible to X * (for instance when X is a base class of Y). Since shared_ptr<X> and shared_ptr<Y> are different classes it is impossible to swap one with the other.

      Quote
    • I don’t think that’s a problem for the canonical move assignment:

      template <class X>
      template <class Y> shared_ptr<X>&
      shared_ptr<X>::operator=(shared_ptr<Y> &&r)
      {
           shared_ptr( std::move(r) )
               .swap(*this);
           return *this;
      }
        Quote
  5. Thomas Petit says:
    // libstdc++ (the GCC standard lib) has optimizations that can
    // make this up to 2x faster than the simple-minded
    // implementation, e.g. when possible this call becomes a
    // memmove
    std::copy(std::make_move_iterator(rhs.begin()), std::make_move_iterator(rhs.end()), begin());

    I don’t understand this line. Why do you have to pass move iterator to help std::copy to take advantage of memmove ? How does that optimization work ?

    And by the way, the test file canonical-assign.cpp doesn’t compile on MSVC 10. There is an error at this line :

    vbase(vbase&& rhs)
        : A( std::move(static_cast<A const&>(rhs)) )
    int std::move function :
     
    // TEMPLATE FUNCTION move
    template<class _Ty> inline
       typename tr1::_Remove_reference<_Ty>::_Type&&
       move(_Ty&& _Arg)
       {    // forward _Arg as moveable
          return (_Arg);              // <- error here
       }

    The error is : “error C2440: ‘return’ : cannot convert from ‘const alloc’ to ‘alloc &&’”

      Quote
    • I don’t remember the details of how the optimization works, but I think it just had a special overload of copy for when the inputs were move_iterators; you can crawl the code yourself to find out.

      As for MSVC 10, that would be a bug in their implementation. rhs is supposed to be treated in expressions as a non-const lvalue, so _Ty should be deduced to be alloc, not const alloc.

        Quote
  6. Hoang Vu says:

    “So, std::vector uses a more economical assignment operator with the signature vector& operator=(vector const&), which allows it to delay copying lvalue arguments until it can be sure a copy is needed. Unfortunately, if we also tried to keep the canonical copy assignment signature around for use with rvalue arguments, the overloads would be ambiguous.” Can you clarify on that? And what is the fastest implementation of both the copy assignment and move assignment ?

      Quote
    • Let me try to clarify.

      1. An implementation like

        vector const& operator=(vector rhs) { swap(*this,rhs); return *this; }
        relies on copy elision for efficiency when the right-hand side of the assignment is an rvalue. However, when the right-hand side is an lvalue, it must be copied at the call site—those are the standard rules.
      2. Instead, std::vector’s copy assignment operator looks something like:

        vector& operator=(vector const& rhs)
        {
            this->clear();
            if ( this->capacity() < rhs.size() )
            {
                // deallocate existing storage
                // allocate new capacity
            }
            // …assign / copy elements into existing storage
        }
        Leaving aside the issue of peak resource usage, when the “if” condition is satisified here, the cost is equivalent to that of the full copy-and-swap implementation in item 1. But when it is not satisfied, substantial savings are available; that’s what I mean by delaying the copy.
      3. If we try to overload implementation 2 for copy assignment and implementation 1 for move assignment, any attempts to assign vectors will result in an ambiguity error.

      As for the fastest implementation of copy assignment and move assignment, I guess I can’t say for sure. How can one ever prove one knows the fastest way to implement something?

        Quote
      • Hoang Vu says:

        I want to check if I understand it now. In the ideal situation only implementation 1 is needed for both lvalues and rvalues. In situation like std::vector, where implementation 2 exists, a move assignment like clear and swap of minimal is added. Is that right ?

          Quote
        • No, I don’t think so. As my tests demonstrate, implementation 1 isn’t even an optimal implementation of move assignment (i.e. assign from rvalues) in most cases. In the case of boost::array, it is dramatically suboptimal. It also has the wrong semantics for std::vector in that it doesn’t preserve user-visible side effects of assignment to the left-hand side.

            Quote
          • Hoang Vu says:

            Thank you for explaining. I understand it now.

              Quote
          • Ben Kerby says:

            Hi Dave,

            I know this is comment is ancient by now, but the page still comes up on google searches.

            I think the statement “It also has the wrong semantics for std::vector in that it doesn’t preserve user-visible side effects of assignment to the left-hand side.” is misleading.

            That statement is true for:

            
            vector& operator=(vector&& rhs)
            { 
                std::swap(*this, rhs);
                return *this;
            }
            

            but it is not true for:

            
            vector& operator=(vector rhs)
            { 
                std::swap(*this, rhs);
                return *this;
            }
            

            since in the second case the ‘rhs’ variable’s scope is limited to the scope of the function. (i.e. any resources which are swapped into rhs must be released at the end of the function).

              Quote
          • Ah, but the second implementation isn’t “swap;” it’s “move construct into a temporary (which might be elided) and swap.”

            Consider the following

            #include <iostream>
            #include <utility>
             
            struct X
            {
                static unsigned count;
                unsigned id;
             
                X() : id(count++) {
                    std::cout << "construct " << id << std::endl;
                }
                X(X const& rhs) : id(count++) {
                    std::cout << "copy " << rhs.id << " -> " << id << std::endl;
                }
                X(X&& rhs) : id(count++) {
                    std::cout << "move " << rhs.id << " -> " << id << std::endl;
                }
                X& operator=(X rhs) {
                    std::cout << "assign " << rhs.id << " -> " << id << std::endl;
                }
                ~X() {
                    std::cout << "die " << id << std::endl;
                }
            };
             
            unsigned X::count = 0;
             
            int main()
            {
                X a, b;
                a = std::move(b);
            }

            The output is

            construct 0
            construct 1
            move 1 -> 2
            assign 2 -> 0
            die 2
            die 1
            die 0
            
              Quote
          • Ben Kerby says:

            Yes, but the important point is that “die 2″ will release the resources which were swapped into it, and this will happen before the next statement, thus preserving “the user-visible side effects of assignment to the left hand side.” Any resources held by variable “0″ will be swapped into the temporary “2″ and released there.

            See:

            
            #include <iostream>
            #include <utility>
            #include <memory>
             
            struct X
            {
                static unsigned count;
                unsigned id;
                static char res;
                std::unique_ptr resource;
             
                X() : id(count++), resource(new char(res++)) {
                    std::cout << "construct " << id << " (created " << *resource << ')' << std::endl;
                }
                X(X const& rhs) : id(count++), resource(new char(*rhs.resource)) {
                    std::cout << "copy " << rhs.id < " << id << " (copied " << *resource << ')' << std::endl;
                }
                X(X&& rhs) : id(count++), resource(std::move(rhs.resource)) {
                    std::cout << "move " << rhs.id < " << id << " (transferred " << *resource << ')' << std::endl;
                }
                X& operator=(X rhs) {
                    std::cout << "assign " << rhs.id << " (holding " << *rhs.resource < "
                        << id << " (holding " << *resource << ')' << std::endl;
                    resource.swap(rhs.resource);
                    return *this;
                }
                ~X() {
                    std::cout << "die " << id;
                    if (resource)
                    {
                        std::cout << " (release " << *resource << ')';
                    }
                    else
                    {
                        std::cout << " (empty)";
                    }
                    std::cout << std::endl;
                }
            };
             
            unsigned X::count = 0;
            char X::res = 'a';
             
            int main()
            {
                X a, b;
                a = std::move(b);
                std::cout << "move complete" << std::endl;
                return 0;
            }
            

            which outputs:

            
            construct 0 (created a)
            construct 1 (created b)
            move 1 -> 2 (transferred b)
            assign 2 (holding b) -> 0 (holding a)
            die 2 (release a)
            move complete
            die 1 (empty)
            die 0 (release b)
            

            the important point being that ‘a’ is released before “move complete” is printed.

              Quote
          • That wasn’t lost on me. What was lost on me was the context of my comment. Once I went back and looked at it, I see that you’re almost right. It wasn’t misleading; it was flat-out wrong. Thanks for the correction!

              Quote
      • Jeffrey Bosboom says:
        Dave Abrahams:
      • However, when the right-hand side is an lvalue, it must be copied at the call site—those are the standard rules.
      • Why can’t a compiler transform Foo& operator=(Foo x) { … } into Foo& operator=(const Foo& __x) { Foo x(__x); … } when asked to optimize for size (-Os)?

          Quote
        • It can. Sorry, I wrote imprecisely. The point I was trying to make is that there’s no opportunity to avoid copying an lvalue. In your example code, the lvalue is copied explicitly.

            Quote
          • Jeffrey Bosboom says:

            The point relates more to one of the earlier columns where increased code size was listed as a drawback of the pass-by-value form of assignment compared to the canonical by-const-reference version. I can’t seem to comment on older articles, so I asked here.

              Quote
          • You are right; code size doesn’t have to increase if you’re willing to forego copy elision for rvalues.

              Quote

Leave a Comment (post replies using links below individual comments)