Lambda vs Bind

Consider the following function:

template <typename Function>
void do_test_loop(Function func, const uint64_t upper_limit = 1000000000ULL)
{
    for (uint64_t i = 0; i < upper_limit; ++i)
        func(i);
}

This rather useless higher-order function calls func some upper_limit amount of times (default of one billion). There a multiple methods of giving this function a function to call repeatedly; here, we will discuss two of them: std::bind to generate a polymorphic std::function<void (uint64_t)> and a lambda expression.

void test_accumulate_bind_function(uint64_t& x, uint64_t i)
{
    x += i;
}

uint64_t test_accumulate_bind()
{
    namespace arg = std::placeholders;
    
    uint64_t x = 0;
    std::function<void (uint64_t)> accumulator = std::bind(&test_accumulate_bind_function, std::ref(x), arg::_1);
    do_test_loop(accumulator);
    return x;
}

That is a pretty ugly function. The biggest problem I have had with boost::bind is that it requires you to seperate your functions (and therefore your logic), which can lead to hard-to-follow code. For larger functions, that is not a big deal, but for small functions like this one, the context switching is more obnoxious than anything.

The equivalent lambda expression looks like this:

uint64_t test_accumulate_lambda()
{
    uint64_t x = 0;
    auto accumulator = [&x] (uint64_t i) { x += i; };
    do_test_loop(accumulator);
    return x;
}

The lambda syntax keeps everything tight — no context switching. Of course, we are losing the major advantage the polymorphic std::function has, which is polymorphism. The type of a lambda expression is a statically compiler-generated unpronouncable name, which is why you must use auto to name it. The type of accumulator is specific to the result of that expression — no other expression (lambda or otherwise) can generate the same type. Even if the contents of two lambda expressions are identical, they will not have the same type. If do_test_loop was a non-trivial function hidden within some cpp file, we would not get to pass the lambda expression's type around.

Luckily, some very smart people thought of this potential problem and made assignment from a lambda expression to an std::function not only possible, but downright trivial:

uint64_t test_accumulate_bound_lambda()
{
    uint64_t x = 0;
    std::function<void (uint64_t)> accumulator = [&x] (uint64_t i) { x += i; };
    do_test_loop(accumulator);
    return x;
}

By using lambda syntax instead of std::bind, we get all the power of the polymorphic std::function with the convenience and expressiveness of C++ lambdas. This sounds like a win-win to me.

We can do a fairly simple comparison of these three functions (using the timer class):

template <typename Function>
void run_test(const std::string& name, Function func)
{
    std::cout << name;
    timer t;
    volatile_write(func());
    timer::duration duration = t.elapsed();
    std::cout << '\t' << duration.count() << std::endl;
}

int main()
{
    run_test("Accumulate (lambda)      ", &test_accumulate_lambda);
    run_test("Accumulate (bind)        ", &test_accumulate_bind);
    run_test("Accumulate (bound lambda)", &test_accumulate_bound_lambda);
}

Without further ado, here are the results on my Intel Core i7 Q740 (compiled with GCC 4.6.2 -O3):

Accumulate (lambda)         7
Accumulate (bind)           4401849
Accumulate (bound lambda)   4379315

Whenever I am doing performance tests and see an absurdly small time like this, I disassemble the program to see what the compiler did.

(gdb) disassemble test_accumulate_lambda
Dump of assembler code for function _Z22test_accumulate_lambdav:
   0x0000000000400e70 <+0>:     movabs $0x6f05b59b5e49b00,%rax
   0x0000000000400e75 <+5>:     retq
End of assembler dump.

After optimization, the whole function just moves 0x6f05b59b5e49b00 into rax and returns it. In decimal, that number is 499999999500000000. Whoa! The compiler is smart enough to realize that we are just looking for the value of sum(xrange(1000000000)) and does that for us. I find it rather impressive that the compiler can do this, but it makes sense. The contents of the function are statically known the by the instantiated do_test_loop, so the compiler unrolls the contents to something like:

uint64_t test_accumulate_lambda()
{
    uint64_t x = 0;
    // do_test_loop:
    for (uint64_t i = 0; i < 1000000000; ++i)
        x += i;
    return x;
}

Any compiler worth its salt will optimize this right out. I think the important thing to take away from this simple example is to know that the compiler is aware of the static-ness of lambda functions. You can use lambda functions as a great way to avoid repeating yourself without worrying about the overhead.

So what about our calls with std::function? Here, it is the polymorphism that kills us. When the function do_test_loop is instantiated with Function = std::function<void (uint64_t)>, the compiler does not know the behavior of func; it could do anything (which is the entire point of std::function).

The difference between the std::bind and bound lambda expression are insignificant. If you run the test a bunch of times, the bound lambda seems to always be a bit faster on my computer, but it does not look statistically significant. This performance will probably change in the future and on different machines, but if I had to guess, I would say it has to do with the std::reference_wrapper. Just for grins, let's look at the stack traces from the two functions.

std::bind
#0  test_accumulate_bind_function (x=@0x7fffffffe5d0, i=0) at lambda_vs_bind.cpp:106
#1  0x0000000000401111 in operator() (__args#0=0, this=<optimized out>) at /usr/local/include/gcc-4.6.2/functional:2161
#2  do_test_loop<std::function<void(long unsigned int)> > (func=<optimized out>, upper_limit=<optimized out>) at lambda_vs_bind.cpp:93
#3  test_accumulate_bind () at lambda_vs_bind.cpp:115
#4  0x0000000000401304 in run_test<unsigned long (*)()> (name=<optimized out>, func=0x401080 <test_accumulate_bind()>) at lambda_vs_bind.cpp:84
#5  0x0000000000401411 in main () at lambda_vs_bind.cpp:136
Lambda Expression
#0  std::_Function_handler<void(long unsigned int), test_accumulate_bound_lambda()::<lambda(uint64_t)> >::_M_invoke(const std::_Any_data &, unsigned long) (__functor=..., __args#0=0) at /usr/local/include/gcc-4.6.2/functional:1778
#1  0x0000000000400fa9 in operator() (__args#0=0, this=<optimized out> at /usr/local/include/gcc-4.6.2/functional:2161
#2  do_test_loop<std::function<void(long unsigned int)> > (func=<optimized out>, upper_limit=<optimized out>) at lambda_vs_bind.cpp:93
#3  test_accumulate_bound_lambda () at lambda_vs_bind.cpp:126
#4  0x0000000000401304 in run_test<unsigned long (*)()> (name=<optimized out>, func=0x400f20 <test_accumulate_bound_lambda()>) at lambda_vs_bind.cpp:84
#5  0x000000000040143e in main () at lambda_vs_bind.cpp:140

So the only real difference is the function called by std::function's operator(). To make sense of why (and how) this happens, let's take a quick look at the functional implementation as of g++ version 4.6.2.

template<typename _Res, typename... _ArgTypes>
class function<_Res(_ArgTypes...)>
    : public _Maybe_unary_or_binary_function<_Res, _ArgTypes...>,
      private _Function_base
{
    // a whole bunch of implementation details

private:
    typedef _Res (*_Invoker_type)(const _Any_data&, _ArgTypes...);
    _Invoker_type _M_invoker;
};

The interesting thing to me is that std::function does not use virtual; instead, it uses a function pointer to get the job done. There are a few advantages to doing it this way. This allows you to use an std::function without dealing with pointers or references — that complexity is internal to the object (a Good Thing).

boost::bind

What about the old method: boost::bind? To test this out, we can use the same code for std::bind, but replace std with boost.

Accumulate (boost bind)         3223174
Accumulate (boost bound lambda) 4255098

Curiously, boost::bind is consistently about 25% faster than the equivalent std::bind. The stack trace for calling the function looks about the same as std::bind:

#0  test_accumulate_bind_function (x=@0x7fffffffe600, i=0) at lambda_vs_bind.cpp:114
#1  0x00000000004018a3 in operator() (a0=0, this=<optimized out>) at /usr/local/include/boost/function/function_template.hpp:1013
#2  do_test_loop<boost::function<void(long unsigned int)> > (upper_limit=<optimized out>, func=<optimized out>) at lambda_vs_bind.cpp:101
#3  test_accumulate_boost_bind () at lambda_vs_bind.cpp:144
#4  0x0000000000401f44 in run_test<unsigned long (*)()> (name=<optimized out>, func=0x401800 <test_accumulate_boost_bind()>) at lambda_vs_bind.cpp:92
#5  0x000000000040207e in main () at lambda_vs_bind.cpp:161

I could probably write an entire article on why boost::bind is so fast...

functional
template<typename _Functor, typename... _ArgTypes>
inline
typename _Bind_helper<_Functor, _ArgTypes...>::type
bind(_Functor&& __f, _ArgTypes&&... __args)
{
    typedef _Bind_helper<_Functor, _ArgTypes...> __helper_type;
    typedef typename __helper_type::__maybe_type __maybe_type;
    typedef typename __helper_type::type __result_type;
    return __result_type(__maybe_type::__do_wrap(std::forward<_Functor>(__f)),
                        std::forward<_ArgTypes>(__args)...);
}
boost/bind/bind.hpp (with the macros expanded)
template<class F, class A1, class A2>
    _bi::bind_t<_bi::unspecified, F, typename _bi::list_av_2<A1, A2>::type>
    bind(F f, A1 a1, A2 a2)
{
    typedef typename _bi::list_av_2<A1, A2>::type list_type;
    return _bi::bind_t<_bi::unspecified, F, list_type> (f, list_type(a1, a2));
}

More Info

1: Source Code

Get the source code for this program here. It has been compiled and tested with g++ 4.6.2, but any compiler with good support for the C++11 standard should be perfectly fine with it. My Boost version is 1.47, but earlier and later versions will work just fine, since the boost::bind semantics have not changed for quite some time (and probably will not in the future). If you wish to compile and run without Boost, then change the value of USE_BOOST to 0.

2: volatile_write

The volatile_write function is a simple function I made to force the system to actually write data somewhere in memory. In this case, it prevents the optimizer from seeing that we are not doing anything with the result of func in run_test.

template <typename T>
void volatile_write(const T& x)
{
    volatile T* p = new T;
    *p = x;
    delete p;
}