Why std::binary_search of std::list works, sorta ...(转载)

http://www.umich.edu/~eecs381/handouts/binary_search_std_list.pdf

 

template <class ForwardIterator, class T>
  bool binary_search ( ForwardIterator first, ForwardIterator last,
                       const T& value );

template <class ForwardIterator, class T, class Compare>
  bool binary_search ( ForwardIterator first, ForwardIterator last,
                       const T& value, Compare comp );

可以看到binary_search的迭代器是前向迭代器,而非随即存取迭代器。只有vector和deque是随即存取迭代器,为了算法的泛型威力,适用于所有容器,所以用的前向迭代器。
下面文章讲述了std::list应用binary_search算法时的实现原理,并用vector和list做了测试。。。。

Why std::binary_search of std::list works, sorta ...
David Kieras, University of Michigan
Prepared for EECS 381, 2/7/2012
You can indeed apply the Standard Library binary_search and lower_bound algorithms to
any sorted sequence container, including std::list, and it will produce a correct result. However, if
you look at any normal code for binary search (e.g. as in Kernighan and Ritchie), it is written to
use array subscripting, which runs in constant time, independent of the size of the array. A linked
list has the property that the only way you can find a node is to start at one end of the list and follow the links from one node to the next, checking them one at a time. Unlike with array subscripting, there is no way to compute the location of a list node directly from knowing its numerical position in the list. So how is it that you can do a binary_search of a std::list?
How binary_search is implemented. Below is a somewhat simplified copy of the Metrowerks
Standard Library version of lower_bound; the binary_search algorithm just calls lower_bound
and checks the result (other implementations might differ, but only in details).

    template <class ForwardIterator, class T>
    ForwardIterator
        lower_bound(ForwardIterator first, ForwardIterator last, const T& value)
    {
        typedef typename iterator_traits<ForwardIterator>::difference_type difference_type;
        difference_type len = distance(first, last);
        while (len > 0)
        {
            ForwardIterator i = first;
            difference_type len2 = len / 2;
            advance(i, len2);
            if (*i < value)
            {
                ! ! ! first = ++i;
                len -= len2 + 1;
                ! ! }
            else
                ! ! ! len = len2;
        }
        return first;
    }

First, see how this algorithm is written in terms of iterators, so that it can apply to any sequence container that supports the standard iterator interface. The two input iterators are first and
last, marking the beginning and end of the range to be searched. In the type system, input iterators can be iterators pointing into any type of container. The basic binary search algorithm involves calculating the midpoint of a range of values, and then checking the value at that midpoint. The code calls std::distance, which returns the numerical distance between the first and last
iterators. This distance is divided by two, and then std::advance is called to move the first iterator
forward by that amount to get to the midpoint of the range. The distance and advance functions
1are also function templates that are defined so that they work with iterators into any type of container. Template magic is used to specialize them for different iterator types. For random-access
iterators (which behave like pointers or subscripts), which std::vector and std::deque supply, the
definition that is used is:

    template <class RandomAccessIterator>
    inline
        typename iterator_traits<RandomAccessIterator>::difference_type
        __distance(RandomAccessIterator first, RandomAccessIterator last, random_access_iterator_tag)
    {
        return last - first;
    }

The subtraction operator is defined for these iterators because the internal pointers can simply
be subtracted to get the distance directly via pointer arithmetic. However, for the more general
input iterators, which can only move forward or back by one step at a time, the definition is:

    template <class InputIterator>
    inline
        typename iterator_traits<InputIterator>::difference_type
        __distance(InputIterator first, InputIterator last, input_iterator_tag)
    {
        typename iterator_traits<InputIterator>::difference_type result = 0;
        for (; first != last; ++first)
            ! ! ++result;
        return result;
    }

In other words, compute the distance between two general iterators by incrementing the first
until it equals the last, and count how many increments are required.
The advance function has a similar pair of specializations. Advancing a random-access iterator simply adds the number of steps to the iterator, corresponding directly to pointer arithmetic,
but advancing an input iterator requires incrementing the iterator the supplied number of times.
When these functions are applied to a built-in array or a std::vector container, the distance
and advance functions will compile down to simple subtraction and addition of the subscript values or pointers, taking almost no time. But if applied to a list, whose iterators support only moving forward or back by one step at a time, then the binary search will require using the distance
function to repeatedly count the nodes between the ends of the narrowing range and then advance with increment and count again to position at the midpoint. Surely all this link-following
will add up to a substantial amount of time, won’t it?
The Standard in fact states that lower_bound and binary_search will run in O(log n) time
when applied to a container with random access iterators. When applied to a container that lacks
random-access iterators, like std::list, the Standard states that the search will be logarithmic with
the number of comparisons, but linear with the number of nodes visited. So whether it runs faster
than a linear search depends on how the much time it takes to do the comparisons compared to
the counting the nodes over and over again.
Big-O doesn’t tell the whole story. Sometimes you need benchmarks or profiling to see how
things work out in practice. I defined two classes of objects which contain an ID value used in
2operator< and operator==, and with constructors that give each object a unique value. One class,
Cheap, uses a single integer for the comparisons. The other, Expensive, uses a string containing
31 characters, the first 28 of which are identical. The comparison is done with std::strcmp, which
will have to test the first 28 characters before finding the different ones. My benchmark code
filled containers with 10, 50, 100, 500, or 1000 objects, and then ran a binary search looking for
each one of the objects, so each possible position in the container was searched for. This search
sequence was repeated more for shorter containers so that the final results show the total time for
50,000 searches distributed evenly over all the positions in the container, giving reasonably stable average run times. I compared run times for the eight combinations of std::vector vs. std::list
containers, linear search with std::find vs. binary_search, and Cheap vs. Expensive objects.
The graphs on the next page show the results. First, the Expensive object searches required
about ten times longer than the corresponding Cheap object searches. Searching vector linearly
is the big loser, while using binary_search on vector is so fast that on the plotting scale it looks
almost flat - this really is the best combination. A surprise is that for Cheap objects, the linear
search of a list ran substantially faster than the linear search of a vector. Apparently, the constanttime arithmetic for doing the subscripting is slower than merely following a pointer to the next
node. This effect is almost swamped by the time required for the comparisons in the Expensive
objects; the two linear searches have similar speeds.
The key result for this discussion is how for the std::list container, the effects of doing linear
vs. binary searches depends on the cost of the comparisons. Compare the two curves plotted with
triangles in the graphs.
For Cheap objects, the linear list search is faster than binary list search by about twice - using
binary_search would be a serious mistake in this case. In fact, it is almost as slow as the linear
vector search. The binary list search requires counting those nodes over and over again, and this
can really add up.
For Expensive objects, the binary list search is quite a bit faster than the linear list search
which is almost as slow as the linear vector search. The way in which binary search minimizes
the number of comparisons really pays off. However the binary list search looks pretty much linear, rather than logarithmic, and is still substantially slower than the binary vector search. Counting nodes back and forth in the list still hurts substantially, even if we save a lot by doing only a
logarithmic number of comparisons.
The bottom line: Of course, your mileage will vary depending on the cost of comparing your
objects and the lengths of your lists and the distributions of your searches. However, together
with purely theoretical considerations, these results show that the only reason to search a list with
binary search is that the objects in the list are extremely expensive to compare. But it is still substantially better to do a binary search of a vector on the same objects. Therefore, the using the
power of the Standard Library to easily do a binary search of a linked list is just a curiosity with
little practical value,
30
1
2
3
4
Time (s) for 50K Searches, All positions
0 250 500 750 1000
l ist +binary
list+linear
vector+binary
vector+linear
0
10
20
30
40
0 250 500 750 1000
Cheap Comparisons Expensive Comparisons
Container Length
4

 

posted @ 2012-07-08 17:56  小 楼 一 夜 听 春 雨  阅读(783)  评论(0编辑  收藏  举报