The functions
std::lower_bound
and
std::upper_bound
are useful any time you want to do a binary search on a sorted range. They
operate on any sorted container, which may contain duplicates. These includes
containers like std::vector that don't inherently have some ordering property,
but that you may know to be sorted (e.g. because you called std::sort). Unlike
the regular find functions in C++ they can return iterators within the range if
an element is not found, where the iterators correspond to the logical start or
end positions for a query. The problem is, I always found the naming for these
functions and the boundary conditions confusing.
The documentation for std::lower_bound states about the returned value:
Iterator to the first element of the range
[first, last)not ordered beforevalue, orlastif no such element is found.
And for std::upper_bound:
Iterator to the first element of the range
[first, last)ordered after value, orlastif no such element is found.
The phrasing of "ordered after" for std::upper_bound is relatively clear---the
returned iterator will be after the search term---but the phrase "not ordered
before" for std::lower_bound is more confusing. Even if you spend some time to
think through what it means, it's hard to remember intuitively which function
does what, and why they're named the way they are.
I recently learned a way that makes the names of the functions make sense, and makes it easy to remember exactly what they each do and why they're named the way they are. The key insight is to consider these functions in the case where the sorted range of values contains duplicates, and then to understand how the functions behave.
Consider a sorted vector of values std::vector<int> v = {1, 2, 4, 4, 4, 8, 10}. If we search the vector using std::lower_bound with a search term of 4,
the returned iterator will be to the first 4 in the vector, i.e. v.begin() + 2. If we search the vector using std::upper_bound with a search term of 4,
the returned iterator will be to the first value in the vector that is
greater than 4, which is the value 8, or v.begin() + 5. If we consider the
span of the vector that contains all of the elements equal to 4,
std::lower_bound() returns the .begin() of that span, and
std::upper_bound() returns the .end() of that span.
Using this insight, we can understand also how the functions behave if we search
for say element 2. Conceptually there is a span of length 1 that starts at the
index of the element equal to 2 and has length 1 (which is at offset 1 into the
vector), so std::lower_bound() will return the iterator corresponding to 2
(i.e. the .begin() of this imaginary span) and std::upper_bound() will
return the iterator corresponding to the next element (i.e. the .end() of this
imaginary span).
Now consider the cases where we search for a value before the first element in
the vector, after the last element, or a vector between the first and last
elements but that doesn't exist within the range. If we call std::lower_bound()
or std::upper_bound() with a search term of -1 then the span containing -1
would be to the left of the first element, so both will return v.begin().
If we search for value 7, the span of 7s would be between the last 4 and the
8, so both functions will return the iterator to 8. If we search for value 11,
the span of 11s would be to the right of the last element, so both functions
will return v.end().
Queries that match search terms can also return v.begin() or v.end()
depending on which method we're using, so you still always need to check which
iterator is returned and whether it matches the search term. But the point is
that thinking about how these functinos behave in terms of specifying spans of
duplicate values also helps with understanding the behavior when the sorted
range doesn't contain the search term or doesn't actually contain duplicates.