std::lower_bound and std::upper_bound

The functions std::lower_bound and std::upper_bound are useful any time you want to do a binary search on a sorted range. They operate on any sorted container, which may contain duplicates. These includes containers like std::vector that don't inherently have some ordering property, but that you may know to be sorted (e.g. because you called std::sort). Unlike the regular find functions in C++ they can return iterators within the range if an element is not found, where the iterators correspond to the logical start or end positions for a query. The problem is, I always found the naming for these functions and the boundary conditions confusing.

The documentation for std::lower_bound states about the returned value:

Iterator to the first element of the range [first, last) not ordered before value, or last if no such element is found.

And for std::upper_bound:

Iterator to the first element of the range [first, last) ordered after value, or last if no such element is found.

The phrasing of "ordered after" for std::upper_bound is relatively clear---the returned iterator will be after the search term---but the phrase "not ordered before" for std::lower_bound is more confusing. Even if you spend some time to think through what it means, it's hard to remember intuitively which function does what, and why they're named the way they are.

I recently learned a way that makes the names of the functions make sense, and makes it easy to remember exactly what they each do and why they're named the way they are. The key insight is to consider these functions in the case where the sorted range of values contains duplicates, and then to understand how the functions behave.

Consider a sorted vector of values std::vector<int> v = {1, 2, 4, 4, 4, 8, 10}. If we search the vector using std::lower_bound with a search term of 4, the returned iterator will be to the first 4 in the vector, i.e. v.begin() + 2. If we search the vector using std::upper_bound with a search term of 4, the returned iterator will be to the first value in the vector that is greater than 4, which is the value 8, or v.begin() + 5. If we consider the span of the vector that contains all of the elements equal to 4, std::lower_bound() returns the .begin() of that span, and std::upper_bound() returns the .end() of that span.

Using this insight, we can understand also how the functions behave if we search for say element 2. Conceptually there is a span of length 1 that starts at the index of the element equal to 2 and has length 1 (which is at offset 1 into the vector), so std::lower_bound() will return the iterator corresponding to 2 (i.e. the .begin() of this imaginary span) and std::upper_bound() will return the iterator corresponding to the next element (i.e. the .end() of this imaginary span).

Now consider the cases where we search for a value before the first element in the vector, after the last element, or a vector between the first and last elements but that doesn't exist within the range. If we call std::lower_bound() or std::upper_bound() with a search term of -1 then the span containing -1 would be to the left of the first element, so both will return v.begin(). If we search for value 7, the span of 7s would be between the last 4 and the 8, so both functions will return the iterator to 8. If we search for value 11, the span of 11s would be to the right of the last element, so both functions will return v.end().

Queries that match search terms can also return v.begin() or v.end() depending on which method we're using, so you still always need to check which iterator is returned and whether it matches the search term. But the point is that thinking about how these functinos behave in terms of specifying spans of duplicate values also helps with understanding the behavior when the sorted range doesn't contain the search term or doesn't actually contain duplicates.