More on Binary Search: Variants
1/Jun 2019In the last binary search article, we discussed how to write a correct binary search. But sometimes we are not asked to find any position of the target. If we need the first or the last occurrence, can we do it?
The classic binary search
In Python, the basic binary search looks like this:
lo = 0
hi = N-1
while (lo < hi):
mid = (lo+hi)//2
if A[mid] == target:
return mid
elif A[mid] < target:
lo = mid + 1
else:
hi = mid
return lo if A[lo] == target else -1
And the invariant of the algorithm is that A[lo] <= target <= A[hi]
.
Find first occurrence
Finding first occurrence is similar to the classic binary search.
- We should use
if A[mid] >= target
,hi = mid
because it discards other occurrences of target to the right, while keepingA[mid]
. (Given target exists) hi = mid
works withlo = mid + 1
.- Since
lo = mid + 1
and floor division is used,hi - lo
is strictly decreasing in each pass. The algorithm will always terminate.
lo = 0
hi = N-1
while (lo < hi):
mid = (lo+hi)//2
if A[mid] >= target:
hi = mid
else:
lo = mid + 1
return lo if A[lo] == target else -1
Find last occurrence
if A[mid] <= target
,lo = mid
guarantees at least one target is kept while discarding everything to the left. (Given target exists)lo = mid
works withhi = mid - 1
.- Here’s the tricky part: to ensure that
hi - lo
decreases in each pass, the floor truncation should have bias towards the higher index. i.e.mid = (lo+hi+1)//2
. Without this tweak, there will be dead loop.
lo = 0
hi = N-1
while (lo < hi):
mid = (lo+hi+1)//2
if A[mid] <= target:
lo = mid
else:
hi = mid - 1
return lo if A[lo] == target else -1
The algorithm will terminate with lo == hi
and A[lo]
is the first occurrence if it exists.
Conclusion
The keys to quickly writing a correct algorithm for a binary search variant:
- The condition containing
<
should setlo
and>
should sethi
. - The condition containing
==
should guarantee that probable solutions are not discarded. Under this condition, thelo
orhi
will be set tomid
and there will be no-1
or+1
. - To determine the condition of the test, imagine the input is a list of equal elements. If
A[mid] <= target
and setlo = mid
, it discards some target occurrences to the left, therefore this is for finding last occurrence. The reasoning for finding first occurrence is similar. - In case of
lo = mid
, we need to avoid dead loop by adding 1 before the floor division, i.e.mid = (lo+hi+1)//2
.
Here’s a leetcode problem which requires finding both first and last occurrences: 34. Find First and Last Position of Element in Sorted Array
More Reading