More on Binary Search: Variants

In the last binary search article, we discussed how to write a correct binary search. But sometimes we are not asked to find any position of the target. If we need the first or the last occurrence, can we do it?

In Python, the basic binary search looks like this:

lo = 0
hi = N-1
while (lo < hi):
    mid = (lo+hi)//2
    if A[mid] == target:
        return mid
    elif A[mid] < target:
        lo = mid + 1
    else:
        hi = mid
return lo if A[lo] == target else -1

And the invariant of the algorithm is that A[lo] <= target <= A[hi].

Find first occurrence

Finding first occurrence is similar to the classic binary search.

  1. We should use if A[mid] >= target, hi = mid because it discards other occurrences of target to the right, while keeping A[mid]. (Given target exists)
  2. hi = mid works with lo = mid + 1.
  3. Since lo = mid + 1 and floor division is used, hi - lo is strictly decreasing in each pass. The algorithm will always terminate.
lo = 0
hi = N-1
while (lo < hi):
    mid = (lo+hi)//2
    if A[mid] >= target:
        hi = mid
    else:
        lo = mid + 1
return lo if A[lo] == target else -1

Find last occurrence

  1. if A[mid] <= target, lo = mid guarantees at least one target is kept while discarding everything to the left. (Given target exists)
  2. lo = mid works with hi = mid - 1.
  3. Here’s the tricky part: to ensure that hi - lo decreases in each pass, the floor truncation should have bias towards the higher index. i.e. mid = (lo+hi+1)//2. Without this tweak, there will be dead loop.
lo = 0
hi = N-1
while (lo < hi):
    mid = (lo+hi+1)//2
    if A[mid] <= target:
        lo = mid
    else:
        hi = mid - 1
return lo if A[lo] == target else -1

The algorithm will terminate with lo == hi and A[lo] is the first occurrence if it exists.

Conclusion

The keys to quickly writing a correct algorithm for a binary search variant:

  1. The condition containing < should set lo and > should set hi.
  2. The condition containing == should guarantee that probable solutions are not discarded. Under this condition, the lo or hi will be set to mid and there will be no -1 or +1.
  3. To determine the condition of the test, imagine the input is a list of equal elements. If A[mid] <= target and set lo = mid, it discards some target occurrences to the left, therefore this is for finding last occurrence. The reasoning for finding first occurrence is similar.
  4. In case of lo = mid, we need to avoid dead loop by adding 1 before the floor division, i.e. mid = (lo+hi+1)//2.

Here’s a leetcode problem which requires finding both first and last occurrences: 34. Find First and Last Position of Element in Sorted Array

Tags// , ,