Searching through data is painful, whether it’s in a database, a spreadsheet, or even on a piece of paper. Even in code, searching data tends to be a fairly slow process, at least when compared to other programmatic actions you might be carrying out.
The classic method of course is with a loop. To find data with a loop, we simply look through every element until we find what we’re looking for. This sounds great in principle, and is probably similar to how we’d look through a list of data in real life, but it’s not terribly efficient. However, if the data is randomly inserted and unordered, there’s not much we can do about this.
If on the other hand, the data is sorted, this opens us up for some other options, the primary of which is a binary search. We can think of a binary search as a way of chopping the data to be searched in half, until we reach the answer. To continue the real life example; imagine you have a list of 1,000 forenames in alphabetical order, and you’re looking for the name John. Rather than go through every individual name looking for John, what if we instead looked at entry 500 first? Well, if entry 500 was Lucy, then we’d know our answer lies in the first 500 entries, and so we can discard entries 500–1,000. We’ve just discarded 500 entries in a single check, pretty efficient right? So now all we do is repeat this process, until we are eventually left with just 1 entry.
To give a more practical example, consider this list of 10 names:
Let’s now try and search for Nancy. First off we’ll check entry 5 (Lucy), and see that our name comes after that, so we’ll discard the first half of the list and be left with the following:
Now let’s check the middle point again; Terry. Well we know Nancy comes before Terry, so we’ll discard the latter section, leaving us with:
This time when we check the middle value, we get a match! We’ve found the answer with just 3 checks, instead of the 7 it would have taken for a conventional loop.
More importantly, this approach scales. If we have a list of 10 entries and we’re looking for a value, we must do up to 10 checks. If we apply that same algorithm to 100,000,000 entries, we must do up to 100,000,000 checks. If instead we utilise a binary search, we will only need to do around 27 checks, depending on the target and the exact approach we use. That’s a pretty significant saving!
Let’s look at some of this in code. We’ll look at an ordered array of 10 numbers
[1, 3, 4, 7, 8, 12, 16, 17, 18, 20], and search for a target number of
16. To achieve this, we’ll use the following binary search implementation:
First off we’ll establish our middle index of 5, which gives us a value in the above array of 12. We then compare that with the target and realise that the number we’re looking for is higher. So, we discard the first half of the data by moving the left-cursor to the middle point plus 1 (as we know the value at the middle point is not the target, having just checked it). This then reduces the array values we’re checking to
[16, 17, 18, 20].
Now we’ll establish a new middle index of 2, which gives us a value in the new array of 18. We compare this with our target of 12 and see that it’s higher than our target, so we discard the second half of the new array, leaving us with
Next we pick a new middle index of 1, which gives us a value of 17, and see that this is still above our target value. So we chop off the right-side of the latest array and leave ourselves with
, which is of course our answer.
It’s worth pointing out that the above implementation is just one, fairly classic implementation of a binary search. There are additional minor tweaks that can be done, such as using the full length of the array, or having a
left <= right check rather than
left < right. Some of these aid readability and personal understanding, others give very different results, but most follow the same basic concept and thus give the same performance.
Where you are most likely to need to make these sorts of changes is when what you’re looking for is a little more complicated, such as when you need to find not just the first occurrence of a value, but the last occurrence of it, and thus need to do a right-biased search. Or maybe your data isn’t ordered quite the way you expect, and so you need to account for that. In all cases, the fundamentals of a binary search remain the same, but the way in which you traverse the provided values may need to change slightly.
Finally I’d also like to mention a recursive form of binary searching. Once again the principles remain the same, but instead of a while loop running after we shrink the window of inspection (by moving left and right pointers closer together), we simply re-call the function with the smaller window. Personally, I prefer the iterative approach, but I will include it here for completeness: