Sorting algorithms 101
In the world of programming, there are many ways to sort data. For the average engineer, you might just use .sort()
, perhaps with a slightly modified sorting function to return your data in a specific way. But that’s probably about the most you do.
If however you’ve ever taken an algorithms class, used LeetCode, or just dived down a little deeper into sorting performance of a platform, you’ve probably heard of sorting algorithms.
Now, make no mistake, your favourite language’s .sort()
(or equivalent) already uses one of these algorithms. In fact, if we’re being honest, it probably uses it in a way that’s faster than what you’re going to write anyway. But that doesn’t mean you won’t occasionally come up against a situation where writing a custom-sorting algorithm wouldn’t benefit you.
So, in this article, I’m going to give a quick overview of the most common sorting-algorithms out there. Over time, I’ll be posting more implementation articles for sorting, and will add links to them here.
Stable vs unstable sorting
Before we dive in, a quick note on stable and unstable sorting. In the sorting world, we’ll often come up against multiple instances of the same value. In the array [1, 5, 7, 5, 3, 5]
for example, we’ll encounter 5
three times, and so our algorithm may be doing wasted work on comparing such values together and (potentially) rearranging them.
In a stable sort, our duplicate values will remain in their original order, with no time spent rearranging them for no reason.
In an unstable sort, our duplicate values will be rearranged, with time having been spent on this during the sort.
Below is an example of this, with letters added to the duplicate values to indicate the order pre and post-sort (these letters would not be in the real data and would thus have no bearing on the actual sort):
Stable sort: [1, 5a, 7, 5b, 3, 5c]
→ [1, 3, 5a, 5b, 5c, 7]
Unstable sort: [1, 5a, 7, 5b, 3, 5c]
→ [1, 3, 5c, 5a, 5b, 7]
1. Quick sort
This algorithm relies on a simple premise; we sort a single value in an array of data, and then another one, and another one, until we sort the entire array. Each time we do this, we mark the now sorted point as a “pivot”, move all numbers lower than it to its left, move all numbers higher than it to its right, and then repeat the algorithm on either side of that point.
Quick sorts are great if you want to do an in-memory sort, as we are constantly swapping existing data, but picking the “right” pivot point can be the difference between a fast and slow sort. In addition, this algorithm is an unstable one.
This algorithm has a typical time complexity of O(n log n), but a worst-case of O(n²). The space complexity however is just O(log n).
Note: You can see my post on quick-sort implementations here.
2. Merge sort
Here we split our to-be-sorted data into 2, right down the middle. Then, we run each half through the merge-sorting algorithm again, where it will be split recursively until we have just 2 pieces of data (1 on each side). From there, we re-combine each side in sorted order, until we eventually return to the top level with a sorted dataset.
This algorithm is fast, but as it involves re-creating data and lots of moving around, its perhaps not as fast as some others. It is however highly customisable and a personal favourite of mine. Oh, and it’s a stable sort, too!
Merge sorts have a time complexity of O(n log n), but a space complexity of O(n).
Note: You can see my post on merge-sort implementations here.
3. Bucket sort
In this algorithm, we divide our dataset into ordered buckets, sort each bucket, and then concatenate the buckets to give us a fully sorted set. This can make for a very elegant solution, especially if we pick the right number of buckets, however just like with quick-sort, picking the wrong number of buckets can slow things down significantly.
It’s also worth mentioning that once we’ve split our data into buckets, we still have to sort each bucket. This may be done with a recursive bucket sort, or we may combine bucket sort with another algorithm (common implementations seem to involve combining bucket and insertion sorts).
Bucket sorts can be fast if we pick the right number of buckets, and are also very easy to understand and implement. In addition, they are a stable sorting method.
Bucket sorts have a typical time complexity of O(n + k) (where k is the number of buckets), but a worst case of O(n²). The space complexity if O(n*k) in the worst case.
Note: GeeksforGeeks does a great 3 minute video on bucket sort here.
4. Bubble sort
In this sorting algorithm, we are simply comparing values with their immediate neighbours. The way we do this is that we loop through the dataset, and we compare the value at i
with the value at i + 1
. If i
is higher, than we swap the two values and increment i
. After the first run, we will have identified the biggest value in the data, and moved it all the way to the right. Now on the next run, we will identify the second biggest value, and can ignore the already correctly placed value at the end of the array.
Bubble sort is an easy and intuitive algorithm to implement. It performs the sort in-place, and is stable.
In terms of complexity, the worst case for time is O(n²), however the space complexity is a wonderful O(1).
Note: Michael Sambol does a great 2 minute video on bubble sort here.
5. Insertion sort
Much like with bubble sort, insertion sort loops through the values of an array, but this time it compares it with the number to its left. If the left-hand number is lower, then those two values are sorted (for now), and we move onto the next one. Once we find a value that is not lower than the one to its left, we loop through the values on its left until we can find (and place it in) its correct position.
Insertion sort is once again a very easy algorithm to implement, and is also a stable one.
For complexity, insertion sort has a worst-case time complexity of O(n²) (where the original data was presented in descending order and thus every value has to be swapped), but has a space complexity of O(1).
Note: Michael Sambol does a great 2 minute video on insertion sort here.
And beyond
There are a huge number of additional sorting algorithms out there, and I hope to add more to this article in the future.