Introsort algorithm in .NET 4.5 – internal implementation details

Even though introspective sort was invented in 1997 (by David Musser) and was taken on  by many frameworks no too long after, Microsoft decided to add it to the .NET platform only in 2012. However, it worth having a look at the implementation, it’s quite smartly written.

Introsort

Currently the fastest sorting algorithm is quicksort but it’s got a couple of edge cases when it performs really poorly – in O(n^2) time instead of the O(n*log(n)) typical runtime. Even though a smart pivot selection makes it practically impossible to have an O(n^2) runtime, “evil” input can force the sorting to fall into this, making the running time much longer than expected.

Musser had two ideas to speed up the typical quicksort algorithm:

  • If the algorithm on any sub-array runs longer than expected (recursion is deeper than log(array size)) then let’s switch to heapsort which is guaranteed to finish in O(n*log(n)) time
  • If the number of elements is small enough don’t use quicksort, simply do an insertion sort which is faster on small data sets 

These two optimizations allow introsort to outperform the “dumb” quicksort implementation on most arrays (even ~200 times on artificial “evil” arrays consisting of 100.000 elements).

Microsoft implementation

Interestingly Microsoft decided not to use the fully optimized introsort in every case: the framework checks the current runtime version which has to be at least .NET 4.5 desktop (full). This means that for some reason on mobile devices and older platforms the framework will use a “limited depth quicksort” but will not optimize the sorting of small arrays. My guess is that Microsoft wanted to show a huge performance gain with 4.5 but from engineering perspective it’s a bit hard to sell.

Let’s have a look at the pseudo code of the fully optimized implementation:

Introsort()
{
    while(subarray)
    {
        if (length <= 16)
        {
            if (length == 1)
            {
                already sorted, return
            }
            if (length == 2)
            {
                SwapIfGreater left, right then return
            }
            if (length == 3)
            {
                SwapIfGreater left, middle
                SwapIfGreater left, right
                SwapIfGreater middle, right
                return
            }
            InsertionSort(subarray)
            return
        }
        else
        {
            if (depthLimit == 0)
            {
                Heapsort(subarray)
                return
            }
            depthLimit--;
            pivot = QuickSortPartition(subarray)
            recurse IntroSort(right of pivot)
            Set subarray to left of pivot
        }
    }
}

There is a couple of interesting parts to look at. The first is the way the algorithm sorts the 2 and 3 size arrays. The 2 is pretty trivial (swap is not in order), but the 3 is bit more hidden. The first list makes sure that [0] <= [1] holds. The second makes sure [0] <= [2] is true while the third makes sure [1] <= [2] holds as well, ultimately sorting the array.

The other key thing is the way the recursion is implemented. Unlike the textbook based implementations which use recurse(left), recurse(right), this uses a single recursion. The reason is that before it recurses to the left side, it makes sure the sub array is still larger than 16 elements by letting the upper part of the code run on it again. Please note that this is not a tail recursion because there is still executable code after the recursion so the compiler cannot optimize it.

Heapsort

Finally it’s really worth to have a look at the heapsort implementation as it’s beautifully written:

Heapsort()
{
    for middle down to 1
    {
        DownHeap element
    }

    for end down to 0
    {
        Swap first element with current
        DownHeap fist element
    }
}

DownHeap()
{
    while
    {
        int parentPos = elementPos * 2
        if arr[parentPos + 1] > arr[parentPos]
        {
            parentPos = parentPos + 1
        }

        if arr[parentPos] > element
        {
            break
        }
        Swap arr[elementPos], arr[parentPos]
        elementPos = parentPos
    }
    arr[parentPos] = element
}


The DownHeap is a standard heap correcting implementation, pushing the newly inserted item down as long as its parent node is larger than itself – as a side effect, it will build a max heap, meaning the first element in the array will be the largest element and its children are always smaller or equal to itself ([0] >= [1], [0] >= [2] but there is no known between [1] and [2]).

However, the HeapSort implementation is very concise: the first loop from middle to first element will heapify the array, meaning the array will store a max heap where the two children of an element can be found in arr[pos*2] and arr[pos*2+1] position with the properties described above.

The second loop relies on this property: it will keep looping from the end and keep swapping the current item with the first in the array. As we know it’s a max heap, so arr[0] will always be the largest element. Whatever is put on arr[0] from the last position will be shifted down to its proper position by the DownHeap call, maintaining a max heap all the way through.

It means that the arr[0..pos] subarray always will be a max heap while arr[pos+1..end] will be a sorted subarray with the largest element at the end.

Conclusion

Since 2012 Microsoft slightly changed the default sorting algorithm on the .NET platform that might give better performance even on average arrays. However, the biggest change is more around artificially crafted “evil” arrays that previously could have caused O(n^2) run time: the new implementation is guaranteed to finish in O(n*log(n)) time. To have a look at the actual implementation, you can use ILSpy.

Comments

  1. u have mentined, "unlike the text book...". can u tell me which text book that was?

    ReplyDelete

Post a Comment

Popular posts from this blog

MurMurHash3, an ultra fast hash algorithm for C# / .NET

Quick select algorithm - find the Kth element in a list in linear time

Octoprint as a systemd service - running it with high priority