Code unwinding - performance is far away

Code unwinding – don’t get too excited

As a general idea, many compilers (and human beings) rely on code unwinding (aka unrolling), so that their loops (while, for and other cycles) might be a little bit faster than in a standard loop.
A very simple code unwinding would be:


From
for(int i = 0; i < 5; i++){ j+= arr[i]; }

To
{
j+= arr[0];
j+= arr[1];
j+= arr[2];
j+= arr[3];
j+= arr[4];
}

The idea is to reduce the jumps and branches from the code. For instance, the .NET framework String.Compare uses loop unwinding when checking long strings, making it a little bit faster (at least in theory).
The problem is that if your unrolled code is too long and still contains some branches (if-else), the performance loss due to mispredicted branches and cache memory waste will be definitely bigger than the expected gain.

Sorting 

So, looking at the loop unwinding examples I had an idea: why not create a ‘supersort’ which does not use recursion or cycles at all! Based on the values of a smaller array it just puts everything in place. In theory it would start with larger arrays using the generic QuickSort then switch back to the IF-ELSE based sort to finish up – no recursion, so in theory it could be really fast.

To understand the concept, have a look at the following example where the sorting is ‘unrolled’ into simple IF-ELSE branches to sort three variables (a,b,c):




It is great, no cycle, no recursion, and simple branches so we are winning – at least this is what I thought.

The problem is, that for 4, 5 or even bigger sorts, the amount of branches are too high, so I wrote a generator that accepts the size as a parameter and generates the sorting code for you.

Factorial is not your friend

As everyone who has ever dealt with algorithms knows: factorial is definitely not your friend. In sorting however, the number of possible item-orders is factorial.

In the example above, 3! = 6 possible combinations.
With 4 items, it’s 24 branches.
With 5 items, it’s 120 branches.
With only 6 items it’s 720 already.

Generating code to sort just 6 items based on IF-ELSE in theory would require more than 1000 lines of source code. Bad, but not as bad as in reality: because of the massive IF-ELSE-IF-ELSE… blocks, the generated code will be more than 200.000 lines of C# mass.

Before I got into code generation, it would have been wise to use the calculator: thousand lines of code just cannot be faster than the couple lines of quicksort – especially if it’s 200.000.

The result

The ‘SuperSort’ interestingly beats the recursive quicksort in a single case, when only 3 items are in the array. Let’s see the comparison:

Supersort Quicksort relative time
(<1 means QS is faster)
Supersort LOC Average LOC/branch
3 items 1.128 85 14
4 items 0.927 556 23
5 items 0.501 7 781 64
6 items 233 710 325

An interesting side result is that a single function with 233.710 lines of code just cannot be executed on the .NET platform – it never ran, not even once.
Please note that the generated code was not optimal, it did not rely on the transitiveness of the ‘<=’ operator (instead of checking a < b, b < c, it checked a < b, a < c, b < c). However, apart from the sorted case, it did not really cause performance issues, as an unsorted array needs many more comparisons to decide which element belongs where.

A sample of generated code (sort3):

public static void Sort3<T>(IList<T> arr, int from, int to) where T : IComparable
{
T a = arr[from + 0];
T b = arr[from + 1];
T c = arr[from + 2];
if (a.CompareTo(b) <= 0)
{
    if (a.CompareTo(c) <= 0)
    {
        if (b.CompareTo(c) <= 0)
        {
            //abc

        }
        else
        {
            //acb
            arr[from + 1] = c;
            arr[from + 2] = b;

        }
    }
    else
    {
        if (b.CompareTo(c) <= 0)
        {
            // invalid branch
        }
        else
        {
            //cab
            arr[from + 0] = c;
            arr[from + 1] = a;
            arr[from + 2] = b;

        }
    }
}
else
{
    if (a.CompareTo(c) <= 0)
    {
        if (b.CompareTo(c) <= 0)
        {
            //bac
            arr[from + 0] = b;
            arr[from + 1] = a;

        }
        else
        {
            // invalid branch
        }
    }
    else
    {
        if (b.CompareTo(c) <= 0)
        {
            //bca
            arr[from + 0] = b;
            arr[from + 1] = c;
            arr[from + 2] = a;

        }
        else
        {
            //cba
            arr[from + 0] = c;
            arr[from + 2] = a;

        }
    }
}
}

Conclusion

Code unwinding might give you some performance benefit on a very simple cycle with only a couple of non-branched operations but in real life you are better off with even using recursion.

Comments

Popular posts from this blog

MurMurHash3, an ultra fast hash algorithm for C# / .NET

ESP32 - send a push notification from the Arduino ESP32 device to your phone

Octoprint as a systemd service - running it with high priority