Code unwinding - performance is far away
As a general idea, many compilers (and human beings) rely on code unwinding (aka unrolling), so that their loops (while, for and other cycles) might be a little bit faster than in a standard loop.
A very simple code unwinding would be:
From
for(int i = 0; i < 5; i++){ j+= arr[i]; }
To
{ j+= arr[0]; j+= arr[1]; j+= arr[2]; j+= arr[3]; j+= arr[4]; }
The idea is to reduce the jumps and branches from the code. For instance, the .NET framework String.Compare uses loop unwinding when checking long strings, making it a little bit faster (at least in theory).
The problem is that if your unrolled code is too long and still contains some branches (if-else), the performance loss due to mispredicted branches and cache memory waste will be definitely bigger than the expected gain.
Sorting
So, looking at the loop unwinding examples I had an idea: why not create a ‘supersort’ which does not use recursion or cycles at all! Based on the values of a smaller array it just puts everything in place. In theory it would start with larger arrays using the generic QuickSort then switch back to the IF-ELSE based sort to finish up – no recursion, so in theory it could be really fast.
To understand the concept, have a look at the following example where the sorting is ‘unrolled’ into simple IF-ELSE branches to sort three variables (a,b,c):
It is great, no cycle, no recursion, and simple branches so we are winning – at least this is what I thought.
The problem is, that for 4, 5 or even bigger sorts, the amount of branches are too high, so I wrote a generator that accepts the size as a parameter and generates the sorting code for you.
Factorial is not your friend
As everyone who has ever dealt with algorithms knows: factorial is definitely not your friend. In sorting however, the number of possible item-orders is factorial.
In the example above, 3! = 6 possible combinations.
With 4 items, it’s 24 branches.
With 5 items, it’s 120 branches.
With only 6 items it’s 720 already.
Generating code to sort just 6 items based on IF-ELSE in theory would require more than 1000 lines of source code. Bad, but not as bad as in reality: because of the massive IF-ELSE-IF-ELSE… blocks, the generated code will be more than 200.000 lines of C# mass.
Before I got into code generation, it would have been wise to use the calculator: thousand lines of code just cannot be faster than the couple lines of quicksort – especially if it’s 200.000.
The result
The ‘SuperSort’ interestingly beats the recursive quicksort in a single case, when only 3 items are in the array. Let’s see the comparison:
Supersort | Quicksort relative time (<1 means QS is faster) |
Supersort LOC | Average LOC/branch |
---|---|---|---|
3 items | 1.128 | 85 | 14 |
4 items | 0.927 | 556 | 23 |
5 items | 0.501 | 7 781 | 64 |
6 items | ∞ | 233 710 | 325 |
An interesting side result is that a single function with 233.710 lines of code just cannot be executed on the .NET platform – it never ran, not even once.
Please note that the generated code was not optimal, it did not rely on the transitiveness of the ‘<=’ operator (instead of checking a < b, b < c, it checked a < b, a < c, b < c). However, apart from the sorted case, it did not really cause performance issues, as an unsorted array needs many more comparisons to decide which element belongs where.
A sample of generated code (sort3):
public static void Sort3<T>(IList<T> arr, int from, int to) where T : IComparable { T a = arr[from + 0]; T b = arr[from + 1]; T c = arr[from + 2]; if (a.CompareTo(b) <= 0) { if (a.CompareTo(c) <= 0) { if (b.CompareTo(c) <= 0) { //abc } else { //acb arr[from + 1] = c; arr[from + 2] = b; } } else { if (b.CompareTo(c) <= 0) { // invalid branch } else { //cab arr[from + 0] = c; arr[from + 1] = a; arr[from + 2] = b; } } } else { if (a.CompareTo(c) <= 0) { if (b.CompareTo(c) <= 0) { //bac arr[from + 0] = b; arr[from + 1] = a; } else { // invalid branch } } else { if (b.CompareTo(c) <= 0) { //bca arr[from + 0] = b; arr[from + 1] = c; arr[from + 2] = a; } else { //cba arr[from + 0] = c; arr[from + 2] = a; } } } }
Conclusion
Code unwinding might give you some performance benefit on a very simple cycle with only a couple of non-branched operations but in real life you are better off with even using recursion.
Comments
Post a Comment