Posts

Showing posts from February, 2012

Code unwinding - performance is far away

Image
Code unwinding – don’t get too excited As a general idea, many compilers (and human beings) rely on code unwinding (aka unrolling), so that their loops (while, for and other cycles) might be a little bit faster than in a standard loop. A very simple code unwinding would be: From for(int i = 0; i < 5; i++){ j+= arr[i]; } To { j+= arr[0]; j+= arr[1]; j+= arr[2]; j+= arr[3]; j+= arr[4]; } The idea is to reduce the jumps and branches from the code. For instance, the .NET framework String.Compare uses loop unwinding when checking long strings, making it a little bit faster (at least in theory). The problem is that if your unrolled code is too long and still contains some branches (if-else), the performance loss due to mispredicted branches and cache memory waste will be definitely bigger than the expected gain. Sorting  So, looking at the loop unwinding examples I had an idea: why not create a ‘supersort’ which does not use recursion or cycles at all! Based on the va

Quicksort, binary search and linear search performance - far from what you believe

Image
When to sort then binary search and when to do linear search? We can all agree that binary search is faster than linear search – at least in general and if the searched array size is large enough. This means that everyone who learned the following equation will prefer to use binary search to linear search: log(N) < N/2 And here is a good example when a nice education background may distort your decision: there are ample cases when in practice the above does not hold true! In theory To compare linear search to binary search, it makes sense to compare the sorting then searching to simply linear search. Let’s assume number of searches on an N sized array is K. In this case the following equations should be compared: To get K results from an array:   With sort and binary search time: N * log(N) + K * log(N)   With linear search time: K * (N/2) Looking at the above the sort/binary search is definitely not always smaller than linear search. Generally speaking, if you

Why AutoResetEvent is slow and how to synchronize threads in .NET

Image
All of the modern processors are multi-cored even the low-end embedded ones. This means that software have to be written in a different manner to take advantage of the currently underutilized extra cores. The problem is generally very simple: I have one book but two people are trying to read/write from it – how to solve the problem so the overall performance is faster than with a single guy (many processors, one shared memory). There are many different ways to synchronize these workers and some of them are really efficient and others are just terribly slow. Nehalem The good news is, that from the first generation of i7 processors (Nehalem) the processor level synchronization primitives became really fast – Intel did a good job, so it’s on average 60% faster than it used to be on the P4 processors. These synchronization primitives are XCHG and CMPXCHG commands: the first one replaces a value with another one in a single step, the second one replaces the value if it equa