Dutch national flag problem - performance in Python and C (two pivot partition, three way partition)

One of the typical interview questions is the three way partitioning, also known as the Dutch national flag problem: given an array with three different values, sort it in a way that all values are grouped together (like a three colored flag) in linear time without extra memory. The problem was first described by Edsger Dijkstra.

As it's a typical sorting problem, any sorting would do but they would need n*log(n) time. There are two good solutions to it: do a counting sort which requires two passes over the array; place the elements to their correct location using two pointers.

The latter uses a reader, lower, and an upper pointer to do the sorting. Reader and lower starts from one end of the array, upper pointer starts at the other end. The algorithms goes like this:

- If it's a small value, swap it with the lower pointer and step lower pointer one up
- If it's a middle value, do nothing with it and step reader pointer one up
- If it's a larger value, swap it with the upper pointer and step upper pointer one down

After the reader reaches the upper pointer, we are done: all smaller values have been swapped to the lower end of the array, all larger values pushed to the upper end so middle values necessarily occupy the middle of the array.

Two pivot partitioning

Java 7 introduced an idea with it's traditional quicksort implementation: it uses the above method to create three partitions instead of the two. The new partitioning phase is still linear time but provides three subproblems to solve instead of the two like the traditional quicksort. This little modification in practise is usually faster than the two partitioning quicksort.

When we don't have predefined values like the flag problem above, we can simply take two random values (a, b) from the array and sort it like this:

  <a   a-b   b<

I've created two simple implementation in Python and C to compare them:

Python implementation C implementation
def partition(array):
  if array is None or
   len(array) == 0:
    raise TypeError

  small = array[0]
  large = array[len(array)-1]

  if small > large:
    small, large = large, small

  wr = len(array) - 1
  wl = 0
  reader = 0

  while reader <= wr:
    v = array[reader]

    if v < small:
      array[wl], array[reader] =
       v, array[wl]
      wl += 1
    elif v > large:
      array[wr], array[reader] =
       v, array[wr]
      wr -= 1
    else:
      reader += 1

  return (wl, wr)
int partition(int array[], int n, 
  int* left, int* right)
{
  if(!array || !n)
  {
    return 0;
  }

  int small = array[0];
  int large = array[n-1];
  if(small > large)
  {
    int tmp = large;
    large = small;
    small = tmp;
  }

  int wr = n-1;
  int wl = 0;

  for(int reader = 0; reader <= wr; )
  {
    int v = array[reader];

    if(v < small)
    {
      array[reader] = array[wl];
      array[wl] = v;
      wl++;
    }
    else if(v > large)
    {
      array[reader] = array[wr];
      array[wr] = v;
      wr--;
    }
    else
    {
      reader++;
    }
  }
  *left = wl;
  *right = wr;

  return 1;
}
Runner code for PythonRunner code for C
start = time.time()

for j in range(1000 * 1000):
  arr = [0] * 10
  for i in range(10):
    # much slower, ~20 sec        
    # arr[i] = random.randint(0,10)
    arr[i] = int(random.random()*10) 
    pass
  l, r = partition(arr)

end = time.time()
int n = 10;
int n = 10;
int arr[n];
srand(time(NULL));

int start = clock();
for(int j = 0; j < 1000 * 1000; j++)
{
    for(int i = 0; i < n; i++)
    {
        arr[i] = rand() % n;
    }
    int l, r;
    partition(arr, n, &l, &r);
}
int end = clock();
cout << ((float)end - start) / 
  CLOCKS_PER_SEC;
// Compiled with 
// gcc threads.cc -lstdc++ -std=c++11


Both implementations are readable but Python seems much less "noisy", easier to read, so I really wanted it to be as good and fast as the C implementation. To my disappointment, the Python implementation is about 50 times slower than the C counterpart (and even slower if randint() is used instead of random()). This is a bit surprising as only very basic array arithmetic is involved so the slowdown is somewhat unjustified.

Performance of the Python and C implementation of the two pivot partition, ran 10^6 times on a random integer array with 10 elements.



Comments

Popular posts from this blog

MurMurHash3, an ultra fast hash algorithm for C# / .NET

ESP32 - send a push notification from the Arduino ESP32 device to your phone

Octoprint as a systemd service - running it with high priority