Posts

Showing posts from 2014

Taming Zookeeper - introduction and “how to” for Java and Python

Image
Zookeeper is one of the most versatile projects out there but still somewhat underutilized and feared; however, almost every software project could benefit from using it as a distributed, fault tolerant, hierarchical data storage. What really is Zookeeper? Zookeeper is a free and open source, distributed, fault tolerant, hierarchical value storage - pretty much like a network share that stores small files in directories and subdirectories. These “files”, which are called nodes in Zookeeper, are small, so the platform is not really a BLOB storage (actually the size of a node is limited to 1mb). Storing and reading these nodes are extremely fast, so any information can be stored in Zookeeper without seconds thoughts – active keys in a cache, currently active sessions, list of active servers, and so on. A Zookeeper cluster can consist of multiple servers and there is no predefined master in this topology – the platform will make sure that the storage and retrieval of the nodes

Beating the binary search algorithm – interpolation search, galloping search

Image
Binary search is one of the simplest yet most efficient algorithms out there for looking up data in sorted arrays. The question is, can it be beaten by more sophisticated methods? Let’s have a look at the alternatives. In some cases hashing the whole dataset is not feasible or the search needs to find the location as well as the data itself. In these cases the O(1) runtime cannot be achieved with hash tables, but O(log(n)) worst case runtime is generally available on sorted arrays with different divide and conquer approaches. Before jumping to conclusions, it is worth to note that there are a lot of ways to “beat” an algorithm: required space, required runtime, and required accesses to the underlying data structure can be different priorities. For the following runtime and comparison tests different random arrays between 10,000 and 81,920,000 items were created with 4 byte integer elements. The keys were evenly distributed with an average increment of 5 between them. Binary se

Java 8 parallel sort - internals

Image
Java 8 is a quite big update to the platform and luckily they focused on multicore systems too. The new Arrays.parallelSort() implementation is quite interesting: it is using quicksort, merge sort, and Timsort to achieve the best performance, depending on the underlying data type and size. Arrays.parallelSort() – not always parallel The first thing that needs to be noted is that it’s not always a parallel sort. If the array size is small enough (<= 8192 elements) or the number of reported logical CPUs is 1, it always falls back to a dual pivot quick sort. Note, that the “logical CPU” here refers to hyper-threading, so an average i7 dual core CPU will report 4 logical CPUs to the Java environment. To override the number of reported CPUs for the sort algorithm, we simply need to pass in any value to the following system property when starting the JVM: java.util.concurrent.ForkJoinPool.common.parallelism Sorting the chunks and chunk sizes The algorithm ( Arrays.java ) wi

Git cheat sheet - the most useful commands

After a little practise, Git can be fully utilised from the command line without any GUI. Adding, committing, pulling, and pushing is part of the daily work; however, there are other commands that are useful but not that frequently used. Here is my collection of those: # Create and checkout new branch based on the current branch git checkout -b <newBranch> # Undo all local pending changes git reset --hard HEAD # Erase the last 3 commits git reset --hard HEAD~3 # Undo a single file change git checkout -- <fileName> # Push new branch to server git push origin <branchName> # Delete remote branch git push origin --delete <branchName> # List all local and remote branches git branch -a # Remove remotely deleted branches - will not delete local branches, only copies of remotes git remote prune origin # Pull remote master branch and merge it into current branch git pull origin master # Undo last commit but keep changes unstaged git reset H