Wednesday, July 2, 2014

Taming Zookeeper - introduction and “how to” for Java and Python

Zookeeper is one of the most versatile projects out there but still somewhat underutilized and feared; however, almost every software project could benefit from using it as a distributed, fault tolerant, hierarchical data storage.

What really is Zookeeper?

Zookeeper is a free and open source, distributed, fault tolerant, hierarchical value storage - pretty much like a network share that stores small files in directories and subdirectories. These “files”, which are called nodes in Zookeeper, are small, so the platform is not really a BLOB storage (actually the size of a node is limited to 1mb).

Storing and reading these nodes are extremely fast, so any information can be stored in Zookeeper without seconds thoughts – active keys in a cache, currently active sessions, list of active servers, and so on.

A Zookeeper cluster can consist of multiple servers and there is no predefined master in this topology – the platform will make sure that the storage and retrieval of the nodes are synchronized properly.

Moreover, Zookeeper guarantees consistency, which means that it doesn’t matter which Zookeeper server the client is talking to, the returned values are always consistent. However, it’s good to keep in mind that consistency is reached on the order of tens of seconds scale - usually it’s much faster but Zookeeper doesn’t consider the service to be down within that time period.

Cross platform

The core of the Zookeeper server is written in pure Java and consists only of 190 classes, and about 42k lines of code. It is reasonably small for such a system and as the code is nicely written it is far from hard to read it and understand the internals of the system.

Zookeeper uses a protocol name “Jute” to communicate with the clients, which is somewhat similar to Protobuf that was designed by Google. Interestingly the Jute compiler, which can turn the zookeeper.jute definition into the Java classes, is included in the Zookeeper source code too.

The protocol is quite simple, so even if there is no direct compilation to other platforms, sending the required data across the wire should be straightforward on most systems. For instance, the Python Kazoo implementation simply recreated the required bindings manually.

Ephemeral Nodes

One of the most interesting features of Zookeeper is called Ephemeral Node. While permanent nodes are stored forever, an Ephemeral Node will be immediately deleted when the client who created it disconnects from the server. It is one of the best ways to keep track of actual “live” data, like which services or servers are up and running, who are online on our site, or who joined/left a chat room.

Storage and cleanup

Zookeeper stores the data on disk, so even if the server is restarted, all nodes will available again. As the system is more optimized for performance than storage, cleaning old and deleted data is not part of the Zookeeper server’s normal job.

Earlier versions required a scheduled job to be run to purge old data, newer versions can do it periodically, but not enabled by default. To enable purging the old data from log files, add/uncomment the following in the zoo.cfg file:

# keeps the 3 most recent snapshots of the data
autopurge.snapRetainCount=3
# runs every 1 hour
autopurge.purgeInterval=1 

Watching data changes

Zookeeper can notify the clients if a watched node has been changed, but every watcher fire only once. If we need to keep watching the changes, we need to create a new watcher; however, it is important to keep in mind that while we will receive the latest change to the node, we may not receive all intermittent changes when we are re-creating the watch, or temporarily get disconnected from the server.

Starting the server

For development purposes it’s really easy to start using Zookeeper: after downloading the binaries, simply run
./bin/zkServer.sh start-foreground
In production mode it might be a good idea to use it as a service or run though supervisord to make sure it is restarted if it crashes for any reason.

Java client

The Curator framework provides high-level access for accessing the Zookeeper server. Even though the protocol is simple so it could be quickly re-implemented, the frameworks typically give extra features like automatic reconnection to the server with a predefined exponential back off time.

To use the Curator framework, simply add the following dependency to the Maven pom.xml file:

<dependency>
  <groupId>org.apache.curator</groupId>
  <artifactId>curator-framework</artifactId>
  <version>2.5.0</version>
</dependency>

To read string data from an existing node, only the following is required:

final RetryPolicy retryPolicy =
        new ExponentialBackoffRetry(BASE_SLEEP_TIME_MS, MAX_RETRIES);
final CuratorFramework client =
        CuratorFrameworkFactory.newClient(CONNECTION_STRING, retryPolicy);
client.start();

final GetDataBuilder getDataBuilder = client.getData();
final byte[] configBytes = getDataBuilder.forPath(CONFIG_PATH);
final String config = new String(configBytes);

System.out.println("Data: " + config);

For a full fledged demo with watchers, deletion, and Zookeeper ACLs, check out the java source code here.

Python client

The easiest way to access Zookeeper is to use the Kazoo client library, which is written purely in Python. It can be installed system wide or just used as a module in a solution and deployed as part of the application.

The steps are very similar to the Java client, so we need to create and start a connection and read the data we are looking for:
from kazoo.client import KazooClient
zk = KazooClient(hosts='127.0.0.1:2181')
zk.start()

print('Data in node: %s' % zk.get('/rabbit/config')[0])
As Kazoo is a high level framework, we do not need to recreate the watcher after every change, it will stay active.

For the full source code with an embedded Kazoo client, check out the python source code here.

Performance

The Zookeeper server are surprisingly fast, reaching above 20k operations / second on a singe server setup with 2 cores and 10 simulated clients. The average latency is typically less than 1 millisecond so in most circumstances Zookeeper wont’ be the bottleneck in the system.

For a detailed performance review of the system under different configurations (cores and machines), check out this Zookeeper performance article.

Conclusion

Even though Zookeeper is usually used as part of a larger software package like Apache Hadoop or Apache Storm, it is a great standalone product.

However, unfortunately it is a little bit mystified with articles going in great depth of the election algorithm of the server nodes, which doesn’t really help to understand how to and when to use Zookeeper.

Storing simple hierarchical information in a fault tolerant, distributed fashion with the live tracking capability of ephemeral nodes make Zookeeper a really handy tool across a lot of different software projects. Give it a go, get Zookeeper binaries!

1 comment:

  1. Hi Adam, if I want to automate server additions and dynamically add them to Zookeeper's ensemble, then measure the time it takes for all servers to recognize each other, is Curator or Kazoo the right tools to use?

    ReplyDelete