Managing configuration of a distributed system with Apache ZooKeeper
One of the steps towards building a successful distributed software system is establishing effective configuration management. It is a complex engineering process which is responsible for planning, identifying, tracking and verifying changes in the software and its configuration as well as maintaining configuration integrity throughout the life cycle of the system.
Let's consider how to store and manage configuration settings of the entire system and its components using Apache ZooKeeper, a high-performance coordination service for distributed applications.
Configuration of distributed system
In this article we will describe a concept of configuration settings management for the following types of distributed systems or their combination:
- a server application in a cluster (several instances of the same application are deployed to a clustered environment for load-balancing and/or high-availability support);
- a set of services with various functionality, which are communicating with each other via common protocol and forms custom software platform.
Generally, configuration items could be arranged by scope in the following groups:
- global, which are the same for entire system in any sub-configuration (system name, company website url, etc.)
- environment-specific, which may differ between environments: development, test, production (security settings, database sever urls, backup settings etc.)
- service-specific, which holds settings that are related to functionality of the service (database constants, timeouts, links to external resources, etc.)
- instance-specific, which is usually responsible for identification of specific instance in a cluster (host, role in the ensemble, recovery options and so on)
However, items of the classification described above don't have clear boundaries. It depends on system architecture, size and complexity.
The default way to manage settings is to use configuration files that usually have some common and individual sections. Hence, with a growth of the system scale and complexity, volume of the unique configuration data increases. At the same time, common configuration entries are being copied between different components and the risk of their inconsistency across the system grows. Moreover, situation is often aggravated by the presence of several platform development environments (development, test, production, etc.), which require own runtime configuration for both system-wide and service-specific settings.
Continuously increasing volume and variability of configuration data in the form of configuration files makes the task of ensuring its integrity, scalability and security quite complex and resource-consuming. In this article, I'm going to show how to use Apache ZooKeeper to design centralized configuration storage for distributed systems as an alternative to file-based solutions.
Apache ZooKeeper as a centralized configuration storage
If you're new to ZooKeeper then you're strongly suggested to take a look at its design concepts and architecture in the Overview section of Official Documentation. The Programmers Guide would be the best if you're already familiar with ZooKeeper's fundamentals and looking for a starting point to develop real-world applications using Apache ZooKeeper.
As you could find out from documentation and articles about ZooKeeper, it acts the best as a coordination service for distributed applications. In our concept, ZooKeeper is planned to be used as a centralized configuration data storage. However, despite the fact that ZooKeeper has data model that looks like a UNIX OS file system (ZNode can be interpreted as a "directory" that can have data associated with it), there are several constraints, which limits using ZooKeeper as an ordinary file system:
- Default ZNode size limit is 1MB. It can be increased by changing jute.maxbuffer Java system property for all of communicating ZooKeeper servers and clients. However, it is strongly discouraged to do that because it may cause performance drop and seriously disrupt operation of the whole system or even cause its malfunction. And another thing to remember - one megabyte limitation involves not only ZNode value, but also its key and a list of child node names.
- All the data is stored in the memory and duplicated on each server in ZooKeeper ensemble. This fact must be taken into account during the planning phase of the system development. Server machine should have enough RAM and correct JVM heap settings (max. 3/4 of the total amount of memory) to be able to operate properly for the most severe design conditions. Disk swapping would degrade ZooKeeper performance significantly.
- Each write operation is flushed to the disk. Thereby, operations that require extra low latencies or that perform intensive writes of large amount of data are wrong for ZooKeeper. Also, be aware that write speeds can not be increased by adding new instances to the ensemble; on the contrary, one can observe a slight performance drop on write operations.
On the other hand, there are several advantages of using ZooKeeper for our use case:
- All required configuration data is in centralized storage, that will help to avoid issues with data integrity. Meanwhile, ZooKeeper won't be a "single point of failure", because it allows running several instances in ensemble. It can survive failures as long as a majority of servers are active.
- Centralized configuration storage, which is always online, allows maintaining dynamic configuration of the distributed system. This results in an ability to adjust system settings without its restart or even in an ability to perform system auto tuning depending on values of environment metrics. For example, re-adjust settings of specific system component when it runs under high load.
- Flexible control of access to the specific ZNode and its child ZNodes. For instance, this allows to restrict access of employee or service to specific environments/services, protect config entry from unapproved changes by setting read-only access to it, etc.
- Out-of-the-box scalable and reliable tool for synchronizing various runtime system metrics.
You know, that creating complex all-purpose system that can do "almost everything" often leads to worse outcomes compared to the set of specialized tools, and even if you're sure in the final result the development process become very resource consuming and results in a poor ROI. Considering all the pros and cons, we have to strictly define the boundaries of our system:
- Individual configuration entries should have maximum size of KBytes: numeric and text constants, xml/json configurations, etc.; large binary resources have to be avoided.
- System components should avoid time-critical operations with variables and constants at runtime.
- ZooKeeper doesn't track changes to specific nodes itself, so you need to have separate backup/version control systems.
Example
Let's build an application that will show basic examples of communicating with ZooKeeper ensemble to coordinate configuration information. System will consist of one running instance of Apache Zookeeper and simple HTTP service, that use remote configuration to initialize itself on startup and serve several dummy requests. Structure of the system as well as its functionality could be easily extended according to your needs due to great scalability of the ZooKeeper-based solutions.
Structure of configuration data
The structure of the ZooKeeper storage of system configuration data is equivalent to file system structure of any UNIX-like operation system. In this case, we have several root paths for different kind of initial and runtime configuration data.
Root structure of the storage of configuration constants is the following:
/system/<environment>/
- child ZNodes in this path should store global configuration constants which are common for all services and nodes of the system for a target environment./system/<environment>/<service>/
- child ZNodes in this path should store service-specific configuration constants for a target environment.
Below is description for used placeholders:
<environment>
means environment identifier: dev, test, prod, etc.<service>
is a name of a service (system component).
Technology Stack
The following software is used to build and run the example:
- Apache ZooKeeper (3.4.6) - A high-performance coordination service for distributed applications.
- Scala (2.10.4) - A general purpose programming language, which smoothly integrates features of object-oriented and functional programming paradigms.
- Apache Curator Framework (2.7.0) - A high level API library that simplifies using Apache ZooKeeper.
- Spray (1.3.2) - An open-source toolkit for building REST/HTTP-based integration layers on top of Scala and Akka.
- Akka (2.3.8) - An asynchronous event-driven middleware framework for building high performance and reliable distributed applications.
- SBT (0.13.5) - An interactive build tool.
Initializing ZooKeeper client
The first step to get remote configuration data from ZooKeeper in our HTTP Service is creating a client. We use capabilities of Apache Curator Framework for this purposes. It is pretty intelligent high-level API that adds many features that are built on ZooKeeper and handles the complexity of managing connections to the ZooKeeper cluster and retrying operations.
Below is the code that initializes a client.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
|