ZooKeeper学习笔记--介绍(1)

前言

学习Kafka,碰到了ZooKeeper,研究一下这个东西。建议读者也带一点耐心,看一看英文原文,我尽量准确地去翻译,但是不代表一定非常准确。

正文

介绍

阅读官网

ZooKeeper: A Distributed Coordination Service for Distributed Applications

ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C.

这里说ZooKeeper是一个为分布式应用程序准备的分布式的协调服务。第二句很关键,它暴露了一个简单的原语集,凭借它,分布式的应用程序可以建立高级别的服务,哪些服务呢?【同步服务】,【配置维持服务】,【组服务】和【命名服务】。它设计成很容易去编程,并且使用了一个类似于文件系统的文档树的结构的数据模型。

以下是对官网维基的介绍,这里的翻译源自google翻译,几乎没有进行什么修改:

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

ZooKeeper是一种集中式服务(怀疑这里存在错误,怎么写成了centralized的呢?),用于维护配置信息,命名,提供分布式同步和提供组服务。所有这些类型的服务都以分布式应用程序的某种形式使用。每次实施它们都需要做很多工作来修复不可避免的错误和竞争条件。由于实现这些类型的服务的难度,应用程序最初通常会实现得不完善,这使得它们在变化的情况下变得脆弱并且难以管理。即使正确完成,这些服务的不同实现也会在部署应用程序时导致管理复杂性。

ZooKeeper aims at distilling the essence of these different services into a very simple interface to a centralized coordination service. The service itself is distributed and highly reliable. Consensus, group management, and presence protocols will be implemented by the service so that the applications do not need to implement them on their own. Application specific uses of these will consist of a mixture of specific components of Zoo Keeper and application specific conventions. ZooKeeper Recipes shows how this simple service can be used to build much more powerful abstractions.

ZooKeeper旨在将这些不同服务的本质提炼为一个非常简单的集中协调服务接口。服务本身是分布式且高度可靠的。协议,组管理和存在协议将由服务实现,以便应用程序不需要自己实现它们。这些应用程序的特定用途将由Zoo Keeper的特定组件和特定于应用程序的约定组成。 ZooKeeper Recipes展示了如何使用这个简单的服务来构建更强大的抽象。

看了上面这一段,又查看了别的不少网页,也都是大体是类似的内容,基本可以明白ZooKeeper主要是给分布式应用程序提供以上四个基本服务或者说是原语。

ZooKeeper 的由来

摘录一段历史,基于以前看书的习惯,总喜欢好好看看前言,了解一个东西的历史和由来,对于理解这个东西,往往是有很大的帮助的,想起一个同学,他看这类技术型书籍从来不看前言,还嘲笑我为什么要看前言,他感觉前言没有用,其实到了今天,你会发现,那些技术书籍里面可能是只有前言是还是有价值的,而且可能一直会有价值下去,而里面的内容,却早已经过时了。关于这一点,《东吴相对论》里面,梁冬也有同样的观点。
下面这段内容摘自《从 Paxos 到 ZooKeeper》(粗看了一下,这本书讲的还是不错的):

Zookeeper 最早起源于雅虎研究院的一个研究小组。在当时,研究人员发现,在雅虎内部很多大型系统基本都需要依赖一个类似的系统来进行分布式协调,但是这些系统往往都存在分布式单点问题。所以,雅虎的开发人员就试图开发一个通用的无单点问题的分布式协调框架,以便让开发人员将精力集中在处理业务逻辑上。

关于“ZooKeeper”这个项目的名字,其实也有一段趣闻。在立项初期,考虑到之前内部很多项目都是使用动物的名字来命名的(例如著名的Pig项目),雅虎的工程师希望给这个项目也取一个动物的名字。时任研究院的首席科学家 Raghu Ramakrishnan 开玩笑地说:“在这样下去,我们这儿就变成动物园了!”此话一出,大家纷纷表示就叫动物园管理员吧,因为各个以动物命名的分布式组件放在一起,雅虎的整个分布式系统看上去就像一个大型的动物园了,而 Zookeeper 正好要用来进行分布式环境的协调,于是,Zookeeper 的名字也就由此诞生了。

设计目标

ZooKeeper is simple.

ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchal namespace which is organized similarly to a standard file system. The name space consists of data registers - called znodes, in ZooKeeper parlance - and these are similar to files and directories. Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can acheive high throughput and low latency numbers.

ZooKeeper允许分布式进程通过一个共享的分级的命名空间来互相协调,这个命名空间是有一个类似于标准文件系统的结构来组织的。这个命名空间包括数据注册,在ZooKeeper的术语中被称作znodes,这非常类似于文件和目录。但是不同于一个典型的文件系统–被设计用来存储–ZooKeeper数据保存在内存中,这意味着ZooKeeper可以获得很高的吞吐和很低的延迟。(这段居然还有错误的拼写!)

The ZooKeeper implementation puts a premium on high performance, highly available, strictly ordered access. The performance aspects of ZooKeeper means it can be used in large, distributed systems. The reliability aspects keep it from being a single point of failure. The strict ordering means that sophisticated synchronization primitives can be implemented at the client.

感觉都是空话,就不翻译了,只是对最后一句有点兴趣,严格的顺序化意味着客户端可以实现复杂的同步原语。

ZooKeeper is replicated.

Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a sets of hosts called an ensemble.

Zookeeper自己也是要在一个主机集上被重复,这个主机集被称为ensemble,怎么翻译,乐团?
在这里插入图片描述

The servers that make up the ZooKeeper service must all know about each other. They maintain an in-memory image of state, along with a transaction logs and snapshots in a persistent store. As long as a majority of the servers are available, the ZooKeeper service will be available.

构成ZooKeeper的每台server必须知道彼此,它们维持着一个内存的状态镜像,处于持久化存储中一个事务日志和快照。只要大多数server还可用,ZooKeeper就可用。

Clients connect to a single ZooKeeper server. The client maintains a TCP connection through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the client will connect to a different server.

ZooKeeper is ordered.

ZooKeeper stamps each update with a number that reflects the order of all ZooKeeper transactions. Subsequent operations can use the order to implement higher-level abstractions, such as synchronization primitives.

ZooKeeper使用反映着所有ZooKeeper事务顺序的数字来标记每个更新。后续操作可以使用该顺序来实现更高级别的抽象,例如同步原语。这应该就是上文所说的顺序化。

ZooKeeper is fast. It is especially fast in “read-dominant” workloads. ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.

总结

理解和翻译得好辛苦,需要休息一下,明天继续。

参考

https://zookeeper.apache.org/doc/current/zookeeperOver.html

posted on 2019-12-15 17:06  chaiyu2002  阅读(71)  评论(0编辑  收藏  举报

导航