作者:
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach
Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber
{fay,jeff,sanjay,wilsonh,kerr,m3b,tushar,fikes,gruberg}@google.com
原文:http://labs.google.com/papers/bigtable.html
翻译:eaglet
Abstract
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully
provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.
概述
Bigtable 是一个用于管理超大型结构化数据(面向上千种服务的PB级数据)的分布式存储系统。包括Web 索引,Google Earth, Google Finance 等很多Google的项目都将数据存储在Bigtable 中。这些应用在Bigtable 中放置了很多完全不同需求的数据,包括数据类型的不同(从Urls 到 Web pages 再到人造卫星的图像等)和响应时间的不同(从后台批处理进程到实时数据业务)。尽管需求千差万别,Bigtable 依然成功的为这些Google 的产品提供了稳定且高性能的解决方案。在这篇论文中我们将描述客户端动态控制数据分布和格式的简单数据模型以及Bigtable 的设计和实现。
1 Introduction
Over the last two and a half years we have designed, implemented, and deployed a distributed storage system for managing structured data at Google called Bigtable. Bigtable is designed to reliably scale to petabytes of data and thousands of machines. Bigtable has achieved several goals: wide applicability, scalability, high performance, and high availability. Bigtable is used by more than sixty Google products and projects, including
Google Analytics, Google Finance, Orkut, Personalized Search, Writely, and Google Earth. These products use Bigtable for a variety of demanding workloads, which range from throughput-oriented batch-processing jobs to latency-sensitive serving of data to end users. The Bigtable clusters used by these products span a wide range of configurations, from a handful to thousands of servers, and store up to several hundred terabytes of data. In manyways, Bigtable resembles a database: it shares many implementation strategies with databases. Parallel databases [14] and main-memory databases [13] have achieved scalability and high performance, but Bigtable provides a different interface than such systems. Bigtable does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format, and allows clients to reason about the locality properties of the data represented in the underlying storage. Data is indexed using row and column names that can be arbitrary strings. Bigtable also treats data as uninterpreted strings, although clients often serialize various forms of structured and semi-structured data into these strings. Clients can control the locality of their data through careful
choices in their schemas. Finally, Bigtable schema parameters let clients dynamically control whether to serve data out of memory or from disk.
Section 2 describes the data model in more detail, and Section 3 provides an overview of the client API. Section 4 briefly describes the underlying Google infrastructure on which Bigtable depends. Section 5 describes the fundamentals of the Bigtable implementation, and Section 6 describes some of the refinements that we made to improve Bigtable's performance. Section 7 provides measurements of Bigtable's performance. We describe several examples of how Bigtable is used at Google in Section 8, and discuss some lessons we learned in designing and supporting Bigtable in Section 9. Finally, Section 10 describes related work, and Section 11 presents our conclusions.
1 介绍
在过去的2年半时间里,我们设计实现并部署了Bigtable这个分布式存储系统用于管理Google的结构化数据。Bigtable 被设计为适用于PB 级数据和上千台机器分布这种数据规模。Bigtable 完成了如下目标:
- 广泛的适用性
- 可量测性
- 高性能和高可用性
Bigtable 被应用于超过60种 Google的产品和项目中,包括 Google Analytics, Google Finance, Orkut, Personalized Search, Writely, and Google Earth. 这些项目把Bigtable用于不同的工作需求中,从面向吞吐量的批量工作进程到针对终端用户的敏感响应时间的数据服务(译者注:这里可能指一些实时性要求比较强的服务,比如新闻等)。这些服务可以配置为使用少量的服务器一直到使用上千台服务器,存储容量最大到100TB。在很多情况下,Bigtable 更像一个数据库。它可以共享很多数据库实现策略。比如可测量的高性能并行数据库和内存数据库,但Bigtable 提供了不同于这些系统的操作界面。Bigtable 不支持完整的关系数据模型。取而代之的是提供一种可以动态控制数据分布和数据格式的简单数据模型并允许用户通过底层存储的表现来推理分析数据的位置属性。数据采用行名和列名的方式索引并可以命名为任意的字符串。Bigtalbe 还可以把数据视为连续的字符串尽管用户经常将不同格式的结构化和半结构化数据(半结构化数据指类型XML这样的数据,译者注)序列化到这些字符串中。用户可以通过选择他们的架构信息来控制这些数据的存储位置。最后,Bigtable 的结构参数可以让用户动态控制数据是存储在磁盘中还是内存中。
下面是后面各个章节的内容简介:
第2章描述详细的数据模型。
第3章提供用户API 的总体描述
第4章简单描述Bigtable 所依赖的google底层架构
第5章描述Bigtable 的基本实现
第6章描述我们用于提高Bigtable 性能的一些技巧
第7章提供对Bigtable性能的测量
第8章描述Bigtable 在Google 应用的例子
第9章讨论一些我们在设计和支持Bigtable中总结的经验
第10章描述相关工作
第11章结论
下面为译者的话
这篇文章是Bigtable 的设计者Jeffrey Dean 等所写的一篇比较权威的介绍Bigtable的论文,由于比较长,我将分各个章节陆续翻译,翻译采用中英文混合的方式,这样如果由于我英文水平有限而产生出入,也方便读者进行指正。Bigtable的设计思路对于构建大型分布式存储和数据检索系统有非常好的参考价值,其也成功的应用于google等世界知名的互联网公司。我所做的 HubbleDotNet 开源全文数据库项目的设计目标和Bigtable类似,也是要构建一个可伸缩的,简单模型(即不提供完整的关系模型)的大型分布式全文检索数据库系统。在后续版本中我将逐渐参考Bigtable 的设计架构,完善HubbleDotNet,希望能在.Net 平台下实现一个类似 Bigtable 这样的可靠的高性能的分布式全文检索数据库系统。以后可能会有大量的研究工作要做。博客园搜索引擎小组目前已经有600多人,但目前也只限于一些搜索应用实现上的问题反馈,以后如何让大家更多的参与到这方面的研究上来,还需要大家多出主意。