上一页 1 ··· 41 42 43 44 45 46 47 48 49 ··· 53 下一页

HBase-TDG ClientAPI The Basics

摘要: General Notes The primary client interface to HBase is the HTable class in the org.apache.hadoop.hbase.client package. It provides the user with all the functionality needed to store and retrieve... 阅读全文
posted @ 2012-09-26 15:34 fxjwind 阅读(792) 评论(0) 推荐(0) 编辑

HBase-TDG Introduction

摘要: Before we start looking into all the moving parts of HBase, let us pause to think about why there was a need to come up with yet another storage architecture. Relational database management systems (RDBMS) have been around since the early 1970s, and have helped countless companies and organizations 阅读全文
posted @ 2012-09-25 11:38 fxjwind 阅读(379) 评论(0) 推荐(0) 编辑

Hadoop TDG 3 – MR Features

摘要: Counters There are often things you would like to know about the data you are analyzing but that are peripheral to the analysis you are performing. For example, if you were counting invalid records a... 阅读全文
posted @ 2012-09-17 18:19 fxjwind 阅读(385) 评论(0) 推荐(0) 编辑

Hadoop TDG 3 – MR Job

摘要: Anatomy of a MapReduce Job Run Classic MapReduce (MapReduce 1) A job run in classic MapReduce is illustrated in Figure 6-1. At the highest level, there are four independent entities: • The clien... 阅读全文
posted @ 2012-09-12 11:35 fxjwind 阅读(526) 评论(0) 推荐(0) 编辑

Hadoop TDG 2 – Development Environment

摘要: GenericOptionsParser, Tool, and ToolRunnerHadoop comes with a few helper classes for making it easier to run jobs from the command line. GenericOptionsParser is a class that interprets common Hadoop command-line options and sets them on a Configuration object for your application to use as desired. 阅读全文
posted @ 2012-09-08 14:48 fxjwind 阅读(337) 评论(0) 推荐(0) 编辑

Hadoop TDG 2 – I/O

摘要: Data Integrity HDFS transparently checksums all data written to it and by default verifies checksums when reading data. A separate checksum is created for every io.bytes.per.checksum bytes of data. T... 阅读全文
posted @ 2012-09-06 11:27 fxjwind 阅读(395) 评论(0) 推荐(0) 编辑

Hadoop TDG 2 -- HDFS

摘要: The Hadoop Distributed Filesystem The Design of HDFS HDFS is a filesystem designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware. Let... 阅读全文
posted @ 2012-08-27 17:39 fxjwind 阅读(1142) 评论(0) 推荐(0) 编辑

Lars George , 关于Hadoop和HBase的Blog

摘要: http://www.oreillynet.com/pub/au/4685 HBase: The Definitive Guide的作者 HBase Architecture 101 - Storage http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html HBase最隐秘的问题之一就是它的数据是... 阅读全文
posted @ 2012-08-21 17:51 fxjwind 阅读(905) 评论(0) 推荐(0) 编辑

详解SSTable结构和LSMTree索引

摘要: http://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/, SSTable and Log Structured Storage: LevelDB The Sorted String Table (SSTable) is one of the most popular outputs for s... 阅读全文
posted @ 2012-08-14 17:19 fxjwind 阅读(15830) 评论(0) 推荐(1) 编辑

Cassandra - A Decentralized Structured Storage System

摘要: http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf, 英文 http://www.dbthink.com/?p=372, 中文 对Cassandra并没有深入研究, 在data server上copy了bigtable, 而在分布式nodes管理上copy了Dynamo的去中心化的架构, 可以... 阅读全文
posted @ 2012-08-10 14:32 fxjwind 阅读(811) 评论(0) 推荐(1) 编辑
上一页 1 ··· 41 42 43 44 45 46 47 48 49 ··· 53 下一页