随笔分类 -  Hadoop Ecosystem

Hadoop,HBase, Bigtable, GFS
Apache Tez Design
摘要:http://tez.incubator.apache.org/ http://dongxicheng.org/mapreduce-nextgen/apache-tez/ http://dongxicheng.org/mapreduce-nextgen/apache-tez-newest-progress/ Tez aims to be a general purpose execut... 阅读全文

posted @ 2013-10-19 11:45 fxjwind 阅读(1817) 评论(0) 推荐(0)

YARN - Yet Another Resource Negotiator
摘要:http://www.socc2013.org/home/program http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/ Hadoop V1.0的问题 Hadoop被发明的时候是用于index海量的web crawls, 所以它很适应那个场景, 但是现在Hadoop被当作一种通用的计算平台, 这个已经... 阅读全文

posted @ 2013-10-18 11:11 fxjwind 阅读(930) 评论(0) 推荐(1)

HBase-TDG Schema Design
摘要:这章主要描述怎样设计HBase schema. 关于这个主题, 强烈推荐下面这个presentation, 写的非常清晰. 首先再次强调的是, Nosql无法替代SQL, 对于非bigdata, 毫无疑问SQL更加好用. 对于系统或场景, 我们不应该执着的想着用Nosql去替代SQL, 而是仅仅将SQL无法handle那部分big data(往往关系性不强)放到Nosql上. ... 阅读全文

posted @ 2012-10-24 15:39 fxjwind 阅读(2448) 评论(3) 推荐(1)

HBase-TDG Architecture
摘要:Seek vs. Transfer 我之前专门比较过B+ tree和LSM tree http://www.cnblogs.com/fxjwind/archive/2012/06/09/2543357.html 里面最后一篇blog比较好的分析使用B+ tree和LSM tree (Log-Structured Merge-Trees) 的本质, 读写效率的balance, 全局有序和局部有... 阅读全文

posted @ 2012-10-12 16:56 fxjwind 阅读(504) 评论(0) 推荐(0)

HBase-TDG ClientAPI Advanced Features
摘要:Advanced Features Filters HBase filters are a powerful feature that can greatly enhance your effectiveness working with data stored in tables. You will find predefined filters, already provided by ... 阅读全文

posted @ 2012-10-10 11:06 fxjwind 阅读(475) 评论(0) 推荐(0)

HBase-TDG ClientAPI The Basics
摘要:General Notes The primary client interface to HBase is the HTable class in the org.apache.hadoop.hbase.client package. It provides the user with all the functionality needed to store and retrieve... 阅读全文

posted @ 2012-09-26 15:34 fxjwind 阅读(799) 评论(0) 推荐(0)

HBase-TDG Introduction
摘要:Before we start looking into all the moving parts of HBase, let us pause to think about why there was a need to come up with yet another storage architecture. Relational database management systems (RDBMS) have been around since the early 1970s, and have helped countless companies and organizations 阅读全文

posted @ 2012-09-25 11:38 fxjwind 阅读(400) 评论(0) 推荐(0)

Hadoop TDG 3 – MR Features
摘要:Counters There are often things you would like to know about the data you are analyzing but that are peripheral to the analysis you are performing. For example, if you were counting invalid records a... 阅读全文

posted @ 2012-09-17 18:19 fxjwind 阅读(387) 评论(0) 推荐(0)

Hadoop TDG 3 – MR Job
摘要:Anatomy of a MapReduce Job Run Classic MapReduce (MapReduce 1) A job run in classic MapReduce is illustrated in Figure 6-1. At the highest level, there are four independent entities: • The clien... 阅读全文

posted @ 2012-09-12 11:35 fxjwind 阅读(529) 评论(0) 推荐(0)

Hadoop TDG 2 – Development Environment
摘要:GenericOptionsParser, Tool, and ToolRunnerHadoop comes with a few helper classes for making it easier to run jobs from the command line. GenericOptionsParser is a class that interprets common Hadoop command-line options and sets them on a Configuration object for your application to use as desired. 阅读全文

posted @ 2012-09-08 14:48 fxjwind 阅读(339) 评论(0) 推荐(0)

Hadoop TDG 2 – I/O
摘要:Data Integrity HDFS transparently checksums all data written to it and by default verifies checksums when reading data. A separate checksum is created for every io.bytes.per.checksum bytes of data. T... 阅读全文

posted @ 2012-09-06 11:27 fxjwind 阅读(398) 评论(0) 推荐(0)

Hadoop TDG 2 -- HDFS
摘要:The Hadoop Distributed Filesystem The Design of HDFS HDFS is a filesystem designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware. Let... 阅读全文

posted @ 2012-08-27 17:39 fxjwind 阅读(1154) 评论(0) 推荐(0)

Lars George , 关于Hadoop和HBase的Blog
摘要:http://www.oreillynet.com/pub/au/4685 HBase: The Definitive Guide的作者 HBase Architecture 101 - Storage http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html HBase最隐秘的问题之一就是它的数据是... 阅读全文

posted @ 2012-08-21 17:51 fxjwind 阅读(915) 评论(0) 推荐(0)

GFS - The Google File System
摘要:The Google File System http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.125.789&rep=rep1&type=pdf http://www.dbthink.com/?p=501, 中文翻译 Google牛人云集的地方, 但在设计系统时, 却非常务实, 没有采用什么复杂和时髦的算法和机制 ... 阅读全文

posted @ 2012-07-17 17:00 fxjwind 阅读(9357) 评论(0) 推荐(0)

bigtable: A Distributed Storage System for Structured Data
摘要:bigtable: A Distributed Storage System for Structured Data http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/bigtable-osdi06.pdf http://www.dbthink.... 阅读全文

posted @ 2012-07-07 17:46 fxjwind 阅读(2574) 评论(0) 推荐(0)

Hadoop——你不得不了解的大数据工具
摘要:转篇blog, 因为里面图不错, 以后找的方便 http://cloud.csdn.net/a/20120220/312061.html 如今Apache Hadoop已成为大数据行业发展背后的驱动力。Hive和Pig等技术也经常被提到,但是他们都有什么功能,为什么会需要奇怪的名字(如Oozie,ZooKeeper、Flume)。 Hadoop带来了廉价的处理大数据(大数据的数据容... 阅读全文

posted @ 2012-02-21 11:50 fxjwind 阅读(397) 评论(0) 推荐(0)

Hadoop- The Definitive Guide 笔记二
摘要:The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing, including:Hadoop Core , our flagship sub-project, provides a distributed filesystem (HDFS) and su... 阅读全文

posted @ 2011-07-04 21:00 fxjwind 阅读(617) 评论(0) 推荐(0)

Hadoop TDG 2 -- introduction
摘要:首先我们为什么需要Hadoop? The good news is that Big Data is here. The bad news is that we are struggling to store and analyze it. 面对海量的数据,我们需要高效的分析和存储他们,而Hadoop可以做到这点, This, in a nutshell, is what Hadoop p... 阅读全文

posted @ 2011-07-04 20:57 fxjwind 阅读(735) 评论(0) 推荐(0)