Hadoop生态框架介绍

一、总览

1.1 参考论文

引领大数据前进的三驾马车分别是:

1.2 生态组件

Components of the Hadoop Ecosystem

num item img
1 HDFS
2 MapReduce
3 YARN
4 HBase
5 Pig
6 Hive
7 Sqoop
8 Flume
9 Kafka
10 Zookeeper
11 Spark

总体框架图:
生态框架图

2. 各个框架

1.1 HDFS (Hadoop Distributed File System)

item content
主页 http://hadoop.apache.org/
文档 https://hadoop.apache.org/docs/current/
介绍 The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

1.2 MapReduce

item content
1 2

1.3 YARN

item content
主页 https://classic.yarnpkg.com/lang/en/docs/
介绍 Yarn is a package manager for your code. It allows you to use and share (e.g. JavaScript) code with other developers from around the world. Yarn does this quickly, securely, and reliably so you don’t ever have to worry.

Yarn allows you to use other developers’ solutions to different problems, making it easier for you to develop your software. If you have problems, you can report issues or contribute back, and when the problem is fixed, you can use Yarn to keep it all up to date.

Code is shared through something called a package (sometimes referred to as a module). A package contains all the code being shared as well as a package.json file which describes the package.

1.4 HBase

item content
官网 http://hbase.apache.org/
介绍 Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

1.5 Pig

...

1.6 Hive

...

1.7 Sqoop

...

1.8 Flume

...

1.9 Kafka

...

1.10 Zookeeper

item content
主页 http://zookeeper.apache.org/
介绍 Apache ZooKeeper is an effort to develop and maintain an open-source server which enables highly reliable distributed coordination.

What is ZooKeeper?

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.Learn more about ZooKeeper on the ZooKeeper Wiki.

1.11 Spark

item content
主页 http://spark.apache.org/
文档 http://spark.apache.org/docs/latest/
介绍 Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or cluster
posted @   那个白熊  阅读(87)  评论(0编辑  收藏  举报
编辑推荐:
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
阅读排行:
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· AI 智能体引爆开源社区「GitHub 热点速览」
· 从HTTP原因短语缺失研究HTTP/2和HTTP/3的设计差异
· 三行代码完成国际化适配,妙~啊~
点击右上角即可分享
微信分享提示