Hadoop生态框架介绍
目录
一、总览
1.1 参考论文
引领大数据前进的三驾马车分别是:
1.2 生态组件
Components of the Hadoop Ecosystem
num | item | img |
---|---|---|
1 | HDFS | ![]() |
2 | MapReduce | ![]() |
3 | YARN | ![]() |
4 | HBase | ![]() |
5 | Pig | ![]() |
6 | Hive | ![]() |
7 | Sqoop | ![]() |
8 | Flume | ![]() |
9 | Kafka | ![]() |
10 | Zookeeper | ![]() |
11 | Spark | ![]() |
总体框架图:
2. 各个框架
1.1 HDFS (Hadoop Distributed File System)

item | content |
---|---|
主页 | http://hadoop.apache.org/ |
文档 | https://hadoop.apache.org/docs/current/ |
介绍 | The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. |
1.2 MapReduce

item | content |
---|---|
1 | 2 |
1.3 YARN

item | content |
---|---|
主页 | https://classic.yarnpkg.com/lang/en/docs/ |
介绍 | Yarn is a package manager for your code. It allows you to use and share (e.g. JavaScript) code with other developers from around the world. Yarn does this quickly, securely, and reliably so you don’t ever have to worry. Yarn allows you to use other developers’ solutions to different problems, making it easier for you to develop your software. If you have problems, you can report issues or contribute back, and when the problem is fixed, you can use Yarn to keep it all up to date. Code is shared through something called a package (sometimes referred to as a module). A package contains all the code being shared as well as a package.json file which describes the package. |
1.4 HBase

item | content |
---|---|
官网 | http://hbase.apache.org/ |
介绍 | Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. |
1.5 Pig

...
1.6 Hive

...
1.7 Sqoop

...
1.8 Flume

...
1.9 Kafka

...
1.10 Zookeeper

item | content |
---|---|
主页 | http://zookeeper.apache.org/ |
介绍 | Apache ZooKeeper is an effort to develop and maintain an open-source server which enables highly reliable distributed coordination. What is ZooKeeper? ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.Learn more about ZooKeeper on the ZooKeeper Wiki. |
1.11 Spark

item | content |
---|---|
主页 | http://spark.apache.org/ |
文档 | http://spark.apache.org/docs/latest/ |
介绍 | Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or cluster |
有了计划记得推动,不要原地踏步。
标签:
数据科学与技术
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· AI 智能体引爆开源社区「GitHub 热点速览」
· 从HTTP原因短语缺失研究HTTP/2和HTTP/3的设计差异
· 三行代码完成国际化适配,妙~啊~