分布式学习材料Distributed System Prerequisite List

 

接下的内容按几个大类来列:
1. 文件系统
a. GFS – The Google File System
b. HDFS
1) The Hadoop Distributed File System
2) The Hadoop Distributed File System: Architecture And Design
c. XFS – The Tencent File System

2. 数据库系统
a. BigTable – BigTable: A Distributed Storage System for Structured Data
b. HBase – The Apache HBase Reference Guide
c. Dynamo – Dynamo: Amazon’s Highly Available Key-Value Store
d. Megastore – Megastore: Providing Scalable, Highly Available Storage for Interactive Services 
e. Spanner – Spanner: Google’s Globally-Distributed Database
f. Azure – Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency
g. Percolator – Large-scale Incremental Processing Using Distributed Transactions and Notifications

3. 机群/资源管理系统
a. Omega – Omega: Flexible, Scalable Schedulers for Large Compute Clusters
b. Autopilot – Autopilot: Automatic Data Center Management
c. Yarn
1) Architecture of Next Generation Apache Hadoop MapReduce Framework
2) The Next Generation of Apache Hadoop Mapreduce
3) Introducing Apache Hadoop YARN
d. Mesos – A Platform for Fine-Grained Resource Sharing in the Data Center

4. 计算框架:
a. MapReduce – MapReduce: Simplified Data Processing on Large Clusters
b. Storm – Storm: Distributed and Fault-Tolerant Realtime Computaion
c. Spark – Spark: Cluster Computing with Working Sets
d. Impala – Cloudera Impala: Real-Time Querie in Apache Hadoop
e. Dremel – Dremel: Interactive Analysis of Web-Scale Datasets
f. Hive/Stinger
1) Hive: A Warehousing Solution Over a MapReduce Framework
2) Hive: A Petabyte Scale Data Warehouse Using Hadoop
3) The Stinger Initiative: Making Apache Hive 100 Times Faster
4) Stinger, Interactive Query for Apache Hive
g. FlumeJava/Crunch
1) FlumeJava: Easy, Efficient Data-Parellel Pipelines
2) Introducing Crunch: Easy MapReduce Pipelines for Apache Hadoop
h. Tez
1) Apache Hadoop Tez
2) Apache Tez: A New Chapter in Hadoop Data Processing
g. Presto – Presto: Interacting with petabytes of data at Facebook

5. 分布式一致性
a. Paxos – Paxos Made Simple
b. Zookeeper
1) Zookeeper: A Distributed Coordination Service for Distributes Applications
2) Zookeeper: Wait-free Coordination for Internet-scale Systems
c. Chubby – The Chubby Lock Service for Loosely-coupled Distributed Systems
d. Raft – In Search of an Understandable Consensus Algorithm

6. 其它
a. SequenceFile – Sequence File Format
b. SSTable
1) SSTable and Log Structured Storage: LevelDB
2) SSTable Storage Format
c. RCFile – RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems
d. ORCFile – ORC File Format
e. Parquet – Parquet: Columnar Storage for The People

posted @ 2015-12-22 14:47  ThinkDiff  阅读(268)  评论(0编辑  收藏  举报