分布式系统阅读清单 via jobbole

简介

我常常主张说，研究分布式系统最难的是改变你思考的方式。对于激发这种改变，我找到的一些很实用的阅读材料。如下。

Thought Provokers

一些让你考虑你设计方式的随笔。不是所有事都可以靠大服务器，数据库和事物来解决的。

Amazon

有些有关的技术，但更有趣的是他们创造的与之配合的文化和结构。

Google

当前分布式系统领域的“火箭科学”（形容艰深的学问）

MapReduce
Chubby Lock Manager
Google File System
BigTable
Data Management for Internet-Scale Single-Sign-On
Dremel: Interactive Analysis of Web-Scale Datasets
Large-scale Incremental Processing Using Distributed Transactions and Notifications
Megastore: Providing Scalable, Highly Available Storage for Interactive Services – 实现跨数据中心、低延迟的paxos算法的巧妙设计。
Spanner – Google的可扩展、多版本、全球分布且同步复制的数据库。
Photon – 连续数据流的容错和扩容。扩容是非常困难的，尤其是在时钟偏移、高可用性和分布式的情况下.
Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing 用于存储谷歌互联网广告业务相关的关键测量数据的数据仓库系统。

eBay

有趣的是他们抛弃了大多数的J2EE，并使用了大量的数据库分区。同时，看看他们的网站升级工具。

一致性模型

构建能够适应环境的系统的关键是寻求正确权衡一致性和可用性。

CAP Conjecture – 一致性，可用性，分区容忍性不可能同时满足
Consistency, Availability, and Convergence – 证明了在一个典型系统中一致性可能的上界。
CAP Twelve Years Later: How the “Rules” Have Changed – Eric Brewer 在原来权衡描述工作上的扩展
Consistency and Availability – Vogels
Eventual Consistency – Vogels
Avoiding Two-Phase Commit – 两阶段提交的避免方法
2PC or not 2PC, Wherefore Art Thou XA – 两阶段提交不是银弹
Life Beyond Distributed Transactions – Helland
If you have too much data, then ‘good enough’ is good enough – NoSQL, 数据理论的未来- Pat Helland
Starbucks doesn’t do two phase commit – 在起作用的异步机制
You Can’t Sacrifice Partition Tolerance – 另外的 CAP 说明
Optimistic Replication – 数据主从复制的弱一致性方法

理论

一些描述了分布式系统设计中各种各样的重要因素的论文。

Distributed Computing Economics – Jim Gray
Rules of Thumb in Data Engineering – Jim Gray and Prashant Shenoy
Fallacies of Distributed Computing – Peter Deutsch
Impossibility of distributed consensus with one faulty process 也称为FLP [访问需要帐号或付费，免费版本在这里： here]
Unreliable Failure Detectors for Reliable Distributed Systems.一种处理FLP难题的方法
Lamport Clocks -当每台电脑的时钟都是独立的时候，你如何建立对时间的全局视图。
The Byzantine Generals Problem
Lazy Replication: Exploiting the Semantics of Distributed Services
Scalable Agreement – Towards Ordering as a Service
Scalable Eventually Consistent Counters over Unreliable Networks 在不可靠的世界，可扩展计数很困难。

语言和工具

使用特定技术构建分布式系统的问题。

Programming Distributed Erlang Applications: Pitfalls and Recipes 构建可靠的分布式应用并不仅仅是的选择Erlang还是OTP的问题那么简单。

基础设施

存储

Paxos 一致性算法

理解这种算法是一个挑战。我建议在阅读其他论文之前先读读“Paxos Made Simple”，然后在读完其他论文之后，再读一遍。

The Part-Time Parliament – Leslie Lamport
Paxos Made Simple – Leslie Lamport
Paxos Made Live – An Engineering Perspective – Chandra等人
Revisiting the Paxos Algorithm – Lynch 等人
How to build a highly available system with consensus – Butler Lampson
Reconfiguring a State Machine – Lamport 等人 -改变集群的成员
Implementing Fault-Tolerant Services Using the State Machine Approach: a Tutorial – Fred Schneider

其他一致性文章

Gossip 协议（传染行为）

Epidemic Routing Bibliography
How robust are gossip-based communication protocols
Astrolabe: A Robust and Scalable Technology For Distributed Systems Monitoring, Management, and Data Mining
Epidemic Computing at Cornell
Fighting Fire With Fire: Using Randomized Gossip To Combat Stochastic Scalability Limits
Bi-Modal Multicast
ACM SIGOPS Operating Systems Review – Gossip-based computer networking
SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol

P2P

posted @ 2015-11-23 11:57 scott_h 阅读(243) 评论(0) 编辑收藏举报

刷新页面返回顶部

scott_h