[Architecture]Facebook Chat
关于Facebook Chat的文章在InfoIQ已经出现很久很久了,正好Piaoger有看到了Facebook那位仁兄在Erlang-Factory上的一个PPT,结合起来看了看,还是有些用。
# Keywords
Realtime messaging, C++, Erlang, Long-polling, Thrift
# Challenges
▪ How does synchronous messaging work on the Web?
▪ “Presence” is hard to scale
▪ Need a system to queue and deliver messages
▪ Millions of connections, mostly idle
▪ Need logging, at least between page loads
▪ Make it work in Facebook’s environment
在用户上线或者下线时通知其所有好友的做法是非常幼稚可笑的,这么做的代价是O(平均好友个数×高峰期用户数×上下线频率) 条短信/秒, 上下线频率是指用户平均每秒上线和下线的次数。当每个用户好友的平均数量大约在几百个,高峰期同时在线用户数在百万数量级的时候,这种实现方法的效率简直 低得无法忍受。
Piaoger:
什么时候,我所工作着的Online Prouct也会有下面的困惑,那将是痛并快乐着:
在当产品的客户有可能在一夜之间从零增加到七千万的时候,可扩展性就变为从一开始就必须考虑的问题。
# System Overview
system Overview (Front-end)
▪ Mix of client-side Javascript and server-side PHP
▪ Regular AJAX for sending messages, fetching conversation history
▪ Periodic AJAX polling for list of online friends
▪ AJAX long-polling for messages (Comet)
System overview (Back-end)
▪ Discrete responsibilities for each service
- Communicate via Thrift
▪ Channel (Erlang): message queuing and delivery
- Queue messages in each user’s “channel”
- Deliver messages as responses to long-polling HTTP requests
▪ Presence (C++): aggregates online info in memory (pull-based presence)
▪ Chatlogger (C++): stores conversations between page loads
▪ Web tier (PHP): serves our vanilla web requests
在在集群和分区子系统上,Facebook选择了C++和Erlang的组合。C++模块用户用于记录聊天信息,而Erlang模块“将在线用户的对话保存在内存中并且对长时间轮询(long-polled)请求提供支持”。
# Realtime Messaging
Facebook采用的是客户端直接从服务器将新消息“拉”的方式,跟Comet的XHR长时间轮询(Comet's XHR Long Polling)过程比较相似.
Facebook的页面会加载一个iframe用于用户间消息的传递, 这个iframe中的Javascript代码发出一个HTTP GET请求,这个请求将建立与服务器的一个持久连接,直到有消息返回给用户为止。
# Channel Server Architecture
Overview
▪ One channel per user
▪ Web tier delivers messages for that user
▪ Channel State: short queue of sequenced messages
▪ Long poll for streaming (Comet)
▪ Clients make an HTTP request
- Server replies when a message is ready
- One active request per browser tab
Details
▪ Distributed design
- User id space is partitioned (division of labor)
- Each partition is serviced by a cluster (availability)
▪ Presence aggregation
- Channel servers are authoritative
- Periodically shipped to presence servers
▪ Open source: Erlang, Mochiweb, Thrift, Scribe, fb303, et al.
Channel Servers
Channel Applcations
# Dark launch
启动这项服务的方式也比较有意思——利用所谓的“摸黑启动(dark launch)” 。
一夜间就将客户数由零变为七千万的秘密就在于避免一步到位地完成这个过程。我们会首先模拟很多用户访问的场景,这是通过一个叫做“摸黑启动”的阶段实现的.在这个阶段中Facebook的页面会在没有任何UI元素的情况下连接聊天服务器,询问在线信息和模拟信息发送过程。
Piaoger: 这个玩意儿,和我们的Warmup是不是有得一比??
# References
[Facebook Chat的架构(CHS)] (http://www.infoq.com/cn/news/2008/05/facebookchatarchitecture)
[Facebook Chat Architecture(EN)] (http://www.infoq.com/news/2008/05/facebookchatarchitecture)
[Facebook Chat] (http://www.facebook.com/note.php?note_id=14218138919&id=9445547199&index=0)
[Erlang at Facebook] (http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf)