System design interview: how to design a chat system (e.g., Facebook Messenger, WeChat or WhatsApp)
System design interview: how to design a chat system (e.g., Messenger, WeChat or WhatsApp)
Methodology: READ MF!
For system design interview questions, normally we should follow the "READ MF!" steps. You can easily remember this as "Read! Mother Fucker!", similar to RTFM, with an attitude to your interviewer (Because if I can design this complicated system in 60 minutes, why would I waste my time interviewing here? JK)
READ MF! (Requirement, Estimation, Architecture, Details, Miscellaneous, Future)
- Requirement clarification, keep it simple first and make some assumptions to make your life easier, later we can drill down or improve.
- Estimation on scalability (QPS, storage size, network bandwidth per sec) the scalability determines the final architecture since we should not over engineering things, keep it simple first
- Design system Architecture & Layers, major services and responsibilities, front-end/edge layer, service layer & storage layer
- Drill down Details into each component, e.g, data models on storage layers, API interface between micro services
- Miscellaneous. Talk about design trade-offs, bottlenecks, peak traffic handling, failover plan, monitoring & alerting, security concerns. Talk things that you are most comfortable with
- Future work on if new features are added, how system would be extended to support them
Key designs and terms
- WeSocket(On OSI 7, application layer) is more suitable for real time chatting over HTTP due to the bidirectional nature (HTTP long polling is also not efficient)
- Maintaining connection efficiently. We should reuse those socket connections since it's not efficient to recreate them (still under IP&TCP), also utilize each machine in Chat Service to host more connections. In real world, WhatsApp tech stack Erlang + FreeBSD can host 1 Million connections per commodity host.
- Data storage solutions. For large volumn writes and range query (for latest messages), we might not want to go with relational database (write is not super efficient for large scale) and we want to read from memory as well. We could cache over a relational DB but we can also choose HBase (LSM Tree), Cassandra(SSTables) or even Redis
- Sent, Delivered & Seen State are easy given the above graph. Sent is when Chat Service returns success. Delivered when message is written into the main coveration storage, Seen is when user pulls the latest chat history or connection is open chat service successfully pushed the message. Tying is simply a listener on the client side.
Baozi Youtube Video
Existing Resources (Credits to original authors)
- Rick Reed on F8 about WhatsApp Design [Video]
- Culture is important! 57 engineers in total to reach 1B WhatsApp users, including clients and server sides
- Design principle: Just enough engineering. Only provide the most essential features. E.g., simple layers between services, go as native as possible for performance gains, simple data replication by keeping a hot copy
- Small number of servers, each server support 1M connections. Not just about cost saving, also about easier to maintain (since you have less servers), How to scale to Millions of Simultaneous Connections: Video, Slide
- Transient messages when users are offline: WhatsApp won’t delete those messages unless all recipients get it
- High Scalability: The WhatsApp Architecture Facebook Bought For $19 Billion
- Tushar Roy on Youtube