system design(how to design tweet)

Catalog

Clarify the requirements
Capacity Estimation
System APIs
High-level System Design
Data Storage
Scalability

Step1: Clarify the requirements

Clarify requirements and goals of the system

Requirements
Traffic size(e.g. Daily Active User)

Nobody expect you do design a complete system in 30-40 mins

Discuss the functionalities, align with interviewers or components to focus

Type1: Functional Requirement

Tweet
- a. Create
- b. Delete
Timeline/Feed
- a. Home
- b. User
Follow a user
Like a tweet
Search tweets
...

Type2: Non-Functional Requirement

Consistency
- Every read receives the most recent write or an error
- Sacrifice: Eventual consistency
Availability
- Every request receives a response, without the guarantee that it contains the most recent write
- Scalable
  - Performance: low latency
Partion tolerance(Fault Tolerance)
- The system continues to operate despite an arbitrary number of messages being dropped by the network between nodes

Step2: Capacity Estimation

Assumption:
- 200 million DAU, 100 million new tweets
- Each user: visit home timeline 5 times; other user timeline 3 times
- Each timeline/page has 20 tweets
- Each tweet has size 280 bytes, matadatda 30 bytes
- per photo: 200kb, 20% tweets have images
- per video: 2mb, 10% tweets have video, 30% videos will be watched

Storage Estimate

Write size daily:
- Text：
  - 100M new tweets*(280+30)bytes/tweet = 31GB/day
- Image:
  - 4TB/day
- Video:
  - 20TB/day
Total
- 24TB/day

Daily Read Tweets Volume:
- 200M * (5 home visit + 3 user visit) * 20 tweets/page = 32B tweets/day
Daily Read Bandwidth:

Text: 23B * 280bytes / 86400 = 100MB/s
Image: 14GB/s
Video: 20GB/s
Total: 35GB/s

Step3: System APIs

postTweet(userToken, string tweet)

deleteTweet(userToken, string tweetId)

likeOrUnlikeTweet(userToken, string tweetId, bool like)

readHomeTimeLine(userToken, int pageSize, opt string pageToken)

readUserTimeLine(userToken, int pageSize, opt string pageToken)

Step4: High-Level System Design:

post tweets

user timeline(push/pull mode)

https://medium.com/@winapp/read-fast-with-fan-out-write-f25257117297

Home Timeline (cant d)

Fan out on write

Not efficient for users with huge amount of followers(like Taylor Swift)

Hybrid Solution

Non-hot users:
- fan out on write(push)
Hot users:
- fan in on write(pull): read during timeline request from tweets cache, and aggregate with results from non-hot users

Step5: Data Storage

principles

SQL database:
- e.g, user table
NoSQL database:
- e.g, timelines
File system:
- media file: image, audio, video

Step6: Scalability

Identify potential bottlenecks
Discussion solutions, focusing on tradeoffs
- Data sharding
  - data store, cache
- Load balancing
  - user <-> application server
  - application server <-> cache server
  - application server <-> db
- Data caching
  - read heavy

Sharding

Why?

impossible to store/process all data in a single machine

How?

Break large tables into smaller shards on multiple servers

Pros

Horizontal scaling

Cons

Complexity(distributed query, resharding...)

Option 1: shard by tweets' creation time

Pros:

Limited shards to query

Cons:

Hot/Cold data issue
New shards fill up quickly

Option 2: Shard by hash(userId): store all the data of user on a single shard

Pros:

Simple
Query user timeline is straightforward

Cons:

Home timeline stall needs to query multiple shards
Non-uniform distribution of storage
Hot users
Availability

Option 3: Shard by hash(tweetId)

Pros:

uniform distribution
high availability

Cons:

need to query all shards in order to generate user/home timeline（cache solution）

Caching

Why?

social networks have heavy read traffic
queries can be slow and cosyly

How?

store hot/ precompuyed data in memory, reads can much faster

Timeline service

user timelinme: user_id ->
home timeline: user_id ->
tweets: tweet_id -> tweet

Topics:

caching policy
sharding
performance

ref

https://www.youtube.com/watch?v=PMCdWr6ejpw&list=PLLuMmzMTgVK4RuSJjXUxjeUt3-vSyA1Or&index=1

posted @ 2021-08-05 11:39 zhangyu63 阅读(80) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

CherryTab

system design(how to design tweet)

Catalog

Step1: Clarify the requirements

Type1: Functional Requirement

Type2: Non-Functional Requirement

Step2: Capacity Estimation

Storage Estimate

Step3: System APIs

Step4: High-Level System Design:

Home Timeline (cant d)

Step5: Data Storage

principles

Step6: Scalability

Sharding

Option 1: shard by tweets' creation time

Option 2: Shard by hash(userId): store all the data of user on a single shard

Option 3: Shard by hash(tweetId)

Caching

ref

公告

CherryTab

system design(how to design tweet)

Catalog

Step1: Clarify the requirements

Type1: Functional Requirement

Type2: Non-Functional Requirement

Step2: Capacity Estimation

Storage Estimate

Bandwidth Estimate (Social Networking => read heavy)

Step3: System APIs

Step4: High-Level System Design:

Home Timeline (cant d)

Step5: Data Storage

principles

Step6: Scalability

Sharding

Option 1: shard by tweets' creation time

Option 2: Shard by hash(userId): store all the data of user on a single shard

Option 3: Shard by hash(tweetId)

Caching

ref

公告