代码改变世界

News feed

2015-01-10 14:26  李涛的技术博客  阅读(254)  评论(0编辑  收藏  举报

1. Level 1.0

Database Schema:

a. User

UserID Name Age
1 Jason 25
2 Michael 26

 

b. Friendship

FriendshipID SourceID TargetID
1 1 2
2 2 1

 

c. News

NewsID AuthorID Content Timestamp
1 2 "Hello" 2015-01-10
2 1 "Hi" 2015-01-10

Why bad?

100+ friends

a. 1 query --> get friends list

b. 1 query -->

SELECT * FROM news

WHERE timestamp > xxx

AND authorID IN frientsList

LIMIT 1000

IN is slow

 

2. Level 2.0

a. Pull: Get news from each friend, merge them together. NewsFeed generated when user request.

b. Push: NewsFeed generated when news generated. We have anthor table to store newsfeed, may cause duplicate news.

Push: disadvantage: News Delay.

 

3. Level 3.0

a. Popular star(Justin Bieber)

  Flowers 13M +

  Async Push may cause over 30 minutes (13M+ insertions, delay too long)

b. Push + Push

  for popular star, don't push news

  for every newsfeed request, merge non-popular users newsfeed(push) and popular users newsfeed(pull).

 

4. Level 4.0

Push disadvantage:

a. Realtime

b. Storage (Duplicate)

c. Edit

Go back to Pull:

a. cache users' latest(14 days) news

b. Broadcast multiple request to multiple server (shard by userID)

c. Merge & sort newsfeed(time, forward frequency, friends' forward, dedup, sort)

d. Cache newsfeeds for this user with timestamp (user_login)