Twitter如何使用开源软件tweets your tweets

  在上一篇blog(Twitter背后的开源技术)中,Twitter主管开源的Manager Chris Aniszczyk为大家介绍了Twitter使用开源软件的情况。同时,也作为其在LinuxCon主题演讲《 The open source technology behind a Tweet.》的预告。LinuxCon已经结束,现在可以更加全面的窥视Twitter对开源软件的使用情况,以及一个tweet是如何完成其生命之旅的。

  

  下文是对《How Twitter tweets your tweets with open source》一文的摘抄及翻译,原文请点击这里

  

  1.Twitter's philosophy is to open-source almost all things. We take our software inspiration from Red Hat's development philosophy: 'default to open.''

     Twitter的哲学是开源all things。我们从Red Hat的开发哲学中获得启发:“默认开源”;

  

  2.Twitter的开源软件使用github托管,地址请点击这里

  

  3.If Unix and Linux are operating systems that are made of many utilities loosely coupled than Twitter is a social network made up of many open-source programs loosely couped together. Some parts will be familiar to anyone in Linux or Web development circles.

    如果说Unix和Linux是由许多组件松耦合的组合在一起的操作系统的话,Twiiter就是由许多开源软件以松耦合的形式构建的社交网络。一些组成部分被任何Linux或Web开发者所熟知。

  

  4.Twitter的操作系统使用2.6.39,数据库使用MySQL,版本管理使用Git.

  

  5.Twitter的规模:一年处理28亿Tweets,平均5000每秒。热点事件时会大幅增高,比如Twitter遇到的挑战之一:25088 TPS。

     

  What happens with each of these tweets is they put are registered as a status update. Then each one is given a unique ID using a program called snowflake. Next, it's geolocation data is noted by Rockdove, a program that hasn't been made open-source yet.

    首先每一个发出的tweet被注册作为一个状态的更新。使用snowflake获取一个唯一的ID,使用Rockdove处理地理位置数据(Rockdove尚未开源);

  

  Each tweet is then checked by a combination URL shortener and spam detector called t.co. Once past this stage, each tweet is stored in MYSQL by Gizzard, a flexible sharding framework for creating eventually-consistent distributed datastores. Now, and only now is an HTTP 200 signal, meaning all has gone well, to your Web browser.

    t.co检查每一个tweet,完成短址和SPAM检测的工作。这部分完成之后,使用Gizzard,每一个tweet均被存在MySQL之中。Gizzard是一个灵活的sharding框架,可以帮助完成最终一致的分布式数据存储。在存储完成后,你的浏览器收到一个HTTP 200响应。

  

  Of course at this point your tweet hasn't gone out to the world. First, your tweets get started on their way to Bing and other search programs using the Firehose application programming interface (API). Finally, your tweets are ready for fanout, that is heading to your friends, family, and fans.

    接下来,tweet开始传播。第一步是通过Firehose API,将数据导给Bing和其他有合作关系的搜索引擎。然后,开始准备向你的朋友、家人、粉丝发送。

  

  The actual process is handled by FlockDB. This is an open-source graph database that sits on Gizzard and pulls data from MySQL. FlockDB contains all of Twitter's users and their relationships to one another. Now, armed with the your followers addresses your tweets are finally on their way.

     后续实际的过程由FlockDB处理。FlockDB是一个开源的图数据库,其基于Gizzard,从MySQL中获取数据。FlockDB存储有所有Twitter用户及相互关系的信息数据。在这之后,每一个tweet将会带有你的followers的地址信息,然后向特定的人发送。

    

  The average time all this takes? About 350-milliseconds. Not bad for a system handling 5,000 TPS every day, 24-hours a day.

    整个过程需要350ms.

    

  Twitter may be causing some of its would-be partners grief with tighter API rules, but the company itself does an exceptional job of delivering thousands of messages every moment of the day with open-source software.

    Twitter日益缩紧的API政策正在引起合作伙伴的不安和不满,但公司技术团队正通过使用开源软件杰出的完成每秒递送数千条消息的任务。

posted @ 2012-09-02 19:45  刘浩de技术博客  阅读(2041)  评论(0编辑  收藏  举报