腾讯SOSO养了群垃圾白痴的开发者

实在不想骂,你它妈开发soso蜘蛛的 实在是太蠢了!!早在7,8个月前,注意到一个IP段为124.115.0.*  的匿名蜘蛛 疯狂的读取网页,最开始每秒抓取最高有20多个网页, 那几天数个服务器经常被搞死, 后来就把这个IP段封了, 当时一直查询不到这个IP段是归哪个公司, 因此曾在techweb和其他地方发过贴报告过这个事情, 也是希望做这个开发的能够改进。 但这个蜘蛛的行为一直没有改变。 最愚蠢的是: 每个网页内带的同一个.js和css以及其它文件, 不管是那个网页, 总是重复读取同一个文件js和css文件,每次读网页后重复地读取,你有多少网页它就读多少次JS和css文件。, 后来搜索引擎上关于这个IP段的流氓行为的报告慢慢多起来了, google上有网友报告 124.115.4.* IP是属于腾讯公司,因此 猜测 124.115.0.*也是腾讯的。 由于没有直接证据, 因此一直得不到证实。

刚才查看日志发现124.115.4.* 和124.115.0.*被正式冠名sosospider来抓取网页, 由于我只封了124.115.0.*,对124.115.4.* IP段没有禁止, 因此才发现这些IP都是腾讯的, 下面是日志片断, 大家看这个搞开发的是不是蠢到家了:

124.115.4.208 - - [31/Dec/2007:15:31:40 +0800] "GET /tl/583845.htm HTTP/1.1" 200 2476 "http://www.domain.com/tl/583845.htm" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.4.208 - - [31/Dec/2007:15:31:41 +0800] "GET /css/tr.css HTTP/1.1" 200 742 "http://www.domain.com/css/tr.css" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.4.208 - - [31/Dec/2007:15:31:42 +0800] "GET /tl/423817.htm HTTP/1.1" 200 2375 "http://www.domain.com/tl/423817.htm" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.4.208 - - [31/Dec/2007:15:31:43 +0800] "GET /p/p.js HTTP/1.1" 200 460 "http://www.domain.com/p/p.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.4.208 - - [31/Dec/2007:15:31:43 +0800] "GET /css/tr.css HTTP/1.1" 200 742 "http://www.domain.com/css/tr.css" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.4.208 - - [31/Dec/2007:15:31:44 +0800] "GET /js/s.js HTTP/1.1" 200 358 "http://www.domain.com/js/s.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.4.208 - - [31/Dec/2007:15:31:44 +0800] "GET /p/p.js HTTP/1.1" 200 460 "http://www.domain.com/p/p.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.4.208 - - [31/Dec/2007:15:31:45 +0800] "GET /p/free.js HTTP/1.1" 200 334 "http://www.domain.com/p/free.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.4.208 - - [31/Dec/2007:15:31:45 +0800] "GET /js/s.js HTTP/1.1" 200 358 "http://www.domain.com/js/s.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.4.208 - - [31/Dec/2007:15:31:46 +0800] "GET /js/search.js HTTP/1.1" 200 791 "http://www.domain.com/js/search.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.4.208 - - [31/Dec/2007:15:31:47 +0800] "GET /p/free.js HTTP/1.1" 200 334 "http://www.domain.com/p/free.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.4.208 - - [31/Dec/2007:15:31:48 +0800] "GET /js/search.js HTTP/1.1" 200 791 "http://www.domain.com/js/search.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.0.164 - - [31/Dec/2007:15:31:48 +0800] "GET /css/tr.css HTTP/1.1" 200 742 "http://www.domain.com/css/tr.css" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.0.164 - - [31/Dec/2007:15:31:50 +0800] "GET /p/p.js HTTP/1.1" 200 460 "http://www.domain.com/p/p.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.0.164 - - [31/Dec/2007:15:31:52 +0800] "GET /js/s.js HTTP/1.1" 200 358 "http://www.domain.com/js/s.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.0.164 - - [31/Dec/2007:15:31:52 +0800] "GET /css/tr.css HTTP/1.1" 200 742 "http://www.domain.com/css/tr.css" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.0.164 - - [31/Dec/2007:15:31:53 +0800] "GET /p/free.js HTTP/1.1" 200 334 "http://www.domain.com/p/free.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.0.164 - - [31/Dec/2007:15:31:54 +0800] "GET /p/p.js HTTP/1.1" 200 460 "http://www.domain.com/p/p.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.0.164 - - [31/Dec/2007:15:31:55 +0800] "GET /js/search.js HTTP/1.1" 200 791 "http://www.domain.com/js/search.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.0.164 - - [31/Dec/2007:15:31:55 +0800] "GET /js/s.js HTTP/1.1" 200 358 "http://www.domain.com/js/s.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.0.164 - - [31/Dec/2007:15:31:57 +0800] "GET /p/free.js?ref=http%3A%2F%2Fwww.domain.com%2Ftl%2F184016.htm HTTP/1.1" 200 334 "http://www.domain.com/p/free.js" "Sosospider+(+http://help.soso.com/webspider.htm)"
124.115.0.164 - - [31/Dec/2007:15:31:59 +0800] "GET /js/search.js HTTP/1.1" 200 791 "http://www.domain.com/js/search.js" "Sosospider+(+http://help.soso.com/webspider.htm)"

仅仅几行日志,相同的js和css文件就被读取了多次。 这个bug从最开始sosospider到现在7,8个月了一直存在, 我不知道小马同学养的搞soso开发花了多少钱,难道就从来没有测试从来不察看蜘蛛爬行日志???

在这里我要对腾讯的开发者门说声好:你们继续忽悠你们的老总马化腾吧

posted @ 2008-03-11 08:50  荖K  阅读(1639)  评论(1编辑  收藏  举报