常见博客类网站robots.txt
因为网站有可能变动,所以这些robots.txt都是某一时刻的情况。
csdn
http://www.csdn.net/robots.txt
Sitemap: http://www.csdn.net/article/sitemap.txt
Disallow: /article_preview.html*
博客园
http://www.cnblogs.com/robots.txt
User-Agent: *
Allow: /
中国博客网
http://www.blogchina.com/robots.txt
User-agent: *
Disallow: /
网易博客
http://blog.163.com/robots.txt
User-agent: *
Disallow: /apps/
Disallow: /settings
Disallow: /dwr/
Disallow: /*/dwr/
Disallow: /unblock.do
Disallow: /feedback.do
Disallow: /*\${*}*
Disallow: *jsessionid=*
Disallow: /login.do
Disallow: /qiangbao
Disallow: /error.do
Sitemap: http://blog.163.com/sitemap.xml
新浪博客
#限制的搜索引擎的User-Agent代码,*表示所有##############
User-agent: *
#限制不能搜索的目录,Disallow: 为空时开放所有目录######
Allow: /admin/blogmove/
Disallow: /admin/
Disallow: /include/
Disallow: /html/
Disallow: /queue/
Disallow: /config/
#开放搜索的目录有####################################
# /
# /advice/
# /help/
# /lm/
# /main/
# /myblog/
#搜索引擎User-Agent代码对照表########################
# 搜索引擎 User-Agent代码
# AltaVista Scooter
# Infoseek Infoseek
# Hotbot Slurp
# AOL Search Slurp
# Excite ArchitextSpider
# Google Googlebot
# Goto Slurp
# Lycos Lycos
# MSN MSNBOT
# Netscape Googlebot
# NorthernLight Gulliver
# WebCrawler ArchitextSpider
# Iwon Slurp
# Fast Fast
# DirectHit Grabber
# Yahoo Web Pages Googlebot
# Looksmart Web Pages Slurp
# Baiduspider Baidu
1.csdn