Nginx的防爬虫优化
我们可以根据客户端的 user-agents 首部字段来阻止指定的爬虫爬取我们的网站:
虚拟主机配置如下:(红色标记为添加或者修改内容)
[root@Nginx www_date]# cat brian.conf server { listen 80; server_name www.brian.com; if ($http_user_agent ~* "qihoobot|Baiduspider|Googlebot|Googlebot-Mobile|Googlebot-Image|Mediapartners-Google|Adsbot-Google|Yahoo! Slurp China|YoudaoBot|Sosospider|Sogou spider|Sogou web spider|MSNBot") { return 403; } location / { root html/brian; index index.html index.htm; #limit_conn addr 1; limit_conn perserver 2; auth_basic "brian training"; auth_basic_user_file /opt/nginx/conf/htpasswd; } location ~ .*\.(js|jpg|JPG|jpeg|JPEG|css|bmp|gif|GIF)$ { access_log off; } access_log logs/brian.log main gzip buffer=128k flush=5s; error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } }
朱敬志(brian),成功不是将来才有的,而是从决定去做的那一刻起,持续累积而成。