如何利用python模仿浏览器进行网页爬取?
简单讲可以利用mechanize库来实现这一功能:
import mechanize URL = 'http://yoururl.com' br = mechanize.Browser() br.set_handle_robots() #这一句是用来绕过那些防止机器爬虫的网站的设置 response = br.open(URL) sourcecode = response.read()
import mechanize URL = 'http://yoururl.com' br = mechanize.Browser() br.set_handle_robots() #这一句是用来绕过那些防止机器爬虫的网站的设置 response = br.open(URL) sourcecode = response.read()