爬虫—代理的使用
使用代理IP
一,requests使用代理
requests的代理需要构造一个字典,然后通过设置proxies参数即可。
import requests proxy = '60.186.9.233' proxies = { 'http': 'http://' + proxy, 'https': 'https://' + proxy } try: res = requests.get('http://httpbin.org/get', proxies=proxies) print(res.text) except requests.exceptions.ConnectionError as e: print('error', e.args)
运行结果:
{ "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.18.4" }, "origin": "60.186.9.233", "url": "https://httpbin.org/get" }
其运行结果的origin是代理的IP,说明代理设置成功。如果代理需要认证,再代理的前面加上用户名密码即可。
proxy = 'username:password@60.186.9.233'
二,Selenium使用代理
Selenium同样可以设置代理,一种是有界面浏览器,Chrome为例;另一种是无头浏览器,以PhantomJS为例。
Chrome浏览器设置
通过chrome_options来设置代理,才创建Chrome对象的时候用chrome_options参数传递即可。运行代码会弹出Chrome浏览器,访问连接后看到如下结果。
# chrome代理设置 from selenium import webdriver proxy = '60.186.9.233' chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--proxy-server=http://' + proxy) browser = webdriver.Chrome(chrome_options=chrome_options) res = browser.get('http://httpbin.org/get')
{ "args": {}, "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-CN,zh;q=0.9", "Host": "httpbin.org", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36" }, "origin": "60.186.9.233", "url": "https://httpbin.org/get" }
PhantomJS设置
使用service_args参数将命令行的一些参数定义为列表,在初始化的时候传递给PhantomJS就可以了。
# PhantomJs代理设置 from selenium import webdriver service_args = [ '--proxy=60.186.9.233', '--proxy-type=http' ] browser = webdriver.PhantomJS(service_args=service_args) browser.get('http://httpbin.org/get') print(browser.page_source)
运行结果:
{ "args": {}, "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-CN,zh;q=0.9", "Host": "httpbin.org", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36" }, "origin": "60.186.9.233", "url": "https://httpbin.org/get" }
如果需要认证,那么在service_args参数中加入--proxy-auth选项即可。
service_args = [ '--proxy=60.186.9.233', '--proxy-type=http', '--proxy-auth=username:password' ]