python爬虫-代理的使用

代理的设置

在urllib库中使用代理,代码如下:

from urllib.request import ProxyHandler,build_opener
from urllib.error import URLError

proxy = "113.116.50.182:808"
proxy_handler = ProxyHandler({
        "http":"http://"+proxy,
        "https":"https://"+proxy,
})
opener = build_opener(proxy_handler)
try:
        response = opener.open("http://httpbin.org/ip")
        print(response.read().decode())
except URLError as e:
        print("ip不能用")

显示为下面的情况,说明代理设置成功:

{
  "origin": "113.116.50.182, 113.116.50.182"
}

 

对于需要认证的代理,,只需要改变proxy变量,在代理前面加入代理认证的用户名密码即可:"username:password@113.116.50.182"

from urllib.request import ProxyHandler,build_opener
from urllib.error import URLError

proxy = "username:password@113.116.50.182:808"
proxy_handler = ProxyHandler({
        "http":"http://"+proxy,
        "https":"https://"+proxy,
})
opener = build_opener(proxy_handler)
try:
        response = opener.open("http://httpbin.org/ip")
        print(response.read().decode())
except URLError as e:
        print("ip不能用")

 

如果遇到了socks代理服务器:

采用socks协议的代理服务器就是SOCKS服务器,是一种通用的代理服务器。Socks是个电路级的底层网关,是DavidKoblas在1990年开发的,此后就一直作为Internet RFC标准的开放标准。Socks 不要求应用程序遵循特定的操作系统平台,Socks 代理与应用层代理、 HTTP 层代理不同,Socks 代理只是简单地传递数据包,而不必关心是何种应用协议(比如FTP、HTTP和NNTP请求)。所以,Socks代理比其他应用层代理要快得多。

代码设置如下:

import socks
import socket
from urllib import request
from urllib.error import URLError

socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807)
socket.socket = socks.socksocket


try:
        response = request.urlopen("http://httpbin.org/ip")
        print(response.read().decode())
except URLError as e:
        print("ip不能用")


 

requests库代理设置

import requests

proxy = "113.116.50.182:808"
proxies = {
        "http":"http://"+proxy,
        "https":"https://"+proxy,
}
try:
        response = requests.get("http://httpbin.org/ip",proxies=proxies)
        print(response.text)
except requests.exceptions.ConnectionError as e:
        print("Error",e.args)

比urllib中使用代理设置要简单的多,当然这里对于需要认证的代理,同样使用proxy = “username:password@113.116.50.182:808”即可,这里不再演示

对于requests库中使用socks5代理,设置如下:

import requests
import socks
import socket

socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807)
socket.socket = socks.socksocket

try:
        response = requests.get("http://httpbin.org/ip")
        print(response.text)
except requests.exceptions.ConnectionError as e:
        print("Error",e.args)


 

Selenium中设置代理

鉴于PhantomJS无界面浏览器已经无人维护,这里只演示有界面浏览器Chrome

from selenium import webdriver

proxy = "113.116.50.182:808"
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument('--proxy-server=http://'+proxy)
driver = webdriver.Chrome(executable_path=r"C:\Users\Administrator\Downloads\chromedriver.exe",options=chromeOptions)

driver.get("http://httpbin.org/ip")
print(driver.page_source)

爬取结果如下:

<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{
  "origin": "113.116.50.182, 113.116.50.182"
}
</pre></body></html>

注意:chromeOptions目前需要使用options代替

 

对于在Selenium中使用认证代理,稍微麻烦一些,以后直接修改以下代码即可

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import zipfile

ip = '113.116.50.182'
port = 808
username = 'xxxx'
password = 'xxxx'

manifest_json = """
{
    "version": "1.0.0",
    "manifest_version": 2,
    "name": "Chrome Proxy",
    "permissions": [
        "proxy",
        "tabs",
        "unlimitedStorage",
        "storage",
        "<all_urls>",
        "webRequest",
        "webRequestBlocking"
    ],
    "background": {
        "scripts": ["background.js"]
    }
}
"""

background_js = """
var config = {
        mode: "fixed_servers",
        rules: {
          singleProxy: {
            scheme: "http",
            host: "%(ip)s",
            port: %(port)s
          }
        }
      }
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
    return {
        authCredentials: {
            username: "%(username)s",
            password: "%(password)s"
        }
    }
}
chrome.webRequest.onAuthRequired.addListener(
            callbackFn,
            {urls: ["<all_urls>"]},
            ['blocking']
)
""" % {'ip': ip, 'port': port, 'username': username, 'password': password}

plugin_file = 'proxy_auth_plugin.zip'
with zipfile.ZipFile(plugin_file, 'w') as zp:
    zp.writestr("manifest.json", manifest_json)
    zp.writestr("background.js", background_js)
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
chrome_options.add_extension(plugin_file)
browser = webdriver.Chrome(executable_path=r"C:\Users\Administrator\Downloads\chromedriver.exe",options=chrome_options)
browser.get('http://httpbin.org/ip')


 

posted @ 2019-07-11 11:38  nikecode  阅读(7153)  评论(0编辑  收藏  举报