Python爬虫：基本操作（发送get、post请求，模拟浏览器，加入cookie信息）

向指定url发送get请求：

# -*- coding: utf-8 -*-
import urllib2
url = "http://localhost:80/webtest/test?name=xuejianbest"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
page_html = response.read()
print page_html

若urlopen方法数据参数不为空，则发送post请求：

# -*- coding: utf-8 -*-
import urllib2
import urllib
url = "http://localhost:80/webtest/test?name=xuejianbest"
req = urllib2.Request(url)
values = {}
values["age"] = "23"
values["sex"] = "男"
data = urllib.urlencode(values)
print data   # age=23&sex=%E7%94%B7
response = urllib2.urlopen(req, data)
page_html = response.read()
print page_html

此时后台若获取sex参数值乱码，可以进行如下转换（java）：

System.out.println(new String(req.getParameter("sex").getBytes("iso8859-1"), "UTF-8"));

可以在请求头中加入浏览器标识，模拟浏览器访问：

# -*- coding: utf-8 -*-
import urllib2
user_agent = r'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.2669.400 QQBrowser/9.6.10990.400'
headers = {r'User-Agent': user_agent}
url = "http://localhost:80/webtest/test"
req = urllib2.Request(url, headers = headers)
response = urllib2.urlopen(req)
page_html = response.read()
print page_html

若想让多次请求共有一个session，可在请求头加入cookies信息：

#  -*- coding: utf-8 -*-
import urllib2
user_agent = r'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.2669.400 QQBrowser/9.6.10990.400'
headers = {r'User-Agent': user_agent}
url = "http://localhost:80/webtest/test"
req = urllib2.Request(url, headers = headers)
response = urllib2.urlopen(req)
cookie = response.headers.get('Set-Cookie')    # 从第一次的请求返回中获取cookie
print cookie        # str类型,值为: JSESSIONID=B66F6A96B2FBC7D9A7591293E28DEEE3; Path=/webtest/; HttpOnly 
page_html = response.read()
print page_html

req.add_header('cookie', cookie)    # 将cookie加入以后的请求头，保证多次请求属于一个session
response = urllib2.urlopen(req)
page_html = response.read()
print page_html

posted @ 2018-12-28 09:04 xuejianbest 阅读(4675) 评论(0) 编辑收藏举报

刷新页面返回顶部

xuejianbest

Python爬虫：基本操作（发送get、post请求，模拟浏览器，加入cookie信息）

公告