request模块之网页采集器

request模块之网页采集器

#!/usr/bin/env python
 
import requests
import json
 
# 通过浏览器查看到的搜索地址 :https://www.sogou.com/web?query=python
 
# UA 伪装 :让怕重对应的User-Agent封装到一个字典中
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36",
}
 
url = 'https://www.sogou.com/web'
 
# 处理url 携带的参数,封装到字典中
kw = input('enter a word:')
 
param = {
    'query': kw,
}
 
# 对指定的url发起请求对应的url是携带参数的,并且请求过程中处理了参数
response = requests.get(url=url, params=param, headers=headers)
 
page_text = response.text
filename = kw + '.html'
with open(filename, 'w', encoding='utf-8') as fp:
    fp.write(page_text)
print(filename, '保存完成。')
posted @ 2021-04-27 22:05  SRE运维充电站  阅读(81)  评论(0编辑  收藏  举报