盘点一个Python抓取有道翻译爬虫中的报错问题
大家好,我是皮皮。
一、前言
前几天在Python白银交流群【斌】问了一个Python网络爬虫的问题,提问截图如下:
报错截图如下:
粉丝需要的数据如下:
二、实现过程
有道翻译之前有做过很多,确实适合练手,主要是需要找到对应的请求。这里【dcpeng】结合粉丝的代码,然后给了一份正确的代码,如下所示:
import requests
headers = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
"Connection": "keep-alive",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Origin": "https://fanyi.youdao.com",
"Referer": "https://fanyi.youdao.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36 Edg/104.0.1293.70",
"X-Requested-With": "XMLHttpRequest",
"sec-ch-ua": "\"Chromium\";v=\"104\", \" Not A;Brand\";v=\"99\", \"Microsoft Edge\";v=\"104\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\""
}
cookies = {
"OUTFOX_SEARCH_USER_ID": "-835551069@223.104.228.2",
"OUTFOX_SEARCH_USER_ID_NCOO": "242914410.9668874",
"P_INFO": "pdcfighting",
"_ga": "GA1.2.1404336446.1645147264",
"ANTICSRF": "cleared",
"NTES_OSESS": "cleared",
"S_OINFO": "",
"___rl__test__cookies": "1662539503369"
}
url = "https://fanyi.youdao.com/translate_o"
params = {
"smartresult": "rule"
}
data = {
"i": "dog",
"from": "AUTO",
"to": "AUTO",
"smartresult": "dict",
"client": "fanyideskweb",
"salt": "16625395033719",
"sign": "2a0056b7249263308d07a3fce52c065c",
"lts": "1662539503371",
"bv": "6f1d3ad76bcde34b6b6745e8ab9dc20a",
"doctype": "json",
"version": "2.1",
"keyfrom": "fanyi.web",
"action": "FY_BY_REALTlME"
}
response = requests.post(url, headers=headers, cookies=cookies, params=params, data=data)
print(response.json())
print(response)
运行之后,便可得到对应的结果了,如下图所示:
后来发现是构造参数少传了,难怪没获取到信息!
后来粉丝发现了最终问题所在,虽然没看懂,但是只要解决问题了就好!
三、总结
大家好,我是皮皮。这篇文章主要盘点了一个Python网络爬虫的问题,文中针对该问题,使用正则表达式匹配出想要的结果,并给出了具体的解析和代码实现,帮助粉丝顺利解决了问题。
最后感谢粉丝【斌】提问,感谢【dcpeng】、【猫药师Kelly】给出的思路和代码解析,感谢【Python狗】等人参与学习交流。