Requests库网络爬虫实战
获取京东某件商品的内容:
1 import requests 2 3 def getHTMLText(url): 4 try: 5 r = requests.get(url, timeout=30) 6 r.raise_for_status() 7 r.encoding = r.apparent_encoding 8 return (r.text[:1000]) 9 except: 10 return "产生异常" 11 12 if __name__ == "__main__": 13 url = "https://item.jd.com/4304372.html" 14 print(getHTMLText(url))
获取亚马逊某件商品的内容:
1 import requests 2 url = "https://www.amazon.cn/gp/product/B01M8L5Z3Y" 3 try: 4 kv = {'user-agent':'Mozilla/5.0'} 5 r = requests.get(url,headers=kv) 6 r.raise_for_status() 7 r.encoding = r.apparent_encoding 8 print(r.text[1000:2000]) 9 except: 10 print("爬取失败")
百度/360关键字提交:
现在只是一些简单得到功能,具体的还要学习学习下。
百度:
1 import requests 2 keyword = 'Python' 3 try: 4 kv = {'wd':keyword} 5 r = requests.get("http://www.baidu.com/s",params=kv) 6 print(r.request.url) 7 r.raise_for_status() 8 print(len(r.text)) 9 except: 10 print('爬取失败')
360:
1 import requests 2 keyword = 'Python' 3 try: 4 kv = {'q':keyword} 5 r = requests.get("http://www.so.com/s",params=kv) 6 print(r.request.url) 7 r.raise_for_status() 8 print(len(r.text)) 9 except: 10 print('爬取失败')
网络图片获取:
1 import requests 2 import os 3 url = 'https://imgsa.baidu.com/forum/pic/item/5882b2b7d0a20cf42e83994176094b36acaf9918.jpg' 4 root = 'e://pice//' 5 path = root + url.split('/')[-1] 6 try: 7 if not os.path.exists(root): 8 os.mkdir(root) 9 if not os.path.exists(path): 10 r = requests.get(url) 11 with open(path, 'wb') as f: 12 f.write(r.content) 13 f.close() 14 print("文件保存成功") 15 else: 16 print("文件已存在") 17 except: 18 print("爬取失败")
IP地址归属地的自动查询:
1 import requests 2 url = "http://m.ip138.com/ip.asp?ip=" 3 try: 4 r = requests.get(url+'202.204.80.112') 5 r.raise_for_status() 6 r.encoding = r.apparent_encoding 7 print(r.text[-500:]) 8 except: 9 print("爬取失败")
手机号码归属地的自动查询:
1 import requests 2 url = "http://m.ip138.com/mobile.asp?mobile=" 3 try: 4 r = requests.get(url+'18879970722') 5 r.raise_for_status() 6 r.encoding = r.apparent_encoding 7 print(r.text[-500:]) 8 except: 9 print("爬取失败")