Requests库网络爬虫实战

获取京东某件商品的内容:

 1 import requests
 2 
 3 def getHTMLText(url):
 4     try:
 5         r = requests.get(url, timeout=30)
 6         r.raise_for_status()
 7         r.encoding = r.apparent_encoding
 8         return (r.text[:1000])
 9     except:
10         return "产生异常"
11 
12 if __name__ == "__main__":
13     url = "https://item.jd.com/4304372.html"
14     print(getHTMLText(url))

获取亚马逊某件商品的内容:

 1 import requests
 2 url = "https://www.amazon.cn/gp/product/B01M8L5Z3Y"
 3 try:
 4     kv = {'user-agent':'Mozilla/5.0'}
 5     r = requests.get(url,headers=kv)
 6     r.raise_for_status()
 7     r.encoding = r.apparent_encoding
 8     print(r.text[1000:2000])
 9 except:
10     print("爬取失败")

百度/360关键字提交:

现在只是一些简单得到功能,具体的还要学习学习下。

百度:

 1 import requests
 2 keyword = 'Python'
 3 try:
 4     kv = {'wd':keyword}
 5     r = requests.get("http://www.baidu.com/s",params=kv)
 6     print(r.request.url)
 7     r.raise_for_status()
 8     print(len(r.text))
 9 except:
10     print('爬取失败')

360:

 1 import requests
 2 keyword = 'Python'
 3 try:
 4     kv = {'q':keyword}
 5     r = requests.get("http://www.so.com/s",params=kv)
 6     print(r.request.url)
 7     r.raise_for_status()
 8     print(len(r.text))
 9 except:
10     print('爬取失败')

 

网络图片获取:

 

 1 import requests
 2 import os
 3 url = 'https://imgsa.baidu.com/forum/pic/item/5882b2b7d0a20cf42e83994176094b36acaf9918.jpg'
 4 root = 'e://pice//'
 5 path = root + url.split('/')[-1]
 6 try:
 7     if not os.path.exists(root):
 8         os.mkdir(root)
 9     if not os.path.exists(path):
10         r = requests.get(url)
11         with open(path, 'wb') as f:
12             f.write(r.content)
13             f.close()
14             print("文件保存成功")
15     else:
16         print("文件已存在")
17 except:
18     print("爬取失败")

 

IP地址归属地的自动查询:

 

1 import requests
2 url = "http://m.ip138.com/ip.asp?ip="
3 try:
4     r = requests.get(url+'202.204.80.112')
5     r.raise_for_status()
6     r.encoding = r.apparent_encoding
7     print(r.text[-500:])
8 except:
9     print("爬取失败")

 

手机号码归属地的自动查询:

 

1 import requests
2 url = "http://m.ip138.com/mobile.asp?mobile="
3 try:
4     r = requests.get(url+'18879970722')
5     r.raise_for_status()
6     r.encoding = r.apparent_encoding
7     print(r.text[-500:])
8 except:
9     print("爬取失败")

 

posted @ 2017-03-16 00:19  starry_sky  阅读(347)  评论(0编辑  收藏  举报