种太阳

2021年3月9日

摘要： print(bs.head.contents)#得出来的是列表格式，eg:[xxxx,xxxxx,xxxx] print(bs.head.contents[1])#获取列表中第二个元素阅读全文

posted @ 2021-03-09 09:40 种太阳阅读(2) 评论(0) 推荐(0) 编辑

摘要： from bs4 import BeautifulSoup file=open("./baidu.html","rb") html=file.read() bs=BeautifulSoup(html,"html.parser") print(bs.title) print(bs.a)#将打印出第一个阅读全文

posted @ 2021-03-09 09:38 种太阳阅读(20) 评论(0) 推荐(0) 编辑

7.模拟真实浏览器访问网站,get请求方式

摘要： url="https://www.douban.com" headers={"User-Agent":"处理内容可以去真实浏览器中复制"}#有些网站卡的多，可以多写几个头部键值对信息 req=urllib.request.Request(url=url,headers=headers) respon 阅读全文

posted @ 2021-03-09 09:37 种太阳阅读(27) 评论(0) 推荐(0) 编辑

6.模拟真实浏览器访问网站,post请求方式

摘要： url="http://httpbin.org/post" headers={"User-Agent":"处理内容可以去真实浏览器中复制"}#有些网站卡的多，可以多写几个头部键值对信息 data=bytes(urllib.parse.urlencode({"name":"erick"}),encod 阅读全文

posted @ 2021-03-09 09:35 种太阳阅读(100) 评论(0) 推荐(0) 编辑

5.response的一些功能

摘要： print(response.statu)#获取状态码 print(response.getheaders())#获取响应头信息 print(response.getheader('Server'))#获取头部信息里面单个内容信息阅读全文

posted @ 2021-03-09 09:34 种太阳阅读(4) 评论(0) 推荐(0) 编辑

4.测试超时异常处理

摘要： try: response=urllib.request.urlopen("http://httpbin.org/get",timeout=0.01)#如果0.01秒内数据没有响应就超时 print(response.read().decode("utf-8")) except utllib.err 阅读全文

posted @ 2021-03-09 09:32 种太阳阅读(7) 评论(0) 推荐(0) 编辑

3.获取一个post请求

摘要： import urllib.parse data=bytes(urllib.parse.urlencode({"hello":"world"}),encoding="utf-8") response=urllib.request.urlopen("http://httpbin.org/post",d 阅读全文

posted @ 2021-03-09 09:30 种太阳阅读(3) 评论(0) 推荐(0) 编辑

2.获取一个get请求

摘要： import urllib.request response =urllib.request.urlopen("http://www.baidu.com") print(response.read().decode('utf-8'))#对获取到的网页源码进行utf-8解码阅读全文

posted @ 2021-03-09 09:29 种太阳阅读(5) 评论(0) 推荐(0) 编辑

1.爬虫三步骤

摘要： 1.爬取网页 2.逐一解析数据3.讲想要的数据保存阅读全文

posted @ 2021-03-09 09:27 种太阳阅读(13) 评论(0) 推荐(0) 编辑

17.品牌列表案例

摘要： <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width,initial-scale=1.0"> <meta http-equiv= 阅读全文

posted @ 2021-03-09 08:44 种太阳阅读(17) 评论(0) 推荐(0) 编辑

公告