将bs4.element.ResultSet类型转换为bs4.BeautifulSoup类型

  首先我们查看一下request库的返回值类型,这样就知道BeautifulSoup构造方法需要什么类型的参数了:

request返回值类型: <class 'str'>

  我们发现,request库的返回值类型是String,也就是说,我们可以先把bs4.element.ResultSet类型转换为String,之后再用BeautifulSoup构造方法将String类型转换为BeautifulSoup,这样就可以继续用find_All()方法,代码如下:

		data = getHtmlText(url=url)  # 这里返回值其实是request.text
    print('request返回值类型:',type(data))

    soup = BeautifulSoup(data, "html.parser")
    print('BeautifulSoup类型:',type(soup))
    page = soup.find_all('div',class_='more-page')
    data2 = str(page)

    soup2 = BeautifulSoup(data2, "html.parser")
    page_count = soup2.script.string
    # print(page_count)

  getHtmlText方法代码如下:

def getHtmlText(url):
    headers = {
        'Accept': '*/*',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'Connection': 'keep-alive',
        'Cookie': 'widget_dz_id=54511; widget_dz_cityValues=,; timeerror=1; defaultCityID=54511; defaultCityName=%u5317%u4EAC; Hm_lvt_a3f2879f6b3620a363bec646b7a8bcdd=1516245199; Hm_lpvt_a3f2879f6b3620a363bec646b7a8bcdd=1516245199; addFavorite=clicked',
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3236.0 Safari/537.36'
    }
    try:
        r = requests.get(url, timeout=30, headers=headers)
        r.raise_for_status() #如果状态不是200,引发HTTPError异常(200表示能正常访问url)
        r.encoding = r.apparent_encoding
        return r.text  # 获取数据
    except:
        return "产生异常"
posted @ 2022-06-23 17:15  爱吃雪糕的小布丁  阅读(12)  评论(0编辑  收藏  举报  来源