Python_实战爬虫

# -*- coding: utf-8 -*-
__auther__ = "jiachaojun"
__time__ = '2020/1/12 11:03'
import requests
from bs4 import BeautifulSoup
# 以什么编码写的，也要以什么编码解出来
# 1、python模拟浏览器向 https://www.autohome.com.cn/news/

r1 = requests.get('https://www.autohome.com.cn/news/')
print(r1.content)

# 2.去字符串找我想要的东西 (先将二进制转换成字符串)
data = r1.content.decode('gb2312')
soup = BeautifulSoup(data,features='html.parser')
container = soup.find(id='auto-channel-lazyload-article')
li_list = container.find_all(name='li')
for item in li_list:
    tag = item.find(name='h3')
    if not tag:
        continue
    img_url = "https:"+item.find(name='img').get('src')
    print(item.find(name='h3').text,img_url)
    print('===============================================')

posted on 2020-01-12 11:32 九酒馆阅读(167) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

九酒馆

Python_实战爬虫

导航

公告