爬虫 - Cc01 - 博客园

爬虫

import  requests
from bs4 import  BeautifulSoup
import os


BASE_DIR = os.path.dirname(os.path.abspath(__file__))
print(BASE_DIR)
url = 'https://www.autohome.com.cn/news/'
res = requests.get('https://www.autohome.com.cn/news/')
res.encoding = 'GBK'
soup_obj = BeautifulSoup(res.text,'html.parser')
image_obj = soup_obj.find(name = 'div',attrs={'id':'auto-channel-lazyload-article'})

img_list = image_obj.find_all(name='img')
# print(img_list)
for item in img_list:
    item_url = "https:" + item.get("src")
    print(item_url)
    img_respons = requests.get(item_url)
    file_name = os.path.join(BASE_DIR, 'images', item_url.split('/')[-1])
    with open(file_name ,'wb') as f :
        f.write(img_respons.content)
    print(file_name,'download done')

posted on 2021-05-12 22:52 Cc01 阅读(40) 评论(0) 收藏举报

刷新页面返回顶部

py

爬虫

导航

公告