Python爬虫实战系列3：今日BBNews编程新闻采集

一、分析页面

打开今日BBNews网址 https://news.bicido.com ，下拉选择【编程】栏目

1.1、分析请求

F12打开开发者模式，然后点击Network后点击任意一个请求，Ctrl+F开启搜索，输入标题Apache Doris 2.1.0 版本发布 ，开始搜索

搜索结果显示直接返回的json格式，那就so easy了，直接copy curl，然后将curl 转换为Python代码，运行。

推荐个curl转Python代码的在线工具：https://curlconverter.com/

二、代码实现

直接将curl 转换后的Python代码做下修改，然后调试运行即可。

完整代码

# -*- coding: utf-8 -*-
import os
import sys
from datetime import datetime

import requests

opd = os.path.dirname
curr_path = opd(os.path.realpath(__file__))
proj_path = opd(opd(opd(curr_path)))
sys.path.insert(0, proj_path)

from app.conf.conf_base import USERAGENT

spider_config = {
    "name_en": "https://news.bicido.com",
    "name_cn": "今日BBNews"
}


class Bbnews:
    def __init__(self):
        self.headers = {
            'referer': 'https://news.bicido.com/',
            'user-agent': USERAGENT
        }

    def get_group(self):
        url = 'https://news.bicido.com/api/config/news_group/'
        content = requests.get(url=url, headers=self.headers)
        content = content.json()
        return content

    def get_news(self):
        groups = self.get_group()
        news_type = []
        for group in groups:
            if group['name'] == '编程':
                news_type = group['news_types']
        result = []
        for news_type in news_type:
            type_id = news_type['id']
            url = f'https://news.bicido.com/api/news/?type_id={type_id}'
            content = requests.get(url, headers=self.headers)
            news_list = content.json()
            for new in news_list:
                result.append({
                    "news_title": str(new['title']),
                    "news_date": datetime.now(),
                    "source_en": spider_config['name_en'],
                    "source_cn": spider_config['name_cn'],
                })
        return result


def main():
    bbnews = Bbnews()
    results = bbnews.get_news()
    print(results)


if __name__ == '__main__':
    main()

总结

今日BBNews页面没反爬策略，比较简单，拿来即用
本文介绍了curl to Python的工具，方便好用。

本文章代码只做学习交流使用，作者不负责任何由此引起的法律责任。

各位看官，如对你有帮助欢迎点赞，收藏，转发，关注公众号【Python魔法师】获取更多Python魔法~

posted @ 2024-03-15 08:34 Python魔法师阅读(417) 评论(0) 收藏举报

刷新页面返回顶部

Python魔法师

一名老程序员，为你分享Python技巧，Python爬虫，Python兼职接单，大数据，Java，副业搞钱的心得。公众号【Python魔法师】

Python爬虫实战系列3：今日BBNews编程新闻采集

一、分析页面

1.1、分析请求

二、代码实现

总结

公告