python爬虫---污言污语网站数据采集

代码：

import requests
from lxml import etree

headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62"
}


def get_text():
    count = 0
    while True:
        with open("nihaowua.txt", "a") as file:
            resp = requests.get("https://www.nihaowua.com/", headers=headers, timeout=10).text
            html = etree.HTML(resp)
            content = html.xpath("//section/div/*/text()")[0]
            file.write(content + "\n")
            count += 1


get_text()

posted @ 2021-12-23 15:15 睡觉不困阅读(145349) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布：重大改进与新特性概览！
· 单线程的Redis速度为什么快？
· 展开说说关于C#中ORM框架的用法！
· Pantheons：用 TypeScript 打造主流大模型对话的一站式集成库

公告

昵称：睡觉不困
园龄： 4年5个月
粉丝： 60
关注： 7

+加关注

2025年3月

日

一

二

三

四

五

六

睡觉不困

python爬虫---污言污语网站数据采集

公告

搜索

常用链接

我的标签

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论