爬虫实战(十):发送每日新闻
爬虫实战(十):发送每日新闻
一、 简介
1、 概述
关注时事新闻,是新时代青年必须做的,那么,我们如何来快速获取新闻呢?
-
每天自动从网上找到新闻
-
自动整理新闻排版成一个html页面,发送到邮箱中
2、 环境配置
requests = "*" # 用来解析数据 fake-useragent = "*" # 随机请求头 pyquery = "*" # 改写前端页面的接口,制作每日快报
3、 配置文件
{ "status": 200, "data": [ {"name": "A.L.Kun", "email": "3500515050@qq.com"} ], "temp": [ ] }
data
里面存放我们需要发送的对象
status
:判断是否获取成功
二、 前端页面
我们使用前端知识,来设计页面,我的代码如下:
<!doctype html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0"> <meta http-equiv="X-UA-Compatible" content="ie=edge"> <title>每日简报</title> <style> /*rest.css*/ html, body, div, span, applet, object, iframe, h1, h2, h3, h4, h5, h6, p, blockquote, pre, a, abbr, acronym, address, big, cite, code, del, dfn, em, img, ins, kbd, q, s, samp, small, strike, strong, sub, sup, tt, var, b, u, i, center, dl, dt, dd, ol, ul, li, fieldset, form, label, legend, table, caption, tbody, tfoot, thead, tr, th, td, article, aside, canvas, details, embed, figure, figcaption, footer, header, hgroup, menu, nav, output, ruby, section, summary, time, mark, audio, video { margin: 0; padding: 0; border: 0; font-size: 100%; font: inherit; vertical-align: baseline; } * { color: black; } input { padding: 0; margin: 0; } /* HTML5 display-role reset for older browsers */ article, aside, details, figcaption, figure, footer, header, hgroup, menu, nav, section { display: block; } body { line-height: 1; } ol, ul { list-style: none; } blockquote, q { quotes: none; } blockquote:before, blockquote:after, q:before, q:after { content: ''; content: none; } table { border-collapse: collapse; border-spacing: 0; } * { text-decoration: none; margin: 0; } </style> <style> .main { /*设置主窗口的大小,同时居中对齐*/ height: auto; width: 500px; margin: 100px auto; border: #a5a5a5 4px solid; border-radius: 10px; padding-bottom: 10px; } .main > h1 { /*设置标题*/ font: normal 500 33px "KaiTi_GB2312"; text-align: center; margin-bottom: 30px; margin-top: 10px; color: #436d7b; } .img_des { /*设置图片的描述信息样式*/ font: normal 500 9px "Microsoft YaHei"; text-align: center; color: #b647a6; } img { /*设置图片样式*/ padding-left: 2px; border-radius: 10px; margin-bottom: 4px; } .big_line { height: 5px; border-radius: 3px; width: 496px; background-color: #48745b; margin: 15px 0 5px 2px; } .small_line { height: 3px; border-radius: 1px; width: 496px; background-color: #48745b; margin-left: 2px; } .body { overflow: hidden; } .body .left { float: left; background-color: #48745b; height: 100px; width: 300px; margin-left: 4px; } .body span { margin-top: 10px; border-radius: 4px; } .body .right { background-color: #48745b; height: 100px; width: 170px; float: right; margin-right: 4px; } .left h1 { color: white; font: normal 500 33px "KaiTi_GB2312"; text-align: center; padding-top: 2px; } .left h2 { color: white; font: normal 500 20px "KaiTi_GB2312"; text-align: center; border-top: white 2px solid; margin-top: 10px; padding-top: 6px; } .right h1{ text-align: center; border-top: #48745b 4px solid; border-left: #48745b 4px solid; border-right: #48745b 4px solid; border-radius: 4px; background-color: white; font: normal 600 18px "KaiTi_GB2312"; color: #48745b; margin-top: 11px; margin-bottom: 11px; } .right h3{ text-align: center; border: #48745b 4px solid; border-radius: 4px; background-color: white; font: normal 600 18px "KaiTi_GB2312"; color: #48745b; } /*设置存储内容的样式*/ .content div { /*border: 4px solid #48745b; border-radius: 3px;边框样式*/ margin: 5px 2px 0 2px; padding: 10px 0 10px 10px; } .content p { margin-top: 5px; font: normal 500 18px "KaiTi_GB2312"; } </style> </head> <body> <div class="main"> <h1>每日简报</h1> <!--存放图片链接--> <a href="https://www.bing.com/search?q=%E6%82%89%E5%B0%BC%E5%A5%A5%E6%9E%97%E5%8C%B9%E5%85%8B%E5%85%AC%E5%9B%AD&form=hpcapt&mkt=zh-cn" target="_blank"> <img src="https://www.bing.com/th?id=OHR.BarcelonaPop_ZH-CN3687855585_1920x1080.jpg&rf=LaDigue_1920x1080.jpg&pid=hp" alt="Bing每日一图" width=496px> </a> <p class="img_des">悉尼奥林匹克公园里的湾标瞭望台,澳大利亚 (© ai_yoshi/Getty Images)</p> <div> <div class="header"> <div class="big_line"></div> <div class="small_line"></div> </div> <div class="body"> <span class="left"> <h1>每日早报</h1> <h2>NEWS TODAY</h2> </span> <span class="right"> <h1><!--存放时间-->2022年7月12日</h1> <h3><!--存放星期几-->星期二</h3> </span> </div> <div class="content"> <div></div> </div> </div> </div> </body> </html>
三、 获取数据
1、 获取图片
调用必应每日一图的接口,获取图片
# !/usr/bin/python3 # -*- coding: UTF-8 -*- __author__ = "A.L.Kun" __file__ = "getImg.py" __time__ = "2022/7/12 11:20" """获取到每日图片""" from requests import get from fake_useragent import UserAgent def getResp(): url = "https://cn.bing.com/HPImageArchive.aspx" # 这里使用的是bing每日图片链接 resp = get(url, headers={ # 发送请求 "user-agent": UserAgent().random, }, params={ "format": "js", # 返回JSON数据 "idx": 1, # 获取前一天的图片 "n": 1 }) # print(resp) return resp.json() def main1(): src = getResp() url_img = "https://www.bing.com" + src["images"][0]["url"] title = src["images"][0]["copyright"] url_title = src["images"][0]["copyrightlink"] return { "url_img": url_img, "title": title, "url_title": url_title } # 返回图片数据 if __name__ == '__main__': print(main1())
2、 获取新闻
调用接口,获取每日新闻
# !/usr/bin/python3 # -*- coding: UTF-8 -*- __author__ = "A.L.Kun" __file__ = "getInfo.py" __time__ = "2022/7/12 10:08" """获取到每日新闻""" from requests import get from fake_useragent import UserAgent def get_resp(): url = "https://news.topurl.cn/api" # 发送请求的接口 resp = get(url, headers={ "user-agent": UserAgent().random }, params={ "count": 20, # 获取20条新闻 }) resp.encoding = resp.apparent_encoding # 设置编码 return resp.json() # 返回JSON数据 def main2(): data_ = get_resp() temp = [] for i in data_["data"]['newsList']: # 清洗数据 temp.append({ "content": i["title"], }) return temp if __name__ == '__main__': print(main2())
3、 制作数据
# !/usr/bin/python3 # -*- coding: UTF-8 -*- __author__ = "A.L.Kun" __file__ = "GenHtml.py" __time__ = "2022/7/12 14:21" """ 生成HTML页面 """ from getImg import main1 from getInfo import main2 # 导入获取数据 import datetime # 导入时间库 from pyquery import PyQuery # 对html文件进行修改 def get_timer(): """获取今天的时间""" week = ["星期天", "星期一", "星期二", "星期三", "星期四", "星期五", "星期六"] now = datetime.datetime.now() time_ = now.strftime("%Y年%m月%d日") week_ = week[int(now.strftime("%w"))] return time_, week_ img = main1() # print(img) data = main2() # print(data) time_, week_ = get_timer() def main3(): html = PyQuery(filename="./templates/index.html") # 设置时间 html(".right h1").text(time_) # 设置当前时间 html(".right h3").text(week_) # 设置第几周 # 设置图片 html(".img_des").text(img["title"]) # 设置图片的标题 html(".main a").attr("href", img["url_title"]) # 设置图片链接 html(".main a img").attr("src", img["url_img"]) # 设置图片 # 设置新闻显示 cont = html(".main .content div") for index, item in enumerate(data): str_ = f"{index + 1}. {item['content']}" p = " <p>%s</p>" % str_ cont.append(p) return html.outer_html() # 导出html文件 if __name__ == '__main__': print(main3())
四、 发送邮件
# !/usr/bin/python3 # -*- coding: UTF-8 -*- __author__ = "A.L.Kun" __file__ = "main.py" __time__ = "2022/7/12 15:12" import sys from GenHtml import time_, week_, main3 from smtplib import SMTP from email.mime.text import MIMEText # 发送文本信息使用的库 from email.header import Header # 设置请求的头部信息 from email.utils import formataddr # 格式化 from functools import wraps import json f = open("settings.json", "r", encoding="utf-8") info = json.load(f) if info["status"] != 200: print("JSON数据读取错误!") sys.exit(1) # 如果没有读取到信息 subject = f'{time_}新闻(建议使用电脑查看)' # 设置邮件的标题 html = main3() def decorate(fun_): username = 'liu.zhong.kun@foxmail.com' # 发送邮件的qq号 password_pass = 'sadfadsg' # 授权密码,有的邮箱是使用登录密码 smtp = SMTP('smtp.qq.com', 587) # 创建一个SMTP服务器,这里使用qq邮箱 smtp.starttls() # 开启tls smtp.login(username, password_pass) # 登录 @wraps(fun_) def func_mail(*args, **kwargs): fun_(smtp, username, *args, **kwargs) # 调用发邮件的函数 smtp.quit() # 关闭服务器 smtp.close() return func_mail @decorate def mail(smtp, username): for receiver_ in info["data"]: msgRoot = MIMEText(html, "html", "utf-8") # 把html信息发送出去 msgRoot["Subject"] = Header(subject, "utf-8") # 设置文本标题 msgRoot['From'] = formataddr(("A.L.Kun", username)) # 设置发件人信息 msgRoot['To'] = formataddr((receiver_["name"], receiver_["email"])) # 设置收件人信息 smtp.sendmail(username, receiver_["email"], msgRoot.as_string()) # 发送邮件 print(receiver_["email"], ':发送完成') if __name__ == '__main__': mail()
总代码:https://github.com/liuzhongkun1/spider_/tree/master/autosend
本文来自博客园,作者:Kenny_LZK,转载请注明原文链接:https://www.cnblogs.com/liuzhongkun/p/16470843.html
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?