Python .Net 早报

Python .Net 早报

  • 来源于 365资讯简报

Python demo

'''
早报
早报地址:https://www.163.com/dy/media/T1603594732083.html
'''
import requests
from lxml import etree

def main():
    url="https://www.163.com/dy/media/T1603594732083.html"
    # 增加请求头
    headers={
        "Host": "www.163.com",
        "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36',
    }
    rsp=requests.get(url,headers=headers)
    html=etree.HTML(rsp.text)
    today_url=html.xpath("//div[@class='tab_content']//div/h4//@href")[0]
    rsp=requests.get(today_url,headers=headers)
    html=etree.HTML(rsp.text)
    news_list=html.xpath("//div[@class='post_body']/p[2]//text()")
    news_list=news_list[1:]
    for news in news_list:
        print(news.replace("[公众号:365资讯简报]",""))
if __name__ == "__main__":
    main()






.Net demo


var client = new RestClient("https://www.163.com/dy/media/T1603594732083.html");

var request = new RestRequest();
request.AddHeader("Host", "www.163.com");
request.AddHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36");
var response = client.Get(request);
//Console.WriteLine(response.Content);
var doc = new HtmlDocument();
doc.LoadHtml(response.Content);//html字符串
//获取一个多个节点,进行遍历
var todayNode = doc.DocumentNode.SelectNodes("//div[@class='tab_content']//div/h4//@href")[0];
var todayUrl = todayNode.Attributes.First(a => a.Name == "href").Value;
//请求详细页
client = new RestClient(todayUrl);
response = client.Get(request);
doc.LoadHtml(response.Content);//html字符串
//获取一个多个节点,进行遍历
var textNodeList = doc.DocumentNode.SelectNodes("//div[@class='post_body']/p[2]//text()");
textNodeList.RemoveAt(0);
foreach (var item in textNodeList)
{
    Console.WriteLine(item.InnerText.Replace("[公众号:365资讯简报]", ""));
}
posted @ 2021-11-24 09:11  Alex_Mercer  阅读(879)  评论(0编辑  收藏  举报