python爬虫——爬取网易云音乐评论内容

一、选题背景  

  随着互联网和社交媒体的发展,人们在网络上留言、进行评论已成为一种常见的社交方式。在音乐等文艺领域,用户会通过平台的评价和评论来表达自己对音乐的喜好和态度。因此,对音乐平台中的评论信息进行爬取、处理和分析,能够帮助用户更加深入地了解用户的反馈和需求,也有助于制定更好的发展策略和提高用户满意度。针对网易云音乐平台,其评论信息质量高、数量大,因此对其进行爬取、处理和可视化分析具有一定的实际应用价值。

 

二、主题式网络爬虫设计方案

1.爬虫名称: 网易云音乐评论内容爬取

2.网址:https://music.163.com/?from=wsdh#/song?id=2052441038

3.运用技术:

  • requests:用于向目标网站发送HTTP请求,获取相应的响应数据。
  • BeautifulSoup:用于解析HTML文档,提取需要的字段信息。
  • xlwt:用于将数据保存到Excel表格中,提供Excel文件读写的功能。
  • Matplotlib:用于绘制柱状图和聊天曲线等可视化图形。
  • jieba:用于进行中文词语分割,实现词频统计功能。
  • wordcloud: 用于生成词云图。
  • os:用于操作文件系统中的文件。
  • Counter:用于对词频进行统计。

4.爬取网易云评论内容的步骤:

  1)先通过调用requests库向预定网页发送请求,并接收响应数据。

  2)接着,在获取到的响应数据中解析字段信息,进而获取评论信息大列表。

  3)对评论大列表进行遍历,解析每个评论的字段信息,包括用户ID、用户名称、评论内容、评论时间等。

  4)构造保存大字典并将数据写入开辟好的存储文件里,同时生成柱状图和聊天曲线进行数据可视化展示.

  5)该代码段包含多个函数,其中confirm_form_data()函数用于获取并预处理参数,show_image()函数用于生成词云图,main()函数则为逻辑控制部分。

  6)爬虫主要通过requests库向特定网址发送请求,并采用正则表达式解析HTML文档来获取所需的信息。

  7)在获取到的评论数据中,通过词频统计及去除停用词等方式,对评论数据进行筛选、清洗,最后生成词云图进行展示。

三、数据分析步骤

1.初始化

复制代码
import os
import xlrd
import xlwt
import execjs
import requests
import time
import jieba
import random
from xlutils.copy import copy
from collections import Counter
from wordcloud import WordCloud
from matplotlib import image

class WyplSpider(object):
    def __init__(self):
        '''
            1、初始化部分
        '''
        self.number = 1
        self.cursor = -1
        self.song_id = r'2013972674'
        self.start_url = r'https://music.163.com/weapi/comment/resource/comments/get?csrf_token='
复制代码

 

2.JS逆向

复制代码
 def Rervese_JS(self, index):
        '''
            2、JS逆向部分
        '''
        js_params = {
            "rid": "R_SO_4_{}".format(self.song_id),
            "threadId": "R_SO_4_{}".format(self.song_id),
            "pageNo": "{}".format(index),
            "pageSize": "20",
            "cursor": "{}".format(self.cursor),
            "offset": "0",
            "orderType": "1",
            "csrf_token": ""
        }
        with open(r'./wyypl.js', 'r', encoding='utf-8') as f:
            js_code = f.read()
        # print(js_code)
        Encrypt_Data = execjs.compile(js_code).call('Encrypt', js_params)
        # print(Encrypt_Data)
        return Encrypt_Data
复制代码

3.构造请求函数

复制代码
 def confrim_form_data(self):
        '''
            3、构造请求参数
        '''
        for index in range(1, 11):
            Encrypt_Data = self.Rervese_JS(index)
            form_data = {
                'params': Encrypt_Data['encText'],
                'encSecKey': Encrypt_Data['encSecKey'],
            }
            self.requests_start_url(form_data)
            time.sleep(random.randint(1, 2))
复制代码

4.请求起始地址,获取响应

复制代码
    def requests_start_url(self, form_data):
        '''
            4、请求起始地址,获取响应
        '''
        headers = {
            'content-length': '598',
            'content-type': 'application/x-www-form-urlencoded',
            'origin': 'https://music.163.com',
            'referer': 'https://music.163.com/song?id=1875268931',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
            'cookie': '_ntes_nnid=eadb0ed7dfc8af2c6ed4396d450e9938,1648110937354; _ntes_nuid=eadb0ed7dfc8af2c6ed4396d450e9938; WEVNSM=1.0.0; WNMCID=mddiyi.1648110937474.01.0; NMTID=00Od0BeiWIkyLEwVUHwssxUTr-r2LIAAAF_uxElTA; _ga=GA1.2.1535768445.1649667578; WM_TID=FtvgfNvoSrNEVQBEBVbBDQYCVb%2FjT7t6; __snaker__id=5TABdjtOTDN1U2tJ; gdxidpyhxdE=VYfr%2BnSPJ6ODm1EDZXGxJW%2BECo%2FRJxYc8v0M2x32VyLNuMSEbZ9jE9Qoysw554Mc9tp4BgS7tUUzjnPg7keV50Oq81vYoukO8aj4LiR6u1emmXKieAAy4Evx%2FWABrZzUtIVU5doj3owiOul19jSchdcirxqNefJ5jA5DLUXMX7XIVfas%3A1658144530294; _9755xjdesxxd_=32; YD00000558929251%3AWM_NI=2v1eq0TfwrPYTg5PfAfeFZ%2FubGfNIo59ROgLJ0DRIwDSt1wB4oNLjAJ95V%2FgdLCQBZw4oH7LNVAmkT2vbvEWE7d5ulnufuxBBGSnhO3m1Eu9vfjSsa6FlnUiH3WXA9x8Vlg%3D; YD00000558929251%3AWM_NIKE=9ca17ae2e6ffcda170e2e6eeaec85c82bce1b9ae548bb08ea3c44f839b9b87c45996869ca6e53dfb918694db2af0fea7c3b92af2aaff86f43c98bab993d35398b9fbb1e54ba6bb8abbae7fa299aaafcc65f7e79eb0e6488af08cb3cc7d888dbcd2f64af8b182ade94b8ae7fbb7ae6da8948db2c152b5eca1a9b6488e8a8f8eea60f2aaae97c85d87e9acd6f868aeabaa96fb4a82999d8feb45f1b3ffa6f833b6baa6b4f26bf69aa48fd8638ced99b9c433bce89a8de637e2a3; YD00000558929251%3AWM_TID=w8IReXz%2BLiFARRQAUUPVCQITUf%2FsedpY; ntes_kaola_ad=1; __bid_n=183ad76cde163b76a44207; FPTOKEN=abAuNIT7IiWc7nm72caPyHMim3GbB3ndEqWZqxe2dsl0h7BsF8NRrDLz11pjH1wnBVo4LWYbU5moL1KsvG8uhXfZ1ybws29YIuDgNWYIk5VZ/DA+XH/4QO9hElD6r3XpGvChd0XqePFXBa7q8jv/cuIuEr56OAwwx4TOLSRup6QrukAfEuTA3j9KKaBlApH3d8US3TvIZciEDriWREOb9/LXvesScPBCvfgfASg52AyfECYfGmOC8Oh+/4H+CxytddLmTFiP1JEV4TdSwrh9qLIu0u3dIKiX1R+cZ4O8kTQn7kweNy99WXhbXwASYqBCvF2AFooxqzWdjot/KTyoNPs4IUucx52JV0jW68S/QEXl9wL4vbOe+OWRUXlmRTbG+J9jeq4dyoffREBTfjIt2Q==|ybGVAEE0NHfQzVDO++o+ZedWI9CsVKGJ/FRuD61FzxU=|10|5bb1d021120aaa6b613f4355f2980e3d; WM_NI=ADAXoWw%2BG0IKVi2cSMLggc3gDF8Fbvs1%2FO%2Fi1lmXoGBmyAmHyDX4wKOk6cWj1Nqa%2B%2Fl4ImO3YEhlXcODWc8Xh2hTx5GCkrbXPKGtZwHVqDbJBw4iVVCIX0GW0ZM7qkuLZms%3D; WM_NIKE=9ca17ae2e6ffcda170e2e6eed8f564959dbed7ce3a8cb48aa2c44f829b8a86d55c9aabfa96b76282bab8b7bc2af0fea7c3b92a839c8fb9ee5db5f589d1cc4196b7fe84d24790b9a984b26d889af8a7e17eb1efe1b0ae63f3b4a3b9ee4d82b3b7a2f044bb9c98d5f6418db5fa84d662f392bf82aa25b0ec878ece399b9ba88ddb698d9c9984bb54aceea8b5fb3f859da593e15494958995d97d8c8abfa5e14282a9fc8ad55e819798d1c65df48683d8f461b794968ef237e2a3; JSESSIONID-WYYY=5x%2BXCiX8V8e%5Cgs7G7Zo%5C%2BX8rbaWljCJ7X4bdO4NBPIdHlx3ygKCFbKDtGh%5ClOozZPayeX0r8X86ROWGQC9eG11E9UI9K%2BZ%2B8TPbXhg4kfqhnm2DR3H1oUJ%2FQ3ru126pV%5CB49jyEu2sDrdwn6ZnnSWxMSfYeBnmSkrf2R4tD0phopQ97B%3A1678264351561; _iuqxldmzr_=32'
        }
        response_first = requests.post(self.start_url, data=form_data, headers=headers).json()
        # print(response_first)
        self.parse_response_first(response_first)
复制代码

5.解析获取评论字段信息

复制代码
    def parse_response_first(self, response_first):
        '''
            5、解析获取评论字段信息
        '''
        # ===============================A、获取评论信息大列表===========================
        comment_infos = response_first['data']['comments']
        self.cursor = response_first['data']['cursor']
        print(self.cursor)
        for comment_info in comment_infos:
            # ===============================B、解析获取评论信息=========================
            # 1、用户ID userId
            user_id = comment_info['user']['userId']
            # 2、用户名称
            user_name = comment_info['user']['nickname']
            # 3、评论内容
            content = comment_info['content']
            # 4、评论时间
            pub_time = comment_info['time']
            pub_time = time.strftime("%Y-%m-%d", time.localtime(pub_time / 1000))
            # 5、翻页cursor
            self.cursor = response_first['data']['cursor']
            # print(user_id, user_name, content, pub_time, self.cursor)
            # ===============================C、构造保存大字典=========================
            data = {
                '数据': [self.number, user_id, user_name, content, pub_time]
            }
            with open('评论文本.txt', 'a+', encoding='utf-8') as f:
                f.write(content)
            self.save_data(data, user_name)
            self.number += 1
复制代码

6.保存Excel数据

复制代码
    def save_data(self, data, user_name):
        '''
            5、保存Excel数据
        '''
        if not os.path.exists(r'./网易云音乐评论内容.xls'):
            # 1、创建 Excel 文件
            wb = xlwt.Workbook(encoding='utf-8')
            # 2、创建新的 Sheet 表
            sheet = wb.add_sheet('数据', cell_overwrite_ok=True)
            # 3、设置 Borders边框样式
            borders = xlwt.Borders()
            borders.left = xlwt.Borders.THIN
            borders.right = xlwt.Borders.THIN
            borders.top = xlwt.Borders.THIN
            borders.bottom = xlwt.Borders.THIN
            borders.left_colour = 0x40
            borders.right_colour = 0x40
            borders.top_colour = 0x40
            borders.bottom_colour = 0x40
            style = xlwt.XFStyle()  # Create Style
            style.borders = borders  # Add Borders to Style
            # 4、写入时居中设置
            align = xlwt.Alignment()
            align.horz = 0x02  # 水平居中
            align.vert = 0x01  # 垂直居中
            style.alignment = align
            # 5、设置表头信息, 遍历写入数据, 保存数据
            header = ('序号', '用户ID', '用户名称', '评论内容', '评论时间')
            for i in range(0, len(header)):
                sheet.col(i).width = 2560 * 3
                #           行,列, 内容,   样式
                sheet.write(0, i, header[i], style)
                wb.save(r'./网易云音乐评论内容.xls')
复制代码

 

可视化处理:

 7.绘制出各个用户在聊天中的发言次数的柱状图

复制代码
import matplotlib.pyplot as plt

plt.bar(user_count.keys(), user_count.values())

plt.xlabel('Users')
plt.xticks(rotation=45)
plt.ylabel('Count')
plt.title('Number of Chats from Each User')

plt.show()
复制代码

 

8.统计各个时间段内的聊天量,并绘制出曲线图

复制代码
from datetime import datetime, timedelta

time_count = defaultdict(int)

start_time = datetime(2023, 6, 9, 0, 0, 0) # 设置开始时间
for chat in chat_list:
  time_str = chat.find('span', {'class': 'time'}).text
  
  time = datetime.strptime(time_str, '%Y-%m-%d %H:%M:%S') + timedelta(hours=8) # 将时间转化为datetime对象,同时加上8小时,从UTC时间转为北京时间
  
  if time >= start_time:
    time_count[time.strftime('%H:%M')] += 1 # 统计聊天量,以小时:分钟为单位

x = list(time_count.keys())
y = list(time_count.values())

plt.plot(x, y)

plt.xlabel('Time')
plt.xticks(rotation=45)
plt.ylabel('Count')
plt.title('Number of Chats by Time')

plt.show()
复制代码

9.检查

复制代码
 # 判断工作表是否存在
        if os.path.exists(r'./网易云音乐评论内容.xls'):
            # 打开工作薄
            wb = xlrd.open_workbook(r'./网易云音乐评论内容.xls')
            # 获取工作薄中所有表的个数
            sheets = wb.sheet_names()
            for i in range(len(sheets)):
                for name in data.keys():
                    worksheet = wb.sheet_by_name(sheets[i])
                    # 获取工作薄中所有表中的表名与数据名对比
                    if worksheet.name == name:
                        # 获取表中已存在的行数
                        rows_old = worksheet.nrows
                        # 将xlrd对象拷贝转化为xlwt对象
                        new_workbook = copy(wb)
                        # 获取转化后的工作薄中的第i张表
                        new_worksheet = new_workbook.get_sheet(i)
                        for num in range(0, len(data[name])):
                            new_worksheet.write(rows_old, num, data[name][num])
                        new_workbook.save(r'./网易云音乐评论内容.xls')
        print(r'***正在保存: 第{}条网易云音乐评论数据: {}'.format(self.number, user_name))
复制代码

 

 

10.读取统计

复制代码
  def show_image(self):
        # 词云图部分
        # 读取原始文本
        with open('评论文本.txt', 'r', encoding='utf-8') as f:
            data = f.read()

        # 进行jieba分词, 数据存到 source_list中
        source_list = list(jieba.cut(data))
        # 去除空格
        source_list = [i for i in source_list if i != ' ']
        # print(source_list)

        # 读取停用词文本
        stop_words = []
        with open('stop_words.txt', 'r', encoding='utf-8') as f:
            for line in f:
                stop_words.append(line.strip().lower())
        # print(stop_words)

        # 去除停用词
        result_words = []
        for word in source_list:
            if word != '\n':
                if word.lower() not in stop_words:
                    if word not in stop_words:
                        result_words.append(word)
        # print(result_words)

        # 统计词频
        word_count = Counter(result_words)
        # print(word_count)
        top_words = word_count.most_common(100)
        # print(top_words)
        # print('==================前20的单词和数量如下===================')
        # for w, c in top_words:
        #     print(w, c)

        # 绘制词云图
        # A、添加背景图片
        mask_pic = image.imread('star.jpg')
        # B、设置词云图样式
        wd = WordCloud(
            font_path="msyh.ttc",
            background_color="white",
            scale=4,
            mask=mask_pic,
            max_words=100,
            contour_width=1,
            contour_color='steelblue',
        ).generate(data)
        # 添加数据
        wd.generate_from_frequencies(dict(top_words))
        wd.to_file('词云图.png')
        print('===================词云图创建完成================')

    def main(self):
        '''
            逻辑控制部分
        '''
        self.confrim_form_data()
        self.show_image()


if __name__ == '__main__':
    wypl = WyplSpider()
    wypl.main()
复制代码

 

11.附完整程序源代码(以及输出结果)

复制代码
  1 import os
  2 import xlrd
  3 import xlwt
  4 import execjs
  5 import requests
  6 import time
  7 import jieba
  8 import random
  9 from xlutils.copy import copy
 10 from collections import Counter
 11 from wordcloud import WordCloud
 12 from matplotlib import image
 13 
 14 class WyplSpider(object):
 15     def __init__(self):
 16         '''
 17             1、初始化部分
 18         '''
 19         self.number = 1
 20         self.cursor = -1
 21         self.song_id = r'2013972674'
 22         self.start_url = r'https://music.163.com/weapi/comment/resource/comments/get?csrf_token='
 23 
 24     def Rervese_JS(self, index):
 25         '''
 26             2、JS逆向部分
 27         '''
 28         js_params = {
 29             "rid": "R_SO_4_{}".format(self.song_id),
 30             "threadId": "R_SO_4_{}".format(self.song_id),
 31             "pageNo": "{}".format(index),
 32             "pageSize": "20",
 33             "cursor": "{}".format(self.cursor),
 34             "offset": "0",
 35             "orderType": "1",
 36             "csrf_token": ""
 37         }
 38         with open(r'./wyypl.js', 'r', encoding='utf-8') as f:
 39             js_code = f.read()
 40         # print(js_code)
 41         Encrypt_Data = execjs.compile(js_code).call('Encrypt', js_params)
 42         # print(Encrypt_Data)
 43         return Encrypt_Data
 44 
 45     def confrim_form_data(self):
 46         '''
 47             3、构造请求参数
 48         '''
 49         for index in range(1, 11):
 50             Encrypt_Data = self.Rervese_JS(index)
 51             form_data = {
 52                 'params': Encrypt_Data['encText'],
 53                 'encSecKey': Encrypt_Data['encSecKey'],
 54             }
 55             self.requests_start_url(form_data)
 56             time.sleep(random.randint(1, 2))
 57 
 58     def requests_start_url(self, form_data):
 59         '''
 60             4、请求起始地址,获取响应
 61         '''
 62         headers = {
 63             'content-length': '598',
 64             'content-type': 'application/x-www-form-urlencoded',
 65             'origin': 'https://music.163.com',
 66             'referer': 'https://music.163.com/song?id=1875268931',
 67             'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
 68             'cookie': '_ntes_nnid=eadb0ed7dfc8af2c6ed4396d450e9938,1648110937354; _ntes_nuid=eadb0ed7dfc8af2c6ed4396d450e9938; WEVNSM=1.0.0; WNMCID=mddiyi.1648110937474.01.0; NMTID=00Od0BeiWIkyLEwVUHwssxUTr-r2LIAAAF_uxElTA; _ga=GA1.2.1535768445.1649667578; WM_TID=FtvgfNvoSrNEVQBEBVbBDQYCVb%2FjT7t6; __snaker__id=5TABdjtOTDN1U2tJ; gdxidpyhxdE=VYfr%2BnSPJ6ODm1EDZXGxJW%2BECo%2FRJxYc8v0M2x32VyLNuMSEbZ9jE9Qoysw554Mc9tp4BgS7tUUzjnPg7keV50Oq81vYoukO8aj4LiR6u1emmXKieAAy4Evx%2FWABrZzUtIVU5doj3owiOul19jSchdcirxqNefJ5jA5DLUXMX7XIVfas%3A1658144530294; _9755xjdesxxd_=32; YD00000558929251%3AWM_NI=2v1eq0TfwrPYTg5PfAfeFZ%2FubGfNIo59ROgLJ0DRIwDSt1wB4oNLjAJ95V%2FgdLCQBZw4oH7LNVAmkT2vbvEWE7d5ulnufuxBBGSnhO3m1Eu9vfjSsa6FlnUiH3WXA9x8Vlg%3D; YD00000558929251%3AWM_NIKE=9ca17ae2e6ffcda170e2e6eeaec85c82bce1b9ae548bb08ea3c44f839b9b87c45996869ca6e53dfb918694db2af0fea7c3b92af2aaff86f43c98bab993d35398b9fbb1e54ba6bb8abbae7fa299aaafcc65f7e79eb0e6488af08cb3cc7d888dbcd2f64af8b182ade94b8ae7fbb7ae6da8948db2c152b5eca1a9b6488e8a8f8eea60f2aaae97c85d87e9acd6f868aeabaa96fb4a82999d8feb45f1b3ffa6f833b6baa6b4f26bf69aa48fd8638ced99b9c433bce89a8de637e2a3; YD00000558929251%3AWM_TID=w8IReXz%2BLiFARRQAUUPVCQITUf%2FsedpY; ntes_kaola_ad=1; __bid_n=183ad76cde163b76a44207; FPTOKEN=abAuNIT7IiWc7nm72caPyHMim3GbB3ndEqWZqxe2dsl0h7BsF8NRrDLz11pjH1wnBVo4LWYbU5moL1KsvG8uhXfZ1ybws29YIuDgNWYIk5VZ/DA+XH/4QO9hElD6r3XpGvChd0XqePFXBa7q8jv/cuIuEr56OAwwx4TOLSRup6QrukAfEuTA3j9KKaBlApH3d8US3TvIZciEDriWREOb9/LXvesScPBCvfgfASg52AyfECYfGmOC8Oh+/4H+CxytddLmTFiP1JEV4TdSwrh9qLIu0u3dIKiX1R+cZ4O8kTQn7kweNy99WXhbXwASYqBCvF2AFooxqzWdjot/KTyoNPs4IUucx52JV0jW68S/QEXl9wL4vbOe+OWRUXlmRTbG+J9jeq4dyoffREBTfjIt2Q==|ybGVAEE0NHfQzVDO++o+ZedWI9CsVKGJ/FRuD61FzxU=|10|5bb1d021120aaa6b613f4355f2980e3d; WM_NI=ADAXoWw%2BG0IKVi2cSMLggc3gDF8Fbvs1%2FO%2Fi1lmXoGBmyAmHyDX4wKOk6cWj1Nqa%2B%2Fl4ImO3YEhlXcODWc8Xh2hTx5GCkrbXPKGtZwHVqDbJBw4iVVCIX0GW0ZM7qkuLZms%3D; WM_NIKE=9ca17ae2e6ffcda170e2e6eed8f564959dbed7ce3a8cb48aa2c44f829b8a86d55c9aabfa96b76282bab8b7bc2af0fea7c3b92a839c8fb9ee5db5f589d1cc4196b7fe84d24790b9a984b26d889af8a7e17eb1efe1b0ae63f3b4a3b9ee4d82b3b7a2f044bb9c98d5f6418db5fa84d662f392bf82aa25b0ec878ece399b9ba88ddb698d9c9984bb54aceea8b5fb3f859da593e15494958995d97d8c8abfa5e14282a9fc8ad55e819798d1c65df48683d8f461b794968ef237e2a3; JSESSIONID-WYYY=5x%2BXCiX8V8e%5Cgs7G7Zo%5C%2BX8rbaWljCJ7X4bdO4NBPIdHlx3ygKCFbKDtGh%5ClOozZPayeX0r8X86ROWGQC9eG11E9UI9K%2BZ%2B8TPbXhg4kfqhnm2DR3H1oUJ%2FQ3ru126pV%5CB49jyEu2sDrdwn6ZnnSWxMSfYeBnmSkrf2R4tD0phopQ97B%3A1678264351561; _iuqxldmzr_=32'
 69         }
 70         response_first = requests.post(self.start_url, data=form_data, headers=headers).json()
 71         # print(response_first)
 72         self.parse_response_first(response_first)
 73 
 74     def parse_response_first(self, response_first):
 75         '''
 76             5、解析获取评论字段信息
 77         '''
 78         # ===============================A、获取评论信息大列表===========================
 79         comment_infos = response_first['data']['comments']
 80         self.cursor = response_first['data']['cursor']
 81         print(self.cursor)
 82         for comment_info in comment_infos:
 83             # ===============================B、解析获取评论信息=========================
 84             # 1、用户ID userId
 85             user_id = comment_info['user']['userId']
 86             # 2、用户名称
 87             user_name = comment_info['user']['nickname']
 88             # 3、评论内容
 89             content = comment_info['content']
 90             # 4、评论时间
 91             pub_time = comment_info['time']
 92             pub_time = time.strftime("%Y-%m-%d", time.localtime(pub_time / 1000))
 93             # 5、翻页cursor
 94             self.cursor = response_first['data']['cursor']
 95             # print(user_id, user_name, content, pub_time, self.cursor)
 96             # ===============================C、构造保存大字典=========================
 97             data = {
 98                 '数据': [self.number, user_id, user_name, content, pub_time]
 99             }
100             with open('评论文本.txt', 'a+', encoding='utf-8') as f:
101                 f.write(content)
102             self.save_data(data, user_name)
103             self.number += 1
104 
105     def save_data(self, data, user_name):
106         '''
107             5、保存Excel数据
108         '''
109         if not os.path.exists(r'./网易云音乐评论内容.xls'):
110             # 1、创建 Excel 文件
111             wb = xlwt.Workbook(encoding='utf-8')
112             # 2、创建新的 Sheet 表
113             sheet = wb.add_sheet('数据', cell_overwrite_ok=True)
114             # 3、设置 Borders边框样式
115             borders = xlwt.Borders()
116             borders.left = xlwt.Borders.THIN
117             borders.right = xlwt.Borders.THIN
118             borders.top = xlwt.Borders.THIN
119             borders.bottom = xlwt.Borders.THIN
120             borders.left_colour = 0x40
121             borders.right_colour = 0x40
122             borders.top_colour = 0x40
123             borders.bottom_colour = 0x40
124             style = xlwt.XFStyle()  # Create Style
125             style.borders = borders  # Add Borders to Style
126             # 4、写入时居中设置
127             align = xlwt.Alignment()
128             align.horz = 0x02  # 水平居中
129             align.vert = 0x01  # 垂直居中
130             style.alignment = align
131             # 5、设置表头信息, 遍历写入数据, 保存数据
132             header = ('序号', '用户ID', '用户名称', '评论内容', '评论时间')
133             for i in range(0, len(header)):
134                 sheet.col(i).width = 2560 * 3
135                 #           行,列, 内容,   样式
136                 sheet.write(0, i, header[i], style)
137                 wb.save(r'./网易云音乐评论内容.xls')
138 
139 #可视化柱状图
140 import matplotlib.pyplot as plt
141     plt.bar(user_count.keys(), user_count.values())
142 
143     plt.xlabel('Users')
144     plt.xticks(rotation=45)
145     plt.ylabel('Count')
146     plt.title('Number of Chats from Each User')
147 
148     plt.show()
149 
150 #可视化聊天曲线
151 from datetime import datetime, timedelta
152 
153     time_count = defaultdict(int)
154 
155     start_time = datetime(2023, 6, 9, 0, 0, 0) # 设置开始时间
156     for chat in chat_list:
157       time_str = chat.find('span', {'class': 'time'}).text
158   
159       time = datetime.strptime(time_str, '%Y-%m-%d %H:%M:%S') + timedelta(hours=8) # 将时间转化为datetime对象,同时加上8小时,从UTC时间转为北京时间
160   
161       if time >= start_time:
162         time_count[time.strftime('%H:%M')] += 1 # 统计聊天量,以小时:分钟为单位
163 
164     x = list(time_count.keys())
165     y = list(time_count.values())
166 
167     plt.plot(x, y)
168 
169     plt.xlabel('Time')
170     plt.xticks(rotation=45)
171     plt.ylabel('Count')
172     plt.title('Number of Chats by Time')
173 
174     plt.show()
175 
176  # 判断工作表是否存在
177         if os.path.exists(r'./网易云音乐评论内容.xls'):
178             # 打开工作薄
179             wb = xlrd.open_workbook(r'./网易云音乐评论内容.xls')
180             # 获取工作薄中所有表的个数
181             sheets = wb.sheet_names()
182             for i in range(len(sheets)):
183                 for name in data.keys():
184                     worksheet = wb.sheet_by_name(sheets[i])
185                     # 获取工作薄中所有表中的表名与数据名对比
186                     if worksheet.name == name:
187                         # 获取表中已存在的行数
188                         rows_old = worksheet.nrows
189                         # 将xlrd对象拷贝转化为xlwt对象
190                         new_workbook = copy(wb)
191                         # 获取转化后的工作薄中的第i张表
192                         new_worksheet = new_workbook.get_sheet(i)
193                         for num in range(0, len(data[name])):
194                             new_worksheet.write(rows_old, num, data[name][num])
195                         new_workbook.save(r'./网易云音乐评论内容.xls')
196         print(r'***正在保存: 第{}条网易云音乐评论数据: {}'.format(self.number, user_name))
197 
198   def show_image(self):
199         # 词云图部分
200         # 读取原始文本
201         with open('评论文本.txt', 'r', encoding='utf-8') as f:
202             data = f.read()
203 
204         # 进行jieba分词, 数据存到 source_list中
205         source_list = list(jieba.cut(data))
206         # 去除空格
207         source_list = [i for i in source_list if i != ' ']
208         # print(source_list)
209 
210         # 读取停用词文本
211         stop_words = []
212         with open('stop_words.txt', 'r', encoding='utf-8') as f:
213             for line in f:
214                 stop_words.append(line.strip().lower())
215         # print(stop_words)
216 
217         # 去除停用词
218         result_words = []
219         for word in source_list:
220             if word != '\n':
221                 if word.lower() not in stop_words:
222                     if word not in stop_words:
223                         result_words.append(word)
224         # print(result_words)
225 
226         # 统计词频
227         word_count = Counter(result_words)
228         # print(word_count)
229         top_words = word_count.most_common(100)
230         # print(top_words)
231         # print('==================前20的单词和数量如下===================')
232         # for w, c in top_words:
233         #     print(w, c)
234 
235         # 绘制词云图
236         # A、添加背景图片
237         mask_pic = image.imread('star.jpg')
238         # B、设置词云图样式
239         wd = WordCloud(
240             font_path="msyh.ttc",
241             background_color="white",
242             scale=4,
243             mask=mask_pic,
244             max_words=100,
245             contour_width=1,
246             contour_color='steelblue',
247         ).generate(data)
248         # 添加数据
249         wd.generate_from_frequencies(dict(top_words))
250         wd.to_file('词云图.png')
251         print('===================词云图创建完成================')
252 
253     def main(self):
254         '''
255             逻辑控制部分
256         '''
257         self.confrim_form_data()
258         self.show_image()
259 
260 
261 if __name__ == '__main__':
262     wypl = WyplSpider()
263     wypl.main()
复制代码

 

 

 

 JS文件:

var window = global;
var CryptoJS = require('crypto-js')

function RSAKeyPair(a, b, c) {
this.e = biFromHex(a),
this.d = biFromHex(b),
this.m = biFromHex(c),
this.chunkSize = 2 * biHighIndex(this.m),
this.radix = 16,
this.barrett = new BarrettMu(this.m)
}
function twoDigit(a) {
return (10 > a ? "0" : "") + String(a)
}
function encryptedString(a, b) {
for (var f, g, h, i, j, k, l, c = new Array, d = b.length, e = 0; d > e; )
c[e] = b.charCodeAt(e),
e++;
for (; 0 != c.length % a.chunkSize; )
c[e++] = 0;
for (f = c.length,
g = "",
e = 0; f > e; e += a.chunkSize) {
for (j = new BigInt,
h = 0,
i = e; i < e + a.chunkSize; ++h)
j.digits[h] = c[i++],
j.digits[h] += c[i++] << 8;
k = a.barrett.powMod(j, a.e),
l = 16 == a.radix ? biToHex(k) : biToString(k, a.radix),
g += l + " "
}
return g.substring(0, g.length - 1)
}
function decryptedString(a, b) {
var e, f, g, h, c = b.split(" "), d = "";
for (e = 0; e < c.length; ++e)
for (h = 16 == a.radix ? biFromHex(c[e]) : biFromString(c[e], a.radix),
g = a.barrett.powMod(h, a.d),
f = 0; f <= biHighIndex(g); ++f)
d += String.fromCharCode(255 & g.digits[f], g.digits[f] >> 8);
return 0 == d.charCodeAt(d.length - 1) && (d = d.substring(0, d.length - 1)),
d
}
function setMaxDigits(a) {
maxDigits = a,
ZERO_ARRAY = new Array(maxDigits);
for (var b = 0; b < ZERO_ARRAY.length; b++)
ZERO_ARRAY[b] = 0;
bigZero = new BigInt,
bigOne = new BigInt,
bigOne.digits[0] = 1
}
function BigInt(a) {
this.digits = "boolean" == typeof a && 1 == a ? null : ZERO_ARRAY.slice(0),
this.isNeg = !1
}
function biFromDecimal(a) {
for (var d, e, f, b = "-" == a.charAt(0), c = b ? 1 : 0; c < a.length && "0" == a.charAt(c); )
++c;
if (c == a.length)
d = new BigInt;
else {
for (e = a.length - c,
f = e % dpl10,
0 == f && (f = dpl10),
d = biFromNumber(Number(a.substr(c, f))),
c += f; c < a.length; )
d = biAdd(biMultiply(d, lr10), biFromNumber(Number(a.substr(c, dpl10)))),
c += dpl10;
d.isNeg = b
}
return d
}
function biCopy(a) {
var b = new BigInt(!0);
return b.digits = a.digits.slice(0),
b.isNeg = a.isNeg,
b
}
function biFromNumber(a) {
var c, b = new BigInt;
for (b.isNeg = 0 > a,
a = Math.abs(a),
c = 0; a > 0; )
b.digits[c++] = a & maxDigitVal,
a >>= biRadixBits;
return b
}
function reverseStr(a) {
var c, b = "";
for (c = a.length - 1; c > -1; --c)
b += a.charAt(c);
return b
}
function biToString(a, b) {
var d, e, c = new BigInt;
for (c.digits[0] = b,
d = biDivideModulo(a, c),
e = hexatrigesimalToChar[d[1].digits[0]]; 1 == biCompare(d[0], bigZero); )
d = biDivideModulo(d[0], c),
digit = d[1].digits[0],
e += hexatrigesimalToChar[d[1].digits[0]];
return (a.isNeg ? "-" : "") + reverseStr(e)
}
function biToDecimal(a) {
var c, d, b = new BigInt;
for (b.digits[0] = 10,
c = biDivideModulo(a, b),
d = String(c[1].digits[0]); 1 == biCompare(c[0], bigZero); )
c = biDivideModulo(c[0], b),
d += String(c[1].digits[0]);
return (a.isNeg ? "-" : "") + reverseStr(d)
}
function digitToHex(a) {
var b = 15
, c = "";
for (i = 0; 4 > i; ++i)
c += hexToChar[a & b],
a >>>= 4;
return reverseStr(c)
}
function biToHex(a) {
var d, b = "";
for (biHighIndex(a),
d = biHighIndex(a); d > -1; --d)
b += digitToHex(a.digits[d]);
return b
}
function charToHex(a) {
var h, b = 48, c = b + 9, d = 97, e = d + 25, f = 65, g = 90;
return h = a >= b && c >= a ? a - b : a >= f && g >= a ? 10 + a - f : a >= d && e >= a ? 10 + a - d : 0
}
function hexToDigit(a) {
var d, b = 0, c = Math.min(a.length, 4);
for (d = 0; c > d; ++d)
b <<= 4,
b |= charToHex(a.charCodeAt(d));
return b
}
function biFromHex(a) {
var d, e, b = new BigInt, c = a.length;
for (d = c,
e = 0; d > 0; d -= 4,
++e)
b.digits[e] = hexToDigit(a.substr(Math.max(d - 4, 0), Math.min(d, 4)));
return b
}
function biFromString(a, b) {
var g, h, i, j, c = "-" == a.charAt(0), d = c ? 1 : 0, e = new BigInt, f = new BigInt;
for (f.digits[0] = 1,
g = a.length - 1; g >= d; g--)
h = a.charCodeAt(g),
i = charToHex(h),
j = biMultiplyDigit(f, i),
e = biAdd(e, j),
f = biMultiplyDigit(f, b);
return e.isNeg = c,
e
}
function biDump(a) {
return (a.isNeg ? "-" : "") + a.digits.join(" ")
}
function biAdd(a, b) {
var c, d, e, f;
if (a.isNeg != b.isNeg)
b.isNeg = !b.isNeg,
c = biSubtract(a, b),
b.isNeg = !b.isNeg;
else {
for (c = new BigInt,
d = 0,
f = 0; f < a.digits.length; ++f)
e = a.digits[f] + b.digits[f] + d,
c.digits[f] = 65535 & e,
d = Number(e >= biRadix);
c.isNeg = a.isNeg
}
return c
}
function biSubtract(a, b) {
var c, d, e, f;
if (a.isNeg != b.isNeg)
b.isNeg = !b.isNeg,
c = biAdd(a, b),
b.isNeg = !b.isNeg;
else {
for (c = new BigInt,
e = 0,
f = 0; f < a.digits.length; ++f)
d = a.digits[f] - b.digits[f] + e,
c.digits[f] = 65535 & d,
c.digits[f] < 0 && (c.digits[f] += biRadix),
e = 0 - Number(0 > d);
if (-1 == e) {
for (e = 0,
f = 0; f < a.digits.length; ++f)
d = 0 - c.digits[f] + e,
c.digits[f] = 65535 & d,
c.digits[f] < 0 && (c.digits[f] += biRadix),
e = 0 - Number(0 > d);
c.isNeg = !a.isNeg
} else
c.isNeg = a.isNeg
}
return c
}
function biHighIndex(a) {
for (var b = a.digits.length - 1; b > 0 && 0 == a.digits[b]; )
--b;
return b
}
function biNumBits(a) {
var e, b = biHighIndex(a), c = a.digits[b], d = (b + 1) * bitsPerDigit;
for (e = d; e > d - bitsPerDigit && 0 == (32768 & c); --e)
c <<= 1;
return e
}
function biMultiply(a, b) {
var d, h, i, k, c = new BigInt, e = biHighIndex(a), f = biHighIndex(b);
for (k = 0; f >= k; ++k) {
for (d = 0,
i = k,
j = 0; e >= j; ++j,
++i)
h = c.digits[i] + a.digits[j] * b.digits[k] + d,
c.digits[i] = h & maxDigitVal,
d = h >>> biRadixBits;
c.digits[k + e + 1] = d
}
return c.isNeg = a.isNeg != b.isNeg,
c
}
function biMultiplyDigit(a, b) {
var c, d, e, f;
for (result = new BigInt,
c = biHighIndex(a),
d = 0,
f = 0; c >= f; ++f)
e = result.digits[f] + a.digits[f] * b + d,
result.digits[f] = e & maxDigitVal,
d = e >>> biRadixBits;
return result.digits[1 + c] = d,
result
}
function arrayCopy(a, b, c, d, e) {
var g, h, f = Math.min(b + e, a.length);
for (g = b,
h = d; f > g; ++g,
++h)
c[h] = a[g]
}
function biShiftLeft(a, b) {
var e, f, g, h, c = Math.floor(b / bitsPerDigit), d = new BigInt;
for (arrayCopy(a.digits, 0, d.digits, c, d.digits.length - c),
e = b % bitsPerDigit,
f = bitsPerDigit - e,
g = d.digits.length - 1,
h = g - 1; g > 0; --g,
--h)
d.digits[g] = d.digits[g] << e & maxDigitVal | (d.digits[h] & highBitMasks[e]) >>> f;
return d.digits[0] = d.digits[g] << e & maxDigitVal,
d.isNeg = a.isNeg,
d
}
function biShiftRight(a, b) {
var e, f, g, h, c = Math.floor(b / bitsPerDigit), d = new BigInt;
for (arrayCopy(a.digits, c, d.digits, 0, a.digits.length - c),
e = b % bitsPerDigit,
f = bitsPerDigit - e,
g = 0,
h = g + 1; g < d.digits.length - 1; ++g,
++h)
d.digits[g] = d.digits[g] >>> e | (d.digits[h] & lowBitMasks[e]) << f;
return d.digits[d.digits.length - 1] >>>= e,
d.isNeg = a.isNeg,
d
}
function biMultiplyByRadixPower(a, b) {
var c = new BigInt;
return arrayCopy(a.digits, 0, c.digits, b, c.digits.length - b),
c
}
function biDivideByRadixPower(a, b) {
var c = new BigInt;
return arrayCopy(a.digits, b, c.digits, 0, c.digits.length - b),
c
}
function biModuloByRadixPower(a, b) {
var c = new BigInt;
return arrayCopy(a.digits, 0, c.digits, 0, b),
c
}
function biCompare(a, b) {
if (a.isNeg != b.isNeg)
return 1 - 2 * Number(a.isNeg);
for (var c = a.digits.length - 1; c >= 0; --c)
if (a.digits[c] != b.digits[c])
return a.isNeg ? 1 - 2 * Number(a.digits[c] > b.digits[c]) : 1 - 2 * Number(a.digits[c] < b.digits[c]);
return 0
}
function biDivideModulo(a, b) {
var f, g, h, i, j, k, l, m, n, o, p, q, r, s, c = biNumBits(a), d = biNumBits(b), e = b.isNeg;
if (d > c)
return a.isNeg ? (f = biCopy(bigOne),
f.isNeg = !b.isNeg,
a.isNeg = !1,
b.isNeg = !1,
g = biSubtract(b, a),
a.isNeg = !0,
b.isNeg = e) : (f = new BigInt,
g = biCopy(a)),
new Array(f,g);
for (f = new BigInt,
g = a,
h = Math.ceil(d / bitsPerDigit) - 1,
i = 0; b.digits[h] < biHalfRadix; )
b = biShiftLeft(b, 1),
++i,
++d,
h = Math.ceil(d / bitsPerDigit) - 1;
for (g = biShiftLeft(g, i),
c += i,
j = Math.ceil(c / bitsPerDigit) - 1,
k = biMultiplyByRadixPower(b, j - h); -1 != biCompare(g, k); )
++f.digits[j - h],
g = biSubtract(g, k);
for (l = j; l > h; --l) {
for (m = l >= g.digits.length ? 0 : g.digits[l],
n = l - 1 >= g.digits.length ? 0 : g.digits[l - 1],
o = l - 2 >= g.digits.length ? 0 : g.digits[l - 2],
p = h >= b.digits.length ? 0 : b.digits[h],
q = h - 1 >= b.digits.length ? 0 : b.digits[h - 1],
f.digits[l - h - 1] = m == p ? maxDigitVal : Math.floor((m * biRadix + n) / p),
r = f.digits[l - h - 1] * (p * biRadix + q),
s = m * biRadixSquared + (n * biRadix + o); r > s; )
--f.digits[l - h - 1],
r = f.digits[l - h - 1] * (p * biRadix | q),
s = m * biRadix * biRadix + (n * biRadix + o);
k = biMultiplyByRadixPower(b, l - h - 1),
g = biSubtract(g, biMultiplyDigit(k, f.digits[l - h - 1])),
g.isNeg && (g = biAdd(g, k),
--f.digits[l - h - 1])
}
return g = biShiftRight(g, i),
f.isNeg = a.isNeg != e,
a.isNeg && (f = e ? biAdd(f, bigOne) : biSubtract(f, bigOne),
b = biShiftRight(b, i),
g = biSubtract(b, g)),
0 == g.digits[0] && 0 == biHighIndex(g) && (g.isNeg = !1),
new Array(f,g)
}
function biDivide(a, b) {
return biDivideModulo(a, b)[0]
}
function biModulo(a, b) {
return biDivideModulo(a, b)[1]
}
function biMultiplyMod(a, b, c) {
return biModulo(biMultiply(a, b), c)
}
function biPow(a, b) {
for (var c = bigOne, d = a; ; ) {
if (0 != (1 & b) && (c = biMultiply(c, d)),
b >>= 1,
0 == b)
break;
d = biMultiply(d, d)
}
return c
}
function biPowMod(a, b, c) {
for (var d = bigOne, e = a, f = b; ; ) {
if (0 != (1 & f.digits[0]) && (d = biMultiplyMod(d, e, c)),
f = biShiftRight(f, 1),
0 == f.digits[0] && 0 == biHighIndex(f))
break;
e = biMultiplyMod(e, e, c)
}
return d
}
function BarrettMu(a) {
this.modulus = biCopy(a),
this.k = biHighIndex(this.modulus) + 1;
var b = new BigInt;
b.digits[2 * this.k] = 1,
this.mu = biDivide(b, this.modulus),
this.bkplus1 = new BigInt,
this.bkplus1.digits[this.k + 1] = 1,
this.modulo = BarrettMu_modulo,
this.multiplyMod = BarrettMu_multiplyMod,
this.powMod = BarrettMu_powMod
}
function BarrettMu_modulo(a) {
var i, b = biDivideByRadixPower(a, this.k - 1), c = biMultiply(b, this.mu), d = biDivideByRadixPower(c, this.k + 1), e = biModuloByRadixPower(a, this.k + 1), f = biMultiply(d, this.modulus), g = biModuloByRadixPower(f, this.k + 1), h = biSubtract(e, g);
for (h.isNeg && (h = biAdd(h, this.bkplus1)),
i = biCompare(h, this.modulus) >= 0; i; )
h = biSubtract(h, this.modulus),
i = biCompare(h, this.modulus) >= 0;
return h
}
function BarrettMu_multiplyMod(a, b) {
var c = biMultiply(a, b);
return this.modulo(c)
}
function BarrettMu_powMod(a, b) {
var d, e, c = new BigInt;
for (c.digits[0] = 1,
d = a,
e = b; ; ) {
if (0 != (1 & e.digits[0]) && (c = this.multiplyMod(c, d)),
e = biShiftRight(e, 1),
0 == e.digits[0] && 0 == biHighIndex(e))
break;
d = this.multiplyMod(d, d)
}
return c
}

var maxDigits, ZERO_ARRAY, bigZero, bigOne, dpl10, lr10, hexatrigesimalToChar, hexToChar, highBitMasks, lowBitMasks, biRadixBase = 2, biRadixBits = 16, bitsPerDigit = biRadixBits, biRadix = 65536, biHalfRadix = biRadix >>> 1, biRadixSquared = biRadix * biRadix, maxDigitVal = biRadix - 1, maxInteger = 9999999999999998;
setMaxDigits(20),
dpl10 = 15,
lr10 = biFromNumber(1e15),
hexatrigesimalToChar = new Array("0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"),
hexToChar = new Array("0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f"),
highBitMasks = new Array(0,32768,49152,57344,61440,63488,64512,65024,65280,65408,65472,65504,65520,65528,65532,65534,65535),
lowBitMasks = new Array(0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535);

!function() {
function a(a) {
var d, e, b = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", c = "";
for (d = 0; a > d; d += 1)
e = Math.random() * b.length,
e = Math.floor(e),
c += b.charAt(e);
return c
}
function b(a, b) {
var c = CryptoJS.enc.Utf8.parse(b)
, d = CryptoJS.enc.Utf8.parse("0102030405060708")
, e = CryptoJS.enc.Utf8.parse(a)
, f = CryptoJS.AES.encrypt(e, c, {
iv: d,
mode: CryptoJS.mode.CBC
});
return f.toString()
}
function c(a, b, c) {
var d, e;
return setMaxDigits(131),
d = new RSAKeyPair(b,"",c),
e = encryptedString(d, a)
}
function d(d, e, f, g) {
var h = {}
, i = a(16);
return h.encText = b(d, g),
h.encText = b(h.encText, i),
h.encSecKey = c(i, e, f),
h
}
function e(a, b, d, e) {
var f = {};
return f.encText = c(a + e, b, d),
f
}
window.asrsea = d,
window.ecnonasr = e
}();

function Encrypt(i0x) {
var TH5M = '00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7'
var bMr1x = window.asrsea(JSON.stringify(i0x), '010001', TH5M, '0CoJUm6Qyw8W8jud');
return bMr1x
}

//console.log(Encrypt())

四、总结

  首先,学习爬虫需要具备一定的编程基础和计算机网络等相关知识。需要掌握HTML、CSS、JavaScript等前端技术,能够使用Python、Java等编程语言进行开发。本文代码是对网易云评论内容进行爬取,并对其进行数据处理和可视化展示。使用了requests库获取特定歌曲的评论信息,并通过Beautifulsoup库解析HTML文档,最终得到评论信息的数据列表。接着,我们将其进行进一步的字段解析,从中提取出需要的信息,比如用户ID、用户名、评论内容、评论时间等。然后,我们通过xlwt库将这些信息保存到Excel表格中,并将数据写入文本文件中以备后续分析使用。此外,我们还利用Matplotlib库绘制了柱状图和聊天曲线等图形,方便用户进行数据可视化展示。同样使用了requests库获取特定歌曲的评论信息,并使用正则表达式解析HTML文档,提取评论信息中的关键词。之后,我们使用jieba库对关键词进行拆分,并通过Counter库统计各个关键词的出现频率,生成词云图进行可视化展示。同时,我们使用了os库操作文件系统中的文件,以实现词云图的保存功能。

   总的来说,本文展示了如何利用Python编写爬虫程序,爬取网上各种数据信息,并通过一系列的数据处理和可视化操作,更好地展示和理解所采集的数据。学习爬虫需要具备耐心和细心的态度。由于互联网上的信息形式多样,爬取数据要面临很多问题和挑战,例如网站反爬虫机制、页面布局结构复杂等。需要仔细分析和处理各种情况,并寻找最优解决方案才能获得所需数据。最后,学习爬虫需要掌握良好的法律意识。在进行网站爬取时,需要遵守相关法律法规和道德规范,尊重数据所有人的权益和利益,并避免对网站服务器造成不必要的负担和影响。总之,学习爬虫是一项既有挑战性又有趣味性的技能,不仅可以提高数据采集和处理能力,还有助于深入了解互联网世界的构成和运作方式。但需要注意合法合规、避免滥用等问题,才能真正发挥好爬虫技术的作用。

posted @   作业逆流成河  阅读(2565)  评论(1编辑  收藏  举报
相关博文:
阅读排行:
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
点击右上角即可分享
微信分享提示