酷狗飙升榜,写入CSV文件
爬取酷狗音乐飙升榜的前十首歌名、歌手、时间,是一个很好的爬取网页内容的例子,对爬虫不熟悉的读者可以根据这个例子熟悉爬虫是如何爬取网页内容的。
需要用到的库:requests库、BeautifulSoup库、time库;
请求头:'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
网址:https://www.kugou.com/yy/rank/home/1-6666.html?from=rank
运行完整代码:
1 import requests 2 from bs4 import BeautifulSoup 3 import time 4 5 # 请求头 6 headers = { 7 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' 8 } 9 10 def requests_list(url): 11 wb_data = requests.get(url,headers=headers) 12 soup = BeautifulSoup(wb_data.text,'lxml') 13 nums = soup.select('span.pc_temp_num')#排名 14 titles = soup.select('div.pc_temp_songlist > ul > li > a')#标题 15 times = soup.select('span.pc_temp_tips_r > span')#歌曲时间 16 # 定义一个n方便判断是否是只取飙升榜的前十首歌 17 n=0 18 # 将每次循环爬取出的数据放入空字典中 19 data=[] 20 data.append(['num','singer','song','time']) 21 for num,title,time in zip(nums,titles,times): 22 data.append([ 23 num.get_text().strip(), 24 title.get_text().split('-')[0],#用"-"分割歌手和歌名 25 title.get_text().split('-')[1], 26 time.get_text().strip() 27 ]) 28 n=n+1 29 if n>=10: 30 break 31 print(data) 32 return data 33 34 def save_to_csv(data): 35 # 打开kugou.csv文件,将爬取的数据写入进去 36 fr=open("kugou.csv","w") 37 for s in data: 38 fr.write(",".join(s)+"\n") 39 40 if __name__ == '__main__': 41 urls = "https://www.kugou.com/yy/rank/home/1-6666.html?from=rank" 42 save_to_csv(requests_list(urls))
注意:若在爬取过程中,有存在问题,可以在博客下面评论,小编会进行解答哦