QQ音乐爬虫(原创)
记录一次实现搜索QQ音乐关键字,并下载歌曲的过程。
1.在播放歌曲界面,在chrome的network界面选择按size排序,直接找到最大的那个链接,即歌曲的下载url
例如:http://113.215.13.161/amobile.music.tc.qq.com/C400004cJXYC3jQv6N.m4a?guid=8103905332&vkey=170F71B1676EA8140FE63DDF496E8BBC48010A9F805BF512D0228DB0DEC54A103D181076E06AE96EAD221A16FAC571D7BEDEED866046154D&uin=0&fromtag=66
2.通过在chrome的nerwork界面搜索得知:C400004cJXYC3jQv6N.m4a?guid=8103905332&vkey=170F71B1676EA8140FE63DDF496E8BBC48010A9F805BF512D0228DB0DEC54A103D181076E06AE96EAD221A16FAC571D7BEDEED866046154D&uin=0&fromtag=66 这整个url都是保存在某url返回的json字符串中。
例如:https://u.y.qq.com/cgi-bin/musicu.fcg?g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8¬ice=0&platform=yqq.json&needNewCode=0&data=%7B%22req%22:%7B%22module%22:%22CDN.SrfCdnDispatchServer%22,%22method%22:%22GetCdnDispatch%22,%22param%22:%7B%22guid%22:%228103905332%22,%22calltype%22:0,%22userip%22:%22%22%7D%7D,%22req_0%22:%7B%22module%22:%22vkey.GetVkeyServer%22,%22method%22:%22CgiGetVkey%22,%22param%22:%7B%22guid%22:%228103905332%22,%22songmid%22:%5B%22004cJXYC3jQv6N%22%5D,%22songtype%22:%5B0%5D,%22uin%22:%220%22,%22loginflag%22:1,%22platform%22:%2220%22%7D%7D,%22comm%22:%7B%22uin%22:0,%22format%22:%22json%22,%22ct%22:24,%22cv%22:0%7D%7D
3.而构建步骤2的url只需要songMID即可
例如:http://c.y.qq.com/soso/fcgi-bin/client_search_cp?ct=24&qqmusic_ver=1298&new_json=1&remoteplace=txt.yqq.center&searchid=42773248656208759&t=0&aggr=1&cr=1&catZhida=1&lossless=0&flag_qc=0&p=1&n=10&w=MV%20%E5%AE%8C%E7%BE%8E%E4%B8%96%E7%95%8C&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8¬ice=0&platform=yqq.json&needNewCode=0
返回的json字符串中包含有songMID
插曲:本次作业并未参考任何他人的信息。该网站的cookie是通过js代码自动生成的,尝试折腾js代码花了不少功夫;后来在调试post时,发现以上的几个请求都可以不发送cookie。。。
按关键字搜索音乐,并下载搜索到的第一页结果的所有歌曲 import requests,time def download_music(name,string1): string1=string1[0] # 播放歌曲 url = 'http://113.215.13.161/amobile.music.tc.qq.com/{}'.format(string1) headers = { 'Accept-Encoding': "identity;q=1, *;q=0", 'chrome-proxy': "frfr", 'Range': "bytes=0-", 'User-Agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36", } response = requests.get(url,headers=headers) with open('%s.m4a'%name,'wb') as f: print('正在下载%s.m4a'%name) f.write(response.content) def get_download_url(songMID): url = "https://u.y.qq.com/cgi-bin/musicu.fcg" querystring = {"g_tk": "5381", "loginUin": "0", "hostUin": "0", "format": "json", "inCharset": "utf8", "outCharset": "utf-8", "notice": "0", "platform": "yqq.json", "needNewCode": "0", "data": '{"req":{"module":"CDN.SrfCdnDispatchServer","method":"GetCdnDispatch","param":{"guid":"8103905332","calltype":0,"userip":""}},"req_0":{"module":"vkey.GetVkeyServer","method":"CgiGetVkey","param":{"guid":"8103905332","songmid":["%s"],"songtype":[0],"uin":"0","loginflag":1,"platform":"20"}},"comm":{"uin":0,"format":"json","ct":24,"cv":0}}'%songMID} headers = { 'accept': "application/json, text/javascript, */*; q=0.01", 'accept-encoding': "gzip, deflate, br", 'accept-language': "zh-CN,zh;q=0.9", 'origin': "https://y.qq.com", 'referer': "https://y.qq.com/portal/player.html", 'user-agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36", 'cache-control': "no-cache", 'Postman-Token': "b6c72c9a-43b6-4eb4-916e-60c1f9dadb2b" } response = requests.request("GET", url, headers=headers, params=querystring) json_obj = response.json() #返回的respone是jsno字符串 list1 = ['req_0','data','midurlinfo'] list2 = json_obj.copy() for i in list1: list2 = list2.get(i,None) if not list2: break list3 = [] for i in list2: list3.append(i.get('purl',None)) return list3#返回值为包含歌曲下载url的list #返回歌曲名称和songMID def get_songMID(keyword): import requests url = "http://c.y.qq.com/soso/fcgi-bin/client_search_cp" querystring = {"ct": "24", "qqmusic_ver": "1298", "new_json": "1", "remoteplace": "txt.yqq.center", "searchid": "42773248656208759", "t": "0", "aggr": "1", "cr": "1", "catZhida": "1", "lossless": "0", "flag_qc": "0", "p": "1", "n": "10", "w": keyword, "g_tk": "5381", "loginUin": "0", "hostUin": "0", "format": "json", "inCharset": "utf8", "outCharset": "utf-8", "notice": "0", "platform": "yqq.json", "needNewCode": "0"} payload = "" headers = { 'accept': "application/json, text/javascript, */*; q=0.01", 'accept-encoding': "gzip, deflate, br", 'accept-language': "zh-CN,zh;q=0.9", 'origin': "https://y.qq.com", 'referer': "https://y.qq.com/portal/search.html", 'user-agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36", } response = requests.request("GET", url, data=payload, headers=headers, params=querystring) json_obj = response.json() list1 = ['data','song','list'] list2 = json_obj.copy() for i in list1: list2 = list2.get(i,None) if not list2: break # print('---------------------------------------------------------------') for i in list2: mid = i.get('mid',None) name = i.get('name',None) lyric = i.get('lyric',None) singer = i.get('singer',None)[0].get('name') print(mid,name+'_'+singer+'_'+lyric) #003h3CYS3UxDB4 小酒窝_林俊杰_《爱情睡醒了》电视剧插曲 yield mid,name+'_'+singer+'_'+lyric #最终实现功能:搜索关键字,并下载搜索到的第一页结果的所有歌曲 for songmid,song_name in get_songMID('小酒窝'): try: time.sleep(3) abc = get_download_url(songmid) time.sleep(3) download_music(song_name,abc) # break except:pass