(七) 爬虫之爬取视频和音频文件
之前都是爬取网页中的文本信息,没有爬取过视频和音频文件,所以爬取了下b站和网易云音乐,记录下整个过程,留着学习。
1. 爬取b站视频
1.1 网页分析
最近python机器学习比较火热,那就爬取点机器学习的视频吧。首先打开b站网页,输入“python机器”进行搜索,返回页面中,审查元素可以发现每个视频系列都有一个唯一的ID,如下图所示: av28879057, 即为当前视频的一个ID值。
得知每个视频对应的唯一ID后,点击视频进去查看下,发现视频url主要有这下面这两种:
1:https://www.bilibili.com/video/av28879057 (视频只有一集,url即为上面我们观察到的ID值)
2. https://www.bilibili.com/video/av30292394/?p=3 (视频为一个系列,后面参数p=3,表示该ID下的第三集)
至此我们基本上对于每个视频界面的url构造清楚了,接下来就是寻找视频的下载地址了。刷新下网页,点击播放,查看下网络请求,对结果按大小排序,可以发现一个x-flv格式的大文件的传输请求,应该就是视频的下载地址,如下图所示,可以看到请求需要7个参数,研究了下别的视频后发现,有两个参数是动态变化的:ssig和trid。查看了下其他的json返回请求,并没有发现这两个参数,最后只能去网页源码里搜索下,看看有没有相关的动态生成函数,却发现网页源码中直接包含视频的下载地址,存在于一个window.__playinfo__={} 的字典json中,只需对其正则匹配就行了,这下就简单了。
将这个字典匹配后进行查看,结果如下:可以发现整个视频被拆分成了多个小的视频,按顺序进行了编号,order为序号,url即为视频下载地址,因此只需要分别对这些视频进行下载,最后再拼接就可以了。
{ "code": 0, "message": "0", "ttl": 1, "data": { "from": "local", "result": "suee", "message": "", "quality": 32, "format": "flv480", "timelength": 7121936, "accept_format": "flv720,flv480,flv360", "accept_description": ["高清 720P", "清晰 480P", "流畅 360P"], "accept_quality": [64, 32, 16], "video_codecid": 7, "seek_param": "start", "seek_type": "offset", "durl": [{ "order": 1, "length": 363246, "size": 24653145, "ahead": "EZA=", "vhead": "AWQAH//hAB5nZAAfrNlAvD3m//DQEM/xAAADAAEAAAMAPA8YMZYBAAVo6+zyPA==", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-1-32.flv?expires=1554535500&platform=pc&ssig=tz7ktrLd7bdj8qukIG9cjQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-1-32.flv?expires=1554535500&platform=pc&ssig=tz7ktrLd7bdj8qukIG9cjQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-1-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=732e5ee7aad2a9a08406b92aa0bb2ca3&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 2, "length": 330944, "size": 23865726, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-2-32.flv?expires=1554535500&platform=pc&ssig=LemBQ8rVic-aAAN9iXwWGg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-2-32.flv?expires=1554535500&platform=pc&ssig=LemBQ8rVic-aAAN9iXwWGg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-2-32.flv?e=ig8euxZM2rNcNbR3hwdVhoM1nwdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=bb0c67342e48e1a8b438dcc9606f9e91&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 3, "length": 352981, "size": 25848758, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-3-32.flv?expires=1554535500&platform=pc&ssig=vSDeETHYfUOLYf8caLiW5Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-3-32.flv?expires=1554535500&platform=pc&ssig=vSDeETHYfUOLYf8caLiW5Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-3-32.flv?e=ig8euxZM2rNcNbR3hbUVhoM1nwNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=30faa351c57a559f7b69654809418da9&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 4, "length": 394413, "size": 26565740, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-4-32.flv?expires=1554535500&platform=pc&ssig=uaupgm_tbgSyVbou66oO-A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-4-32.flv?expires=1554535500&platform=pc&ssig=uaupgm_tbgSyVbou66oO-A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-4-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=2bb21503e670b1a82769ed6524ea7c25&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 5, "length": 388312, "size": 26901267, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-5-32.flv?expires=1554535500&platform=pc&ssig=DM7BjFfnFGzoux7NA7Ix5g&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-5-32.flv?expires=1554535500&platform=pc&ssig=DM7BjFfnFGzoux7NA7Ix5g&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-5-32.flv?e=ig8euxZM2rNcNbRahbUVhoM17zNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=68a9f6b8213285eb7fba15736e2c683b&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 6, "length": 239979, "size": 15473865, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-6-32.flv?expires=1554535500&platform=pc&ssig=KGQ7DIH2XeAfW0QU4C7X7w&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-6-32.flv?expires=1554535500&platform=pc&ssig=KGQ7DIH2XeAfW0QU4C7X7w&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-6-32.flv?e=ig8euxZM2rNcNbRjhwdVhoM17bdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=4e27dfa3076edd399b0e6ee547f1dd51&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 7, "length": 426645, "size": 29245686, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-7-32.flv?expires=1554535500&platform=pc&ssig=X_NsbB2FEjaE4W2yGI2YMQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-7-32.flv?expires=1554535500&platform=pc&ssig=X_NsbB2FEjaE4W2yGI2YMQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-7-32.flv?e=ig8euxZM2rNcNbRahwdVhoM17zdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=c3ef5ea3bdd2ab1ac310970d85341c80&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 8, "length": 423211, "size": 30372670, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-8-32.flv?expires=1554535500&platform=pc&ssig=rU90cc9rkqn--2je747LAQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-8-32.flv?expires=1554535500&platform=pc&ssig=rU90cc9rkqn--2je747LAQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-8-32.flv?e=ig8euxZM2rNcNbRa7zUVhoM17zuBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=98d5301937834486e0bd9c2996cd73f4&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 9, "length": 291178, "size": 19475045, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-9-32.flv?expires=1554535500&platform=pc&ssig=sMfGnyjVuKCsOzIp9EAanQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-9-32.flv?expires=1554535500&platform=pc&ssig=sMfGnyjVuKCsOzIp9EAanQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-9-32.flv?e=ig8euxZM2rNcNbRj7WdVhoM17bUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=08439132f1831b423be6577c7bd5ef89&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 10, "length": 370880, "size": 25219151, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-10-32.flv?expires=1554535500&platform=pc&ssig=kKqhofi4ayRRMoquCxz-pw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-10-32.flv?expires=1554535500&platform=pc&ssig=kKqhofi4ayRRMoquCxz-pw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-10-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=3d5d140e0dd02a83245ae86da23eb8b9&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 11, "length": 381612, "size": 26624914, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-11-32.flv?expires=1554535500&platform=pc&ssig=HFFhsFFGyXOV8Q3QmF8sJQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-11-32.flv?expires=1554535500&platform=pc&ssig=HFFhsFFGyXOV8Q3QmF8sJQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-11-32.flv?e=ig8euxZM2rNcNbRahbUVhoM17zNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=63f2b71981080c752eed5166a9a85332&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 12, "length": 361344, "size": 25254786, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-12-32.flv?expires=1554535500&platform=pc&ssig=UuAqqNbr1xC5gMlu5FUYdQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-12-32.flv?expires=1554535500&platform=pc&ssig=UuAqqNbr1xC5gMlu5FUYdQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-12-32.flv?e=ig8euxZM2rNcNbRahbUVhoM17zNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=1705e8d3f1075a717c6a91ae018396fe&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 13, "length": 334912, "size": 24639608, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-13-32.flv?expires=1554535500&platform=pc&ssig=MQbcDgFo8iqQ2Uf4yO-L0A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-13-32.flv?expires=1554535500&platform=pc&ssig=MQbcDgFo8iqQ2Uf4yO-L0A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-13-32.flv?e=ig8euxZM2rNcNbR3hbUVhoM1nwNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=a5f1be479528b8a92a462acab849af46&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 14, "length": 365845, "size": 24930389, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-14-32.flv?expires=1554535500&platform=pc&ssig=bpVSp4oDvkaLf1HTlWl5xA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-14-32.flv?expires=1554535500&platform=pc&ssig=bpVSp4oDvkaLf1HTlWl5xA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-14-32.flv?e=ig8euxZM2rNcNbRahwdVhoM17zdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=31a76487e1d32acd5b573f45a4169997&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 15, "length": 338347, "size": 23943047, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-15-32.flv?expires=1554535500&platform=pc&ssig=ieioDVAxcZLksQ55egulgg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-15-32.flv?expires=1554535500&platform=pc&ssig=ieioDVAxcZLksQ55egulgg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-15-32.flv?e=ig8euxZM2rNcNbRa7WdVhoM17zUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=36cdd93257a88fdc90a0c85f2b9babe3&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 16, "length": 475181, "size": 34293360, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-16-32.flv?expires=1554535500&platform=pc&ssig=Ps_lae8ZoX800sJZh-eRRA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-16-32.flv?expires=1554535500&platform=pc&ssig=Ps_lae8ZoX800sJZh-eRRA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-16-32.flv?e=ig8euxZM2rNcNbR3hwdVhoM1nwdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=fe34135d841548c79f78f687282c6bc3&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 17, "length": 204846, "size": 13746922, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-17-32.flv?expires=1554535500&platform=pc&ssig=mzbEJYcCFWAO0ioYePxG_Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-17-32.flv?expires=1554535500&platform=pc&ssig=mzbEJYcCFWAO0ioYePxG_Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-17-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=915e7dc2c91a4072e91bd43988379c8b&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 18, "length": 469078, "size": 32875195, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-18-32.flv?expires=1554535500&platform=pc&ssig=gdm21_hyrHYWZfsmgPkMDA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-18-32.flv?expires=1554535500&platform=pc&ssig=gdm21_hyrHYWZfsmgPkMDA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-18-32.flv?e=ig8euxZM2rNcNbRa7WdVhoM17zUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=e644f16f487b7bd5625326a550716479&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 19, "length": 328213, "size": 21350561, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-19-32.flv?expires=1554535500&platform=pc&ssig=3LoFiUwUGXFRJHBpigewOw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-19-32.flv?expires=1554535500&platform=pc&ssig=3LoFiUwUGXFRJHBpigewOw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-19-32.flv?e=ig8euxZM2rNcNbRjhbUVhoM17bNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=ec0d506311d0efdbfbf576d297c3ebba&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 20, "length": 280769, "size": 19777669, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-20-32.flv?expires=1554535500&platform=pc&ssig=r8NbvnHMQ58qfdYJHoD4kw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-20-32.flv?expires=1554535500&platform=pc&ssig=r8NbvnHMQ58qfdYJHoD4kw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-20-32.flv?e=ig8euxZM2rNcNbRa7WdVhoM17zUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=1991fbc0d72dfe2aaac05943f26d54e4&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }] }, "session": "e5c0e030d13633062a9889d1390010d9", "videoFrame": {} }
1.2 视频下载
根据上面的分析过程,视频爬取步骤如下:
1,根据视频的ID,构造该视频的url
2,访问视频url,对返回的网页进行正则匹配,拿到所有的视频下载地址和编号
3,根据视频下载地址,将视频保存到本地 (请求头中注意加入Referer和Origin,否则会返回Http 458)
代码如下:
#coding:utf-8 import requests import re import json import os import time import subprocess #传入视频的url def down_video(video_url,path="temp_videos"): """ video_url 待下载的video的url path 下载的视频保存地址 """ #video_url = "https://www.bilibili.com/video/av30292394?p=3" #video_url = "https://www.bilibili.com/video/av28879057" headers = { "User-Agent":"Mozilla/5.0 (Windows NT 6.1; r…) Gecko/20100101 Firefox/66.0", } response = requests.get(video_url,headers=headers) #在网页源码中匹配视频地址信息 match_text = re.search(r'<script>window.__playinfo__=(\{.*?\})</script>',response.text,re.S) #re.S,将字符窜中有换行时,将字符窜作为一个整体进行匹配;(否则一行匹配不到时,再匹配下一行) json_data = json.loads(match_text.group(1),encoding="utf-8") #match_text.group(1)为unicode字符窜 urls = json_data["data"]["durl"] #视频包括多个部分,拿到包括各个部分url的列表 content_size = sum([item["size"] for item in urls]) #视频总大小 print("视频总大小为:%0.2f Mb"%(content_size/(1024*1024))) if not os.path.exists(path): os.mkdir(path) header={ "Origin":"https://www.bilibili.com", "Referer":video_url, #请求头必须添加referer } headers.update(header) size=0 start = time.time() for i,item in enumerate(urls): url = item["url"] try: result = requests.get(url,headers=headers,stream=True,verify=False) print result.status_code video_path = os.path.join(path,"{}.mp4".format(i)) with open(video_path,"wb") as f: for chunk in result.iter_content(1024): f.write(chunk) f.flush() #清空缓存 size = size+len(chunk) #print("已下载:%0.2f Mb"%(size/(1024*1024))) except Exception as e: print("url下载错误:%s"%url) print(e) stop = time.time() print("下载完成,耗时:%0.2f秒"%(stop-start))
1.3 视频拼接
上面下载下来的视频也可以直接播放,但逐个播放比较麻烦,可以利用ffmpeg进行拼接。
首先需要下载ffmpeg(https://ffmpeg.zeranoe.com/builds/),解压将其拷贝到相应的文件夹,然后将bin目录下的ffmpeg.exe加入到环境变量,命令行输入ffmpeg -version, 返回提示信息即安装成功
ffmpeg拼接视频的命令语句为: ffmpeg -f concat -safe 0 -i path.txt -c copy output.mp4
其中path.txt包含需要拼接的视频的路径,格式如下:(表示video路径下的v_1.mp4)
file 'video/v_1.mp4' file 'video/v_2.mp4' file 'video/v_3.mp4'
output.mp4表示拼接后的视频存放地址,也可以写成 video/output.mp4,即保存到video文件夹下。
最终拼接的代码如下:
#将下载的多个视频拼接成一个完整的视频 def concatenate(path,title,output="vidoes"): """ path 为待拼接的视频的保存地址 title 为拼接后视频的名称 output 为拼接后视频保存的地址 """ with open("path.txt",'w') as f: for root,dirs,files in os.walk(path): for file in files: if os.path.splitext(file)[1] in [".flv",".mkv",".mp4"]: v_path = os.path.join(root,file) f.write("file '{}'\n".format(v_path)) if os.path.exists("path.txt"): if not os.path.exists(output): os.mkdir(output) try: print("开始合并视频") path_name = os.path.join(output,title+".mp4") ffmpeg_command = r"D:\ffmpeg-win32-static\bin\ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name) #若将D:\ffmpeg-win32-static\bin\ffmpeg.exe路径加入环境变量,可以用"ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name) #print ffmpeg_command subprocess.call(ffmpeg_command) subprocess.call("rmdir /s %s"%path) #windows 删除目录 subprocess.call("del path.txt") #windows 删除文件 except Exception as e: print(e)
完整的代码如下:
#coding:utf-8 import requests import re import json import os import time import subprocess #传入视频的url def down_video(video_url,path="temp_videos"): """ video_url 待下载的video的url path 下载的视频保存地址 """ #video_url = "https://www.bilibili.com/video/av30292394?p=3" #video_url = "https://www.bilibili.com/video/av28879057" headers = { "User-Agent":"Mozilla/5.0 (Windows NT 6.1; r…) Gecko/20100101 Firefox/66.0", } response = requests.get(video_url,headers=headers) #在网页源码中匹配视频地址信息 match_text = re.search(r'<script>window.__playinfo__=(\{.*?\})</script>',response.text,re.S) #re.S,将字符窜中有换行时,将字符窜作为一个整体进行匹配;(否则一行匹配不到时,再匹配下一行) json_data = json.loads(match_text.group(1),encoding="utf-8") #match_text.group(1)为unicode字符窜 urls = json_data["data"]["durl"] #视频包括多个部分,拿到包括各个部分url的列表 content_size = sum([item["size"] for item in urls]) #视频总大小 print("视频总大小为:%0.2f Mb"%(content_size/(1024*1024))) if not os.path.exists(path): os.mkdir(path) header={ "Origin":"https://www.bilibili.com", "Referer":video_url, #请求头必须添加referer } headers.update(header) size=0 start = time.time() for i,item in enumerate(urls): url = item["url"] try: result = requests.get(url,headers=headers,stream=True,verify=False) print result.status_code video_path = os.path.join(path,"{}.mp4".format(i)) with open(video_path,"wb") as f: for chunk in result.iter_content(1024): f.write(chunk) f.flush() #清空缓存 size = size+len(chunk) #print("已下载:%0.2f Mb"%(size/(1024*1024))) except Exception as e: print("url下载错误:%s"%url) print(e) stop = time.time() print("下载完成,耗时:%0.2f秒"%(stop-start)) #将下载的多个视频拼接成一个完整的视频 def concatenate(path,title,output="vidoes"): """ path 为待拼接的视频的保存地址 title 为拼接后视频的名称 output 为拼接后视频保存的地址 """ with open("path.txt",'w') as f: for root,dirs,files in os.walk(path): for file in files: if os.path.splitext(file)[1] in [".flv",".mkv",".mp4"]: v_path = os.path.join(root,file) f.write("file '{}'\n".format(v_path)) if os.path.exists("path.txt"): if not os.path.exists(output): os.mkdir(output) try: print("开始合并视频") path_name = os.path.join(output,title+".mp4") ffmpeg_command = r"D:\ffmpeg-win32-static\bin\ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name) #若将D:\ffmpeg-win32-static\bin\ffmpeg.exe路径加入环境变量,可以用"ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name) #print ffmpeg_command subprocess.call(ffmpeg_command) subprocess.call("rmdir /s %s"%path) #windows 删除目录 subprocess.call("del path.txt") #windows 删除文件 except Exception as e: print(e) if __name__=="__main__": # down_video("https://www.bilibili.com/video/av28879057") # concatenate("temp_videos",title="python") down_video("https://www.bilibili.com/video/av30292394?p=3") concatenate("temp_videos",title="python机器学习与量化分析")
参考:
https://amberwest.github.io/2018/09/11/%E7%94%A8python%E4%B8%8B%E8%BD%BD%E5%93%94%E5%93%A9%E5%93%94%E5%93%A9%E8%A7%86%E9%A2%91/
https://github.com/Henryhaohao/Bilibili_video_download
2. 爬取网易云音乐
2.1 网页分析
查看了下网页版的网易云音乐,也是每首歌有一个ID,如下,对应的网址组成为 https://music.163.com/song?id=1353372483(请求时网易自动添加了一个“#”,从而变成了https://music.163.com/#/song?id=1352541009)
接着刷新网页,看下网络请求,同样按大小排序,可以发现一个较大的mp3传输请求,如下图所示:该url即为音乐的下载url,直接发送请求就能下载该视频,剩下就是如何获得每首歌的下载url。
查看了下其他xhr请求的返回值,发现了如下的返回值,可以看到其包含了歌曲的相关信息,从中可以拿到我们需要的url。观察这个请求,发现是一个post请求,需要提交表单数据,主要是两个参数'params' 和'encSecKey', 但是是加密后的数据,如下第二张图所示,因此需要对加密方法进行解析。
整理下思路,下载音乐的整个流程可以分为三步,如下:
1.通过get请求,访问https://music.163.com/song?id=1353372483,能拿到歌曲的名字,歌词等基本信息
2.通过post请求,提交两个参数'params' 和'encSecKey',访问https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token=,从返回的json 数据中能拿到歌曲的下载地址和大小等信息
3. 访问歌曲的下载地址(http://m10.music.126.net/20190407154531/74f897c9d014dede19a0905644433907/ymusic/035c/5458/530f/46ebf59083c2f04cc090de3b1e0beaf0.mp3),将其写到本地,即完成下载信息
因此,剩下的就是如何构造加密后的两个参数'params' 和'encSecKey'。点击浏览器的source选项,在每个js文件下搜索下encSecKey(或者直接ctrl+shift +f 全局搜索),在如下js文件中找到了相关的代码,正好包括了我们需要的两个参数。
对上面的代码进行分析,主要是var bYl2x = window.asrsea()这个函数完成具体的工作,搜索这个函数发现了如下的语句 window.asrsea = d, 即该函数是d函数,而d函数中调用了一次a函数,两次b函数和一次c函数
其中a函数主要是产生一组随机的字符窜,这里是a(16)产生一个包含16个字符的随机字符窜,上面js代码和对应的python实现如下:
#a 函数 function a(a) { var d, e, b = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", c = ""; for (d = 0; a > d; d += 1) e = Math.random() * b.length, e = Math.floor(e), c += b.charAt(e); return c } #对应python产生随机字符窜代码 def random_str(size): return binascii.hexlify(os.urandom(size))[:16] #binascii.hexlify()接受byte字符窜,返回ascii字符窜
b函数是对数据进行AES对称加密,js代码和对应的python实现如下:
python需要用到Crypto模块,pip install crypto安装会有问题,通过如下方式安装:(windows 7和python2.7环境安装成功)
python -m pip install pycrypto
#b函数 function b(a, b) { var c = CryptoJS.enc.Utf8.parse(b) , d = CryptoJS.enc.Utf8.parse("0102030405060708") , e = CryptoJS.enc.Utf8.parse(a) , f = CryptoJS.AES.encrypt(e, c, { iv: d, mode: CryptoJS.mode.CBC }); return f.toString() } #python 实现b函数 from Crypto.Cipher import AES import base64 def get_params(text,key): #AES对称加密 iv = '0102030405060708' pad = 16 - len(text)%16 text = text + pad * chr(pad) encryptor = AES.new(key, AES.MODE_CBC, iv) result = encryptor.encrypt(text) result_str = base64.b64encode(result).decode('utf-8') return result_str
c函数是对数据进行RSA不对称加密,s代码和对应的python实现如下:
#c函数 function c(a, b, c) { var d, e; return setMaxDigits(131), d = new RSAKeyPair(b,"",c), e = encryptedString(d, a) } #python实现c函数 def get_encSecKey(text,pubkey,modulus): #rsa不对称加密 text = text[::-1] rs = pow(int(binascii.hexlify(text),16),int(pubkey,16),int(modulus,16)) return format(rs,'x').zfill(256)
接下来就该分析下window.asrsea()传入的四个参数了,需要插入断点,如图所示,点击某一行插入断点,然后点击播放音乐,执行到断点处后,点击右边红圈处的两个按钮(第一个向下执行一个过程,第二个向下执行一句),当我们选中四个参数中的某一个时(复制时那样选中),即能看到该参数的值。
如下图是选中第二个参数时,显示的值为“010001”,说明第二个参数为一个常量,查看其它参数后发现第二三四个参数都为常量,第一个参数为与id相关的json数据。四个参数的示例可以见下面:
四个参数示例:
first_param = {"ids":"[1353194608]","level":"standard","encodeType":"aac","csrf_token":""} second_param = "010001" third_param = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7" fourth_param = "0CoJUm6Qyw8W8jud"
上面整个过程只需要利用歌曲的ID值和上面三个常量参数,就可以构造最终的加密数据了,剩下的就是写代码了
2.2 歌曲下载
根据上面的分析过程,代码书写流程如下:
1,根据歌曲id值,访问https://music.163.com/song?id=1353372483,利用正则表达式匹配网页内容,获得歌曲名称
2,计算加密后的参数'params' 和'encSecKey',post请求访问https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token=,拿到歌曲url和size
3. 访问歌曲的下载地址,将结果写到本地
完整代码如下:
#coding:utf-8 import os import binascii from Crypto.Cipher import AES import base64 import json import requests import re first_param = {"ids":"[1353194608]","level":"standard","encodeType":"aac","csrf_token":""} second_param = "010001" third_param = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7" fourth_param = "0CoJUm6Qyw8W8jud" headers={ "Referer":"https://music.163.com/", "User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Mobile Safari/537.36" } def random_str(size): return binascii.hexlify(os.urandom(size))[:16] #binascii.hexlify()接受byte字符窜,返回ascii字符窜 def get_params(text,key): #AES对称加密 iv = '0102030405060708' pad = 16 - len(text)%16 text = text + pad * chr(pad) encryptor = AES.new(key, AES.MODE_CBC, iv) result = encryptor.encrypt(text) result_str = base64.b64encode(result).decode('utf-8') return result_str def get_encSecKey(text,pubkey,modulus): #rsa不对称加密 text = text[::-1] rs = pow(int(binascii.hexlify(text),16),int(pubkey,16),int(modulus,16)) return format(rs,'x').zfill(256) def encrypt_data(first_param,second_param,third_param,fourth_param): data={} i = random_str(16) temp = get_params(json.dumps(first_param),fourth_param) params = get_params(temp,i) encSecKey = get_encSecKey(i,second_param,third_param) data['params']=params.encode("utf-8") data['encSecKey']=encSecKey return data #获取歌曲名称 def get_song_title(id): url = "https://music.163.com/song?id=%s"%(id) response = requests.get(url,headers=headers) title = re.search(r'<title>(.*?)\s-',response.text).group(1) #匹配歌曲标题 #print(title) return title #获取歌曲的下载地址,大小等信息 def get_song_info(id): first_param['ids'] = "[%s]"%id data = encrypt_data(first_param,second_param,third_param,fourth_param) url="https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token=" response = requests.post(url,headers=headers,data=data) #print response.status_code json_data = json.loads(response.text) return json_data #下载歌曲 def down_song(id,down_url,song_title,size): filename = song_title+str(id)+".mp3" print("歌曲大小为:%0.2f Mb"%(size/(1024*1024))) try: result = requests.get(down_url,headers=headers) with open(filename,"wb") as f: for chunk in result.iter_content(1024): f.write(chunk) f.flush() except Exception as e: print("下载失败,id值为:%s"%id) print(e) print("下载完成") if __name__=="__main__": id=input("请输入歌曲的id值,如:1353194608 ") song_title = get_song_title(id) song_info=get_song_info(id) down_url = song_info["data"][0]["url"] size = song_info["data"][0]["size"] #print down_url,size down_song(id,down_url,song_title,size)
参考:
https://blog.csdn.net/qq_38282706/article/details/80251666
https://github.com/Jack-Cherish/python-spider/blob/master/Netease/Netease.py