虾米网音乐真实地址解析
最近经常上虾米听歌,有些歌蛮好听的,昨天回上海准备下载一些音乐路上听,发现要用虾币购买,第一想法在chrome浏览器中按下F12,看Network中发出的报文,很轻松的找到了类似http://f3.xiami.net/78926/417559/08%201769939716_1875663.mp3这样的链接,这就是音乐的真实地址,可以直接下载下来。这里多说一句,很多人问怎么可以把在线的视频或者音乐下载到本地,网上也可以看到各式各样的回答,有用嗅探工具的,有从浏览器缓存找的,其实用chrome或者其他浏览器自带的抓包功能就很容易就能找到。
上面是最简单的方法,但是需要很多手工操作,下面用程序的方式来解析,更重要的是提供一个这类问题的思路。
首先来分析一下这首歌,地址是http://www.xiami.com/song/1769939716 从网页内容可以看到歌曲名字Rainbow Trees,演唱者 Robert de Boron,所属专辑 Diaspora,打开网页源代码注意到一些数字 1769939716,417599,78926.回头看看mp3的真实地址http://f3.xiami.net/78926/417559/08%201769939716_1875663.mp3,1769939716是歌曲ID,417599是所属专辑ID,78926是演唱者ID,发现这个url的构成 http://f3.xiami.net/演唱者ID/所属专辑ID/08%20歌曲ID_18655663.mp3.
这里还差一些东西08是什么?18655663是什么?%20我们知道是空格符,回到专辑页面http://www.xiami.com/album/417559发现这首歌Rainbow Trees是第八首歌,那18655663是什么?翻遍了chrome发出的所有报文,所有相关页面的源代码,没找到这个数字是什么意思。没办法,网上找了个反编译swf的软件,反编译了播放器的源代码,找到一些源代码
下面的代码看起来像是获取歌曲位置的代码,再继续找到getLocation方法
var dataStr:* = evt.target.data; dataStr = dataStr.replace(" xmlns=\"http://xspf.org/ns/0/\"", ""); var xmlData:* = new XML(dataStr); xmlData.ignoreWhitespace = true; uid = xmlData.uid; clearList = xmlData.clearlist; var songArr:* = xmlData.trackList.track; var tLoadArr:* = []; var backgroundStr:* = ""; var firstSongId:* = 0; var addSongTmpArr:* = []; var oldDataArr:* = []; if (songArr[0] != undefined){ for (i in songArr) { tData = songArr[i]; songLocation = ""; thisLocation = tData.location; if (thisLocation.indexOf("http://") < 0){ try { songLocation = locationDec.getLocation(tData.location); } catch(e) { }; } else { songLocation = thisLocation; };
以下是getLocation方法
public function getLocation(_arg1:String):String{ var _local10:*; var _local2:* = Number(_arg1.charAt(0)); var _local3:* = _arg1.substring(1); var _local4:* = Math.floor((_local3.length / _local2)); var _local5:* = (_local3.length % _local2); var _local6:* = new Array(); var _local7:* = 0; while (_local7 < _local5) { if (_local6[_local7] == undefined){ _local6[_local7] = ""; }; _local6[_local7] = _local3.substr(((_local4 + 1) * _local7), (_local4 + 1)); _local7++; }; _local7 = _local5; while (_local7 < _local2) { _local6[_local7] = _local3.substr(((_local4 * (_local7 - _local5)) + ((_local4 + 1) * _local5)), _local4); _local7++; }; var _local8:* = ""; _local7 = 0; while (_local7 < _local6[0].length) { _local10 = 0; while (_local10 < _local6.length) { _local8 = (_local8 + _local6[_local10].charAt(_local7)); _local10++; }; _local7++; }; _local8 = unescape(_local8); var _local9:* = ""; _local7 = 0; while (_local7 < _local8.length) { if (_local8.charAt(_local7) == "^"){ _local9 = (_local9 + "0"); } else { _local9 = (_local9 + _local8.charAt(_local7)); }; _local7++; }; _local9 = _local9.replace("+", " "); return (_local9); }
这些代码看起来非常像获取地址的关键代码,沿着标黑的代码往上找到一个xml文件,并且这个xml文件里面应该有location这个标签,这时候找到这个xml文件很关键,这时候回到浏览器重新抓包,找到了这样一个链接http://www.xiami.com/song/playlist/id/1769939716(歌曲ID)/object_name/default/object_id/0。内容如下
<?xml version="1.0" encoding="utf-8"?> <playlist version="1" xmlns="http://xspf.org/ns/0/"> <trackList> <track> <title><![CDATA[Rainbow Trees]]></title> <song_id>1769939716</song_id> <album_id>417559</album_id> <album_name><![CDATA[Diaspora]]></album_name> <object_id>1</object_id> <object_name>default</object_name> <insert_type>1</insert_type> <background>http://img.xiami.com/res/player/bimg/bg-5.bak.jpg</background> <grade>-1</grade> <artist><![CDATA[Robert de Boron]]></artist> <location>4h%2Fxit7645F8219186pt3Ffi.%8%19%%%736733tA%3an2927%52569_5.p%2.meF2F52E5E9716m</location> <ms></ms> <lyric>http://www.xiami.com/song/lyrictxt/id/1769939716</lyric> <pic>http://img.xiami.com/images/album/img26/78926/4175591312340942_1.jpg</pic> </track> </trackList> <uid>12390378</uid> <type>default</type> <type_id>1</type_id> <clearlist></clearlist> </playlist>
里面找到了我想要的location标签中的内容。拿到源代码和location参数后就明白了,4h%2Fxit7645F8219186pt3Ffi.%8%19%%%736733tA%3an2927%52569_5.p%2.meF2F52E5E9716m这串字符串中,把第一个字符4拿出来,然后把剩余的字符串分为四部分,若能整除则每部分都一样长,若不能整除,则后余数个字符串少一个字符,这里拆开后为[h%2Fxit7645F8219186p, t3Ffi.%8%19%%%736733, tA%3an2927%52569_5., p%2.meF2F52E5E9716m],一共78个字符 4-78%4 = 2,因此数列为[20,20,19,19].然后从第一个字符串的第一个字符开始拼接,若把这个拆分后的字符串数组看成一个二维的字符数组,拼接方式为[0][0],[1][0],[2][0],[3][0],[4][0],[0][1],[1][1],[2][1],[3][1][4][1]... 拼完之后http%3A%2F%2Ff3.xiami.net%2F78926%2F417559%2F%5E8%252%5E1769939716_1875663.mp3,然后urldecode为http://f3.xiami.net/78926/417559/^8%2^1769939716_1875663.mp3,最后把^替换为字符0.
自己平时用java,把这段代码翻译成JAVA后。
public static String getLocation(String location) throws UnsupportedEncodingException { int _local10; int _local2 = Integer.parseInt(location.substring(0, 1)); String _local3 = location.substring(1, location.length()); double _local4 = Math.floor(_local3.length() / _local2); int _local5 = _local3.length() % _local2; String[] _local6 = new String[_local2]; int _local7 = 0; while (_local7 < _local5) { if (_local6[_local7] == null) { _local6[_local7] = ""; } _local6[_local7] = _local3.substring((((int) _local4 + 1) * _local7), (((int) _local4 + 1) * _local7) + ((int) _local4 + 1)); _local7++; } _local7 = _local5; while (_local7 < _local2) { _local6[_local7] = _local3 .substring((((int) _local4 * (_local7 - _local5)) + (((int) _local4 + 1) * _local5)), (((int) _local4 * (_local7 - _local5)) + (((int) _local4 + 1) * _local5))+(int) _local4); _local7++; } String _local8 = ""; _local7 = 0; while (_local7 < ((String) _local6[0]).length()) { _local10 = 0; while (_local10 < _local6.length) { if (_local7 >= _local6[_local10].length()) { break; } _local8 = (_local8 + _local6[_local10].charAt(_local7)); _local10++; } _local7++; } _local8 = URLDecoder.decode(_local8, "utf8"); String _local9 = ""; _local7 = 0; while (_local7 < _local8.length()) { if (_local8.charAt(_local7) == '^'){ _local9 = (_local9 + "0"); } else { _local9 = (_local9 + _local8.charAt(_local7)); }; _local7++; } _local9 = _local9.replace("+", " "); return _local9; }
把location标签中的内容作为输入,输出结果就是我想要的mp3真实地址了。
这里我提供以下我处理这类问题的思路,适用于视频真实地址,音乐真实地址的解析。首先是浏览器抓包,一般这种方式可以直接拿到真实地址,但是如果要做一个程序自动去抓这样还不行,需要知道这个地址是怎么生成的,比如土豆视频,通过一个请求获取一个xml,xml中就有视频地址,这种最简单。比如优酷的直接通过抓包看不出来是怎么算出来真实地址的,这时候需要反编译flash,然后把flash中的代码翻译成你自己想要的语言。