经常听VOK(Voice of Korea),就是感觉一个个点链接非常不方便
所以准备做个爬虫自动爬VOK的中文广播,然后利用开源工具自动合成MP3
data中的参数经过encodeURI转码
转码前的数据为:
{"databaseName":"cbc","tableName1":"pd","tableName2":"pro","FIXTITLEID":"34","CHANNEL":"4"}
{"databaseName":"cbc","tableName1":"pd","tableName2":"pro","FIXTITLEID":"13","CHANNEL":"4","LIMIT":"2"}
FIXTITLEID是新闻编号
CHANNEL代表语言,4是中文
下面的CURL指令是获取当前新闻内容的
curl "http://www.vok.rep.kp/model/view.php?data={"%"22databaseName"%"22:"%"22cbc"%"22,"%"22tableName1"%"22:"%"22pd"%"22,"%"22tableName2"%"22:"%"22pro"%"22,"%"22FIXTITLEID"%"22:"%"2234"%"22,"%"22CHANNEL"%"22:"%"224"%"22}" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0" -H "Accept: application/json, text/javascript, */*; q=0.01" -H "Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2" --compressed -H "X-Requested-With: XMLHttpRequest" -H "Connection: keep-alive" -H "Referer: http://www.vok.rep.kp/index.php?CHANNEL=4&lang=" -H "Cookie: PHPSESSID=mh0uh8gdhbh68gf3v7f59cdmc3"
curl "http://www.vok.rep.kp/model/viewlimit.php?data={"%"22databaseName"%"22:"%"22cbc"%"22,"%"22tableName1"%"22:"%"22pd"%"22,"%"22tableName2"%"22:"%"22pro"%"22,"%"22FIXTITLEID"%"22:"%"2213"%"22,"%"22CHANNEL"%"22:"%"224"%"22,"%"22LIMIT"%"22:"%"222"%"22}" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0" -H "Accept: application/json, text/javascript, */*; q=0.01" -H "Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2" --compressed -H "X-Requested-With: XMLHttpRequest" -H "Connection: keep-alive" -H "Referer: http://www.vok.rep.kp/index.php?CHANNEL=4&lang=" -H "Cookie: PHPSESSID=mh0uh8gdhbh68gf3v7f59cdmc3"
返回数据为:
{"JSON":[{"EDATE":"2019-10-06 00:00:00","FIXTITLEID":"13","ORD":"0","REGID":"ice190906039","TITLE":"조중친선의 잊지 못할 자욱들","CHANNEL":"4","FLAG":"0","PDID":"ice190906039","CONTENT":"","CONTENTKIND":"4","FTITLE":"中朝友谊的光辉历史篇章","CTITLE":"中朝友谊的光辉历史篇章","LTITLE":"中朝友谊的光辉历史篇章","KINDID":"50"},{"EDATE":"2019-10-06 00:00:00","FIXTITLEID":"13","ORD":"1","REGID":"ice190809024","TITLE":"조선의 현실을 보다\n-제2회-","CHANNEL":"4","FLAG":"0","PDID":"ice190809024","CONTENT":"","CONTENTKIND":"10","FTITLE":"看到了朝鲜现实(第二回)","CTITLE":"看到了朝鲜现实(第二回)","LTITLE":"看到了朝鲜现实(第二回)","KINDID":"60"}],"预览":{"HTML_PREVIEW":{"responseContent":{"content":{"mimeType":"text/html; charset=UTF-8","text":" [{\"EDATE\":\"2019-10-06 00:00:00\",\"FIXTITLEID\":\"13\",\"ORD\":\"0\",\"REGID\":\"ice190906039\",\"TITLE\":\"\\uc870\\uc911\\uce5c\\uc120\\uc758 \\uc78a\\uc9c0 \\ubabb\\ud560 \\uc790\\uc6b1\\ub4e4\",\"CHANNEL\":\"4\",\"FLAG\":\"0\",\"PDID\":\"ice190906039\",\"CONTENT\":\"\",\"CONTENTKIND\":\"4\",\"FTITLE\":\"\\u4e2d\\u671d\\u53cb\\u8c0a\\u7684\\u5149\\u8f89\\u5386\\u53f2\\u7bc7\\u7ae0\",\"CTITLE\":\"\\u4e2d\\u671d\\u53cb\\u8c0a\\u7684\\u5149\\u8f89\\u5386\\u53f2\\u7bc7\\u7ae0\",\"LTITLE\":\"\\u4e2d\\u671d\\u53cb\\u8c0a\\u7684\\u5149\\u8f89\\u5386\\u53f2\\u7bc7\\u7ae0\",\"KINDID\":\"50\"},{\"EDATE\":\"2019-10-06 00:00:00\",\"FIXTITLEID\":\"13\",\"ORD\":\"1\",\"REGID\":\"ice190809024\",\"TITLE\":\"\\uc870\\uc120\\uc758 \\ud604\\uc2e4\\uc744 \\ubcf4\\ub2e4\\n-\\uc81c2\\ud68c-\",\"CHANNEL\":\"4\",\"FLAG\":\"0\",\"PDID\":\"ice190809024\",\"CONTENT\":\"\",\"CONTENTKIND\":\"10\",\"FTITLE\":\"\\u770b\\u5230\\u4e86\\u671d\\u9c9c\\u73b0\\u5b9e\\uff08\\u7b2c\\u4e8c\\u56de\\uff09\",\"CTITLE\":\"\\u770b\\u5230\\u4e86\\u671d\\u9c9c\\u73b0\\u5b9e\\uff08\\u7b2c\\u4e8c\\u56de\\uff09\",\"LTITLE\":\"\\u770b\\u5230\\u4e86\\u671d\\u9c9c\\u73b0\\u5b9e\\uff08\\u7b2c\\u4e8c\\u56de\\uff09\",\"KINDID\":\"60\"}]","size":1004,"transferredSize":1351},"contentDiscarded":false,"from":"server1.conn0.netEvent3563"}}},"响应载荷(payload)":{"EDITOR_CONFIG":{"text":" [{\"EDATE\":\"2019-10-06 00:00:00\",\"FIXTITLEID\":\"13\",\"ORD\":\"0\",\"REGID\":\"ice190906039\",\"TITLE\":\"\\uc870\\uc911\\uce5c\\uc120\\uc758 \\uc78a\\uc9c0 \\ubabb\\ud560 \\uc790\\uc6b1\\ub4e4\",\"CHANNEL\":\"4\",\"FLAG\":\"0\",\"PDID\":\"ice190906039\",\"CONTENT\":\"\",\"CONTENTKIND\":\"4\",\"FTITLE\":\"\\u4e2d\\u671d\\u53cb\\u8c0a\\u7684\\u5149\\u8f89\\u5386\\u53f2\\u7bc7\\u7ae0\",\"CTITLE\":\"\\u4e2d\\u671d\\u53cb\\u8c0a\\u7684\\u5149\\u8f89\\u5386\\u53f2\\u7bc7\\u7ae0\",\"LTITLE\":\"\\u4e2d\\u671d\\u53cb\\u8c0a\\u7684\\u5149\\u8f89\\u5386\\u53f2\\u7bc7\\u7ae0\",\"KINDID\":\"50\"},{\"EDATE\":\"2019-10-06 00:00:00\",\"FIXTITLEID\":\"13\",\"ORD\":\"1\",\"REGID\":\"ice190809024\",\"TITLE\":\"\\uc870\\uc120\\uc758 \\ud604\\uc2e4\\uc744 \\ubcf4\\ub2e4\\n-\\uc81c2\\ud68c-\",\"CHANNEL\":\"4\",\"FLAG\":\"0\",\"PDID\":\"ice190809024\",\"CONTENT\":\"\",\"CONTENTKIND\":\"10\",\"FTITLE\":\"\\u770b\\u5230\\u4e86\\u671d\\u9c9c\\u73b0\\u5b9e\\uff08\\u7b2c\\u4e8c\\u56de\\uff09\",\"CTITLE\":\"\\u770b\\u5230\\u4e86\\u671d\\u9c9c\\u73b0\\u5b9e\\uff08\\u7b2c\\u4e8c\\u56de\\uff09\",\"LTITLE\":\"\\u770b\\u5230\\u4e86\\u671d\\u9c9c\\u73b0\\u5b9e\\uff08\\u7b2c\\u4e8c\\u56de\\uff09\",\"KINDID\":\"60\"}]","mode":"application/json"}}}
其中的:REGID是MP3文件名称
然后将下面的URL中的地址替换掉就可以了
http://175.45.176.83/vod/media/cbc_pddata/cbc_ice190906039/ice190906039.mp3
注意:经过实测,VOK网站中有bug,部分MP3文件无法下载。
最近混合APP研究的还可以,有空弄个VOK的安卓app出来看看。