PHP爬取itunes页面信息
实验目的:通过PHP curl爬取某个package 的itunes页面数据,格式化输出。
实验步骤:构造curl (demo写于laravel,不同框架的自行更改)
//$url是一条合法的ITunes链接
public static function creativeGet($url,$post_data=false,$ignore_ssl=true, $dataType='text')
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, 'Chrome 42.0.2311.135 Pentamob');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
if($ignore_ssl){
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); //信任任何证书
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0); // 检查证书中是否设置域名,0不验证
}
$proxy = config('app.proxy'); //我的例子是'proxy' => ['host' => '127.0.0.1', 'port' => '1080', 'type' => CURLPROXY_SOCKS5],
if($proxy){
curl_setopt($curl, CURLOPT_HTTPPROXYTUNNEL, true);
curl_setopt($curl, CURLOPT_PROXYAUTH, CURLAUTH_BASIC);
curl_setopt($curl, CURLOPT_PROXYTYPE, $proxy['type']);
curl_setopt($curl, CURLOPT_PROXY, $proxy['host']);
curl_setopt($curl, CURLOPT_PROXYPORT, $proxy['port']);
}
if($post_data){
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $post_data);
}
$data = curl_exec($curl);
$status = curl_getinfo($curl);
$error_info = [ //组装错误信息
'error_no' => curl_errno($curl),
'error_info' => curl_getinfo($curl),
'error_msg' => curl_error($curl),
'result' => $data
];
curl_close($curl);
if (isset($status[ 'http_code' ]) && $status[ 'http_code' ] == 200) {
if ($dataType == 'json') {
$data = json_decode($data, true);
}
return $data;
} else {
return $error_info;
}
}
构造格式化输出文件:
public static function dealiTune($package_name = '')
{
//我测试的包名是 297606951
if (!$package_name) {
return 'package_name is not null';
}
$url = 'https://itunes.apple.com/us/lookup?id='.$package_name;
$html_doc = self::creativeGet($url);
$html_json_data = json_decode($html_doc, true);
$result = [];
if ($html_json_data['resultCount'] < 1) {
return 'This '.$package_name.'name was not found';
}
$result['name'] = $html_json_data['results'][0]['trackCensoredName'];
$result['icon'] = $html_json_data['results'][0]['artworkUrl100'];
$result['description'] = $html_json_data['results'][0]['description'];
$result['min_os_vs'] = $html_json_data['results'][0]['minimumOsVersion'];
$result['category'] = $html_json_data['results'][0]['primaryGenreName'];
return response()->json($result); //laravel的response()->json
}
实验结果:
{
"offer_name": "Amazon - Shopping made easy",
"icon": "https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/43/5e/87/435e87d8-d948-1678-7027-21f1570a1b41/source/100x100bb.jpg",
"des": "International Shopping \nBrowse.............",
"min_os_vs": "9.0",
"category": "Shopping"
}
写在最后:这样就可以简单的通过curl 抓取iTunes上APP的数据,下一篇将会实现goole store的抓取。
注明* 如转载请务必注明来源