PHP爬取itunes页面信息

 

实验目的:通过PHP curl爬取某个package 的itunes页面数据,格式化输出。

 

实验步骤:构造curl (demo写于laravel,不同框架的自行更改)

 

//$url是一条合法的ITunes链接
public static function creativeGet($url,$post_data=false,$ignore_ssl=true, $dataType='text')
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, 'Chrome 42.0.2311.135 Pentamob');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);

if($ignore_ssl){
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); //信任任何证书
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0); // 检查证书中是否设置域名,0不验证
}

$proxy = config('app.proxy'); //我的例子是'proxy' => ['host' => '127.0.0.1', 'port' => '1080', 'type' => CURLPROXY_SOCKS5],
    if($proxy){
curl_setopt($curl, CURLOPT_HTTPPROXYTUNNEL, true);
curl_setopt($curl, CURLOPT_PROXYAUTH, CURLAUTH_BASIC);
curl_setopt($curl, CURLOPT_PROXYTYPE, $proxy['type']);
curl_setopt($curl, CURLOPT_PROXY, $proxy['host']);
curl_setopt($curl, CURLOPT_PROXYPORT, $proxy['port']);

}

if($post_data){
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $post_data);
}

$data = curl_exec($curl);
$status = curl_getinfo($curl);
$error_info = [ //组装错误信息
'error_no' => curl_errno($curl),
'error_info' => curl_getinfo($curl),
'error_msg' => curl_error($curl),
'result' => $data
];

curl_close($curl);
if (isset($status[ 'http_code' ]) && $status[ 'http_code' ] == 200) {
if ($dataType == 'json') {
$data = json_decode($data, true);
}
return $data;
} else {
return $error_info;
}
}

构造格式化输出文件:

  public static function dealiTune($package_name = '')
  {
    //我测试的包名是 297606951
      if (!$package_name) {
   return 'package_name is not null';
   }

   $url = 'https://itunes.apple.com/us/lookup?id='.$package_name;
   $html_doc = self::creativeGet($url);
   $html_json_data = json_decode($html_doc, true);

   $result = [];
   if ($html_json_data['resultCount'] < 1) {
   return 'This '.$package_name.'name was not found';
   }
   $result['name'] = $html_json_data['results'][0]['trackCensoredName'];
   $result['icon'] = $html_json_data['results'][0]['artworkUrl100'];
   $result['description'] = $html_json_data['results'][0]['description'];
   $result['min_os_vs'] = $html_json_data['results'][0]['minimumOsVersion'];
   $result['category'] = $html_json_data['results'][0]['primaryGenreName'];

   return response()->json($result); //laravel的response()->json
  }
实验结果:

  {
    "offer_name": "Amazon - Shopping made easy",
    "icon": "https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/43/5e/87/435e87d8-d948-1678-7027-21f1570a1b41/source/100x100bb.jpg",
    "des": "International Shopping \nBrowse.............",
    "min_os_vs": "9.0",
    "category": "Shopping"
  }


写在最后:这样就可以简单的通过curl 抓取iTunes上APP的数据,下一篇将会实现goole store的抓取。

注明* 如转载请务必注明来源
posted @ 2018-08-22 17:51  一七令  阅读(388)  评论(0编辑  收藏  举报