PHP采集淘宝商品
项目需求:
1.通过PHP程序更新所采集淘宝商品的价格以及是否停售
数据表:
CREATE TABLE `goods` ( `id` int(11) NOT NULL AUTO_INCREMENT , `type` varchar(30) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' , `keyid` varchar(60) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' , `shop_id` int(11) NULL DEFAULT 0 , `cid` smallint(6) NULL DEFAULT 0 , `img_id` int(11) NULL DEFAULT 0 , `imgs` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL , `name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' , `url` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' , `taoke_url` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' , `price` decimal(10,2) NULL DEFAULT 0.00 , `sellerid` int(11) UNSIGNED NULL DEFAULT NULL , `is_off_sale` tinyint(1) UNSIGNED NULL DEFAULT 0 , `delist_time` int(11) NULL DEFAULT 0 , `create_time` int(11) NULL DEFAULT 0 , `ctime` int(11) NULL DEFAULT NULL , `cache_data` text CHARACTER SET utf8 COLLATE utf8_general_ci NULL , `create_day` int(11) NULL DEFAULT 0 , `commission` decimal(10,2) NULL DEFAULT 0.00 , `comment_collect_time` int(11) NULL DEFAULT 0 , `color` smallint(6) NULL DEFAULT 0 , `content` text CHARACTER SET utf8 COLLATE utf8_general_ci NULL , `taobao_desc_url` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '\'\'' COMMENT '淘宝商品详细介绍地址' , PRIMARY KEY (`id`), UNIQUE INDEX `keyid` (`keyid`) USING BTREE , INDEX `shop_id` (`shop_id`) USING BTREE , INDEX `delist_time` (`delist_time`) USING BTREE , INDEX `cid` (`cid`) USING BTREE , INDEX `create_day` (`create_day`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARACTER SET=utf8 COLLATE=utf8_general_ci AUTO_INCREMENT=40614 ROW_FORMAT=COMPACT ;
PHP文件:
<?php $con=mysql_connect('localhost','root',''); mysql_select_db("fanwe", $con); $start_time=microtime(1); $sql="SELECT id,price,url FROM s_goods WHERE url!='' AND keyid like 'taobao%' AND is_off_sale=0 ORDER BY id LIMIT 10";//更新前10个 $rs=mysql_query($sql); echo 'COUNT:'.mysql_num_rows($rs)."\n"; $error=array(); $i=0; $h=fopen('d:/mydomain/updateprice.log','a+'); while ($r=mysql_fetch_array($rs)){ $s=microtime(1); echo $i++.':'.$r['id']."\t"; $price=getPrice($r['url']); $t=microtime(1)-$s; echo (ceil($t*1000)/1000)."\t"; if($price===false) echo 'FALSE'; elseif(bccomp($price,$r['price'])==0) echo 'EQUAL'; else{ echo "UPDATE\t".$r['price']."\t".$price; mysql_query("UPDATE s_goods SET price=".$price." WHERE id=".$r['id']); fputcsv($h,array(date('Y-m-d H:i:s'),$r['id'],$r['price'],$price)); } echo "\n"; } fclose($h); echo 'COUNT:'.mysql_num_rows($rs)."\tTIME:".ceil(microtime(1)-$start_time); function getPrice($url,$time=1){ $des_url=''; $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0'); curl_setopt($ch, CURLOPT_REFERER,'http://www.tmall.com/'); curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);//设置输出方式, 0为自动输出返回的内容, 1为返回输出的内容,但不自动输出. curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); //timeout on connect curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout on response curl_setopt($ch, CURLOPT_HEADER, 1);//是否输出头信息,0为不输出,非零则输出 curl_setopt($ch, CURLOPT_MAXREDIRS, 10 ); curl_setopt($ch, CURLOPT_URL, $url); $content = curl_exec($ch); curl_close($ch); if(preg_match("/'reservePrice'\s*:\s*'([\d\.]+?)',/",$content,$price)){ $price = (float)$price[1]; }elseif(preg_match('/price:([\d\.]+?),/',$content,$price)){ $price = (float)$price[1]; } if(!$price&&preg_match('/tmall/',$url)){//天猫促销价 add LiuYang 2014-02-24 15:10 preg_match('/id=(\d+)+/',$url,$temp); $url2="http://mdskip.taobao.com/core/initItemDetail.htm?itemId=".$temp[1]; $ch = curl_init(); curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" ); curl_setopt( $ch, CURLOPT_URL, $url2 ); curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true ); curl_setopt( $ch, CURLOPT_ENCODING, "" ); curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true ); curl_setopt( $ch, CURLOPT_REFERER, 'http://www.tmall.com' ); curl_setopt( $ch, CURLOPT_AUTOREFERER, true ); curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false ); curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, 10 ); curl_setopt( $ch, CURLOPT_TIMEOUT, 10 ); curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 ); $price_content = curl_exec( $ch ); $response = curl_getinfo( $ch ); curl_close ( $ch ); $price_content=json_decode(iconv('gbk','utf-8',preg_replace('/(\d{10,}):/','"${1}":',$price_content)),true); $priceinfo=$price_content['defaultModel']['itemPriceResultDO']['priceInfo']; $price=array(); foreach ($priceinfo as $v){ $price[]=$v['price']; if(is_array($v['promotionList'])){ foreach ($v['promotionList'] as $v2){ $price[]=$v2['extraPromPrice']?$v2['extraPromPrice']:$v2['price']; } } if(is_array($v['suggestivePromotionList'])){ foreach ($v['suggestivePromotionList'] as $v2){ $price[]=$v2['extraPromPrice']?$v2['extraPromPrice']:$v2['price']; } } } $price=count($price)>0?min($price):false; } if($price) return $price; elseif($time==1) return getPrice($url,2); else return false; } ?>
执行方式如果采用apache或nginx等服务器,会因为各个服务器的最大响应时间而受影响.如果只更新10个那可能会完成,如果是上百个肯定是不能完全更新的.
可以采用PHP命令的等式来执行.
D:\mydomain>php updateprice.php COUNT:10 0:36599 0.232 EQUAL 1:36600 1.091 EQUAL 2:36601 1.039 EQUAL 3:36603 1.08 EQUAL 4:36604 0.984 EQUAL 5:36605 0.972 EQUAL 6:36610 1.019 EQUAL 7:36611 0.971 EQUAL 8:36612 1.048 EQUAL 9:36613 1.149 EQUAL COUNT:10 TIME:10
如此既清晰又明了,而且会一直执行到程序完成,不会担心服务器没有响应.
上面是在windows下面执行,如果输入php提示:
'php' 不是内部或外部命令,也不是可运行的程序
或批处理文件。
就必须把php.exe所在的地址补全
D:\mydomain>D:\AppServ\php5\php updatePrice.php
或者把php.exe所在的地址加入全局变量
此方法在Linux下同样有用只用修改对应的地址即可,在linux中php命令是可以直接用的.
[root@liu ~]# php updatePrice.php
此方法有一个缺点,就是执行效率问题.一个商品采集平均需要0.8秒.那10000个商品采集完需要2个半小时.
如想继续了解,请看下篇.