获取网页内容时的乱码问题
现在总结有两个原因:
1.编码问题。
解决:
<?php $url = "http://news.ef360.com/Articles/2013-3-8/299954.html"; $contents=file_get_contents($url); $contents=iconv("GBK", "UTF-8//IGNORE", $contents); echo $contents; ?>
2.目标页面开了Gzip
解决:@curl获取时
<?php function curl_get($url, $gzip=false){ $curl = curl_init($url); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10); if($gzip) curl_setopt($curl, CURLOPT_ENCODING, "gzip"); // 关键在这里 $content = curl_exec($curl); curl_close($curl); return $content; } ?>
@file_get_contents获取时
<?php file_get_contents("compress.zlib://".$url); ?>