PHP获取网页内容的几种方法

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

方法1: 用file_get_contents以get方式获取内容
 
<?php
 
$url='http://www.domain.com/?para=123';
 
$html= file_get_contents($url);
 
echo$html;
 
?>
 
方法2：用file_get_contents函数,以post方式获取url
 
<?php
 
$url= 'http://www.domain.com/test.php?id=123';
 
$data= array('foo'=> 'bar');
 
$data= http_build_query($data);
 
$opts= array(
 
'http'=> array(
 
   'method'=> 'POST',
 
   'header'=>"Content-type: application/x-www-form-urlencoded\r\n"  .
 
                     "Content-Length: "  . strlen($data) . "\r\n",
 
   'content'=> $data
 
)
 
);
 
$ctx= stream_context_create($opts);
 
$html= @file_get_contents($url,'',$ctx);
 
如果需要再传递cookie数据,则把
 
'header'=>"Content-type: application/x-www-form-urlencoded\r\n"  .
 
                  "Content-Length: "  . strlen($data) . "\r\n",
 
修改为
 
'header'=>"Content-type: application/x-www-form-urlencoded\r\n"  .
 
                 "Content-Length: "  . strlen($data) . "\r\n".
 
                 "cookie:cookie1=c1;cookie2=c2\r\n";
 
即可
 
方法3: 用fopen打开url, 以get方式获取内容
 
<?php
 
$fp= fopen($url,'r');
 
$header= stream_get_meta_data($fp);//获取报头信息
 
while(!feof($fp)) {
 
$result.= fgets($fp, 1024);
 
}
 
echo"url header: {$header} <br>":
 
echo"url body: $result";
 
fclose($fp);
 
?>
 
方法4: 用fopen打开url, 以post方式获取内容
 
<?php
 
$data= array('foo2'=> 'bar2','foo3'=>'bar3');
 
$data= http_build_query($data);
 
$opts= array(
 
'http'=> array(
 
'method'=> 'POST',
 
'header'=>"Content-type: application/x-www-form-urlencoded\r\nCookie:cook1=c3;cook2=c4\r\n"  .
 
"Content-Length: "  . strlen($data) . "\r\n",
 
'content'=> $data
 
)
 
);
 
$context= stream_context_create($opts);
 
$html= fopen('http://www.test.com/zzzz.php?id=i3&id2=i4','rb',false, $context);
 
$w=fread($html,1024);
 
echo$w;
 
?>
 
方法5：用fsockopen函数打开url，以get方式获取完整的数据，包括header和body
 
<?php
 
functionget_url ($url,$cookie=false)
 
{
 
$url= parse_url($url);
 
$query= $url[path]."?".$url[query];
 
echo"Query:".$query;
 
$fp= fsockopen($url[host],$url[port]?$url[port]:80 , $errno,$errstr, 30);
 
if(!$fp) {
 
returnfalse;
 
}else{
 
$request= "GET $query HTTP/1.1\r\n";
 
$request.= "Host: $url[host]\r\n";
 
$request.= "Connection: Close\r\n";
 
if($cookie)$request.="Cookie:   $cookie\n";
 
$request.="\r\n";
 
fwrite($fp,$request);
 
while(!@feof($fp)) {
 
$result.= @fgets($fp, 1024);
 
}
 
fclose($fp);
 
return$result;
 
}
 
}
 
//获取url的html部分，去掉header
 
functionGetUrlHTML($url,$cookie=false)
 
{
 
$rowdata= get_url($url,$cookie);
 
if($rowdata)
 
{
 
$body=stristr($rowdata,"\r\n\r\n");
 
$body=substr($body,4,strlen($body));
 
return$body;
 
}
 
   returnfalse;
 
}
 
?>
 
方法6：用fsockopen函数打开url，以POST方式获取完整的数据，包括header和body
 
<?php
 
functionHTTP_Post($URL,$data,$cookie,$referrer="")
 
{
 
   // parsing the given URL
 
$URL_Info=parse_url($URL);
 
   // Building referrer
 
if($referrer=="")// if not given use this script as referrer
 
$referrer="111";
 
   // making string from $data
 
foreach($dataas
$key=>$value)
 
$values[]="$key=".urlencode($value);
 
$data_string=implode("&",$values);
 
   // Find out which port is needed - if not given use standard (=80)
 
if(!isset($URL_Info["port"]))
 
$URL_Info["port"]=80;
 
   // building POST-request:
 
$request.="POST ".$URL_Info["path"]." HTTP/1.1\n";
 
$request.="Host: ".$URL_Info["host"]."\n";
 
$request.="Referer: $referer\n";
 
$request.="Content-type: application/x-www-form-urlencoded\n";
 
$request.="Content-length: ".strlen($data_string)."\n";
 
$request.="Connection: close\n";
 
   $request.="Cookie:   $cookie\n";
 
   $request.="\n";
 
$request.=$data_string."\n";
 
   $fp= fsockopen($URL_Info["host"],$URL_Info["port"]);
 
fputs($fp,$request);
 
while(!feof($fp)) {
 
$result.= fgets($fp, 1024);
 
}
 
fclose($fp);
 
   return$result;
 
}
 
?>
 
方法7:使用curl库，使用curl库之前，可能需要查看一下php.ini是否已经打开了curl扩展
 
<?php
 
$ch= curl_init();
 
$timeout= 5;
 
curl_setopt ($ch, CURLOPT_URL, 'http://www.domain.com/');
 
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
 
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
 
$file_contents= curl_exec($ch);
 
curl_close($ch);
 
echo$file_contents;
 
?>

这里收集了3种利用php获得网页源代码抓取网页内容的方法，我们可以根据实际需要选用。

1、使用file_get_contents获得网页源代码

这个方法最常用，只需要两行代码即可，非常简单方便。

参考代码：

<?php
$fh= file_get_contents('http://www.webkaka.com/');
echo $fh;
?>

2、使用fopen获得网页源代码

这个方法用的人也不少，不过代码有点多。

参考代码：

<?php
$fh = fopen('http://www.webkaka.com/', 'r');
if($fh){
while(!feof($fh)) {
echo fgets($fh);
}
}
?>

3、使用curl获得网页源代码

使用curl获得网页源代码的做法，往往是需要更高要求的人使用，例如当你需要在抓取网页内容的同时，得到网页header信息，还有ENCODING编码的使用，USERAGENT的使用等等。

参考代码一：

<?php
// 创建一个新cURL资源
$ch = curl_init();
// 设置URL和相应的选项
curl_setopt($ch, CURLOPT_URL, "http://www.webkaka.com/");
curl_setopt($ch, CURLOPT_HEADER, false);
// 抓取URL并把它传递给浏览器
$data = curl_exec($ ch);
echo $data;
//关闭cURL资源，并且释放系统资源
curl_close($ch);
?>

参考代码二：

<?php
$szUrl = "http://www.webkaka.com/";
$UserAgent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506; .NET CLR 3.5.21022; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$curl = curl_init();
curl_setopt( $curl, CURLOPT_URL,$ szUrl);
curl_setopt($curl, CURLOPT_HEADER, 0); //0表示不输出Header，1表示输出
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_ENCODING, '');
curl_setopt( $curl, CURLOPT_USERAGENT,$ UserAgent);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ curl);
echo $data;
//echo curl_errno($curl); //返回0时表示程序执行成功如何从curl_errno返回值获取错误信息

posted @ 2018-04-17 15:01 brady-wang 阅读(26217) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· TypeScript + Deepseek 打造卜卦网站：技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗？
· 【译】Visual Studio 中新的强大生产力特性
· 10年+ .NET Coder 心语 ── 封装的思维：从隐藏、稳定开始理解其本质意义
· 【设计模式】告别冗长if-else语句：使用策略模式优化代码结构

历史上的今天：
2015-04-17 等比缩放图片大小

公告

声明：现大部分文章为寻找问题时在网上相互转载，在此博客中做个记录，方便自己也方便有类似问题的朋友，故原出处已不好查到，如有侵权，请私信表明文章和原出处地址进行删除,谢谢。

烟雨唱扬州 - 李殊

00:00 / 00:00

An audio error has occurred, player will skip forward in 2 seconds.

1 后来刘若英
2 往后余生王贰浪
3 我的一个道姑朋友洛尘鞅
4 拂雪不才
5 大田后生仔王玉萌
6 浪子回头王玉萌
7 素颜许嵩,何曼婷
8 眉间雪晴愔
9 怨苍天变了心方季惟
10 巧解姻缘天作合陈倩倩
11 烟雨唱扬州李殊
12 恋愛サーキュレーション花澤香菜
13 归雪凉松羽,世狼
14 여자이니까 kiss

昵称： brady-wang
园龄： 10年
粉丝： 78
关注： 20

+加关注

2025年3月

日

一

二

三

四

五

六

风行天下

天地不仁以万物为刍狗

PHP获取网页内容的几种方法

公告

搜索

最新随笔

我的标签

积分与排名

随笔档案 (1974)

常用地址

阅读排行榜

评论排行榜

推荐排行榜

最新评论