Java那些事之下载网络资源
废话不多说,今天为大家带来java中下载网络资源的方法。
Java中的下载,通常是使用HttpURLConnection类,这个类的功能也很强大,接下来让我们看看如何使用这个类吧~
首先先介绍一下,本文讲到的下载资源分为两种:html源代码以及网络上的资源文件,他们的下载方式还是有一点区别的。
1 下载html源代码
首先说一下URL请求的两种方式:Get和Post
Post与Get的不同之处在于post的参数不是放在URL字串里面,而是放在http请求的正文内。
下图为本人使用firebug查看某网站post请求的截图
从中可以看到,post参数包含5部分,拼接成字符串后为q=%E4%B8%AD%E6%96%87&t=dict&ut=default&ulang=ZH-CN&tlang=EN-US。该串就是要放在http请求正文内,准备发送给服务器进行请求的post信息。
下面进入正题:
Url的请求连接(Get方式)
2 connection = getValidConnection();
3
4 /**
5 * Validate the if the connection is OK...
6 */
7 private HttpURLConnection getValidConnection()
8 {
9
10 HttpURLConnection httpurlconnection = null;
11
12 try
13 {
14 urlconnection = pageUrl.openConnection();
15
16 if (!(urlconnection instanceof HttpURLConnection))
17 {
18 return null;
19 }
20 httpurlconnection = (HttpURLConnection) urlconnection;
21 httpurlconnection.setRequestProperty("User-Agent",CommonValues.User_Agent);
22 httpurlconnection.setRequestProperty("Accept",CommonValues.Accept);
23 httpurlconnection.setRequestProperty("Accept-Charset",CommonValues.Accept_Charset);
24 httpurlconnection.setRequestProperty("Accept-Language",CommonValues.Accept_Language);
25 httpurlconnection.setRequestProperty("Connection",CommonValues.Connection);
26 httpurlconnection.setRequestProperty("Keep-Alive",CommonValues.Keep_Alive);
27 httpurlconnection.setConnectTimeout(CommonValues.ConnectionTimeOut);
28 httpurlconnection.setReadTimeout(CommonValues.ReadTimeOut);
29
30 httpurlconnection.connect();
31
32 int responsecode = httpurlconnection.getResponseCode();
33
34 switch (responsecode)
35 {
36 // here valid codes!
37 case HttpURLConnection.HTTP_OK:
38 break;
39 default:
40 httpurlconnection.disconnect();
41 return null;
42 }
43 }
44 catch (IOException ioexception)
45 {
46
47 if (httpurlconnection != null)
48 {
49 httpurlconnection.disconnect();
50 }
51 return null;
52 }
53
54 return httpurlconnection;
55 }
Post方式,只需要在上面Get方法的connect()之前setRequestMethod并写入相应请求即可。
2 httpurlconnection.setDoInput(true);
3 httpurlconnection.setDoOutput(true);
4 StringBuilder sb = new StringBuilder();
5 sb.append("q="+keyWord);
6 sb.append("&t=para");
7 sb.append("&ut=sent");
8 sb.append("&sc=all");
9 sb.append("&ss=all");
10 sb.append("&sd=all");
11 sb.append("&ofst="+this.pageCount*10 );
12 sb.append("&tlang=EN-US");
13 sb.append("&ulang=ZH-CN");
14 //post信息
15 OutputStream os = httpurlconnection.getOutputStream();
16 OutputStreamWriter out = new OutputStreamWriter(os);
17 out.write(sb.toString());
18 out.close();
19
20
注意:
a HttpURLConnection的connect()函数,实际上只是建立了一个与服务器的tcp连接,并没有实际发送http请求。
d http请求实际上由两部分组成,一个是http头,所有关于此次http请求的配置都在http头里面定义,一个是正文content。connect()函数会根据HttpURLConnection对象的配置值生成http头部信息,因此在调用connect函数之前,
下载资源
2 // TODO Auto-generated method stub
3
4 StringBuffer content;
5 InputStream inputstream = getSafeInputStream(connection);
6 if (inputstream == null) {
7
8 return "";
9 }
10 // load the Stream in a StringBuffer
11 InputStreamReader isr = null;
12 try {
13 isr = new InputStreamReader(inputstream,this.charSet);
14 if(isr != null)
15 CheckMethods.PrintInfoMessage("InputStreamReader is created");
16 } catch (Exception e) {
17 // TODO Auto-generated catch block
18 CheckMethods.PrintInfoMessage(Thread.currentThread().getName()+ e.getMessage());
19 e.printStackTrace();
20 return "";
21 }
22 content = new StringBuffer();
23 CheckMethods.PrintInfoMessage("StringBuffer is created");
24 try {
25 char buf[] = new char[4096];
26 int cnt = 0;
27 while ((cnt = isr.read(buf, 0, 4096)) != -1) {
28 content.append(buf, 0, cnt);
29 System.out.print(".");
30 }
31 isr.close();
32 inputstream.close();
33 connection.disconnect();
34 } catch (IOException ioexception) {
35 LogHelper.WriteLog(LogHelper.logger_Error, CommonValues.Log_Detail_Level, "error", "下载时读取网页"+this.pageUrl+"内容出现异常");
36 LogHelper.WriteLog(LogHelper.logger_Downloader, CommonValues.Log_Detail_Level, "error", "下载时读取网页"+this.pageUrl+"内容出现异常");
37 CheckMethods.PrintInfoMessage(Thread.currentThread().getName()+" 下载时读取网页"+this.pageUrl+"内容出现异常");
38 try {
39 isr.close();
40 inputstream.close();
41 } catch (IOException e) {
42 // TODO Auto-generated catch block
43 e.printStackTrace();
44 CheckMethods.PrintInfoMessage(Thread.currentThread().getName()+" exception occurred after reading io exception when inputstream and isr is closing ");
45 LogHelper.WriteLog(LogHelper.logger_Error, CommonValues.Log_Detail_Level, "fatal", "exception occurred after reading io exception when inputstream and isr is closing");
46 connection.disconnect();
47 return "";
48 }
49 connection.disconnect();
50 return "";
51 }
52 return content.toString();
53 }
2 {
3 InputStream inputstream = null;
4 for (int i = 0; i < 3; ) {
5 try {
6 inputstream = connection.getInputStream();
7 break;
8 } catch (IOException ioexception1) {
9 i++;
10 }
11 }
12 return inputstream;
13 }
当然如果你要完成一个爬虫,又怕自己的ip地址被封的话。。。。。代理吧
设置代理的方法如下:
Proxy proxy = new Proxy(Proxy.Type.HTTP, proxyAddress);
urlconnection = pageUrl.openConnection(proxy);
urlconnection = pageUrl.openConnection();
下载资源文件的方式则有些不同~
2 FileOutputStream fos = null;
3 BufferedInputStream bis = null;
4 HttpURLConnection httpUrl = null;
5 URL url = null;
6 byte[] buf = new byte[8096];
7 int size = 0;
8
9 //建立链接
10 url = new URL(destUrl);
11 httpUrl = (HttpURLConnection) url.openConnection();
12 //连接指定的资源
13 httpUrl.connect();
14 //获取网络输入流
15 bis = new BufferedInputStream(httpUrl.getInputStream());
16 //建立文件
17 fos = new FileOutputStream(fileName);
18 System.out.println("正在获取链接[" + destUrl + "]的内容...\n将其保存为文件[" +
19 fileName + "]");
20 //保存文件
21 while ((size = bis.read(buf)) != -1)
22 fos.write(buf, 0, size);
23
24 fos.close();
25 bis.close();
26 httpUrl.disconnect();
27 }
好了 先说这么多吧 大家可以去尝试一下~