Java那些事之下载网络资源

废话不多说，今天为大家带来java中下载网络资源的方法。

Java中的下载，通常是使用HttpURLConnection类，这个类的功能也很强大，接下来让我们看看如何使用这个类吧~

首先先介绍一下，本文讲到的下载资源分为两种：html源代码以及网络上的资源文件，他们的下载方式还是有一点区别的。

1 下载html源代码

首先说一下URL请求的两种方式：Get和Post

Post与Get的不同之处在于post的参数不是放在URL字串里面，而是放在http请求的正文内。

下图为本人使用firebug查看某网站post请求的截图

从中可以看到，post参数包含5部分，拼接成字符串后为q=%E4%B8%AD%E6%96%87&t=dict&ut=default&ulang=ZH-CN&tlang=EN-US。该串就是要放在http请求正文内，准备发送给服务器进行请求的post信息。

下面进入正题：

Url的请求连接(Get方式)

1 URL url = new URL(currentUrl);
2 connection = getValidConnection();
3
4 /**
5      * Validate the if the connection is OK...
6      */
7     private HttpURLConnection getValidConnection()
8     {
9
10         HttpURLConnection httpurlconnection = null;
11
12         try
13         {
14                 urlconnection = pageUrl.openConnection();
15
16             if (!(urlconnection instanceof HttpURLConnection))
17             {
18                    return null;
19             }
20             httpurlconnection = (HttpURLConnection) urlconnection;
21             httpurlconnection.setRequestProperty("User-Agent",CommonValues.User_Agent);
22             httpurlconnection.setRequestProperty("Accept",CommonValues.Accept);
23             httpurlconnection.setRequestProperty("Accept-Charset",CommonValues.Accept_Charset);
24             httpurlconnection.setRequestProperty("Accept-Language",CommonValues.Accept_Language);
25             httpurlconnection.setRequestProperty("Connection",CommonValues.Connection);
26             httpurlconnection.setRequestProperty("Keep-Alive",CommonValues.Keep_Alive);
27             httpurlconnection.setConnectTimeout(CommonValues.ConnectionTimeOut);
28             httpurlconnection.setReadTimeout(CommonValues.ReadTimeOut);
29
30             httpurlconnection.connect();
31
32             int responsecode = httpurlconnection.getResponseCode();
33
34             switch (responsecode)
35             {
36                 // here valid codes!
37                 case HttpURLConnection.HTTP_OK:
38                     break;
39                 default:
40                     httpurlconnection.disconnect();
41                     return null;
42             }
43         }
44         catch (IOException ioexception)
45         {
46
47             if (httpurlconnection != null)
48             {
49                 httpurlconnection.disconnect();
50             }
51             return null;
52         }
53
54         return httpurlconnection;
55     }

Post方式,只需要在上面Get方法的connect()之前setRequestMethod并写入相应请求即可。

1                 httpurlconnection.setRequestMethod("POST");
2                 httpurlconnection.setDoInput(true);
3                 httpurlconnection.setDoOutput(true);
4                 StringBuilder sb = new StringBuilder();
5                 sb.append("q="+keyWord);
6                 sb.append("&t=para");
7                 sb.append("&ut=sent");
8                 sb.append("&sc=all");
9                 sb.append("&ss=all");
10                 sb.append("&sd=all");
11                 sb.append("&ofst="+this.pageCount*10 );
12                 sb.append("&tlang=EN-US");
13                 sb.append("&ulang=ZH-CN");
14                 //post信息
15                 OutputStream os = httpurlconnection.getOutputStream();
16                 OutputStreamWriter out = new OutputStreamWriter(os);
17                 out.write(sb.toString());
18                 out.close();
19
20

注意：

a HttpURLConnection的connect()函数，实际上只是建立了一个与服务器的tcp连接，并没有实际发送http请求。

b 无论是post还是get，http请求实际上直到HttpURLConnection的getInputStream()这个函数里面才正式发送出去。

c 在用POST方式发送URL请求时，URL请求参数的设定顺序是重中之重，对connection对象的一切配置（那一堆set函数）都必须要在connect()函数执行之前完成。

d http请求实际上由两部分组成，一个是http头，所有关于此次http请求的配置都在http头里面定义，一个是正文content。connect()函数会根据HttpURLConnection对象的配置值生成http头部信息，因此在调用connect函数之前，

就必须把所有的配置准备好。

下载资源

1 private String downloadStringResource(HttpURLConnection connection) {
2         // TODO Auto-generated method stub
3
4         StringBuffer content;
5         InputStream inputstream = getSafeInputStream(connection);
6         if (inputstream == null) {
7
8             return "";
9         }
10         // load the Stream in a StringBuffer
11         InputStreamReader isr = null;
12         try {
13             isr = new InputStreamReader(inputstream,this.charSet);
14             if(isr != null)
15                 CheckMethods.PrintInfoMessage("InputStreamReader is created");
16         } catch (Exception e) {
17             // TODO Auto-generated catch block
18             CheckMethods.PrintInfoMessage(Thread.currentThread().getName()+ e.getMessage());
19             e.printStackTrace();
20             return "";
21         }
22         content = new StringBuffer();
23         CheckMethods.PrintInfoMessage("StringBuffer is created");
24         try {
25             char buf[] = new char[4096];
26             int cnt = 0;
27             while ((cnt = isr.read(buf, 0, 4096)) != -1) {
28                   content.append(buf, 0, cnt);
29                    System.out.print(".");
30             }
31             isr.close();
32             inputstream.close();
33             connection.disconnect();
34         } catch (IOException ioexception) {
35             LogHelper.WriteLog(LogHelper.logger_Error, CommonValues.Log_Detail_Level, "error", "下载时读取网页"+this.pageUrl+"内容出现异常");
36             LogHelper.WriteLog(LogHelper.logger_Downloader, CommonValues.Log_Detail_Level, "error", "下载时读取网页"+this.pageUrl+"内容出现异常");
37             CheckMethods.PrintInfoMessage(Thread.currentThread().getName()+" 下载时读取网页"+this.pageUrl+"内容出现异常");
38             try {
39                 isr.close();
40                 inputstream.close();
41             } catch (IOException e) {
42                 // TODO Auto-generated catch block
43                 e.printStackTrace();
44                 CheckMethods.PrintInfoMessage(Thread.currentThread().getName()+" exception occurred after reading io exception when inputstream and isr is closing ");
45                 LogHelper.WriteLog(LogHelper.logger_Error, CommonValues.Log_Detail_Level, "fatal", "exception occurred after reading io exception when inputstream and isr is closing");
46                 connection.disconnect();
47                 return "";
48             }
49             connection.disconnect();
50             return "";
51         }
52         return content.toString();
53     }

1 private InputStream getSafeInputStream(HttpURLConnection connection)
2     {
3         InputStream inputstream = null;
4         for (int i = 0; i < 3; ) {
5             try {
6                 inputstream = connection.getInputStream();
7                 break;
8             } catch (IOException ioexception1) {
9                 i++;
10             }
11         }
12         return inputstream;
13     }

当然如果你要完成一个爬虫，又怕自己的ip地址被封的话。。。。。代理吧

设置代理的方法如下：

SocketAddress proxyAddress=new InetSocketAddress(ipAddress,port);
Proxy proxy = new Proxy(Proxy.Type.HTTP, proxyAddress);
urlconnection = pageUrl.openConnection(proxy);
urlconnection = pageUrl.openConnection();

下载资源文件的方式则有些不同~

1 public static void downloadFile(String destUrl, String fileName) throws IOException {
2         FileOutputStream fos = null;
3         BufferedInputStream bis = null;
4         HttpURLConnection httpUrl = null;
5         URL url = null;
6         byte[] buf = new byte[8096];
7         int size = 0;
8
9         //建立链接
10         url = new URL(destUrl);
11         httpUrl = (HttpURLConnection) url.openConnection();
12         //连接指定的资源
13         httpUrl.connect();
14         //获取网络输入流
15         bis = new BufferedInputStream(httpUrl.getInputStream());
16         //建立文件
17         fos = new FileOutputStream(fileName);
18         System.out.println("正在获取链接[" + destUrl + "]的内容...\n将其保存为文件[" +
19                                fileName + "]");
20            //保存文件
21         while ((size = bis.read(buf)) != -1)
22             fos.write(buf, 0, size);
23
24         fos.close();
25         bis.close();
26         httpUrl.disconnect();
27     }

好了先说这么多吧大家可以去尝试一下~

posted @ 2011-07-04 22:13 ~大器晚成~ 阅读(2001) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

与你分享

让分享融入生活

Java那些事之下载网络资源

公告