淘宝开店过程中的技术应用——【线程池】实现【图片下载】

2011-03-16 09:47 stubman 阅读(4525) 评论(39) 收藏举报

在开淘宝店过程中需要批量下载图片时，利用【线程池】实现多线程【图片下载】功能，解决问题，这篇文章主要介绍此功能的实现细节。

工具主要可以细分为以下几个子部分：
1、读取excel中数据，提取数据中的图片URL
2、利用【线程池】实现多线程访问URL
3、将提交HTTP请求得到的图片保存到本地硬盘

1、读取excel中数据，提取数据中的图片URL

将需要进行处理的数据保存到excel文档中，作者为了图方便，就保存在第一列中，程序中访问的时候，直接读取第一列就行。

代码

static DataTable ExcelToDT(string Path, string tableName)

```
 {
     
```
```
 try 
     
```
```
 {
     
```

 string strConn = "Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source=" + Path + ";" + "Extended Properties='Excel 8.0;'";

 OleDbConnection conn = new OleDbConnection(strConn);

```
 conn.Open();
     
```
```
 string strExcel = "";
     
```

 OleDbDataAdapter myCommand = null;

```
 DataSet ds = null;
     
```

 strExcel = "select * from [" + tableName + "$]";

 myCommand = new OleDbDataAdapter(strExcel, strConn);

```
 ds = new DataSet();
     
```
```
 myCommand.Fill(ds, "table1");
     
```
```
 conn.Close();
     
```
```
 return ds.Tables["table1"];
     
```
```
 }
     
```
```
 catch 
     
```
```
 {
     
```
```
 return null;
     
```
```
 }
     
```
```
 }
     
```

2、利用【线程池】实现多线程访问URL

为了实现多线程进行HTTP请求，将所有URL装在不同的List<string>对象中，而List<string>对象装在Dictionary<int, List<string>>中，每个线程实现对一组List<string>的访问，编程过程中可以定义每组List<string>的数目，间接就定义了Dictionary有多少个键值对，有多少个线程并行提交HTTP请求。

代码

```
static void SavePictureFromUrl()
     
```
```
 {
     
```

 List<string> PathList = new List<string>();

 List<string> tempPathList = new List<string>();

 DataTable dt = ExcelToDT("C:/b.xls", "Sheet1");

```
 int pathTempNum = 0;
     
```
```
 int DicKey = 0;
     
```
```
 foreach (DataRow row in dt.Rows)
     
```
```
 {
     
```
```
 if (pathTempNum == 0)
     
```
```
 {
     
```

 tempPathList = new List<string>();

```
 }
     
```

 string[] a = row[0].ToString().Split(new string[] { "src", "background" }, StringSplitOptions.None);

```
 foreach (string str in a)
     
```
```
 {
     
```

 if (str.Contains("jpg") || a.Contains("gif"))

```
 {
     
```
```
 string path = string.Empty;
     
```

 path = str.Substring(str.IndexOf("http"), str.IndexOf("jpg") + str.IndexOf("gif") + 4 - str.IndexOf("http"));

```
 if (PathList.IndexOf(path) < 0)
     
```
```
 {
     
```
```
 PathList.Add(path);
     
```
```
 tempPathList.Add(path);
     
```
```
 pathTempNum++;
     
```
```
 }
     
```
```
 }
     
```
```
 }
     
```
```
 if (pathTempNum > 100)
     
```
```
 {
     
```

 PathDic.Add(DicKey, tempPathList);

```
 DicKey++;
     
```
```
 pathTempNum = 0;
     
```
```
 }
     
```
```
 }
     
```

 ThreadPool.SetMaxThreads(100, 100);

 foreach (int key in PathDic.Keys)

```
 {
     
```

 ThreadPool.QueueUserWorkItem(new WaitCallback(SavePicFromDic), key);

```
 }
     
```
```
 }
     
```

3、将提交HTTP请求得到的图片保存到本地硬盘

这里将图片名称保存为整个URL连接，以避免不同图片重名的可能，而文件名不可包含“/”这个符号，用“@”替代（搜了一下，文件所有链接中没有用到这个符号的）。

代码

static void SavePicFromDic(object DicKey)

```
 {
     
```

 foreach(string path in PathDic[Convert.ToInt32( DicKey)])

```
 {
     
```

 Console.WriteLine(DicKey.ToString()+path);

 SavePictureFromHTTP(path,@"G:\淘宝相关\图片\图片备份\" + path.Substring(7).Replace('/', '@'));

```
 }
     
```
```
 }
     
```
```
 
```

 static void SavePictureFromHTTP(string url, string path)

```
 {
     
```
```
 try 
     
```
```
 {
     
```
```
 long fileLength = 0;
     
```

 WebRequest webReq = WebRequest.Create(url);

 WebResponse webRes = webReq.GetResponse();

 fileLength = webRes.ContentLength;

 Stream srm = webRes.GetResponseStream();

 StreamReader srmReader = new StreamReader(srm);

 byte[] bufferbyte = new byte[fileLength];

 int allByte = (int)bufferbyte.Length;

```
 int startByte = 0;
     
```
```
 while (fileLength > 0)
     
```
```
 {
     
```

 int downByte = srm.Read(bufferbyte, startByte, allByte);

```
 if (downByte == 0) { break; };
     
```
```
 startByte += downByte;
     
```
```
 allByte -= downByte;
     
```
```
 }
     
```
```
 if (File.Exists(path))
     
```
```
 {
     
```

 path = path.Insert(path.LastIndexOf('.'), Guid.NewGuid().ToString());

```
 }
     
```
```
 string tempPath = path;
     
```

 FileStream fs = new FileStream(tempPath, FileMode.OpenOrCreate, FileAccess.Write);

 fs.Write(bufferbyte, 0, bufferbyte.Length);

```
 srm.Close();
     
```
```
 srmReader.Close();
     
```
```
 fs.Close();
     
```
```
 }
     
```
```
 catch (WebException ex)
     
```
```
 {
     
```
```
 }
     
```
```
 }
     
```

总结:刚开始的时候没考虑使用多线程进行保存的，而是为求简单，直接一个进程下载，用时20多分钟，后因为图片数量不断增多才考虑使用线程池进行下载，使用后效果很明显，虽然没有具体统计，但是快了好几倍是肯定的。

另外介绍下线程池与不用线程池之间的差别，个人认为，相比之下，线程池的效率效率是比较高的，因为减少了线程切换引起的上下文切换造成的资源消耗。所以这里选择了线程池，而不是简单的多线程搞定。

刷新页面返回顶部

做对的事，远比把事情做对更重要。

淘宝开店过程中的技术应用——【线程池】实现【图片下载】

About