UIUC 系统编程 assignment2 多线程排序

题目要求:

例如给出下面的输入

1.

./smp2.1   a1.txt a2.txt a3.txt a4.txt a5.txt a6.txt a7.txt

每个文件如a0.txt有着很多int型数据,未排序,对每个文件

启动一个线程对其排序

排序好的文件存储为

a1.txt.sorted … a7.txt.sorted

2.

然后再进行merge操作对于相邻的两个文件merge

每次merge启动一个新的线程,merge生成一个排好序的更大的文件。

 

 

原题提示用temp file 不留下中间过程的临时文件,打算用c++的流操作,

不知道有没有对应的temp file,close file的时候自动删除,

当前采用生成了最后再删除的方法,最后生成一个排好序的唯一文件。

 

初始的a0.txt a1.txt…每个文件中是没有重复数字的,但不同的文件之间可能会有

重复数字,题目要求去除重复只保留唯一的值。

 

原题目似乎有个错误,他提到数字的大小fit int, 而文件line的数目fit unsigened int

不过他提供的gen.c用来生成各个文件,而gen.c生成的文件line的数目是fit int的,即

小于等于RAND_MAX + 1,2^31 .random 的数字是从0 - 2^31 –1 中产生

题目提到不要重复数字,所以归并过程中产生的文件line的数目始终不超过 2^31即2G.

考虑多个线程同时工作,会撑爆内存?他提到第一步用qsort也就是用内排序,如果输入

文件很多,对应很多线程同时工作,会不会撑爆内存呢?

 

另外题目对归并的顺序有要求,按照输入文件的顺序,所以a1.txt肯定要和a2.txt归并

而不是按照线程完成的顺序,小文件对应的线程会先完成。tmp1肯定和tmp2合并.

 

最后考虑文件的增大,归并按照外排序方法,采用两个输入缓冲和一个输出缓冲外排序。

设定3个缓冲区大小相同,输出缓冲区慢即输出到文件,某个输入缓冲满,则继续读取数据到

该缓冲进行merge,若对应输入文件无数据了(一个文件处理完了),则将剩余数据

(输出缓冲,另一输入缓冲,另一输入文件剩余数据)输出到输出文件。

 

 

merge的流程控制采用两个队列,使用当前队列,并建立另一个队列下一次使用,利用swap技术。

每次从当前队列取出两个merge,merge后生成的文件名入另一个队列。当前队列处理完,意味着

完成了一个level的merge。swap,换当前队列,继续下一level的merge直到最后只剩下一个文件

在当前队列。

 

 出现的一个问题是,发现线程工作正常,排序也正确,但是最后输出文件名出现了错误。

后来发现是一个资源管理问题,单线程状态下肯定没有问题,但是多线程就出问题了。

   while(queue_files[now].size() != 1) {
        int l = 0;
        while(queue_files[now].size() >= 2) {
            file_name1 = queue_files[now].front();
            queue_files[now].pop();
            file_name2 = queue_files[now].front();
            queue_files[now].pop();
            string s;
            Int2String(s, merge_times);
            queue_files[other].push(string("temp")+s);
            args_thread.file_name1 = file_name1.c_str();
            args_thread.file_name2 = file_name2.c_str();
            args_thread.merge_times = merge_times++;
            pthread_create(&tid[l++], NULL, MergeNumOf2Files, (void *)&args_thread);
        }
        if (!queue_files[now].empty()) {
            file_name = queue_files[now].front();
            queue_files[now].pop();
            queue_files[other].push(file_name);
        }

        for (int i = 0; i < l; i++)
            pthread_join(tid[i], NULL);
        swap(now, other);
    }

注意file_name1 和file_name2,他们实际存储字符串,并传递指针给线程,问题是一个线程正在执行的时

候,while循环继续了,前面的file_name1存储的空间实际被释放了(我觉得是这样的)!可能会被别的数据占

用,而前面那个线程的字符串数组指针还指向这个地址,于是出现了问题。

例如打印a3.txt.sortex或者更加莫名其妙的名字。

 

原题目要求如下:

//one example

[Part 1] User enters the following file names at the command line:
   ./mp2.1 a1.txt a2.txt a3.txt a4.txt a5.txt a6.txt a7.txt

[Part 2] Sort the numbers contained in each of the files to form
the following new files, without altering the original files:
   "a1.txt.sorted" is a sorted copy of "a1.txt"
   "a2.txt.sorted" is a sorted copy of "a2.txt"
   "a3.txt.sorted" is a sorted copy of "a3.txt"
   "a4.txt.sorted" is a sorted copy of "a4.txt"
   "a5.txt.sorted" is a sorted copy of "a5.txt"
   "a6.txt.sorted" is a sorted copy of "a6.txt"
   "a7.txt.sorted" is a sorted copy of "a7.txt"

And each thread prints the total number of integers encountered as
they terminate:
    This worker thread writes XXXXX lines to "YYYYY".

Therefore your output will be similar to -
    This worker thread writes 10 lines to "a7.txt.sorted".
    This worker thread writes 20 lines to "a3.txt.sorted".
    This worker thread writes 30 lines to "a2.txt.sorted".
    This worker thread writes 40 lines to "a4.txt.sorted".
    This worker thread writes 100000 lines to "a5.txt.sorted".
    This worker thread writes 2000000 lines to "a6.txt.sorted".
    This worker thread writes 30000000 lines to "a1.txt.sorted".
It's important to note that worker threads may exit in a different order than
the order they were created. This is because they're running as threads,
in parallel. Therefore, small files will sort very quickly while multi-million
line files may take a few seconds to sort.

[Part 3] Start merging the files, while still maintaining the sorted
order:

    a1.txt.sorted      a3.txt.sorted       a5.txt.sorted       a7.txt.sorted
       v    a2.txt.sorted   v   a4.txt.sorted   v   a6.txt.sorted   v               /* 7 threads to sort */
       |         v          |        v          |        v          |
       v         -          v        |          v        |          v
       \---------/          \--------/          \--------/          |               /* 3 threads to merge the six files
               v                   v                   v               v                  that can be merged at this layer */
          temp1               temp2               temp3             |
            v                   v                   v               v
            |                   |                   |               |
            \-------------------/                   \---------------/               /* 2 threads to merge the four files
                     v                                     v                           that can be merged at this layer */
                   temp4                                 temp5
                     v                                     v
                     |                                     |
                     \-------------------------------------/                        /* 1 thread to merge the two files
                                       v                                               that can be merged at this layer */
                                  sorted.txt

Each merge thread will display the name of the files merged and the name
of the new file created with the total number of lines in it. Therefore,
your output will look similar to -

    Merged 100 lines and 1000 lines into 1050 lines.
    Merged 10000 lines and 300 lines into 10300 lines.
    Merged 10 lines and 800 lines into 801 lines.
    Merged 1050 lines and 10300 lines into 10345 lines.
    Merged 801 lines and 1 lines into 802 lines.
    Merged 10345 lines and 802 lines into 11111 lines.

At the end of the execution of the program, your directory must
only contain the following NEWLY created files:
    a1.txt.sorted, a2.txt.sorted, ..., a8.txt.sorted, and sorted.txt

As well as the unmodified orignal files:
    a1.txt, a2.txt, ..., a8.txt

//测试

allen:~/study/system_programming/uiuc_assignment/smp2$ ./mp2.1 a2.txt a3.txt a4.txt a5.txt a6.txt a7.txt

The worker thread writes 66797lines to a3.txt.sorted

The worker thread writes 114550lines to a4.txt.sorted

The worker thread writes 103026lines to a5.txt.sorted

The worker thread writes 150172lines to a2.txt.sorted

The worker thread writes 118376lines to a7.txt.sorted

The worker thread writes 174951lines to a6.txt.sorted

Merged file a2.txt.sorted 150172 lines and a3.txt.sorted 66797 lines into the file temp0 with 216967 lines!

Merged file a4.txt.sorted 114550 lines and a5.txt.sorted 103026 lines into the file temp1 with 217572 lines!

Merged file a6.txt.sorted 174951 lines and a7.txt.sorted 118376 lines into the file temp2 with 293318 lines!

Merged file temp0 216967 lines and temp1 217572 lines into the file temp3 with 434515 lines!

Merged file temp3 434515 lines and temp2 293318 lines into the file temp4 with 727791 lines!

allen:~/study/system_programming/uiuc_assignment/smp2$ ls

a0.txt2         a12.txt  a17.txt         a1.txt_bak      a3.txt         a5.txt.sorted  a8.txt  gen2.py        mergea0a1.txt2  README.pdf        tags

a0.txt2.sorted  a13.txt  a18.txt         a1.txt.sorted   a3.txt.sorted  a6.txt         a9.txt  gen.c          mergea3a6.txt   smp2.zip

a0.txt.sorted   a14.txt  a19.txt         a2.txt          a4.txt         a6.txt.sorted  ge2.c   gen.py         mp2.1           sorted.txt

a10.txt         a15.txt  a1.txt2         a2.txt2.sorted  a4.txt.sorted  a7.txt         gen     Makefile       mp2.1.c         sortok.py

a11.txt         a16.txt  a1.txt2.sorted  a2.txt.sorted   a5.txt         a7.txt.sorted  gen2    mergea0a1.txt  mp2.1.cc        sortok_unique.py

allen:~/study/system_programming/uiuc_assignment/smp2$ wc -l a2.txt a3.txt a4.txt a5.txt a6.txt a7.txt

150172 a2.txt

  66797 a3.txt

114550 a4.txt

103026 a5.txt

174951 a6.txt

118376 a7.txt

727872 总用量

allen:~/study/system_programming/uiuc_assignment/smp2$ ./sortok_unique.py sorted.txt

OK! The file is sorted without duplicate num

allen:~/study/system_programming/uiuc_assignment/smp2$ wc -l sorted.txt

727791 sorted.txt

//允许重复数字的测试
allen:~/study/system_programming/uiuc_assignment/smp2$ ./mp2.1 a2.txt a3.txt a4.txt a5.txt a6.txt a7.txt
The worker thread writes 150172lines to a2.txt.sorted
The worker thread writes 66797lines to a3.txt.sorted
The worker thread writes 103026lines to a5.txt.sorted
The worker thread writes 114550lines to a4.txt.sorted
The worker thread writes 118376lines to a7.txt.sorted
The worker thread writes 174951lines to a6.txt.sorted
Merged file a2.txt.sorted 150172 lines and a3.txt.sorted 66797 lines into the file temp0 with 216969 lines!
Merged file a4.txt.sorted 114550 lines and a5.txt.sorted 103026 lines into the file temp1 with 217576 lines!
Merged file a6.txt.sorted 174951 lines and a7.txt.sorted 118376 lines into the file temp2 with 293327 lines!
Merged file temp0 216969 lines and temp1 217576 lines into the file temp3 with 434545 lines!
Merged file temp3 434545 lines and temp2 293327 lines into the file temp4 with 727872 lines!Code
allen:~/study/system_programming/uiuc_assignment/smp2$ ./sortok.py sorted.txt
OK! The file is sorted
allen:~/study/system_programming/uiuc_assignment/smp2$ ./sortok_unique.py sorted.txt
71025869
Error! The file is not sorted without duplicate num
// 允许重复,大文件测试
allen:~/study/system_programming/uiuc_assignment/smp2$ ./gen2.py 7
./gen2 5242880 > a0.txt2
./gen2 9024839 > a1.txt2
./gen2 5381893 > a2.txt2
./gen2 6127276 > a3.txt2
./gen2 7494756 > a4.txt2
./gen2 9574136 > a5.txt2
./gen2 6526798 > a6.txt2
allen:~/study/system_programming/uiuc_assignment/smp2$ ./mp2.1 a0.txt2 a1.txt2 a2.txt2 a3.txt2 a4.txt2 a5.txt2 a6.txt2
The worker thread writes 5242880lines to a0.txt2.sorted
The worker thread writes 5381893lines to a2.txt2.sorted
The worker thread writes 6127276lines to a3.txt2.sorted
The worker thread writes 6526798lines to a6.txt2.sorted
The worker thread writes 7494756lines to a4.txt2.sorted
The worker thread writes 9024839lines to a1.txt2.sorted
The worker thread writes 9574136lines to a5.txt2.sorted
Merged file a4.txt2.sorted 7494756 lines and a5.txt2.sorted 9574136 lines into the file temp2 with 17068892 lines!
Merged file a0.txt2.sorted 5242880 lines and a1.txt2.sorted 9024839 lines into the file temp0 with 14267719 lines!
Merged file a2.txt2.sorted 5381893 lines and a3.txt2.sorted 6127276 lines into the file temp1 with 11509169 lines!
Merged file temp0 14267719 lines and temp1 11509169 lines into the file temp3 with 25776888 lines!
Merged file temp2 17068892 lines and a6.txt2.sorted 6526798 lines into the file temp4 with 23595690 lines!
Merged file temp3 25776888 lines and temp4 23595690 lines into the file temp5 with 49372578 lines!
allen:~/study/system_programming/uiuc_assignment/smp2$ wc -l a0.txt2 a1.txt2 a2.txt2 a3.txt2 a4.txt2 a5.txt2 a6.txt2
  5242880 a0.txt2
  9024839 a1.txt2
  5381893 a2.txt2
  6127276 a3.txt2
  7494756 a4.txt2
  9574136 a5.txt2
  6526798 a6.txt2
49372578 总用量
allen:~/study/system_programming/uiuc_assignment/smp2$ ./sortok.py sorted.txt
OK! The file is sorted
allen:~/study/system_programming/uiuc_assignment/smp2$ wc -l sorted.txt
49372578   sorted.txt
allen:~/study/system_programming/uiuc_assignment/smp2$ du -ha sorted.txt
495M       sorted.txt

//code 
   1 /*
  2  Small Machine Problem #2
  3  CS 241, Spring 2009
  4  */
  5 
  6 #include <stdio.h>              /* Standard buffered input/output        */
  7 #include <stdlib.h>             /* Standard library functions            */
  8 #include <string.h>             /* String operations                     */
  9 #include <pthread.h>            /* Thread related functions                 */
 10 
 11 #include <iostream>
 12 #include <fstream>
 13 #include <sstream>
 14 #include <iterator>
 15 #include <vector>
 16 #include <string>
 17 #include <queue>
 18 
 19 #define _GLIBCXX_FULLY_DYNAMIC_STRING
 20 using namespace std;
 21 const int MaxLen = 5 * 1024 * 1024;  //5M int  20MB  开辟3个5M int 的vector,两个作为输入,一个作为输出缓存
 22 //const int MaxLen = 5 * 1024;    //small buffer for test
 23 struct ArgSet {
 24     const char *file_name1;
 25     const char *file_name2;
 26     int merge_times;
 27     bool unique_merge;
 28 };
 29 
 30 struct LineNumInfo {
 31     int file1_line_num;
 32     int file2_line_num;
 33     int file_out_num;
 34 };
 35 /* 
 36  * 步骤1中的对单一文件进行排序,认为单一文件中的所有数字
 37  * 可以全部读入内存进行排序
 38  */
 39 template <typename T>
 40 void SortNumOfOneFile(const char *file_name)
 41 {
 42     //get input data to vec 
 43     ifstream data_file(file_name);
 44     istream_iterator<T> data_begin(data_file);
 45     istream_iterator<T> data_end;
 46     vector<T> vec(data_begin, data_end);
 47     data_file.close();
 48     
 49     //sort vec
 50     sort(vec.begin(), vec.end());
 51     
 52     //write result to file_name.sorted,first get out file name
 53     string out_file_name = string(file_name) + string(".sorted");
 54     ofstream out_file(out_file_name.c_str());
 55     copy(vec.begin(), vec.end(), ostream_iterator<T>(out_file, "\n")); 
 56     out_file.close();
 57     
 58     cout << "The worker thread writes " << vec.size() 
 59          << "lines to " << out_file_name << endl;
 60 }
 61 
 62 void *SortNumOfOneFile(void *f)
 63 {
 64     char *file_name = (char *) f;
 65     SortNumOfOneFile<int>(file_name);
 66     return NULL;
 67 }
 68 
 69 /*
 70  *以下步骤2,考虑对于排好序的文件进行进一步归并,文件大小逐步变大
 71  *考虑外排序归并算法,开辟固定大小的两个输入缓冲,一个输出缓冲,
 72  *每次将两个文件中的数读入输入缓冲,归并排序结果到输出缓冲,如果
 73  *输出缓冲满则输出到输出文件中,如果某个输入缓冲空则继续从对应
 74  *输入文件中取数
 75  *
 76  * 另外按照题目要求,按照输入文件的顺序进行归并,不一定是上一层次的完成顺序
 77  */
 78 
 79 //从一个file stream中读取num个数字,
 80 //存到vec中,返回读到的数目
 81 template <typename T>
 82 int ReadFile(ifstream &data_file, int num, T &vec)
 83 {
 84     int i = 0;
 85     int val;
 86     while (data_file >> val) {
 87         vec[i++= val;
 88         if (i == num)
 89             break;
 90     }
 91 
 92     return i;
 93 }
 94 
 95 template <typename T>
 96 void WriteToOutputBuffer(ofstream &out_file, const T &vec_out, int num)
 97 {
 98     typedef typename T::iterator _RandomAccessIterator;
 99     typedef typename iterator_traits<_RandomAccessIterator>::value_type _ValueType;
100     copy(vec_out.begin(), vec_out.begin() + num, ostream_iterator<_ValueType>(out_file, "\n"));
101 }
102 
103 /*
104  * vec1,ve2输入缓冲
105  * vec_out输出缓冲
106  * s1,vec1起始标,num1共num1个数据
107  * cur_out 输出缓冲游标
108  * unique = true 不保留重复值(即两个输入文件中都存在的值2个只留1个)
109  */
110 template <typename T>
111 void Merge2Vec(T &vec1,T &vec2, T &vec_out, 
112                    int s1, int num1, int s2, int num2, 
113                    int cur_out, ofstream &out_file, 
114                    ifstream &data_file1, ifstream &data_file2,
115                    LineNumInfo &line_num_info,
116                    bool unique)
117 {
118     int i = s1;
119     int j = s2;
120     int end1 = i + num1;
121     int end2 = j + num2;
122     int num;
123     //注意初始cur_out要保证< MaxLen
124     //特别注意如果考虑重复的元素去掉的化,则有可能i,j 同时到达终点end1,end2
125     while (1) {
126         
127         if(!unique) { 
128             if (vec1[i] <= vec2[j])    
129                 vec_out[cur_out++= vec1[i++];
130             else
131                 vec_out[cur_out++= vec2[j++];
132         } else {    //不保留重复元素
133             if (vec1[i] < vec2[j]) {
134                 vec_out[cur_out++= vec1[i++];
135             } else if (vec1[i] == vec2[j]) {
136                 vec_out[cur_out++= vec1[i++];
137                 j++;
138             } else {
139                 vec_out[cur_out++= vec2[j++];
140             }
141         }
142 
143         if (cur_out == MaxLen) {
144             //cout << "Outpu full!" << endl;
145             WriteToOutputBuffer(out_file, vec_out, MaxLen);
146             cur_out = 0;
147             line_num_info.file_out_num += MaxLen;
148         }
149         
150         if (i == end1) {
151             //前一次1文件读没有读满,说明1文件数字已经读完,只需要把2文件对应缓冲中的以及剩余的数字输出
152             //或者虽然1文件上次读满但是没有剩余数据了 num == 0
153             if (end1 != MaxLen || !(num = ReadFile(data_file1, MaxLen, vec1))) { 
154                 while(j < end2 && cur_out < MaxLen)   //将当前缓冲2中的数据输出
155                     vec_out[cur_out++= vec2[j++];
156                 if (cur_out == MaxLen) {    //如果输出缓冲满,则输出到out文件,剩下的2输入缓冲的数据肯定不会使得输出缓冲满了
157                     WriteToOutputBuffer(out_file, vec_out, MaxLen);
158                     cur_out = 0;
159                     line_num_info.file_out_num += MaxLen;
160                     while(j < end2)
161                         vec_out[cur_out++= vec2[j++];
162                 }
163                 WriteToOutputBuffer(out_file, vec_out, cur_out);
164                 line_num_info.file_out_num += cur_out;
165                 
166                 if (end2 == MaxLen) {   //将2文件剩下数字输出
167                     while ((num = ReadFile(data_file2, MaxLen, vec_out))) {
168                         WriteToOutputBuffer(out_file, vec_out, num);
169                         line_num_info.file_out_num += num;
170                     }
171                 }
172                 return;
173             
174             } else {    //如果1文件还有剩余的数字,则读入缓冲1继续归并
175                 i = 0;
176                 end1 = num;
177                 line_num_info.file1_line_num += num;
178             }
179         }
180       
181         if (j == end2) {   //处理类似上面 i == end1
182             if (end2 != MaxLen || !(num = ReadFile(data_file2, MaxLen, vec2))) {
183                 while(i < end1 && cur_out < MaxLen) 
184                     vec_out[cur_out++= vec1[i++];
185                 if (cur_out == MaxLen) {
186                     WriteToOutputBuffer(out_file, vec_out, MaxLen);
187                     cur_out = 0;
188                     line_num_info.file_out_num += MaxLen;
189                     while(i < end1)
190                         vec_out[cur_out++= vec1[i++];
191                 }
192                 WriteToOutputBuffer(out_file, vec_out, cur_out);
193                 line_num_info.file_out_num += cur_out;
194                 
195                 if (end1 == MaxLen) {   
196                     while ((num = ReadFile(data_file1, MaxLen, vec_out))) {
197                         WriteToOutputBuffer(out_file, vec_out, num);
198                         line_num_info.file_out_num += num;
199                     }
200                 }
201                 return;
202           } else {  //2文件还有剩余数据
203                 j = 0;
204                 end2 = num;
205                 line_num_info.file2_line_num += num;
206           }
207       }
208     }
209 }
210  
211 //对两个输入文件进行归并,重复的数字被删除(when unique == true),注意单个输入的文件中没有重复数字的存在
212 //为了调试方便,首先写成保持重复数字的归并,const是必要的例如你的输入参数是a.c_str()它是一个const
213 //归并的同时记录下两个输入文件的行数目,以及输出文件的行数目
214 void MergeNumOf2Files(const char *file_name1, const char *file_name2,const char *file_name_out, bool unique = false)
215 {
216     ifstream data_file1(file_name1);
217     ifstream data_file2(file_name2);
218     ofstream out_file(file_name_out);
219 
220     LineNumInfo line_num_info;
221    
222 
223     //TODO 多个线程同时会不会内存爆掉?
224     vector<int> vec1(MaxLen);       //输入缓冲区1
225     vector<int> vec2(MaxLen);       //输入缓冲区2
226     vector<int> vec_out(MaxLen);    //输出缓冲区
227     
228     int num1 = ReadFile(data_file1, MaxLen, vec1);
229     int num2 = ReadFile(data_file2, MaxLen, vec2);
230     
231     line_num_info.file1_line_num = num1;
232     line_num_info.file2_line_num = num2;
233     line_num_info.file_out_num = 0;     //take care ,not to forget
234     
235 
236     Merge2Vec(vec1, vec2, vec_out, 
237               0, num1, 0, num2, 
238               0, out_file, 
239               data_file1, data_file2,
240               line_num_info,
241               unique);
242     
243     data_file1.close();
244     data_file2.close();
245     out_file.close();
246     
247     cout << "Merged file " << file_name1 << " " << line_num_info.file1_line_num 
248          << " lines and " << file_name2 << " " << line_num_info.file2_line_num
249          << " lines into the file " << file_name_out << " with "
250          << line_num_info.file_out_num << " lines!" << endl;
251 }
252 
253 void Int2String(string &s, int input)
254 {
255     std::stringstream ss;
256     ss << input;
257     ss >> s;
258 }
259 void *MergeNumOf2Files(void * f)
260 {
261     //g++ 不允许如下转换 错误: 从类型‘void*’到类型‘arg_set*’的转换无效
262     //arg_set *args = f;
263     ArgSet *args = static_cast<ArgSet *>(f);
264     
265     string s;
266     Int2String(s, args->merge_times);
267     s = string("temp"+ s;
268     MergeNumOf2Files(args->file_name1, args->file_name2, s.c_str(), args->unique_merge); 
269     return NULL;
270 }
271 
272 /* MAIN PROCEDURE SECTION */
273 int main(int argc, char **argv)
274 {
275     if (argc == 1) {
276         cout << "You should at least given one file" << endl;
277         return -1;
278     }
279     //第一步,对应每个文件生成一个线程对文件中数字排序
280     //将结果存在相应的.sorted文件中
281     pthread_t tid[argc - 1];
282     queue<string> queue_files[2];
283     int now = 0;
284     int other = 1;
285     
286     string file_name;
287     string file_name1, file_name2;
288     for (int i = 0; i < argc - 1; i++) {
289         file_name = string(argv[i + 1]) + string(".sorted");
290         queue_files[now].push(file_name);
291         pthread_create(&tid[i], NULL, SortNumOfOneFile, (void *) argv[i + 1]);
292     }
293 
294     for (int i = 0; i < argc - 1; i++)
295         pthread_join(tid[i], NULL);
296 
297     
298     //下面将生成线程进行mergefile将两个排好序的文件排序合并成一个新的tmp文件
299     //不断进行该过程直到生成一个唯一的排序文件即将所有文件中的数字排好序
300     //该过程可能对应多个level需要等到一个level中的所有线程完成任务后再开始
301     //下一个level
302     //TODO merge 次数 应该不超过 argc - 1即初始文件数目 how to prove?
303     int merge_times = 0;
304     ArgSet args_thread;
305     
306     while(queue_files[now].size() != 1) {
307         int l = 0;
308         string file_name1[argc - 1];
309         string file_name2[argc - 1];
310         while(queue_files[now].size() >= 2) {
311             file_name1[l] = queue_files[now].front();
312             queue_files[now].pop();
313             file_name2[l] = queue_files[now].front();
314             queue_files[now].pop();
315             
316             string s;
317             Int2String(s, merge_times);
318             queue_files[other].push(string("temp")+s);
319 
320             args_thread.file_name1 = file_name1[l].c_str();
321             args_thread.file_name2 = file_name2[l].c_str();
322             args_thread.merge_times = merge_times++;
323             args_thread.unique_merge = false;  //unique merge 如果为false则允许重复数字
324             //这并不安全,如果下面循环args_thread的内容 变化,但是
                  //上一线程还没有取用它的信息,原信息被覆盖,应该一个线程分配一个单独的args_thread变量
                  //ArgSet args_thread[argc - 1]; args_thread[l].filename1 = ... &args_thread[l]
325             pthread_create(&tid[l++], NULL, MergeNumOf2Files, (void *)&args_thread);
326         }
327         if (!queue_files[now].empty()) {
328             file_name = queue_files[now].front();
329             queue_files[now].pop();
330             queue_files[other].push(file_name);
331         }
332 
333         for (int i = 0; i < l; i++)
334             pthread_join(tid[i], NULL);
335         
336         swap(now, other);
337     }
338 
339     //删除所有temp文件并将最后生成的文件改名为sorted.txt
340     //也可在merge完成后关闭句柄的时候,判断文件名是否含有
341     //temp然后删除
342     for (int i = 0; i < merge_times; i++) {
343         string s;
344         Int2String(s, i);
345         s = string("temp"+ s;
346         if (i != merge_times - 1)
347             remove(s.c_str());
348         else
349             rename(s.c_str(), "sorted.txt");
350     }
351 
352 
353 
354     return 0;
355 /* end main() */

posted @ 2009-07-24 10:42  阁子  阅读(695)  评论(0编辑  收藏  举报