2013年东三省数模A题第而问-(Matlab读取xls文件并根据关键字检索)

题目:

  1. 从这些数据中能否找出某些规律性的东西:如食品产地与食品质量的关系;食品销售地点(即抽检地点)与食品质量的关系;季节因素等等;

  一看到这个题目瞬间就感觉到eggache了,这怎么找啊,跟食品安全有关的东西,这不一大堆嘛,难道这道题要海底捞针?有地方捞针到也行啊,但是一看那一百九十多个几乎可以称之为没有规律的数据,顿时感觉不会再爱了。。

  虽然感觉不会再爱了,但还要做题的不是。其实冷静下来分析一下,看破一个点,这个题就迎刃而解了。这一点就是:从题目所给的表格中,我们能获得什么数据?

然后我们根据获取的数据进行规律总结就可以了,这样这道题就可以变成了一道数据略复杂的统计题。下面是解题过程:

  由分析数据可得,我们从数据中获得以下关系:

  食品-季度;食品-种类;食品-环节;食品-抽检单位;食品-生产单位;

  得到这五个对应的关系(个人分析角度不同,或许不止这五个关系),得到关系就好说了,挨个统计作图分析就可以了。至此,题意分析结束。

  知道做什么以后就轮到怎么去做了。其中食品-季度;食品-种类;食品-环节;还是比较好说的,数据量不大,直接手动分析就可以了。但是食品-抽检单位;食品-生产单位;就让人蛋疼了,那可是190多个数据文件啊(xls、word and html),手动分析,估计等数模结束了你都未必能分出来。那怎么办呢?好吧,我们是程序猿~~~

首先我们对于这个两个关系进行分析,其中抽检单位都是在深圳市内,而有一些生产单位是不在深圳室内的。很明显东三省的数模是给深圳市做城市建设的,所以为了减少工作量,对于这两个关系我们选择食品-抽检单位进行统计。

方法为matlab读取xls文件(其他文件已手动转化为xls文件),并根据关键字检索;

(1)首先是获取合格和不合格食品的受检单位,代码如下:

 1 clear all;
 2 
 3 Files = dir(fullfile('unqualified','*.xls'));
 4 
 5 str1 = '受检单位名称';
 6 str2 = '被检单位';
 7 str3 = '受检单位';
 8 
 9 str4 = '受检单位详细地址';
10 str5 = '受检单位地址';
11 count = 1;
12 name = {str1,str4};
13 sign = 0;
14 sign_address = 0;
15 
16 LengthFiles = length(Files);
17 for l = 1:LengthFiles;
18     %%读取xls文件内容
19     [N, T, rawdata] = xlsread(strcat('unqualified/',Files(l).name));
20     %%获得文件内容大小,为二维矩阵
21     data_size = size(rawdata);
22     %%初始化公司名称元胞数组
23     name_size = size(name);
24     % 循环遍历xls文件中的数据,相当与二维数组
25     for row = 1 : data_size(1)
26         for col = 1 : data_size(2)
27             %%查找适合的那一列
28             if( isequal((rawdata{row, col}), (rawdata{row, col})) && ((strcmp(rawdata{row, col}, str1) == 1) || ...
29                 (strcmp(rawdata{row, col}, str2) == 1) || (strcmp(rawdata{row, col}, str3) == 1) ))
30                 %%判断是否有受检单位详细地址这一列,如果有则将 sign_address 设为 1
31                 if(strcmp(rawdata{row, col+1}, str4) == 1 || strcmp(rawdata{row, col+1}, str5) == 1)
32                     sign_address = 1;
33                 end
34                 %得到受检单位名称一列后,遍历获取那一列的数据
35                 for i = (row + 1) : data_size(1)
36                     % 查询名称是否已存在
37                     for j = 2 : name_size(1)
38                         if(strcmp(rawdata{i, col}, name{j}) == 1)
39                             %如果存在,则将标记位标记为 1
40                             sign = 1;
41                             break;
42                         end
43                     end   
44                     %%如果公司名不存在,则将公司名加入元胞数组中
45                     if(sign == 0 && isequal((rawdata{i, col}), (rawdata{i, col})))
46                         count = count + 1;
47                         name{count,1} = rawdata{i, col};
48                         name_size = size(name);
49                         if(sign_address == 1 && isempty(name{count,2}) )
50                             name{count,2} = rawdata{i, col+1};
51                         end
52                     end
53                     %%如果公司存在但所遍历的这一项有受监单位详细地址这一项,则将受检单位详细地址这一项加入元胞数组中
54                     if(sign == 1 && isequal((rawdata{i, col}), (rawdata{i, col})) && sign_address == 1 && isempty(name{count,2})) 
55                         name{count,2} = rawdata{i, col+1};
56                     end
57                     
58                     sign = 0;
59                     
60                     %end
61                 end
62                 break;
63             end
64         end
65     end
66     sign_address = 0;
67 end

运行后会得到所有公司的一个mat文件,得到公司名后,也不能全手动分地区啊(按地区进行分类,而且不算重复,一共得到了7528了公司)。

然后我们还是程序猿~~读取上面获得的mat文件,进行关键字检索,怎么做呢?直接上代码~

  1 clear all;
  2 
  3 
  4 %load('D:/数模/A题/data/qualified/matlab.mat');
  5 load('D:/数模/A题/data/unqualified/unqualified.mat');
  6 
  7 strFuTianQu = {'福田区 ','园岭','南园','华富','莲花','福田','沙头','香蜜湖','福保','华强北', '梅林',...
  8     '福田','园岭街道','南园街道','华富街道','莲花街道','福田街道','沙头街道','香蜜湖街道','福保街道','华强北街道', '梅林街道'};
  9 strFuTianQuSize = size(strFuTianQu);
 10 
 11 strLuoHuQu = {'罗湖区','黄贝','东门','南湖','桂园','笋岗','清水河','翠竹','东湖','东晓','莲塘'...
 12     '罗湖','黄贝街道','东门街道','南湖街道','桂园街道','笋岗街道','清水河街道','翠竹街道','东湖街道','东晓街道','莲塘街道'};
 13 strLuoHuQuSize = size(strLuoHuQu);
 14 
 15 strNanShanQu = {'南山区','南头','南山','招商','蛇口','粤海','沙河','西丽','桃源', ...
 16     '南山','南头街道','南山街道','招商街道','蛇口街道','粤海街道','沙河街道','西丽街道','桃源街道'};
 17 strNanShanQuSize = size(strNanShanQu);
 18 
 19 strYanTianQu = {'盐田区','沙头角','梅沙','盐田','海山', ...
 20     '盐田','沙头角街道','梅沙街道','盐田街道','海山街道'};
 21 strYanTianQuSize = size(strYanTianQu);
 22 
 23 strBaoAnQu = {'宝安区','新安','西乡','福永','沙井','松岗','石岩','中心区' , ...
 24     '宝安','新安街道','西乡街道','福永街道','沙井街道','松岗街道','石岩街道','中心区街道'};
 25 strBaoAnQuSize = size(strBaoAnQu);
 26 
 27 strGuangMingQu = {'光明区','公明','光明', '公明街道', '光明街道'};
 28 strGuangMingQuSize = size(strGuangMingQu);
 29 
 30 strLongGuangQu = {'龙岗区','布吉','坂田','南湾','平湖','横岗','龙岗','龙城','坪地', ... 
 31     '龙岗','布吉街道','坂田街道','南湾街道','平湖街道','横岗街道','龙岗街道','龙城街道','坪地街道'};
 32 strLongGuangQuSize = size(strLongGuangQu);
 33 
 34 strPingShanQu = {'坪山区','坪山','坑梓', ...
 35     '坪山','坪山街道','坑梓街道'};
 36 strPingShanQuSize = size(strPingShanQu);
 37 
 38 FuShanQu = {'受检单位名称','受检单位详细地址','编号'};
 39 LuoHuQu = {'受检单位名称','受检单位详细地址','编号'};
 40 NanShanQu = {'受检单位名称','受检单位详细地址','编号'};
 41 YanTianQu = {'受检单位名称','受检单位详细地址','编号'};
 42 BaoAnQu = {'受检单位名称','受检单位详细地址','编号'};
 43 GuangMingQu = {'受检单位名称','受检单位详细地址','编号'};
 44 LongGuangQu = {'受检单位名称','受检单位详细地址','编号'};
 45 PingShanQu = {'受检单位名称','受检单位详细地址','编号'};
 46 Others = {'受检单位名称'};
 47 
 48 countFuShanQu = 1;
 49 countLuoHuQu = 1;
 50 countNanShanQu = 1;
 51 countYanTianQu = 1;
 52 countBaoAnQu = 1;
 53 countGuangMingQu = 1;
 54 countLongGuangQu = 1;
 55 countPingShanQu = 1;
 56 countOthers = 1;
 57 
 58 %%互斥信号量
 59 mutex = 1;
 60 
 61 %%mat文件内的元胞数组名为 name
 62 data_size = size(name);
 63 col = 1;
 64 for row = 2 : data_size(1)
 65     %%福山区
 66     if(mutex == 1)
 67         for i1 = 1 : strFuTianQuSize(2)
 68 
 69             %%用strfind函数判断指定字符串是否存在于改字符串中,并将结果分别存储在out1和out2中
 70             out1 = strfind(name{row, col}, strFuTianQu{1, i1});  
 71             out2 = strfind(name{row, col + 1}, strFuTianQu{1, i1});
 72 
 73             %%如果out1和out2中有一个不为空,则分类
 74             if(~isempty(out1) || ~isempty(out2))
 75                 FuShanQu{countFuShanQu, 1} = name{row, col};
 76                 FuShanQu{countFuShanQu, 2} = name{row, col + 1};
 77                 FuShanQu{countFuShanQu, 3} = countFuShanQu;
 78                 countFuShanQu = countFuShanQu + 1;
 79                 mutex = 0;
 80                 break;
 81             end
 82         end
 83     end
 84     %%罗湖区
 85     if(mutex == 1)
 86         for i1 = 1 : strLuoHuQuSize(2)
 87 
 88             out1 = strfind(name{row, col}, strLuoHuQu{1, i1});  
 89             out2 = strfind(name{row, col + 1}, strLuoHuQu{1, i1});
 90             %%如果out1和out2中有一个不为空,则分类
 91             if(~isempty(out1) || ~isempty(out2))
 92                 LuoHuQu{countLuoHuQu, 1} = name{row, col};
 93                 LuoHuQu{countLuoHuQu, 2} = name{row, col + 1};
 94                 LuoHuQu{countLuoHuQu, 3} = countLuoHuQu;
 95                 countLuoHuQu = countLuoHuQu + 1;
 96                 mutex = 0;
 97                 break;
 98             end
 99         end
100     end
101 
102     if(mutex == 1)
103         for i1 = 1 : strNanShanQuSize(2)
104 
105             out1 = strfind(name{row, col}, strNanShanQu{1, i1});  
106             out2 = strfind(name{row, col + 1}, strNanShanQu{1, i1});
107             %%如果out1和out2中有一个不为空,则分类
108             if(~isempty(out1) || ~isempty(out2))
109                 NanShanQu{countNanShanQu, 1} = name{row, col};
110                 NanShanQu{countNanShanQu, 2} = name{row, col + 1};
111                 NanShanQu{countNanShanQu, 3} = countNanShanQu;
112                 countNanShanQu = countNanShanQu + 1;
113                 mutex = 0;
114                 break;
115             end
116         end
117     end
118 
119     if(mutex == 1)
120         for i1 = 1 : strYanTianQuSize(2)
121 
122             out1 = strfind(name{row, col}, strYanTianQu{1, i1});  
123             out2 = strfind(name{row, col + 1}, strYanTianQu{1, i1});
124             %%如果out1和out2中有一个不为空,则分类
125             if(~isempty(out1) || ~isempty(out2))
126                 YanTianQu{countYanTianQu, 1} = name{row, col};
127                 YanTianQu{countYanTianQu, 2} = name{row, col + 1};
128                 YanTianQu{countYanTianQu, 3} = countYanTianQu;
129                 countYanTianQu = countYanTianQu + 1;
130                 mutex = 0;
131                 break;
132             end
133         end
134     end
135 
136     if(mutex == 1)
137         for i1 = 1 : strBaoAnQuSize(2)
138             
139             out1 = strfind(name{row, col}, strBaoAnQu{1, i1});  
140             out2 = strfind(name{row, col + 1}, strBaoAnQu{1, i1});
141             %%如果out1和out2中有一个不为空,则分类
142             if(~isempty(out1) || ~isempty(out2))
143                 BaoAnQu{countBaoAnQu, 1} = name{row, col};
144                 BaoAnQu{countBaoAnQu, 2} = name{row, col + 1};
145                 BaoAnQu{countBaoAnQu, 3} = countBaoAnQu;
146                 countBaoAnQu = countBaoAnQu + 1;
147                 mutex = 0;
148                 break;
149             end
150         end
151     end
152 
153     if(mutex == 1)
154         for i1 = 1 : strGuangMingQuSize(2)
155             
156             out1 = strfind(name{row, col}, strGuangMingQu{1, i1});  
157             out2 = strfind(name{row, col + 1}, strGuangMingQu{1, i1});
158             %%如果out1和out2中有一个不为空,则分类
159             if(~isempty(out1) || ~isempty(out2))
160                 GuangMingQu{countGuangMingQu, 1} = name{row, col};
161                 GuangMingQu{countGuangMingQu, 2} = name{row, col + 1};
162                 GuangMingQu{countGuangMingQu, 3} = countGuangMingQu;
163                 countGuangMingQu = countGuangMingQu + 1;
164                 mutex = 0;
165                 break;
166             end
167         end
168     end
169 
170     if(mutex == 1)
171         for i1 = 1 : strLongGuangQuSize(2)
172             
173             out1 = strfind(name{row, col}, strLongGuangQu{1, i1});  
174             out2 = strfind(name{row, col + 1}, strLongGuangQu{1, i1});
175             %%如果out1和out2中有一个不为空,则分类
176             if(~isempty(out1) || ~isempty(out2))
177                 LongGuangQu{countLongGuangQu, 1} = name{row, col};
178                 LongGuangQu{countLongGuangQu, 2} = name{row, col + 1};
179                 LongGuangQu{countLongGuangQu, 3} = countLongGuangQu;
180                 countLongGuangQu = countLongGuangQu + 1;
181                 mutex = 0;
182                 break;
183             end
184         end
185     end
186 
187     if(mutex == 1)
188         for i1 = 1 : strPingShanQuSize(2)
189             
190             out1 = strfind(name{row, col}, strPingShanQu{1, i1});  
191             out2 = strfind(name{row, col + 1}, strPingShanQu{1, i1});
192             %%如果out1和out2中有一个不为空,则分类
193             if(~isempty(out1) || ~isempty(out2))
194                 PingShanQu{countPingShanQu, 1} = name{row, col};
195                 PingShanQu{countPingShanQu, 2} = name{row, col + 1};
196                 PingShanQu{countPingShanQu, 3} = countPingShanQu;
197                 countPingShanQu = countPingShanQu + 1;
198                 mutex = 0;
199                 break;
200             end
201         end
202     end
203 
204     if(mutex == 1)
205         
206         Others{countOthers, 1} = name{row, col};
207         Others{countOthers, 2} = name{row, col + 1};
208         Others{countOthers, 3} = countOthers;
209         countOthers = countOthers + 1;
210         mutex = 0;
211     end
212     mutex = 1;
213 end

原理为根据深圳市各个区名和区内街道名对公司名进行匹配,原理很简单的,就看你能不能想到了~~,至此,第二题结束

注:由于做数模这个东西确实恶心了,所以这篇博客在思路和代码方面我只是说了个大概,不想费那么多时间去总结那么细了,所以如果您对我的博客感兴趣并有疑问的话,欢迎评论或者发邮件给我(lityangweiguang@gmail.com)

posted on 2013-05-21 00:20  LitLeo  阅读(488)  评论(0编辑  收藏  举报

导航