2013年东三省数模A题第而问-(Matlab读取xls文件并根据关键字检索)
题目:
- 从这些数据中能否找出某些规律性的东西:如食品产地与食品质量的关系;食品销售地点(即抽检地点)与食品质量的关系;季节因素等等;
一看到这个题目瞬间就感觉到eggache了,这怎么找啊,跟食品安全有关的东西,这不一大堆嘛,难道这道题要海底捞针?有地方捞针到也行啊,但是一看那一百九十多个几乎可以称之为没有规律的数据,顿时感觉不会再爱了。。
虽然感觉不会再爱了,但还要做题的不是。其实冷静下来分析一下,看破一个点,这个题就迎刃而解了。这一点就是:从题目所给的表格中,我们能获得什么数据?
然后我们根据获取的数据进行规律总结就可以了,这样这道题就可以变成了一道数据略复杂的统计题。下面是解题过程:
由分析数据可得,我们从数据中获得以下关系:
食品-季度;食品-种类;食品-环节;食品-抽检单位;食品-生产单位;
得到这五个对应的关系(个人分析角度不同,或许不止这五个关系),得到关系就好说了,挨个统计作图分析就可以了。至此,题意分析结束。
知道做什么以后就轮到怎么去做了。其中食品-季度;食品-种类;食品-环节;还是比较好说的,数据量不大,直接手动分析就可以了。但是食品-抽检单位;食品-生产单位;就让人蛋疼了,那可是190多个数据文件啊(xls、word and html),手动分析,估计等数模结束了你都未必能分出来。那怎么办呢?好吧,我们是程序猿~~~
首先我们对于这个两个关系进行分析,其中抽检单位都是在深圳市内,而有一些生产单位是不在深圳室内的。很明显东三省的数模是给深圳市做城市建设的,所以为了减少工作量,对于这两个关系我们选择食品-抽检单位进行统计。
方法为matlab读取xls文件(其他文件已手动转化为xls文件),并根据关键字检索;
(1)首先是获取合格和不合格食品的受检单位,代码如下:
1 clear all; 2 3 Files = dir(fullfile('unqualified','*.xls')); 4 5 str1 = '受检单位名称'; 6 str2 = '被检单位'; 7 str3 = '受检单位'; 8 9 str4 = '受检单位详细地址'; 10 str5 = '受检单位地址'; 11 count = 1; 12 name = {str1,str4}; 13 sign = 0; 14 sign_address = 0; 15 16 LengthFiles = length(Files); 17 for l = 1:LengthFiles; 18 %%读取xls文件内容 19 [N, T, rawdata] = xlsread(strcat('unqualified/',Files(l).name)); 20 %%获得文件内容大小,为二维矩阵 21 data_size = size(rawdata); 22 %%初始化公司名称元胞数组 23 name_size = size(name); 24 % 循环遍历xls文件中的数据,相当与二维数组 25 for row = 1 : data_size(1) 26 for col = 1 : data_size(2) 27 %%查找适合的那一列 28 if( isequal((rawdata{row, col}), (rawdata{row, col})) && ((strcmp(rawdata{row, col}, str1) == 1) || ... 29 (strcmp(rawdata{row, col}, str2) == 1) || (strcmp(rawdata{row, col}, str3) == 1) )) 30 %%判断是否有受检单位详细地址这一列,如果有则将 sign_address 设为 1 31 if(strcmp(rawdata{row, col+1}, str4) == 1 || strcmp(rawdata{row, col+1}, str5) == 1) 32 sign_address = 1; 33 end 34 %得到受检单位名称一列后,遍历获取那一列的数据 35 for i = (row + 1) : data_size(1) 36 % 查询名称是否已存在 37 for j = 2 : name_size(1) 38 if(strcmp(rawdata{i, col}, name{j}) == 1) 39 %如果存在,则将标记位标记为 1 40 sign = 1; 41 break; 42 end 43 end 44 %%如果公司名不存在,则将公司名加入元胞数组中 45 if(sign == 0 && isequal((rawdata{i, col}), (rawdata{i, col}))) 46 count = count + 1; 47 name{count,1} = rawdata{i, col}; 48 name_size = size(name); 49 if(sign_address == 1 && isempty(name{count,2}) ) 50 name{count,2} = rawdata{i, col+1}; 51 end 52 end 53 %%如果公司存在但所遍历的这一项有受监单位详细地址这一项,则将受检单位详细地址这一项加入元胞数组中 54 if(sign == 1 && isequal((rawdata{i, col}), (rawdata{i, col})) && sign_address == 1 && isempty(name{count,2})) 55 name{count,2} = rawdata{i, col+1}; 56 end 57 58 sign = 0; 59 60 %end 61 end 62 break; 63 end 64 end 65 end 66 sign_address = 0; 67 end
运行后会得到所有公司的一个mat文件,得到公司名后,也不能全手动分地区啊(按地区进行分类,而且不算重复,一共得到了7528了公司)。
然后我们还是程序猿~~读取上面获得的mat文件,进行关键字检索,怎么做呢?直接上代码~
1 clear all; 2 3 4 %load('D:/数模/A题/data/qualified/matlab.mat'); 5 load('D:/数模/A题/data/unqualified/unqualified.mat'); 6 7 strFuTianQu = {'福田区 ','园岭','南园','华富','莲花','福田','沙头','香蜜湖','福保','华强北', '梅林',... 8 '福田','园岭街道','南园街道','华富街道','莲花街道','福田街道','沙头街道','香蜜湖街道','福保街道','华强北街道', '梅林街道'}; 9 strFuTianQuSize = size(strFuTianQu); 10 11 strLuoHuQu = {'罗湖区','黄贝','东门','南湖','桂园','笋岗','清水河','翠竹','东湖','东晓','莲塘'... 12 '罗湖','黄贝街道','东门街道','南湖街道','桂园街道','笋岗街道','清水河街道','翠竹街道','东湖街道','东晓街道','莲塘街道'}; 13 strLuoHuQuSize = size(strLuoHuQu); 14 15 strNanShanQu = {'南山区','南头','南山','招商','蛇口','粤海','沙河','西丽','桃源', ... 16 '南山','南头街道','南山街道','招商街道','蛇口街道','粤海街道','沙河街道','西丽街道','桃源街道'}; 17 strNanShanQuSize = size(strNanShanQu); 18 19 strYanTianQu = {'盐田区','沙头角','梅沙','盐田','海山', ... 20 '盐田','沙头角街道','梅沙街道','盐田街道','海山街道'}; 21 strYanTianQuSize = size(strYanTianQu); 22 23 strBaoAnQu = {'宝安区','新安','西乡','福永','沙井','松岗','石岩','中心区' , ... 24 '宝安','新安街道','西乡街道','福永街道','沙井街道','松岗街道','石岩街道','中心区街道'}; 25 strBaoAnQuSize = size(strBaoAnQu); 26 27 strGuangMingQu = {'光明区','公明','光明', '公明街道', '光明街道'}; 28 strGuangMingQuSize = size(strGuangMingQu); 29 30 strLongGuangQu = {'龙岗区','布吉','坂田','南湾','平湖','横岗','龙岗','龙城','坪地', ... 31 '龙岗','布吉街道','坂田街道','南湾街道','平湖街道','横岗街道','龙岗街道','龙城街道','坪地街道'}; 32 strLongGuangQuSize = size(strLongGuangQu); 33 34 strPingShanQu = {'坪山区','坪山','坑梓', ... 35 '坪山','坪山街道','坑梓街道'}; 36 strPingShanQuSize = size(strPingShanQu); 37 38 FuShanQu = {'受检单位名称','受检单位详细地址','编号'}; 39 LuoHuQu = {'受检单位名称','受检单位详细地址','编号'}; 40 NanShanQu = {'受检单位名称','受检单位详细地址','编号'}; 41 YanTianQu = {'受检单位名称','受检单位详细地址','编号'}; 42 BaoAnQu = {'受检单位名称','受检单位详细地址','编号'}; 43 GuangMingQu = {'受检单位名称','受检单位详细地址','编号'}; 44 LongGuangQu = {'受检单位名称','受检单位详细地址','编号'}; 45 PingShanQu = {'受检单位名称','受检单位详细地址','编号'}; 46 Others = {'受检单位名称'}; 47 48 countFuShanQu = 1; 49 countLuoHuQu = 1; 50 countNanShanQu = 1; 51 countYanTianQu = 1; 52 countBaoAnQu = 1; 53 countGuangMingQu = 1; 54 countLongGuangQu = 1; 55 countPingShanQu = 1; 56 countOthers = 1; 57 58 %%互斥信号量 59 mutex = 1; 60 61 %%mat文件内的元胞数组名为 name 62 data_size = size(name); 63 col = 1; 64 for row = 2 : data_size(1) 65 %%福山区 66 if(mutex == 1) 67 for i1 = 1 : strFuTianQuSize(2) 68 69 %%用strfind函数判断指定字符串是否存在于改字符串中,并将结果分别存储在out1和out2中 70 out1 = strfind(name{row, col}, strFuTianQu{1, i1}); 71 out2 = strfind(name{row, col + 1}, strFuTianQu{1, i1}); 72 73 %%如果out1和out2中有一个不为空,则分类 74 if(~isempty(out1) || ~isempty(out2)) 75 FuShanQu{countFuShanQu, 1} = name{row, col}; 76 FuShanQu{countFuShanQu, 2} = name{row, col + 1}; 77 FuShanQu{countFuShanQu, 3} = countFuShanQu; 78 countFuShanQu = countFuShanQu + 1; 79 mutex = 0; 80 break; 81 end 82 end 83 end 84 %%罗湖区 85 if(mutex == 1) 86 for i1 = 1 : strLuoHuQuSize(2) 87 88 out1 = strfind(name{row, col}, strLuoHuQu{1, i1}); 89 out2 = strfind(name{row, col + 1}, strLuoHuQu{1, i1}); 90 %%如果out1和out2中有一个不为空,则分类 91 if(~isempty(out1) || ~isempty(out2)) 92 LuoHuQu{countLuoHuQu, 1} = name{row, col}; 93 LuoHuQu{countLuoHuQu, 2} = name{row, col + 1}; 94 LuoHuQu{countLuoHuQu, 3} = countLuoHuQu; 95 countLuoHuQu = countLuoHuQu + 1; 96 mutex = 0; 97 break; 98 end 99 end 100 end 101 102 if(mutex == 1) 103 for i1 = 1 : strNanShanQuSize(2) 104 105 out1 = strfind(name{row, col}, strNanShanQu{1, i1}); 106 out2 = strfind(name{row, col + 1}, strNanShanQu{1, i1}); 107 %%如果out1和out2中有一个不为空,则分类 108 if(~isempty(out1) || ~isempty(out2)) 109 NanShanQu{countNanShanQu, 1} = name{row, col}; 110 NanShanQu{countNanShanQu, 2} = name{row, col + 1}; 111 NanShanQu{countNanShanQu, 3} = countNanShanQu; 112 countNanShanQu = countNanShanQu + 1; 113 mutex = 0; 114 break; 115 end 116 end 117 end 118 119 if(mutex == 1) 120 for i1 = 1 : strYanTianQuSize(2) 121 122 out1 = strfind(name{row, col}, strYanTianQu{1, i1}); 123 out2 = strfind(name{row, col + 1}, strYanTianQu{1, i1}); 124 %%如果out1和out2中有一个不为空,则分类 125 if(~isempty(out1) || ~isempty(out2)) 126 YanTianQu{countYanTianQu, 1} = name{row, col}; 127 YanTianQu{countYanTianQu, 2} = name{row, col + 1}; 128 YanTianQu{countYanTianQu, 3} = countYanTianQu; 129 countYanTianQu = countYanTianQu + 1; 130 mutex = 0; 131 break; 132 end 133 end 134 end 135 136 if(mutex == 1) 137 for i1 = 1 : strBaoAnQuSize(2) 138 139 out1 = strfind(name{row, col}, strBaoAnQu{1, i1}); 140 out2 = strfind(name{row, col + 1}, strBaoAnQu{1, i1}); 141 %%如果out1和out2中有一个不为空,则分类 142 if(~isempty(out1) || ~isempty(out2)) 143 BaoAnQu{countBaoAnQu, 1} = name{row, col}; 144 BaoAnQu{countBaoAnQu, 2} = name{row, col + 1}; 145 BaoAnQu{countBaoAnQu, 3} = countBaoAnQu; 146 countBaoAnQu = countBaoAnQu + 1; 147 mutex = 0; 148 break; 149 end 150 end 151 end 152 153 if(mutex == 1) 154 for i1 = 1 : strGuangMingQuSize(2) 155 156 out1 = strfind(name{row, col}, strGuangMingQu{1, i1}); 157 out2 = strfind(name{row, col + 1}, strGuangMingQu{1, i1}); 158 %%如果out1和out2中有一个不为空,则分类 159 if(~isempty(out1) || ~isempty(out2)) 160 GuangMingQu{countGuangMingQu, 1} = name{row, col}; 161 GuangMingQu{countGuangMingQu, 2} = name{row, col + 1}; 162 GuangMingQu{countGuangMingQu, 3} = countGuangMingQu; 163 countGuangMingQu = countGuangMingQu + 1; 164 mutex = 0; 165 break; 166 end 167 end 168 end 169 170 if(mutex == 1) 171 for i1 = 1 : strLongGuangQuSize(2) 172 173 out1 = strfind(name{row, col}, strLongGuangQu{1, i1}); 174 out2 = strfind(name{row, col + 1}, strLongGuangQu{1, i1}); 175 %%如果out1和out2中有一个不为空,则分类 176 if(~isempty(out1) || ~isempty(out2)) 177 LongGuangQu{countLongGuangQu, 1} = name{row, col}; 178 LongGuangQu{countLongGuangQu, 2} = name{row, col + 1}; 179 LongGuangQu{countLongGuangQu, 3} = countLongGuangQu; 180 countLongGuangQu = countLongGuangQu + 1; 181 mutex = 0; 182 break; 183 end 184 end 185 end 186 187 if(mutex == 1) 188 for i1 = 1 : strPingShanQuSize(2) 189 190 out1 = strfind(name{row, col}, strPingShanQu{1, i1}); 191 out2 = strfind(name{row, col + 1}, strPingShanQu{1, i1}); 192 %%如果out1和out2中有一个不为空,则分类 193 if(~isempty(out1) || ~isempty(out2)) 194 PingShanQu{countPingShanQu, 1} = name{row, col}; 195 PingShanQu{countPingShanQu, 2} = name{row, col + 1}; 196 PingShanQu{countPingShanQu, 3} = countPingShanQu; 197 countPingShanQu = countPingShanQu + 1; 198 mutex = 0; 199 break; 200 end 201 end 202 end 203 204 if(mutex == 1) 205 206 Others{countOthers, 1} = name{row, col}; 207 Others{countOthers, 2} = name{row, col + 1}; 208 Others{countOthers, 3} = countOthers; 209 countOthers = countOthers + 1; 210 mutex = 0; 211 end 212 mutex = 1; 213 end
原理为根据深圳市各个区名和区内街道名对公司名进行匹配,原理很简单的,就看你能不能想到了~~,至此,第二题结束
注:由于做数模这个东西确实恶心了,所以这篇博客在思路和代码方面我只是说了个大概,不想费那么多时间去总结那么细了,所以如果您对我的博客感兴趣并有疑问的话,欢迎评论或者发邮件给我(lityangweiguang@gmail.com)