根据国家统计局的行政区划爬取阿里云地图边界
最近做一个大屏数据展示项目,需要用到全国地图及下钻功能,之前也写过一篇关于地图下钻的文章 https://www.cnblogs.com/weijiutao/p/13977011.html ,所用到的是Echarts 自带地图插件再加上很老的一份地图边界线,行政区划和边界线无法对应上,所以想到爬取一份最新的行政区划和边界线.
具体的操作是先从国家统计局 http://www.mca.gov.cn/article/sj/xzqh/2020/20201201.html 拿到一份县级以上行政区划的名称和代码,然后再去阿里云的dataV数据可视化平台 http://datav.aliyun.com/portal/school/atlas/area_selector 进行边界线数据的抓取.
先上爬取统计局县级以上行政区划的代码
1 # 统计局县级及以上行政区划 2 import requests 3 from lxml import html 4 5 etree = html.etree 6 import json 7 8 area_url = 'http://www.mca.gov.cn/article/sj/xzqh/2020/20201201.html' 9 response = requests.get(area_url) 10 result = etree.HTML(response.text) 11 areacode_list = result.xpath('//tr/td[2]/text()') 12 areaname_list = result.xpath('//tr/td[3]/text()') 13 arr = [] 14 for index, areacode in enumerate(areacode_list): 15 obj = { 16 'areacode': areacode, 17 'areaname': areaname_list[index] 18 # 'areaname': areaname_list[index+2] 19 } 20 arr.append(obj) 21 # jsonfile = open('area.json', 'w', encoding='utf-8') 22 # json.dump(arr, jsonfile, indent=2, ensure_ascii=False) 23 print(arr)
上面的代码具体解释我就不再累赘了,有兴趣的可以查看我之前的爬虫教程.
上面的代码具体落实到国家统计局的页面如下
在执行完上面的代码后我们查看打印的最后几个地区会发现问题,如下图
我们会发现820000应该是澳门特别行政区的行政区划代码,但是却成了台湾的,经过逐步排查发现海南省西沙区和南沙区没有行政区划,如下图
正好错出了两位,所以可以将上面的代码中第17行注释掉,第18行放开,这样就会向上错出两行来,然后将导出的两份json数据在三沙市那里进行数据的合并,这样就可以获取到一份完整的行政区划代码了.
接下来我们就根据拿到的行政区划区阿里云datav上来获取地图边界线.
根据上图右侧的链接我们可以获取到当前行政区划下的地图边界线,如下图
那我们就可以根据上面爬取到的县级以上行政区划再加上阿里云datav的链接地址就可以得到一份最新的地图边界线,代码如下:
1 # 根据行政区划查找阿里地图的边界 2 import requests 3 import json 4 import time 5 from lxml import html 6 7 etree = html.etree 8 9 # 打开一个json文件 10 data = open("./area.json", encoding='utf-8') 11 # 转换为python对象 12 json_result = json.load(data) 13 # print(json_result) 14 15 for area in json_result: 16 areacode = area['areacode'] 17 areaname = area['areaname'] 18 areacode_2 = areacode[: 2] 19 if int(areacode_2) > 0: 20 if areacode[-4:] == '0000': 21 print('\033[31m province---------->', areacode) 22 time.sleep(1) 23 try: 24 if areacode == '710000': 25 full_url = 'https://geo.datav.aliyun.com/areas_v3/bound/' + areacode + '.json' 26 else: 27 full_url = 'https://geo.datav.aliyun.com/areas_v3/bound/' + areacode + '_full.json' 28 response = requests.get(full_url) 29 result = json.loads(response.text) 30 31 jsonfile = open('data/province/' + areacode + '.json', 'w', encoding='utf-8') 32 json.dump(result, jsonfile, indent=2, ensure_ascii=False) 33 jsonfile.close() 34 35 jsonfile1 = open('data/province1/' + areaname + '.json', 'w', encoding='utf-8') 36 json.dump(result, jsonfile1, indent=2, ensure_ascii=False) 37 jsonfile1.close() 38 39 jsonfile2 = open('data/province2/' + areacode + '_' + areaname + '.json', 'w', encoding='utf-8') 40 json.dump(result, jsonfile2, indent=2, ensure_ascii=False) 41 jsonfile2.close() 42 43 full_url = 'https://geo.datav.aliyun.com/areas_v3/bound/' + areacode + '.json' 44 response = requests.get(full_url) 45 result = json.loads(response.text) 46 47 jsonfile_json = open('data/province_json/province/' + areacode + '.json', 'w', encoding='utf-8') 48 json.dump(result, jsonfile_json, indent=2, ensure_ascii=False) 49 jsonfile_json.close() 50 51 jsonfile_json1 = open('data/province_json/province1/' + areaname + '.json', 'w', encoding='utf-8') 52 json.dump(result, jsonfile_json1, indent=2, ensure_ascii=False) 53 jsonfile_json1.close() 54 55 jsonfile_json2 = open('data/province_json/province2/' + areacode + '_' + areaname + '.json', 'w', 56 encoding='utf-8') 57 json.dump(result, jsonfile_json2, indent=2, ensure_ascii=False) 58 jsonfile_json2.close() 59 except: 60 print('\033[41m province错误') 61 full_url = 'https://geo.datav.aliyun.com/areas_v3/bound/' + areacode + '.json' 62 jsonfile_error = open('data/province_error/' + areacode + '.json', 'w', encoding='utf-8') 63 result = {'areacode': areacode, 'url': full_url} 64 json.dump(result, jsonfile_error, indent=2, ensure_ascii=False) 65 jsonfile_error.close() 66 elif areacode[-2:] == '00': 67 print('\033[34m city---------->', areacode) 68 time.sleep(1) 69 error_list = ['441900', '442000', '460400', '620200'] 70 if areacode in error_list: 71 pass 72 else: 73 try: 74 full_url = 'https://geo.datav.aliyun.com/areas_v3/bound/' + areacode + '_full.json' 75 response = requests.get(full_url) 76 result = json.loads(response.text) 77 78 jsonfile = open('data/city/' + areacode + '.json', 'w', encoding='utf-8') 79 json.dump(result, jsonfile, indent=2, ensure_ascii=False) 80 jsonfile.close() 81 82 jsonfile1 = open('data/city1/' + areaname + '.json', 'w', encoding='utf-8') 83 json.dump(result, jsonfile1, indent=2, ensure_ascii=False) 84 jsonfile1.close() 85 86 jsonfile2 = open('data/city2/' + areacode + '_' + areaname + '.json', 'w', encoding='utf-8') 87 json.dump(result, jsonfile2, indent=2, ensure_ascii=False) 88 jsonfile2.close() 89 90 full_url = 'https://geo.datav.aliyun.com/areas_v3/bound/' + areacode + '.json' 91 response = requests.get(full_url) 92 result = json.loads(response.text) 93 94 jsonfile_json = open('data/city_json/city/' + areacode + '.json', 'w', encoding='utf-8') 95 json.dump(result, jsonfile_json, indent=2, ensure_ascii=False) 96 jsonfile_json.close() 97 98 jsonfile_json1 = open('data/city_json/city1/' + areaname + '.json', 'w', encoding='utf-8') 99 json.dump(result, jsonfile_json1, indent=2, ensure_ascii=False) 100 jsonfile_json1.close() 101 102 jsonfile_json2 = open('data/city_json/city2/' + areacode + '_' + areaname + '.json', 'w', encoding='utf-8') 103 json.dump(result, jsonfile_json2, indent=2, ensure_ascii=False) 104 jsonfile_json2.close() 105 except: 106 print('\033[41m city错误') 107 full_url = 'https://geo.datav.aliyun.com/areas_v3/bound/' + areacode + '.json' 108 jsonfile_error = open('data/city_error/' + areacode + '.json', 'w', encoding='utf-8') 109 result = {'areacode': areacode, 'url': full_url} 110 json.dump(result, jsonfile_error, indent=2, ensure_ascii=False) 111 jsonfile_error.close() 112 else: 113 print('\033[35m county---------->', areacode) 114 time.sleep(1) 115 error_list = ['320611', '330103', '330104', '350402', '410322'] 116 if areacode in error_list: 117 pass 118 else: 119 try: 120 full_url = 'https://geo.datav.aliyun.com/areas_v3/bound/' + areacode + '.json' 121 response = requests.get(full_url) 122 result = json.loads(response.text) 123 124 jsonfile = open('data/county/' + areacode + '.json', 'w', encoding='utf-8') 125 json.dump(result, jsonfile, indent=2, ensure_ascii=False) 126 jsonfile.close() 127 128 jsonfile1 = open('data/county1/' + areaname + '.json', 'w', encoding='utf-8') 129 json.dump(result, jsonfile1, indent=2, ensure_ascii=False) 130 jsonfile1.close() 131 132 jsonfile2 = open('data/county2/' + areacode + '_' + areaname + '.json', 'w', encoding='utf-8') 133 json.dump(result, jsonfile2, indent=2, ensure_ascii=False) 134 jsonfile2.close() 135 except: 136 print('\033[41m county错误') 137 jsonfile_error = open('data/county_error/' + areacode + '.json', 'w', encoding='utf-8') 138 result = {'areacode': areacode, 'url': full_url} 139 json.dump(result, jsonfile_error, indent=2, ensure_ascii=False) 140 jsonfile_error.close()
最后贴上代码地址 https://gitee.com/vijtor/areacode-datav-py