根据经纬度计算距离半径内网点数量

Untitled

In [9]:

# this article is original, some technical details refer to https://yeahexp.com/how-to-quickly-calculate-the-distance-between-all-points-in-arrays/


import pandas as pd
banknet = pd.read_excel("金融机构网点.xlsx")
info = pd.read_excel("STK_LISTEDCOINFOANL.xlsx")

在这篇文档中，我会介绍如何使用CSMAR的公司信息和银行分支机构的经纬度数据计算公司附近5km、10km、15km以内的银行网点数。核心的小技巧，可查找查阅：

reove_others 在dataframe中只保留选出的列
rename 重命名dataframe
pop insert 将dataframe某一列位置调换到第一列
dist.pairwise 计算n个地点到n个地点的距离矩阵

In [25]:

from pandas import DataFrame
from typing import Set, Any
def remove_others(df: DataFrame, columns: Set[Any]):
    cols_total: Set[Any] = set(df.columns)
    diff: Set[Any] = cols_total - columns
    df.drop(diff, axis=1, inplace=True)

In [12]:

banknet = banknet.rename(columns={"Longitude": "Lng"})
banknet = banknet.rename(columns={"Latitude": "Lat"})
banknet

Out[12]:

	SgnDate	BranchName	InstituionID	Lng	Lat	ProvinceName	CityName
0	2018-08-29	中国建设银行股份有限公司双辽大市场支行	103400.0	123.514728	43.506425	吉林省	四平市
1	2018-08-29	中原银行股份有限公司许昌府前街支行	10279284.0	113.830570	34.030894	河南省	许昌市
2	2018-08-29	中国农业银行股份有限公司花垣边城广场分理处	10842.0	109.474963	28.580041	湖南省	湘西土家族苗族自治州
3	2018-08-29	中国农业银行股份有限公司花垣县支行	10842.0	109.479785	28.581781	湖南省	湘西土家族苗族自治州
4	2018-08-29	中国农业银行股份有限公司吉首红旗门支行	10842.0	109.746330	28.310907	湖南省	湘西土家族苗族自治州
...	...	...	...	...	...	...	...
199788	2023-01-13	中国邮政储蓄银行股份有限公司邢台市信都支行	10550.0	114.463711	37.082751	河北省	邢台市
199789	2023-01-13	重庆农村商业银行股份有限公司云阳江口支行	103242.0	108.796737	31.222060	重庆市	重庆市
199790	2023-01-13	洪雅县农村信用合作联社洪川信用社	10406756.0	103.370584	29.906721	四川省	眉山市
199791	2023-01-13	洪雅县农村信用合作联社人民路信用社	10406756.0	103.372998	29.903975	四川省	眉山市
199792	2023-01-13	洪雅县农村信用合作联社止戈信用社	10406756.0	103.330061	29.891578	四川省	眉山市

199793 rows × 7 columns

In [11]:

info
#haversine(lyon, paris)

Out[11]:

	Symbol	ShortName	EndDate	Lng	Lat
0	1	平安银行	2018-12-31	114.107966	22.540578
1	2	万科A	2018-12-31	114.302975	22.596841
2	4	国农科技	2018-12-31	113.940950	22.511309
3	5	世纪星源	2018-12-31	114.128178	22.544323
4	6	深振业A	2018-12-31	113.952117	22.535293
...	...	...	...	...	...
4898	900948	伊泰B股	2018-12-31	109.978348	39.830321
4899	900951	ST 大化B	2018-12-31	121.750003	39.400731
4900	900953	凯马B股	2018-12-31	121.432808	31.249545
4901	900956	东贝B股	2018-12-31	114.997710	30.154867
4902	900957	凌云B股	2018-12-31	121.537186	31.225409

4903 rows × 5 columns

In [26]:

banknet_LatLng = banknet
remove_others(banknet_LatLng, {"Lat","Lng"})

In [28]:

first_column = banknet_LatLng.pop('Lat')
banknet_LatLng.insert(0, 'Lat', Lat)
banknet_LatLng

Out[28]:

	Lat	Lng
0	30.260042	123.514728
1	30.260042	113.830570
2	30.260042	109.474963
3	30.260042	109.479785
4	30.260042	109.746330
...	...	...
199788	30.260042	114.463711
199789	30.260042	108.796737
199790	30.260042	103.370584
199791	30.260042	103.372998
199792	30.260042	103.330061

199793 rows × 2 columns

In [29]:

info_LatLng = info
remove_others(info_LatLng, {"Lat","Lng"})
first_column = info_LatLng.pop('Lat')
info_LatLng.insert(0, 'Lat', Lat)
info_LatLng

Out[29]:

	Lat	Lng
0	30.260042	114.107966
1	30.260042	114.302975
2	30.260042	113.940950
3	30.260042	114.128178
4	30.260042	113.952117
...	...	...
4898	30.260042	109.978348
4899	30.260042	121.750003
4900	30.260042	121.432808
4901	30.260042	114.997710
4902	30.260042	121.537186

4903 rows × 2 columns

In [30]:

import numpy as np
import pandas as pd
from sklearn.neighbors import DistanceMetric

# http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html
dist = DistanceMetric.get_metric('haversine')

earth_radius = 6371
D = dist.pairwise(np.radians(info_LatLng), np.radians(banknet_LatLng)) * earth_radius

In [34]:

D = pd.DataFrame(D)

In [36]:

# so now we calculate how many banks are there 5km around the company
fivekmcount = (D <= 5).sum(axis=1)
fivekmcount

Out[36]:

0       1458
1       1818
2       1376
3       1327
4       1348
        ... 
4898     936
4899     523
4900    2218
4901    1025
4902    2122
Length: 4903, dtype: int64

In [37]:

type(fivekmcount)

Out[37]:

pandas.core.series.Series

In [38]:

# append this 
info['fivekmcount'] = fivekmcount

In [39]:

tenkmcount = (D <= 10).sum(axis=1)
info['tenkmcount'] = tenkmcount
fifteenkmcount = (D <= 15).sum(axis=1)
info['fifteenkmcount'] = fifteenkmcount

info.to_excel("bankcounts.xlsx")

In [ ]:

posted @ 2023-01-16 08:20 热爱工作的宁致桑阅读(134) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Eva's Notes

根据经纬度计算距离半径内网点数量