医疗数据的可视化分析
一、选题的背景
医疗保险在保护居民医药水平,改善民众生活、促进经济发展等方面具有重要作用。建国以来,我国的医疗保险制度的完善程度和普及程度明显提高,但与此同时,随着医疗保险的受益面逐渐扩大的过程中,不可避免地存在着医保费用的控制和分配的问题。因此,本课题选择对医保数据集进行可视化分析。
二、大数据分析设计方案
1.本数据集的数据内容与数据特征分
2.数据分析的课程设计方案概述
(1)数据集的获取:去CSDN上去下载所需的数据集。
(2)对获取到的数据进行预处理:数据预处理的主要操作有删除缺失值、去除重复值、对时间数据进行处理、年龄划分以及去除无关数据。如果不进行数据的预处理的话,可能会影响数据分析结果的准确性。
(3)数据分析与可视化:顾名思义,就是将进行分析然后将数据转换成图表的形式,以一种更直观的方式展现和呈现数据。
三、数据分析步骤
1.数据源
本课程设计的数据源来源于CSDN
2.数据清洗
三、数据分析以及可视化步骤
1、男性-女性就诊疾病Top10
2、就诊疾病平均医保统筹费用Top10
3、就诊人员类型组成
4、就诊人员性别与年龄组成情况
5、就诊人员入院/出院/医保结算日期变化趋势
四、附上源码
##导入第三方包
import time
import pandas as pd
import numpy as np
from pyecharts.charts import *
import pyecharts.options as opts
from pyecharts.commons.utils import JsCode
import warnings
warnings.filterwarnings('ignore')
##导入数据
data = pd.read_csv('医疗数据.csv',encoding='gbk')
##删除缺失值
data.dropna(axis=0,inplace=True)
##处理重复值
if data.duplicated().sum() == 0:
print('数据集未存在重复值')
else:
num = data.duplicated().sum()
data.drop_duplicates(inplace=True,keep='first')
print('已处理{}个重复值数据'.format(num))
##时间数据处理
def trans_time(time_stamp):
timeStamp = time_stamp
timeArray = time.localtime(timeStamp)
Time = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)
return Time
data['RYRQ'] = data['RYRQ'].map(lambda x:trans_time(x))
data['CYRQ'] = data['CYRQ'].map(lambda x:trans_time(x))
data['JSRQ'] = data['JSRQ'].map(lambda x:trans_time(x))
##年龄划分
data['NL_CUT'] = pd.cut(data['NL'],bins=[0,6,12,17,45,69,10000],labels=['婴幼儿','少儿','青少年','青年','中年','老年'],right=True)
data['NL_CUT'] = data['NL_CUT'].astype(str)
##删除无关数据(ID列)
data.drop('ID',axis=1,inplace=True)
##数据预览
data.head()
##男性-女性就诊疾病Top10
icon_dict = {
'男':'path://M358.032 624.101a30.801 30.801 0 1 0 63.039 0 30.801 30.801 0 1 0-63.038 0z m262.99 0a30.801 30.801 0 1 0 63.037 0 30.801 30.801 0 1 0-63.037 0z m281.821-218.718s-9.282-79.594-51.964-125.765c0 0 30.007-16.164-2.303-43.842 0 0-28.825-6.924-34.614-4.603 0 0-14.973-50.786-44.979-63.477 0 0-25.401-6.16-31.188 18.465 0 0-67.462-34.322-116.248-42.988 0 0-124.576-23.941-158.028-81.908 0 0-7.225-14.703-30.847-6.064 0 0-22.54 17.877-15.63 89.413 0 0 1.926 14.629-94.235-46.91 0 0-23.83-6.15-30.738 12.298 0 0-12.277 43.84 3.07 76.158 0 0-55.372 9.988-79.258 33.846 0 0-26.88 34.621 18.46 40.003 0 0-67.635 38.459-79.22 146.909 0 0-1.353 76.092 23.042 160.503a96.063 96.063 0 0 0-5.623 44.326c4.764 44.065 38.812 79.014 82.893 85.201 13.907 42.315 37.644 81.083 70.055 113.493 54.438 54.44 126.819 84.42 203.807 84.42s149.37-29.98 203.808-84.42c33.104-33.104 57.143-72.847 70.917-116.22 42.516-5.446 76.437-37.905 83.273-80.361 3.315-20.598-0.159-40.963-9.03-58.588 15.168-44.548 28.314-99.574 24.58-149.89zM249.33 682.28c-34.119-8.148-59.748-36.671-63.563-71.965a83.181 83.181 0 0 1 0.088-18.627c6.146 17.761 13.512 35.604 22.355 52.917 0 0 18.251 18.252 37.067 11.517a287.165 287.165 0 0 0 2.71 13.735l1.343 12.423z m279.966 199.28c-130.877 0-240.674-91.933-268.243-214.609l-2.244-20.76c2.158-2.56 4.264-5.613 6.286-9.26 0 0 2.377-85.405 53.146-136.172 0 0 44.606 12.345 101.533-19.235 0 0-23.867 29.238-45.412 44.62 0 0-12.319 30.005 15.386 33.094 0 0 85.379 16.157 170.757-40.003 0 0-4.605 28.464 36.185 19.984 0 0 33.02-0.757 86.127-41.538 0 0 11.55 19.23 54.978-4.597 0 0-1.124 38.055 33.49 96.9 0 0 15.01 24.478 15.01 67.614 0 0 3.941 3.251 9.851 6.1l-3.555 22.083C758.52 798.913 653.387 881.559 529.296 881.559zM874.16 611.745c-5.58 34.649-32.11 61.567-66.044 68.327a288.21 288.21 0 0 0 2.913-12.22c10.411 0.582 21.853-4.293 29.481-23.616 0 0 15.437-28.599 31.024-70.149a82.776 82.776 0 0 1 2.626 37.658z',
'女':'path://M358.032 624.101a30.801 30.801 0 1 0 63.039 0 30.801 30.801 0 1 0-63.038 0z m262.99 0a30.801 30.801 0 1 0 63.037 0 30.801 30.801 0 1 0-63.037 0z m226.663-95.005c-0.045-4.295-0.15-8.621-0.363-12.993 0.133-2.305 0.268-4.625 0.407-6.967 0.378-6.357-0.679-13.249-2.849-20.492-13.886-107.69-80.21-226.507-234.402-257.258a68.878 68.878 0 0 0 2.94-19.874c0-41.821-37.568-75.724-83.91-75.724s-83.912 33.903-83.912 75.724a68.86 68.86 0 0 0 2.995 20.048c-172.024 35.076-236.31 182.03-237.282 298.37-24 17.796-39.005 46.12-39.005 77.406 0 39.933 24.435 75.057 60.84 89.588 34.625 219.769 49.248 160.392 49.248 134.317 0-11.919-5.146-45.876-6.574-87.158 13.104 24.085 29.74 46.427 49.67 66.358 54.44 54.44 126.82 84.42 203.808 84.42s149.37-29.98 203.808-84.42a291.336 291.336 0 0 0 34.072-40.95c-1.344 36.572-2.364 68.07-2.364 79.698 0 26.203 25.241-6.882 51.847-118.423 2.764-11.59 5.24-22.466 7.48-32.766 37.966-13.806 63.677-49.704 63.677-90.665 0-31.771-15.472-60.5-40.131-78.239z m-662.078 78.24c0-23.657 9.966-45.358 26.389-60.692 0.219 1.86 0.482 3.585 0.799 5.143l-0.006 0.01c6.465 50.727 12.43 93.471 17.918 129.408-27.278-14.076-45.1-42.222-45.1-73.869z m582.582 135.24c-47.387 82.949-136.716 138.982-238.893 138.982-114.103 0-212.184-69.879-253.723-169.083 1.255-58.505 12.952-123.566 55.943-160.625 46.229-39.85 68.721-105.324 79.592-155.71 10.06 44.267 32.988 97.67 85.256 109.898 88.687 20.747 127.991 21.788 168.505 22.453 19.19 0.315 100.625-3.07 108.812 78.843 0.001 0.001-2.985 69.876-5.492 135.24z m59.298-60.401c11.065-53.957 15.005-90.706 17.404-125.484 1.016-2.847 1.764-6.48 2.238-10.963 17.017 15.377 27.385 37.475 27.385 61.608-0.001 32.388-18.664 61.113-47.027 74.839z',
'医院':'path://M597.333333 85.333333H187.733333a102.4 102.4 0 0 0-102.4 102.4v682.666667a68.266667 68.266667 0 0 0 68.266667 68.266667h512a34.133333 34.133333 0 0 0 34.133333-34.133334V187.733333a102.4 102.4 0 0 0-102.4-102.4z m2.56 68.352A34.133333 34.133333 0 0 1 631.466667 187.733333v682.666667H153.6V187.733333a34.133333 34.133333 0 0 1 34.133333-34.133333h409.6l2.56 0.085333z M870.4 426.666667H665.6a34.133333 34.133333 0 0 0-34.133333 34.133333v443.733333a34.133333 34.133333 0 0 0 34.133333 34.133334h204.8a68.266667 68.266667 0 0 0 68.266667-68.266667V494.933333a68.266667 68.266667 0 0 0-68.266667-68.266666zM699.733333 870.4V494.933333h170.666667v375.466667H699.733333z M426.666667 290.133333v136.533334h136.533333v68.266666h-136.533333v136.533334h-68.266667v-136.533334h-136.533333v-68.266666h136.533333v-136.533334h68.266667z'
}
rich = {
"a": {"color": "#fff", "fontSize":18, "lineHeight": 22, "align": "center","fontFamily":"KaiTi"},
"c": {"color": "#fff","fontSize": 12, "align": "center", "fontFamily":"Adobe",},
"d": {"fontSize": 12}
}
tmp = data[data['XB'] == '男']['ZDMC'].value_counts().head(10)
attrs = tmp.index.tolist()
value = tmp.values.tolist()
bar = (Bar()
.add_xaxis(attrs[::-1])
.add_yaxis(' ',value[::-1],
itemstyle_opts={
'barBorderRadius': [10, 0, 0, 10]}
)
.set_series_opts(
label_opts=opts.LabelOpts(
position='insideRight',
formatter="{a|{b}} {c|{c}例} ",
rich=rich
)
)
.set_global_opts(
xaxis_opts=opts.AxisOpts(
is_show=False,
is_inverse=True
),
yaxis_opts=opts.AxisOpts(
is_show=False
),
legend_opts=opts.LegendOpts(
item_width=40,
item_height=40,
legend_icon=icon_dict['男'],
pos_left='8%',
pos_top='2%'
),
)
.reversal_axis()
)
tmp = data[data['XB'] == '女']['ZDMC'].value_counts().head(10)
attrs = tmp.index.tolist()
value = tmp.values.tolist()
bar1 = (Bar()
.add_xaxis(attrs[::-1])
.add_yaxis(' ',value[::-1],
itemstyle_opts={
'barBorderRadius': [0, 10, 10, 0]}
)
.set_series_opts(
label_opts=opts.LabelOpts(
position='insideLeft',
formatter="{a|{b}} {c|{c}例} ",
rich=rich
)
)
.set_global_opts(
xaxis_opts=opts.AxisOpts(
is_show=False,
is_inverse=False
),
yaxis_opts=opts.AxisOpts(
is_show=False
),
legend_opts=opts.LegendOpts(
item_width=40,
item_height=40,
legend_icon=icon_dict['女'],
pos_right='8%',
pos_top='2%'
),
)
.reversal_axis()
)
grid = Grid(init_opts=opts.InitOpts(width='980px',theme='light'))
grid.add(bar,grid_opts = opts.GridOpts(pos_left='0%',pos_right='50.5%'))
grid.add(bar1,grid_opts = opts.GridOpts(pos_left='50.5%',pos_right='0%'))
grid.render_notebook()
##就诊疾病平均医保统筹费用Top10
tmp = data.groupby('ZDMC').agg({'TCFY':'mean'}).reset_index().sort_values('TCFY',ascending=False).head(10)
tmp['TCFY'] = tmp['TCFY'].map(lambda x:round(x,2))
attrs = tmp['ZDMC'].tolist()
value = tmp['TCFY'].tolist()
bar = (Bar(init_opts=opts.InitOpts(width='980px',height='550px'))
.add_xaxis(attrs[::-1])
.add_yaxis('',value[::-1],
itemstyle_opts={
'shadowBlur': 2,
'shadowColor': 'rgba(0, 0, 0, 0.2)',
'shadowOffsetY': 5,
'shadowOffsetX': 5,
'barBorderRadius': [10, 10, 10, 10]},
)
.set_series_opts(
label_opts=opts.LabelOpts(
position='insideLeft',
formatter='{b|{b}} {c|¥{c}}',
rich={
'b':{'color':'#fff','fontSize':20,'fontFamily':'KaiTi','fontStyle':'bolder'},
'c':{'color':'#fff','fontSize':16,'fontFamily':'Adobe','fontStyle':'lighter'}
}
)
)
.set_global_opts(
xaxis_opts=opts.AxisOpts(
position='top',
is_show=True,
axislabel_opts=opts.LabelOpts(
font_size=18,
color='#ed1941',
font_style='italic',
font_weight='bolder'),
splitline_opts=opts.SplitLineOpts(
is_show=True,
linestyle_opts=opts.LineStyleOpts(type_='dashed')),
axisline_opts=opts.AxisLineOpts(
is_show=False,
linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))
),
yaxis_opts=opts.AxisOpts(is_show=False),
visualmap_opts=opts.VisualMapOpts(
is_show=False,
max_=47773,
min_=10000,
dimension=0,
range_color=['#57BF80','#13884E']
)
)
.reversal_axis()
)
bar.render_notebook()
##就诊人员类型组成
tmp = data.groupby(['RYLB','XB']).agg({'NL':'count'}).reset_index()
data1 = []
item = []
for idx, row in tmp.iterrows():
if row['RYLB'] in item:
data1[-1]['children'].append(dict(name=row['XB'], value=row['NL']))
else:
data1.append(dict(name=row['RYLB'], children=[dict(name=row['XB'], value=row['NL'])]))
item.append(row['RYLB'])
rich={
"b": {"color": "white", "fontSize":20, "align": "center", "fontFamily":"KaiTi"},
"c": {"color":"white","fontSize":14,"fontFamily":"Adobe", "fontStyle":"italic"},
}
tree = (TreeMap(init_opts=opts.InitOpts(width='980px',height='600px',theme='light'))
.add(
"人员数量",
data1,
leaf_depth=1,
roam=False,
label_opts=opts.LabelOpts(
position="inside",
formatter='{b|{b}}\n{c|{c}} people(s)',
rich=rich
),
levels=[
opts.TreeMapLevelsOpts(
treemap_itemstyle_opts=opts.TreeMapItemStyleOpts(
border_color="white", border_width=4, gap_width=4
)
),
opts.TreeMapLevelsOpts(
color_saturation=[0.8, 0.5],
treemap_itemstyle_opts=opts.TreeMapItemStyleOpts(
border_color="white",border_color_saturation=0.7, gap_width=4, border_width=8
),
)
],
)
.set_global_opts(
legend_opts=opts.LegendOpts(
is_show=False
)
)
)
tree.render_notebook()
##就诊人员性别与年龄组成情况
attrs = data['XB'].value_counts().index.tolist()
value = data['XB'].value_counts().values.tolist()
pie = (Pie(init_opts=opts.InitOpts(width='980px'))
.add('',[list(z) for z in zip(attrs,value)],radius=['0%','60%'],center=['20%','50%'])
.set_colors(['#428675','#63AA83','#92CD8A','#CCEF8F'])
.set_series_opts(
label_opts=opts.LabelOpts(
position='inside',
formatter="{b|{b}}\n {c|{c} people(s)}\n {d|Percentage {d}%}} ",
rich = {
"b":{"color":'#fff',"fontSize":22,"fontFamily":"KaiTi"},
"c":{"color":"#fff","fontSize":12,"fontFamily":"Adobe","fontWeight":"lighter"},
"d":{"color":"#fff","fontSize":12,"fontFamily":"Adobe","fontWeight":"lighter"}
}
),
itemstyle_opts=opts.ItemStyleOpts(
border_color='#fff',
border_width=4
)
)
.set_global_opts(
legend_opts=opts.LegendOpts(
is_show=False
),
)
)
attrs = data['NL_CUT'].value_counts().index.tolist()
value = data['NL_CUT'].value_counts().values.tolist()
data_pair = []
for k, v in zip(attrs,value):
if v == max(value):
chart_item = opts.PieItem(
name = k,
value = v,
label_opts=opts.LabelOpts(
position='outside',
formatter="{b|{b}}\n {c|NUMBER {c}}\n {d|PROPORTION:{d}%}",
rich={
"b": {"color": "red", "fontSize":32, "fontFamily":"KaiTi", "align": "center"},
"c": {"color": "red", "fontSize":16, "align": "center", "fontFamily":"Adobe"},
"d": {"color":"red","fontSize":14,"fontFamily":"Adobe"}},
font_weight='lighter',
font_family='KaiTi',
font_size=18
),
)
else:
chart_item = opts.PieItem(
name = k,
value = v,
label_opts=opts.LabelOpts(
position='outside',
formatter = '{b|{b}}:{c|{c}人}',
rich = {
"b": {"color":"#000","fontSize":14, "fontWeight":"bolder", "fontFamily":"KaiTi", "align": "center"},
"c": {"color":"#000","fontSize":10, "fontWeight":"bolder","align": "center",},
},
font_weight='lighter'
),
)
data_pair.append(chart_item)
pie1 = (Pie()
.add('',data_pair=data_pair,radius=['40%','55%'],center=['60%','50%'])
.set_series_opts(
itemstyle_opts=opts.ItemStyleOpts(
border_color='#000',
border_width=2
)
)
.set_global_opts(
legend_opts=opts.LegendOpts(
is_show=False
),
visualmap_opts=opts.VisualMapOpts(
max_=15,
min_=0,
is_show=False,
range_color=['#5dbe8a','#207f4c']
),
)
)
grid = Grid(init_opts=opts.InitOpts(width='980px'))
grid.add(pie,grid_opts = opts.GridOpts(pos_left='10%'))
grid.add(pie1,grid_opts = opts.GridOpts(pos_left='70%'))
grid.render_notebook()
##就诊人员入院/出院/医保结算日期变化趋势
tmp = data[['RYRQ','CYRQ','JSRQ']]
def pro1(tag):
tmp1 = pd.DataFrame(tmp[tag].value_counts()).reset_index()
return tmp1
tmp1 = pro1('RYRQ').merge(pro1('CYRQ'),how='outer').merge(pro1('JSRQ'),how='outer')
tmp1.fillna(0,inplace=True)
tmp1['index'] = tmp1['index'].map(lambda x:x[:10])
tmp1.sort_values('index',inplace=True)
line = (Line(init_opts=opts.InitOpts(width='1100px',theme='light'))
.add_xaxis(tmp1['index'].tolist())
.add_yaxis('入院',tmp1['RYRQ'].tolist(),is_symbol_show=False)
.add_yaxis('出院',tmp1['CYRQ'].tolist(),is_symbol_show=False)
.add_yaxis('结算',tmp1['JSRQ'].tolist(),is_symbol_show=False)
.set_global_opts(
yaxis_opts=opts.AxisOpts(
axisline_opts=opts.AxisLineOpts(
is_show=False
),
splitline_opts=opts.SplitLineOpts(
is_show=True,
linestyle_opts=opts.LineStyleOpts(
type_='dotted'
)
)
),
xaxis_opts=opts.AxisOpts(
axislabel_opts=opts.LabelOpts(
rotate=90,
font_size=8,
font_family='Adobe',
font_weight='bolder'
)
),
legend_opts=opts.LegendOpts(
pos_top='8%'
)
)
)
line.render_notebook()
五、总结
通过本次对医疗数据进行的一系列的分析,让我掌握了数据分析知识的同时,也了解到近期医疗一些相关数据。例如冠心病、糖尿病、高血压、恶性肿瘤门诊放化疗以及尿毒症患者的透析治疗是就诊人员参保最多的疾病、 就诊人员类型为居民的数量最多等,达到了预期的目标。在对医疗数据进行分析的期间,也遇到了一些问题,但也在百度上搜到了相应的解决方案。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通