最近在自己做车险的项目,做到数仓建模搞得一头雾水,理论知识为零=_=决定在这里好好做记录,顺便也可以做以后的参考。
1 Operation Data Store 原始数据层ODS
存放最原始的数据,结构与数据源结构一致,属于准备区
- 基于从业务系统同步过来的数据结构
- 可以保存所有历史数据,应选择高压缩格式,如gzip
- 命名规则应为ods_tableName_inc/full
这里创建的都是业务表,简单来说就是把mysql里的表全部同步到数仓里并作细节处更改,表名以inc/full来区分全量/增量同步,分区就是更新日期
全量表
- ods_administrative_region_full
|
|
|
|
|
|
|
id |
create_time |
update_time |
level |
name |
superior_region |
zip_code |
|
|
|
|
|
|
|
- ods_product_full
|
|
|
|
|
|
|
|
id |
create_time |
update_time |
base_premium |
coverage_amount |
premium_rate |
premium_type |
type |
|
|
|
|
|
|
|
|
- ods_insurance_agent_full
|
|
|
|
|
|
|
id |
create_time |
update_time |
birthday |
gender |
name |
telephone |
|
|
|
|
|
|
|
增量表
- ods_policyholder_inc
|
|
|
|
|
|
|
|
|
|
id |
create_time |
update_time |
birthday |
gender |
identification_number |
identification_type |
name |
telephone |
old:map<string, string> |
|
|
|
|
|
|
|
|
|
|
- ods_vehicle_inc
|
|
|
|
type |
ts |
data: struct<...> |
old: map<string, string> |
|
|
|
|
'data' STRUCT<
id :STRING,
create_time :STRING,
update_time :STRING,
brand :STRING,
purchase_date :STRING,
purchase_price :STRING,
real_value :STRING,
vehicle_model :STRING,
vehicle_number :STRING
>
- ods_insurance_order_inc
|
|
|
|
type |
ts |
data: struct<...> |
old: map<string, string> |
|
|
|
|
`data` STRUCT<
id :STRING,
create_time :STRING,
update_time :STRING,
total_amount :DECIMAL(16, 2),
administrative_region_id :STRING,
insurance_agent_id :STRING,
policyholder_id :STRING,
renew_from_id :STRING,
vehicle_id :STRING>
- ods_insurance_policy_inc
|
|
|
|
type |
ts |
data: struct<...> |
old: map<string, string> |
|
|
|
|
`data` STRUCT<
id :STRING,
create_time :STRING,
update_time :STRING,
amount :DECIMAL(16, 2),
coverage_amount :DECIMAL(16, 2),
expiry_date :STRING,
start_date :STRING,
insurance_order_id :STRING,
policyholder_id :STRING,
product_id :STRING,
vehicle_id :STRING>
- ods_installment_inc
|
|
|
|
type |
ts |
data: struct<...> |
old: map<string, string> |
|
|
|
|
`data` STRUCT<
id :STRING,
create_time :STRING,
update_time :STRING,
amount :DECIMAL(16, 2),
due_time :STRING,
payment_time :STRING,
insurance_order_id :STRING>
- ods_payment_inc
|
|
|
|
type |
ts |
data: struct<...> |
old: map<string, string> |
|
|
|
|
`data` STRUCT<
id :STRING,
create_time :STRING,
update_time :STRING,
amount :DECIMAL(16, 2),
payment_status :STRING,
installment_id :STRING,
policyholder_id :STRING>
- ods_insurance_claim_reporting_inc
|
|
|
|
type |
ts |
data: struct<...> |
old: map<string, string> |
|
|
|
|
`data` STRUCT<
id :STRING,
create_time :STRING,
update_time :STRING,
address :STRING,
reporting_telephone :STRING,
status :STRING,
administrative_region_id :STRING,
vehicle_id :STRING>
- ods_insurance_claim_inc
|
|
|
|
type |
ts |
data: struct<...> |
old: map<string, string> |
|
|
|
|
`data` STRUCT<
id :STRING,
create_time :STRING,
update_time :STRING,
amount :DECIMAL(16,2),
insurance_agent_id :STRING,
insurance_claim_reporting_id :STRING,
insurance_policy_id :STRING,
vehicle_id :STRING>
2a Data Warehouse Detail 明细数据层DWD
基于维度建模理论构建,存放维度模型中的事实表,保存业务过程中最小粒度的操作记录
- 存储维度模型的事实表
- orc列式存储 + snappy压缩
- 命名为dwd_field_tableName_inc/full/acc
- inc: 事务性事实表
- full:周期快照事实表
- acc:累积快照事实表
- 交易域投保事务表 dwd_trade_order_inc: 14个属性
|
|
|
|
|
|
|
|
|
|
|
|
|
|
id |
order_time |
policy_amount |
coverage_amount |
expiry_date |
start_date |
insurance_order_id |
policyholder_id |
product_id |
vehicle_id |
administrative_region_id |
insurance_agent_id |
renew_from_id |
installments:array<struct<>> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
`installments` ARRAY<
STRUCT<
id: STRING,
install_amount: DECIMAL(16, 2),
due_time: STRING>
>
- 交易域缴费成功事务表 dwd_trade_pay_suc_inc
|
|
|
|
|
id |
pay_suc_time |
payment_amount |
installment_id |
policyholder_id |
|
|
|
|
|
- 理赔域报案事务表 dwd_claim_report_inc
|
|
|
|
|
id |
report_time |
report_telephone |
administrative_region_id |
vehicle_id |
|
|
|
|
|
- 理赔域报案理赔流程累积快照事实表 dwd_claim_flow_inc
|
|
|
|
|
|
|
|
|
|
|
id |
report_time |
report_telephone |
administrative_region_id |
vehicle_id |
claim_id |
claim_time |
claim_amount |
insurance_agent_id |
insurance_policy_id |
closed_time |
|
|
|
|
|
|
|
|
|
|
|
2b Dimension 公共维度层DIM
基于维度建模理论构建,存放维度模型中的维度表,保存一致性维度信息
有五张维度表
- dim_administrative_region_full
|
|
|
|
|
id |
level |
name |
superior_region |
zip_code |
|
|
|
|
|
- dim_product_full
|
|
|
|
|
|
|
id |
base_premium |
coverage_amount |
name |
premium_rate |
premium_type |
type |
|
|
|
|
|
|
|
- dim_insurance_agent_full
|
|
|
|
|
id |
birthday |
gender |
name |
telephone |
|
|
|
|
|
- dim_policyholder_zip
|
|
|
|
|
|
|
id |
birthday |
gender |
name |
telephone |
start_date |
end_date |
|
|
|
|
|
|
|
- dim_vehicle_zip
|
|
|
|
|
|
|
|
|
id |
brand |
purchase_date |
purchase_price |
real_value |
vehicle_model |
vehicle_number |
start_date |
end_date |
|
|
|
|
|
|
|
|
|
2c Data Warehouse Summary 汇总数据层DWS
基于指标需求,以分析的主题为对象作为建模驱动,构建公共统计力度的汇总表(?
example
需求1:统计各省份退货率 = 退货次数 / 下单次数
需求2:统计各省份支付率 = 支付次数 / 下单次数
两个需求中都涉及对一张表(下单表)的利用,可以将该表处理好的数据结果放入汇总层,这样后续的其他数据处理操作中直接引用汇总好的结果即可
3 Application Data Service 数据应用层ADS
存放各项统计指标结果