使用PyODPS统计ODPS空间内的表数据信息
1 2 3 4 5 6 7 8 9 10 11 12 | CREATETABLE`table_statistics`( `table_name` string COMMENT '表名' , `partition_name` string COMMENT '最新分区' , `chinese_name` string COMMENT '中⽂表名' , `column_count`bigintCOMMENT '字段数量' , `column_comment_null_count`bigintCOMMENT '字段注释缺失数量' , `pt_count`bigintCOMMENT '分区数量' , `data_count`bigintCOMMENT '最新分区数据量' ) COMMENT '数据情况统计' PARTITIONED BY (dt string) LIFECYCLE 180; |
--------------------------------------------------------
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | # -*- coding: utf-8 -*- from odps import ODPS import datetime dt_str = (datetime.datetime.now() + datetime.timedelta(days = - 1 )).strftime( '%Y%m%d' ) rid_list = [] wd = [] for t in o.list_tables(): table_name = t.name # 过滤非业务表 if table_name[: 3 ] not in ( 'stg' , 'dim' ): continue # 过滤利旧数据表 if table_name in rid_list: continue chinese_name = t.comment.encode( 'utf-8' ) cs = [c for c in t.schema.columns] column_count = len (cs) column_comment_null_count = 0 for c in cs: if c.comment = = ' ' or c.comment == ' null': column_comment_null_count + = 1 cnt_sql = '' new_pt = '' if t.schema.partitions: pi = t.iterate_partitions() ps = [p for p in pi] pt_count = len (ps) if len (ps) = = 0 : continue new_pt = str (ps[ - 1 ]) if ',' in new_pt: # 多级分区的情况 new_pt = new_pt.replace( ',' , ' and ' ) cnt_sql = "select count(1) from %s where %s" % (table_name, new_pt) print (cnt_sql) else : pt_count = 1 cnt_sql = "select count(1) from %s " % (table_name) with o.execute_sql(cnt_sql).open_reader() as reader: data_count = reader[ 0 ][ 0 ] wd.append([table_name, str (new_pt), chinese_name, column_count, column_comment_null_count, pt_count, data_count]) sta_table = o.get_table( "table_statistics" ) sta_table.delete_partition( 'dt=%s' % dt_str, if_exists = True ) with sta_table.open_writer(partition = ( 'dt=%s' % dt_str), create_partition = True ) as writer: writer.write(wd) |
作者:苏su
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY