Tick数据的观察和清洗
程序筛选
测试的品种如下
'''
# 以下日盘交易时间一样 9:00-10:15、10:30-11:30、13:30-15:00
郑商所 日夜盘 'FG401' 21:00-23:00
郑商所 无夜盘 'AP310'
大商所 日夜盘 'i2401' 21:00-23:00
大商所 无夜盘 'jd2310'
上期所 日夜盘 'rb2310' 21:00-23:00
上期所 日夜盘 'cu2309' 21:00-凌晨1:00
上期所 日夜盘 'au2310' 21:00-凌晨2:30
上期所 无夜盘 'wr2310'
广期所 无夜盘 'si2310'
能源中心 日夜盘 'sc2310' 21:00-凌晨2:30
能源中心 日夜盘 'bc2310' 21:00--凌晨1:00
能源中心 日夜盘 'lu2311' 21:00-23:00
能源中心 无夜盘 'ec2404'
# 中金所 均无夜盘
‘IF2309’ 9:30-11:30、13:00-15:00
‘TL2312’ 9:15-11:30、13:00-15:15
# 集合竞价时间
日盘品种(指无夜盘的品种)的集合竞价时间是8:55-8:59
夜盘品种的集合竞价时间是20:55-20:59,有夜盘的品种日盘不再进行集合竞价。
# 结算时间:
每个交易日的结算时间为下午16:30-19:30,
'''
subID = ['FG401', 'AP310', 'i2401', 'jd2310', 'rb2310', 'cu2309', 'au2310', 'wr2310', 'si2310', 'sc2310', 'bc2310', 'lu2311', 'ec2404', 'IF2309', 'TL2312']
将之前下载到csv的tick数据用程序提取出来,并根据规则来校验tick数据的有效性
import os, csv
from datetime import datetime
# 从目录下读取所有的测试合约
dpath = './ticktest'
fpaths = [os.path.join(dpath, x) for x in os.listdir(dpath)]
data_list = []
for fpath in fpaths:
with open(fpath, encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
data_list.append(row)
print(f'totals:{len(data_list)}')
# 提取无效数据
invalid_list1 = []
invalid_list2 = []
invalid_list3 = []
for row in data_list:
dt = datetime.strptime(row[8], '%Y-%m-%d %H:%M:%S')
nowdt = datetime.strptime(row[9], '%Y-%m-%d %H:%M:%S')
dtstamp = int(dt.timestamp())
nowdtstamp = int(nowdt.timestamp())
dthm = dt.strftime('%H%M')
nowdthm = nowdt.strftime('%H%M')
if nowdt.day in [2, 3]: # 9月2号和3号是周六周日,测试有没有推送数据
invalid_list1.append(row)
if not ((nowdthm >= '0900' and nowdthm <= '1500') or nowdthm >= '2100' or (nowdthm >= '0000' and nowdthm <= '0230')): # 实际时间不在交易时间段内
invalid_list2.append(row)
if (dthm >= '0900' and dthm <= '1500') or dthm >= '2100' or (dthm >= '0000' and dthm <= '0230'): # tick时间却在交易时间段内
invalid_list3.append(row)
with open('result_ticktest.csv', 'w', newline='') as f:
csv.writer(f).writerows(invalid_list1)
with open('result_ticktest2.csv', 'w', newline='') as f:
csv.writer(f).writerows(invalid_list2)
with open('result_ticktest3.csv', 'w', newline='') as f:
csv.writer(f).writerows(invalid_list3)
数据观察
周六有数据,实际是周五的夜盘,最晚到凌晨2:30结束,周日没有数据,一切正常。
数据结构:交易日、tick更新时间、合约代码、交易所代码、最新价、成交量、持仓量、今收盘、实际日期和tick时间的合并、实际日期和时间。
au2310属于上期所
集合竞价时间 8:59和20:59,文化财经9:00开盘价是461.5,21:00开盘价是463.68(tick数据是463.67999)。所以集合竞价的结束就是开盘价,而不是9:00和21:00的第1笔tick。
文华财经1分钟K线14:59开始的开盘价462.42,如果按tick时间计算应该是462.4,原因在于成交量,成交量变化才说明是新K线生成。
文华财经1分钟K线23:01开始的开盘价462.14,tick时间23:01:00第1笔462.16,直到23:01:03才是462.14,因为到这成交量才出现变化。
文华财经1分钟K线,10:14开始的收盘价464.66,10:30开始的开盘价464.66,说明tick时间10:15那一笔(464.65999),虽然是新的1分钟,成交量也变化了,但依然属于10:14开始的K线。
文华财经1分钟K线,22:59开始的收盘价461.72,23:00开始的开盘价461.74,因为该品种不是23点收盘,所以23点后第一个成交量变化,是新K线的开始。
10:15到10:30没有产生tick,正常
11:30到13:30没有产生tick,正常
15点出现今收盘有效值,当天结束。无效值用 double的上限值表示即1.7976931348623157e+308
15:19推送的tick成交量有变化,持仓量无变化。文化财经里成交量就是15:00的成交量减去上个1分钟K线的成交量,所以这里的成交量没意义,今收盘出现后的数据都是无效数据。
16:01的数据跟15:19完全一样。
夜盘凌晨2:30结束,早上7:30后重新加载了1次夜盘数据,只要系统时间和tick时间差距大的,就是无效数据。
7:30后和19:30后的数据,tick时间和系统时间差距很大,无效数据。
小数位特别长的数据,一般保留2位小数截断即可,可以节省存储空间。个别品种的小数要保留3位,比如国债期货。至于4位的暂时没见过,所以可以统一保留3位小数截断。
double的上限值可以替换成-1表示无效值,以节省存储空间。郑交所无效值是0.0,也可以替换成-1,方便统一处理。
20230901,02:29:57,au2310,SHFE,461.96000000000004,50190,74982.0,1.7976931348623157e+308,2023-09-01 02:29:57,2023-09-01 02:29:58
20230901,02:30:00,au2310,SHFE,461.96000000000004,50190,74982.0,1.7976931348623157e+308,2023-09-01 02:30:00,2023-09-01 02:30:01
20230901,02:30:00,au2310,SHFE,461.96000000000004,50190,74982.0,1.7976931348623157e+308,2023-09-01 02:30:00,2023-09-01 07:44:13
20230901,08:59:00,au2310,SHFE,461.5,50261,74999.0,1.7976931348623157e+308,2023-09-01 08:59:00,2023-09-01 08:59:01
20230901,09:00:00,au2310,SHFE,461.52000000000004,50271,75007.0,1.7976931348623157e+308,2023-09-01 09:00:00,2023-09-01 09:00:01
20230904,10:14:56,au2310,SHFE,464.61999999999995,67518,67332.0,1.7976931348623157e+308,2023-09-04 10:14:56,2023-09-04 10:14:57
20230904,10:15:00,au2310,SHFE,464.65999999999997,67521,67330.0,1.7976931348623157e+308,2023-09-04 10:15:00,2023-09-04 10:15:00
20230904,10:30:00,au2310,SHFE,464.65999999999997,67596,67333.0,1.7976931348623157e+308,2023-09-04 10:30:00,2023-09-04 10:30:01
20230901,14:58:59,au2310,SHFE,462.40000000000003,87970,69206.0,1.7976931348623157e+308,2023-09-01 14:58:59,2023-09-01 14:59:00
20230901,14:59:00,au2310,SHFE,462.40000000000003,87970,69206.0,1.7976931348623157e+308,2023-09-01 14:59:00,2023-09-01 14:59:01
20230901,14:59:00,au2310,SHFE,462.42,87989,69192.0,1.7976931348623157e+308,2023-09-01 14:59:00,2023-09-01 14:59:01
20230901,14:59:01,au2310,SHFE,462.42,87995,69191.0,1.7976931348623157e+308,2023-09-01 14:59:01,2023-09-01 14:59:02
20230901,14:59:01,au2310,SHFE,462.42,87996,69191.0,1.7976931348623157e+308,2023-09-01 14:59:01,2023-09-01 14:59:02
20230901,14:59:02,au2310,SHFE,462.44,88002,69193.0,1.7976931348623157e+308,2023-09-01 14:59:02,2023-09-01 14:59:03
20230901,14:59:59,au2310,SHFE,462.46000000000004,88305,69172.0,1.7976931348623157e+308,2023-09-01 14:59:59,2023-09-01 15:00:00
20230901,15:00:00,au2310,SHFE,462.46000000000004,88305,69172.0,462.46000000000004,2023-09-01 15:00:00,2023-09-01 15:00:01
20230901,15:19:25,au2310,SHFE,462.46000000000004,88338,69182.0,462.46000000000004,2023-09-01 15:19:25,2023-09-01 15:19:26
20230901,16:01:26,au2310,SHFE,462.46000000000004,88338,69182.0,462.46000000000004,2023-09-01 16:01:26,2023-09-01 16:01:27
20230904,19:15:06,au2310,SHFE,462.46,0,69182.0,1.7976931348623157e+308,2023-09-01 19:15:06,2023-09-01 19:33:16
20230904,20:59:00,au2310,SHFE,463.67999999999995,70,69219.0,1.7976931348623157e+308,2023-09-01 20:59:00,2023-09-01 20:59:00
20230904,21:00:00,au2310,SHFE,463.52,80,69225.0,1.7976931348623157e+308,2023-09-01 21:00:00,2023-09-01 21:00:01
20230904,21:00:01,au2310,SHFE,463.47999999999996,137,69221.0,1.7976931348623157e+308,2023-09-01 21:00:01,2023-09-01 21:00:01
20230901,22:59:59,au2310,SHFE,461.72,37315,76011.0,1.7976931348623157e+308,2023-08-31 22:59:59,2023-08-31 23:00:00
20230901,23:00:00,au2310,SHFE,461.72,37315,76011.0,1.7976931348623157e+308,2023-08-31 23:00:00,2023-08-31 23:00:00
20230901,23:00:00,au2310,SHFE,461.74,37319,76013.0,1.7976931348623157e+308,2023-08-31 23:00:00,2023-08-31 23:00:01
20230901,23:00:01,au2310,SHFE,461.74,37321,76013.0,1.7976931348623157e+308,2023-08-31 23:00:01,2023-08-31 23:00:01
20230901,23:00:01,au2310,SHFE,461.74,37321,76013.0,1.7976931348623157e+308,2023-08-31 23:00:01,2023-08-31 23:00:02
20230901,23:00:03,au2310,SHFE,461.74,37321,76013.0,1.7976931348623157e+308,2023-08-31 23:00:03,2023-08-31 23:00:04
20230901,23:00:04,au2310,SHFE,461.74,37322,76013.0,1.7976931348623157e+308,2023-08-31 23:00:04,2023-08-31 23:00:04
20230901,23:00:04,au2310,SHFE,461.74,37323,76013.0,1.7976931348623157e+308,2023-08-31 23:00:04,2023-08-31 23:00:05
20230901,23:00:05,au2310,SHFE,461.74,37323,76013.0,1.7976931348623157e+308,2023-08-31 23:00:05,2023-08-31 23:00:05
20230901,23:00:06,au2310,SHFE,461.72,37326,76011.0,1.7976931348623157e+308,2023-08-31 23:00:06,2023-08-31 23:00:06
20230831,23:00:59,au2310,SHFE,462.16,37919,86228.0,1.7976931348623157e+308,2023-08-30 23:00:59,2023-08-30 23:00:59
20230831,23:01:00,au2310,SHFE,462.16,37919,86228.0,1.7976931348623157e+308,2023-08-30 23:01:00,2023-08-30 23:01:00
20230831,23:01:00,au2310,SHFE,462.16,37919,86228.0,1.7976931348623157e+308,2023-08-30 23:01:00,2023-08-30 23:01:01
20230831,23:01:01,au2310,SHFE,462.16,37919,86228.0,1.7976931348623157e+308,2023-08-30 23:01:01,2023-08-30 23:01:02
20230831,23:01:02,au2310,SHFE,462.16,37919,86228.0,1.7976931348623157e+308,2023-08-30 23:01:02,2023-08-30 23:01:02
20230831,23:01:02,au2310,SHFE,462.16,37919,86228.0,1.7976931348623157e+308,2023-08-30 23:01:02,2023-08-30 23:01:03
20230831,23:01:03,au2310,SHFE,462.14,37920,86228.0,1.7976931348623157e+308,2023-08-30 23:01:03,2023-08-30 23:01:03
20230831,23:01:03,au2310,SHFE,462.16,37921,86229.0,1.7976931348623157e+308,2023-08-30 23:01:03,2023-08-30 23:01:04
总结:
集合竞价结束的价格是第1根K线的开盘价,白天晚上都是如此。
tick时间和系统时间相差3分钟以上,即是无效数据。
新的K线不仅要求是新的1分钟,还要求成交量有变化。如果新的1分钟成交量没变化,还是属于上个K线。
15点推送的数据,是有效数据,价格和成交量都可能变化。即14:59开始的K线,直到出现今收盘有效值才结束。
所有中断的时间,10:15 11:30 15点 那最后1笔都属于上1分钟的K线,比较麻烦的是夜盘收盘时间,有的23点收盘,有的2:30收盘,对于前者23点属于22:59开始的K线,对于后者23点是23:00开始的新K线。要精确知道每个品种的交易时间才能进行划分。
今收盘出现有效值后,K线合成的任务已经完成了,后面都是无效数据。
因为tick时间和本地时间差距大的情况都是在本地时间7:30和19:30后出现,而且当天收盘后的无效数据并没有时间差,所以直接剔除非交易时间最省事。
比如:把时间格式化为 0900 这种格式,交易时间可以这样表示: 大于等于0855 and 小于等于1500 或者 大于等于2055 或者小于等于0230
补充:有的品种23点后也会出现有时间差的无效数据,虽然非该品种的交易时间,但却是很多夜盘品种的交易时间,所以系统时间和tick时间差超过3分钟的一定要去掉。
时间差超过3分钟的肯定是无效主句,没有时间差的数据一般是15点后产生,所以要屏蔽掉15点到16:30这个时段。
rb2310属于上期所
集合竞价时间8:59和20:59
7:42推送夜盘最后1笔数据
7:30后和19:30后推送的tick时间差距大,无效数据。
文华财经1分钟K线,14:59开始的K线,收盘价3713,所以tick时间15:00归类到14:59开始的K线
文华财经1分钟K线,22:59开始的K线,收盘价是3771 说明tick时间23:00归类到22:59开始的K线。如果碰到23点非结束时间的品种,tick的23:00就是新K线的开盘。
夜盘23点收盘,系统时间和tick时间一致,怎么判断该K线已经完成?
20230830,14:59:59,rb2310,SHFE,3714.0,1307489,1039420.0,1.7976931348623157e+308,2023-08-30 14:59:59,2023-08-30 15:00:00
20230830,15:00:00,rb2310,SHFE,3713.0,1307564,1039387.0,1.7976931348623157e+308,2023-08-30 15:00:00,2023-08-30 15:00:01
20230830,15:00:00,rb2310,SHFE,3713.0,1307564,1039387.0,3713.0,2023-08-30 15:00:00,2023-08-30 15:00:02
20230830,15:16:18,rb2310,SHFE,3713.0,1307564,1039387.0,3713.0,2023-08-30 15:16:18,2023-08-30 15:16:19
20230831,18:34:43,rb2310,SHFE,3713.0,0,1039387.0,1.7976931348623157e+308,2023-08-30 18:34:43,2023-08-30 19:30:18
20230831,20:59:00,rb2310,SHFE,3716.0,13904,1036877.0,1.7976931348623157e+308,2023-08-30 20:59:00,2023-08-30 20:59:01
20230831,21:00:00,rb2310,SHFE,3717.0,14719,1037067.0,1.7976931348623157e+308,2023-08-30 21:00:00,2023-08-30 21:00:01
20230905,22:59:59,rb2310,SHFE,3772.0,186587,683807.0,1.7976931348623157e+308,2023-09-04 22:59:59,2023-09-04 23:00:00
20230905,23:00:00,rb2310,SHFE,3771.0,186601,683796.0,1.7976931348623157e+308,2023-09-04 23:00:00,2023-09-04 23:00:00
20230905,23:00:00,rb2310,SHFE,3771.0000000000005,186601,683796.0,1.7976931348623157e+308,2023-09-05 23:00:00,2023-09-05 07:42:16
AP310 属于郑商所
集合竞价时间 8:55,8:59成交量出现变化,说明集合竞价结束。集合竞价开始到结束之间,成交量为0
收盘前的数据,最后1笔的tick时间是14:59:59秒,实际系统时间超过15点。今收盘的无效值是0.0,出现有效值说明当天行情结束。
晚上19:30后和早上7:30后都有无效数据,甚至23点后都有无效数据。注意:某些品种23点还是交易时间,所以不能只根据交易时间剔除,tick时间和系统时间不一样的肯定要剔除。
20230830,14:59:59,AP310,CZCE,8930.0,68772,97754.0,0.0,2023-08-30 14:59:59,2023-08-30 14:59:58
20230830,14:59:59,AP310,CZCE,8928.0,68773,97754.0,0.0,2023-08-30 14:59:59,2023-08-30 14:59:59
20230830,14:59:59,AP310,CZCE,8928.0,68773,97754.0,8928.0,2023-08-30 14:59:59,2023-08-30 15:00:04
20230830,19:08:41,AP310,CZCE,8928.0,0,97754.0,0.0,2023-08-30 19:08:41,2023-08-30 19:30:18
20230830,19:08:41,AP310,CZCE,8928.0,0,97754.0,0.0,2023-08-30 19:08:41,2023-08-30 23:05:16
20230831,19:08:41,AP310,CZCE,8928.0,0,97754.0,0.0,2023-08-31 19:08:41,2023-08-31 07:40:31
20230831,08:55:00,AP310,CZCE,8928.0,0,97754.0,0.0,2023-08-31 08:55:00,2023-08-31 08:55:00
20230831,08:58:59,AP310,CZCE,8928.0,0,97754.0,0.0,2023-08-31 08:58:59,2023-08-31 08:58:59
20230831,08:59:00,AP310,CZCE,8928.0,679,97192.0,0.0,2023-08-31 08:59:00,2023-08-31 08:59:00
20230831,09:00:00,AP310,CZCE,8930.0,729,97194.0,0.0,2023-08-31 09:00:00,2023-08-31 09:00:00
sc2310 属于能源中心
夜盘结束到次日开盘,中间无tick数据
15点后出现无效数据,系统时间19:57出现有时间差的无效数据。所以除了剔除非交易时间的数据,还要剔除tick时间和本地时间相差大的数据。
系统时间15:00未必是当日结束,该品种15:00:01秒才推送最后1笔数据,tick时间是15:00:00
20230831,02:30:00,sc2310,INE,644.5,104033,32560.0,1.7976931348623157e+308,2023-08-31 02:30:00,2023-08-31 07:40:31
20230831,08:59:00,sc2310,INE,644.8,104051,32558.0,1.7976931348623157e+308,2023-08-31 08:59:00,2023-08-31 08:59:01
20230831,09:00:00,sc2310,INE,644.7,104065,32558.0,1.7976931348623157e+308,2023-08-31 09:00:00,2023-08-31 09:00:00
20230831,14:59:59,sc2310,INE,643.5,128109,30722.0,1.7976931348623157e+308,2023-08-31 14:59:59,2023-08-31 15:00:00
20230831,15:00:00,sc2310,INE,643.4,128111,30724.0,1.7976931348623157e+308,2023-08-31 15:00:00,2023-08-31 15:00:01
20230831,15:00:00,sc2310,INE,643.4,128111,30724.0,643.4,2023-08-31 15:00:00,2023-08-31 15:00:01
20230831,15:16:21,sc2310,INE,643.4,128124,30727.0,643.4,2023-08-31 15:16:21,2023-08-31 15:16:22
20230901,19:01:50,sc2310,INE,643.4,0,30727.0,1.7976931348623157e+308,2023-08-31 19:01:50,2023-08-31 19:57:11
20230901,20:59:00,sc2310,INE,649.6,82,30757.0,1.7976931348623157e+308,2023-08-31 20:59:00,2023-08-31 20:59:00
20230901,21:00:00,sc2310,INE,649.3000000000001,125,30773.0,1.7976931348623157e+308,2023-08-31 21:00:00,2023-08-31 21:00:00
FG401属于郑商所
今收盘没有显示无效数据,而是夜盘结束的收盘价。所以用今收盘判断当日结束还是不行。虽然这里今收盘和上一行数据不同,但不保证数据相同的情况。
tick的最后1笔,是14:59:59 但系统时间是15:00:04 所以还是要用系统时间来判断当日的结束。
20230830,14:59:58,FG401,CZCE,1716.0,2412736,951372.0,1661.0,2023-08-30 14:59:58,2023-08-30 14:59:58
20230830,14:59:59,FG401,CZCE,1716.0,2412761,951353.0,1661.0,2023-08-30 14:59:59,2023-08-30 14:59:58
20230830,14:59:59,FG401,CZCE,1716.0,2412764,951353.0,1661.0,2023-08-30 14:59:59,2023-08-30 14:59:59
20230830,14:59:59,FG401,CZCE,1716.0,2412774,951352.0,1661.0,2023-08-30 14:59:59,2023-08-30 14:59:59
20230830,14:59:59,FG401,CZCE,1716.0,2412774,951352.0,1716.0,2023-08-30 14:59:59,2023-08-30 15:00:04
20230830,19:08:41,FG401,CZCE,1716.0,0,951352.0,0.0,2023-08-30 19:08:41,2023-08-30 19:30:18
K线生成问题
14:59:59秒后不再推送数据,怎么确定这个K线完成了?
同样的,夜盘23点收盘的品种,难道只能次日开盘才能确定上个K线完成?
集合竞价
郑商所 AP310:无夜盘品种,集合竞价时间 8:55,成交量初始是0,8:59成交量出现变化,说明集合竞价结束。
郑商所 FG401:有夜盘品种,集合竞价时间 8:55和20:55,夜盘volume初始是0,日盘volume初始是夜盘收盘时的成交量
上期所 rb2310:有夜盘品种,集合竞价的tick 8:59和20:59,volume初始是上次收盘时的成交量。
注意:当日开盘价不是集合竞价开始的第一笔tick,也不是9:00的第1笔tick,而是集合竞价结束时的tick。8:59和9:00合并为1个K线。
1分钟内没有成交
文化财经的1分钟K线, 对于1分钟内没有成交的情况,直接合并成1个K线。01:29后之直接是01:31
2023-09-06 01:29:00,au2310,463.34,463.34,463.34,463.34,4,55811.0
2023-09-06 01:31:00,au2310,463.36,463.36,463.36,463.36,1,55812.0
2023-09-06 01:32:00,au2310,463.34,463.34,463.3,463.3,6,55807.0
但在5分钟里,依然是01:25~01:30、01:30~0135,所以5分钟的时间并不是按照1分钟K线时间计算,否则5分钟的开始K线应该是01:31
还会引发1分钟合成5分钟的问题,如果1分钟是连续的,那么只要1分钟的分钟+1的和,能被5整除,说明5分钟走完了。如果1分钟03分后直接06分呢?
· 全网最简单!3分钟用满血DeepSeek R1开发一款AI智能客服,零代码轻松接入微信、公众号、小程
· .NET 10 首个预览版发布,跨平台开发与性能全面提升
· 《HelloGitHub》第 107 期
· 从文本到图像:SSE 如何助力 AI 内容实时呈现?(Typescript篇)
· 全程使用 AI 从 0 到 1 写了个小工具