Tick数据的观察和清洗

程序筛选

测试的品种如下

'''
# 以下日盘交易时间一样 9:00-10:15、10:30-11:30、13:30-15:00
郑商所 日夜盘 'FG401' 21:00-23:00
郑商所 无夜盘 'AP310'
大商所 日夜盘 'i2401' 21:00-23:00
大商所 无夜盘 'jd2310'
上期所 日夜盘 'rb2310' 21:00-23:00
上期所 日夜盘 'cu2309' 21:00-凌晨1:00
上期所 日夜盘 'au2310' 21:00-凌晨2:30
上期所 无夜盘 'wr2310'
广期所 无夜盘 'si2310'
能源中心 日夜盘 'sc2310' 21:00-凌晨2:30
能源中心 日夜盘 'bc2310' 21:00--凌晨1:00
能源中心 日夜盘 'lu2311' 21:00-23:00
能源中心 无夜盘 'ec2404'
# 中金所 均无夜盘
‘IF2309’ 9:30-11:30、13:00-15:00
‘TL2312’ 9:15-11:30、13:00-15:15

# 集合竞价时间
日盘品种(指无夜盘的品种)的集合竞价时间是8:55-8:59
夜盘品种的集合竞价时间是20:55-20:59,有夜盘的品种日盘不再进行集合竞价。

# 结算时间:
每个交易日的结算时间为下午16:30-19:30,
'''
subID = ['FG401', 'AP310', 'i2401', 'jd2310', 'rb2310', 'cu2309', 'au2310', 'wr2310', 'si2310', 'sc2310', 'bc2310', 'lu2311', 'ec2404', 'IF2309', 'TL2312']

将之前下载到csv的tick数据用程序提取出来,并根据规则来校验tick数据的有效性

import os, csv
from datetime import datetime

# 从目录下读取所有的测试合约
dpath = './ticktest'
fpaths = [os.path.join(dpath, x) for x in os.listdir(dpath)]
data_list = []
for fpath in fpaths:
    with open(fpath, encoding='utf-8') as f:
        reader = csv.reader(f)
        for row in reader:
            data_list.append(row)
print(f'totals:{len(data_list)}')

# 提取无效数据
invalid_list1 = []
invalid_list2 = []
invalid_list3 = []
for row in data_list:
    dt = datetime.strptime(row[8], '%Y-%m-%d %H:%M:%S')
    nowdt = datetime.strptime(row[9], '%Y-%m-%d %H:%M:%S')
    dtstamp = int(dt.timestamp())
    nowdtstamp = int(nowdt.timestamp())
    dthm = dt.strftime('%H%M')
    nowdthm = nowdt.strftime('%H%M')

    if nowdt.day in [2, 3]:  # 9月2号和3号是周六周日,测试有没有推送数据
        invalid_list1.append(row)
    if not ((nowdthm >= '0900' and nowdthm <= '1500') or nowdthm >= '2100' or (nowdthm >= '0000' and nowdthm <= '0230')):  # 实际时间不在交易时间段内
        invalid_list2.append(row)
        if (dthm >= '0900' and dthm <= '1500') or dthm >= '2100' or (dthm >= '0000' and dthm <= '0230'):  # tick时间却在交易时间段内
            invalid_list3.append(row)

with open('result_ticktest.csv', 'w', newline='') as f:
    csv.writer(f).writerows(invalid_list1)
with open('result_ticktest2.csv', 'w', newline='') as f:
    csv.writer(f).writerows(invalid_list2)
with open('result_ticktest3.csv', 'w', newline='') as f:
    csv.writer(f).writerows(invalid_list3)

数据观察

周六有数据,实际是周五的夜盘,最晚到凌晨2:30结束,周日没有数据,一切正常。

数据结构:交易日、tick更新时间、合约代码、交易所代码、最新价、成交量、持仓量、今收盘、实际日期和tick时间的合并、实际日期和时间。

au2310属于上期所

集合竞价时间 8:59和20:59,文化财经9:00开盘价是461.5,21:00开盘价是463.68(tick数据是463.67999)。所以集合竞价的结束就是开盘价,而不是9:00和21:00的第1笔tick。

文华财经1分钟K线14:59开始的开盘价462.42,如果按tick时间计算应该是462.4,原因在于成交量,成交量变化才说明是新K线生成。

文华财经1分钟K线23:01开始的开盘价462.14,tick时间23:01:00第1笔462.16,直到23:01:03才是462.14,因为到这成交量才出现变化。

文华财经1分钟K线,10:14开始的收盘价464.66,10:30开始的开盘价464.66,说明tick时间10:15那一笔(464.65999),虽然是新的1分钟,成交量也变化了,但依然属于10:14开始的K线。

文华财经1分钟K线,22:59开始的收盘价461.72,23:00开始的开盘价461.74,因为该品种不是23点收盘,所以23点后第一个成交量变化,是新K线的开始。

10:15到10:30没有产生tick,正常

11:30到13:30没有产生tick,正常

15点出现今收盘有效值,当天结束。无效值用 double的上限值表示即1.7976931348623157e+308

15:19推送的tick成交量有变化,持仓量无变化。文化财经里成交量就是15:00的成交量减去上个1分钟K线的成交量,所以这里的成交量没意义,今收盘出现后的数据都是无效数据。

16:01的数据跟15:19完全一样。

夜盘凌晨2:30结束,早上7:30后重新加载了1次夜盘数据,只要系统时间和tick时间差距大的,就是无效数据。

7:30后和19:30后的数据,tick时间和系统时间差距很大,无效数据。

小数位特别长的数据,一般保留2位小数截断即可,可以节省存储空间。个别品种的小数要保留3位,比如国债期货。至于4位的暂时没见过,所以可以统一保留3位小数截断。

double的上限值可以替换成-1表示无效值,以节省存储空间。郑交所无效值是0.0,也可以替换成-1,方便统一处理。

20230901,02:29:57,au2310,SHFE,461.96000000000004,50190,74982.0,1.7976931348623157e+308,2023-09-01 02:29:57,2023-09-01 02:29:58
20230901,02:30:00,au2310,SHFE,461.96000000000004,50190,74982.0,1.7976931348623157e+308,2023-09-01 02:30:00,2023-09-01 02:30:01
20230901,02:30:00,au2310,SHFE,461.96000000000004,50190,74982.0,1.7976931348623157e+308,2023-09-01 02:30:00,2023-09-01 07:44:13
20230901,08:59:00,au2310,SHFE,461.5,50261,74999.0,1.7976931348623157e+308,2023-09-01 08:59:00,2023-09-01 08:59:01
20230901,09:00:00,au2310,SHFE,461.52000000000004,50271,75007.0,1.7976931348623157e+308,2023-09-01 09:00:00,2023-09-01 09:00:01

20230904,10:14:56,au2310,SHFE,464.61999999999995,67518,67332.0,1.7976931348623157e+308,2023-09-04 10:14:56,2023-09-04 10:14:57
20230904,10:15:00,au2310,SHFE,464.65999999999997,67521,67330.0,1.7976931348623157e+308,2023-09-04 10:15:00,2023-09-04 10:15:00
20230904,10:30:00,au2310,SHFE,464.65999999999997,67596,67333.0,1.7976931348623157e+308,2023-09-04 10:30:00,2023-09-04 10:30:01

20230901,14:58:59,au2310,SHFE,462.40000000000003,87970,69206.0,1.7976931348623157e+308,2023-09-01 14:58:59,2023-09-01 14:59:00
20230901,14:59:00,au2310,SHFE,462.40000000000003,87970,69206.0,1.7976931348623157e+308,2023-09-01 14:59:00,2023-09-01 14:59:01
20230901,14:59:00,au2310,SHFE,462.42,87989,69192.0,1.7976931348623157e+308,2023-09-01 14:59:00,2023-09-01 14:59:01
20230901,14:59:01,au2310,SHFE,462.42,87995,69191.0,1.7976931348623157e+308,2023-09-01 14:59:01,2023-09-01 14:59:02
20230901,14:59:01,au2310,SHFE,462.42,87996,69191.0,1.7976931348623157e+308,2023-09-01 14:59:01,2023-09-01 14:59:02
20230901,14:59:02,au2310,SHFE,462.44,88002,69193.0,1.7976931348623157e+308,2023-09-01 14:59:02,2023-09-01 14:59:03

20230901,14:59:59,au2310,SHFE,462.46000000000004,88305,69172.0,1.7976931348623157e+308,2023-09-01 14:59:59,2023-09-01 15:00:00
20230901,15:00:00,au2310,SHFE,462.46000000000004,88305,69172.0,462.46000000000004,2023-09-01 15:00:00,2023-09-01 15:00:01
20230901,15:19:25,au2310,SHFE,462.46000000000004,88338,69182.0,462.46000000000004,2023-09-01 15:19:25,2023-09-01 15:19:26
20230901,16:01:26,au2310,SHFE,462.46000000000004,88338,69182.0,462.46000000000004,2023-09-01 16:01:26,2023-09-01 16:01:27
20230904,19:15:06,au2310,SHFE,462.46,0,69182.0,1.7976931348623157e+308,2023-09-01 19:15:06,2023-09-01 19:33:16
20230904,20:59:00,au2310,SHFE,463.67999999999995,70,69219.0,1.7976931348623157e+308,2023-09-01 20:59:00,2023-09-01 20:59:00
20230904,21:00:00,au2310,SHFE,463.52,80,69225.0,1.7976931348623157e+308,2023-09-01 21:00:00,2023-09-01 21:00:01
20230904,21:00:01,au2310,SHFE,463.47999999999996,137,69221.0,1.7976931348623157e+308,2023-09-01 21:00:01,2023-09-01 21:00:01

20230901,22:59:59,au2310,SHFE,461.72,37315,76011.0,1.7976931348623157e+308,2023-08-31 22:59:59,2023-08-31 23:00:00
20230901,23:00:00,au2310,SHFE,461.72,37315,76011.0,1.7976931348623157e+308,2023-08-31 23:00:00,2023-08-31 23:00:00
20230901,23:00:00,au2310,SHFE,461.74,37319,76013.0,1.7976931348623157e+308,2023-08-31 23:00:00,2023-08-31 23:00:01
20230901,23:00:01,au2310,SHFE,461.74,37321,76013.0,1.7976931348623157e+308,2023-08-31 23:00:01,2023-08-31 23:00:01
20230901,23:00:01,au2310,SHFE,461.74,37321,76013.0,1.7976931348623157e+308,2023-08-31 23:00:01,2023-08-31 23:00:02
20230901,23:00:03,au2310,SHFE,461.74,37321,76013.0,1.7976931348623157e+308,2023-08-31 23:00:03,2023-08-31 23:00:04
20230901,23:00:04,au2310,SHFE,461.74,37322,76013.0,1.7976931348623157e+308,2023-08-31 23:00:04,2023-08-31 23:00:04
20230901,23:00:04,au2310,SHFE,461.74,37323,76013.0,1.7976931348623157e+308,2023-08-31 23:00:04,2023-08-31 23:00:05
20230901,23:00:05,au2310,SHFE,461.74,37323,76013.0,1.7976931348623157e+308,2023-08-31 23:00:05,2023-08-31 23:00:05
20230901,23:00:06,au2310,SHFE,461.72,37326,76011.0,1.7976931348623157e+308,2023-08-31 23:00:06,2023-08-31 23:00:06

20230831,23:00:59,au2310,SHFE,462.16,37919,86228.0,1.7976931348623157e+308,2023-08-30 23:00:59,2023-08-30 23:00:59
20230831,23:01:00,au2310,SHFE,462.16,37919,86228.0,1.7976931348623157e+308,2023-08-30 23:01:00,2023-08-30 23:01:00
20230831,23:01:00,au2310,SHFE,462.16,37919,86228.0,1.7976931348623157e+308,2023-08-30 23:01:00,2023-08-30 23:01:01
20230831,23:01:01,au2310,SHFE,462.16,37919,86228.0,1.7976931348623157e+308,2023-08-30 23:01:01,2023-08-30 23:01:02
20230831,23:01:02,au2310,SHFE,462.16,37919,86228.0,1.7976931348623157e+308,2023-08-30 23:01:02,2023-08-30 23:01:02
20230831,23:01:02,au2310,SHFE,462.16,37919,86228.0,1.7976931348623157e+308,2023-08-30 23:01:02,2023-08-30 23:01:03
20230831,23:01:03,au2310,SHFE,462.14,37920,86228.0,1.7976931348623157e+308,2023-08-30 23:01:03,2023-08-30 23:01:03
20230831,23:01:03,au2310,SHFE,462.16,37921,86229.0,1.7976931348623157e+308,2023-08-30 23:01:03,2023-08-30 23:01:04

总结:

集合竞价结束的价格是第1根K线的开盘价,白天晚上都是如此。

tick时间和系统时间相差3分钟以上,即是无效数据。

新的K线不仅要求是新的1分钟,还要求成交量有变化。如果新的1分钟成交量没变化,还是属于上个K线。

15点推送的数据,是有效数据,价格和成交量都可能变化。即14:59开始的K线,直到出现今收盘有效值才结束。

所有中断的时间,10:15 11:30 15点 那最后1笔都属于上1分钟的K线,比较麻烦的是夜盘收盘时间,有的23点收盘,有的2:30收盘,对于前者23点属于22:59开始的K线,对于后者23点是23:00开始的新K线。要精确知道每个品种的交易时间才能进行划分。

今收盘出现有效值后,K线合成的任务已经完成了,后面都是无效数据。

 

因为tick时间和本地时间差距大的情况都是在本地时间7:30和19:30后出现,而且当天收盘后的无效数据并没有时间差,所以直接剔除非交易时间最省事。

比如:把时间格式化为 0900 这种格式,交易时间可以这样表示: 大于等于0855 and 小于等于1500 或者 大于等于2055 或者小于等于0230

补充:有的品种23点后也会出现有时间差的无效数据,虽然非该品种的交易时间,但却是很多夜盘品种的交易时间,所以系统时间和tick时间差超过3分钟的一定要去掉。

时间差超过3分钟的肯定是无效主句,没有时间差的数据一般是15点后产生,所以要屏蔽掉15点到16:30这个时段。

 

rb2310属于上期所

集合竞价时间8:59和20:59

7:42推送夜盘最后1笔数据

7:30后和19:30后推送的tick时间差距大,无效数据。

文华财经1分钟K线,14:59开始的K线,收盘价3713,所以tick时间15:00归类到14:59开始的K线

文华财经1分钟K线,22:59开始的K线,收盘价是3771 说明tick时间23:00归类到22:59开始的K线。如果碰到23点非结束时间的品种,tick的23:00就是新K线的开盘。

夜盘23点收盘,系统时间和tick时间一致,怎么判断该K线已经完成?

20230830,14:59:59,rb2310,SHFE,3714.0,1307489,1039420.0,1.7976931348623157e+308,2023-08-30 14:59:59,2023-08-30 15:00:00
20230830,15:00:00,rb2310,SHFE,3713.0,1307564,1039387.0,1.7976931348623157e+308,2023-08-30 15:00:00,2023-08-30 15:00:01
20230830,15:00:00,rb2310,SHFE,3713.0,1307564,1039387.0,3713.0,2023-08-30 15:00:00,2023-08-30 15:00:02
20230830,15:16:18,rb2310,SHFE,3713.0,1307564,1039387.0,3713.0,2023-08-30 15:16:18,2023-08-30 15:16:19
20230831,18:34:43,rb2310,SHFE,3713.0,0,1039387.0,1.7976931348623157e+308,2023-08-30 18:34:43,2023-08-30 19:30:18
20230831,20:59:00,rb2310,SHFE,3716.0,13904,1036877.0,1.7976931348623157e+308,2023-08-30 20:59:00,2023-08-30 20:59:01
20230831,21:00:00,rb2310,SHFE,3717.0,14719,1037067.0,1.7976931348623157e+308,2023-08-30 21:00:00,2023-08-30 21:00:01

20230905,22:59:59,rb2310,SHFE,3772.0,186587,683807.0,1.7976931348623157e+308,2023-09-04 22:59:59,2023-09-04 23:00:00
20230905,23:00:00,rb2310,SHFE,3771.0,186601,683796.0,1.7976931348623157e+308,2023-09-04 23:00:00,2023-09-04 23:00:00
20230905,23:00:00,rb2310,SHFE,3771.0000000000005,186601,683796.0,1.7976931348623157e+308,2023-09-05 23:00:00,2023-09-05 07:42:16

 

AP310 属于郑商所

集合竞价时间 8:55,8:59成交量出现变化,说明集合竞价结束。集合竞价开始到结束之间,成交量为0

收盘前的数据,最后1笔的tick时间是14:59:59秒,实际系统时间超过15点。今收盘的无效值是0.0,出现有效值说明当天行情结束。

晚上19:30后和早上7:30后都有无效数据,甚至23点后都有无效数据。注意:某些品种23点还是交易时间,所以不能只根据交易时间剔除,tick时间和系统时间不一样的肯定要剔除。

20230830,14:59:59,AP310,CZCE,8930.0,68772,97754.0,0.0,2023-08-30 14:59:59,2023-08-30 14:59:58
20230830,14:59:59,AP310,CZCE,8928.0,68773,97754.0,0.0,2023-08-30 14:59:59,2023-08-30 14:59:59
20230830,14:59:59,AP310,CZCE,8928.0,68773,97754.0,8928.0,2023-08-30 14:59:59,2023-08-30 15:00:04
20230830,19:08:41,AP310,CZCE,8928.0,0,97754.0,0.0,2023-08-30 19:08:41,2023-08-30 19:30:18
20230830,19:08:41,AP310,CZCE,8928.0,0,97754.0,0.0,2023-08-30 19:08:41,2023-08-30 23:05:16
20230831,19:08:41,AP310,CZCE,8928.0,0,97754.0,0.0,2023-08-31 19:08:41,2023-08-31 07:40:31
20230831,08:55:00,AP310,CZCE,8928.0,0,97754.0,0.0,2023-08-31 08:55:00,2023-08-31 08:55:00

20230831,08:58:59,AP310,CZCE,8928.0,0,97754.0,0.0,2023-08-31 08:58:59,2023-08-31 08:58:59
20230831,08:59:00,AP310,CZCE,8928.0,679,97192.0,0.0,2023-08-31 08:59:00,2023-08-31 08:59:00
20230831,09:00:00,AP310,CZCE,8930.0,729,97194.0,0.0,2023-08-31 09:00:00,2023-08-31 09:00:00

sc2310 属于能源中心

夜盘结束到次日开盘,中间无tick数据

15点后出现无效数据,系统时间19:57出现有时间差的无效数据。所以除了剔除非交易时间的数据,还要剔除tick时间和本地时间相差大的数据。

系统时间15:00未必是当日结束,该品种15:00:01秒才推送最后1笔数据,tick时间是15:00:00

20230831,02:30:00,sc2310,INE,644.5,104033,32560.0,1.7976931348623157e+308,2023-08-31 02:30:00,2023-08-31 07:40:31
20230831,08:59:00,sc2310,INE,644.8,104051,32558.0,1.7976931348623157e+308,2023-08-31 08:59:00,2023-08-31 08:59:01
20230831,09:00:00,sc2310,INE,644.7,104065,32558.0,1.7976931348623157e+308,2023-08-31 09:00:00,2023-08-31 09:00:00

20230831,14:59:59,sc2310,INE,643.5,128109,30722.0,1.7976931348623157e+308,2023-08-31 14:59:59,2023-08-31 15:00:00
20230831,15:00:00,sc2310,INE,643.4,128111,30724.0,1.7976931348623157e+308,2023-08-31 15:00:00,2023-08-31 15:00:01
20230831,15:00:00,sc2310,INE,643.4,128111,30724.0,643.4,2023-08-31 15:00:00,2023-08-31 15:00:01
20230831,15:16:21,sc2310,INE,643.4,128124,30727.0,643.4,2023-08-31 15:16:21,2023-08-31 15:16:22
20230901,19:01:50,sc2310,INE,643.4,0,30727.0,1.7976931348623157e+308,2023-08-31 19:01:50,2023-08-31 19:57:11
20230901,20:59:00,sc2310,INE,649.6,82,30757.0,1.7976931348623157e+308,2023-08-31 20:59:00,2023-08-31 20:59:00
20230901,21:00:00,sc2310,INE,649.3000000000001,125,30773.0,1.7976931348623157e+308,2023-08-31 21:00:00,2023-08-31 21:00:00

FG401属于郑商所 

今收盘没有显示无效数据,而是夜盘结束的收盘价。所以用今收盘判断当日结束还是不行。虽然这里今收盘和上一行数据不同,但不保证数据相同的情况。

tick的最后1笔,是14:59:59 但系统时间是15:00:04 所以还是要用系统时间来判断当日的结束。

20230830,14:59:58,FG401,CZCE,1716.0,2412736,951372.0,1661.0,2023-08-30 14:59:58,2023-08-30 14:59:58
20230830,14:59:59,FG401,CZCE,1716.0,2412761,951353.0,1661.0,2023-08-30 14:59:59,2023-08-30 14:59:58
20230830,14:59:59,FG401,CZCE,1716.0,2412764,951353.0,1661.0,2023-08-30 14:59:59,2023-08-30 14:59:59
20230830,14:59:59,FG401,CZCE,1716.0,2412774,951352.0,1661.0,2023-08-30 14:59:59,2023-08-30 14:59:59
20230830,14:59:59,FG401,CZCE,1716.0,2412774,951352.0,1716.0,2023-08-30 14:59:59,2023-08-30 15:00:04
20230830,19:08:41,FG401,CZCE,1716.0,0,951352.0,0.0,2023-08-30 19:08:41,2023-08-30 19:30:18

K线生成问题

14:59:59秒后不再推送数据,怎么确定这个K线完成了?

同样的,夜盘23点收盘的品种,难道只能次日开盘才能确定上个K线完成?

集合竞价

郑商所 AP310:无夜盘品种,集合竞价时间 8:55,成交量初始是0,8:59成交量出现变化,说明集合竞价结束。

郑商所 FG401:有夜盘品种,集合竞价时间 8:55和20:55,夜盘volume初始是0,日盘volume初始是夜盘收盘时的成交量

上期所 rb2310:有夜盘品种,集合竞价的tick 8:59和20:59,volume初始是上次收盘时的成交量。

注意:当日开盘价不是集合竞价开始的第一笔tick,也不是9:00的第1笔tick,而是集合竞价结束时的tick。8:59和9:00合并为1个K线。

1分钟内没有成交

文化财经的1分钟K线, 对于1分钟内没有成交的情况,直接合并成1个K线。01:29后之直接是01:31

2023-09-06 01:29:00,au2310,463.34,463.34,463.34,463.34,4,55811.0
2023-09-06 01:31:00,au2310,463.36,463.36,463.36,463.36,1,55812.0
2023-09-06 01:32:00,au2310,463.34,463.34,463.3,463.3,6,55807.0

但在5分钟里,依然是01:25~01:30、01:30~0135,所以5分钟的时间并不是按照1分钟K线时间计算,否则5分钟的开始K线应该是01:31

还会引发1分钟合成5分钟的问题,如果1分钟是连续的,那么只要1分钟的分钟+1的和,能被5整除,说明5分钟走完了。如果1分钟03分后直接06分呢?

posted @   C羽言  阅读(308)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 全网最简单!3分钟用满血DeepSeek R1开发一款AI智能客服,零代码轻松接入微信、公众号、小程
· .NET 10 首个预览版发布,跨平台开发与性能全面提升
· 《HelloGitHub》第 107 期
· 从文本到图像:SSE 如何助力 AI 内容实时呈现?(Typescript篇)
· 全程使用 AI 从 0 到 1 写了个小工具
点击右上角即可分享
微信分享提示