Tushare 的get_k_data

近日,挖地兔更新了tushare版本。主要是推出了新的函数get_k_data函数。来对此函数做一些分析。

函数头部分:

def get_k_data(code=None, start='', end='',
                  ktype='D', autype='qfq', 
                  index=False,
                  retry_count=3,
                  pause=0.001):
    """
    获取k线数据
    ---------
    Parameters:
      code:string
                  股票代码 e.g. 600848
      start:string
                  开始日期 format:YYYY-MM-DD 为空时取当前日期
      end:string
                  结束日期 format:YYYY-MM-DD 为空时取去年今日
      autype:string
                  复权类型,qfq-前复权 hfq-后复权 None-不复权,默认为qfq
      ktype:string
                  数据类型,D=日k线 W=周 M=月 5=5分钟 15=15分钟 30=30分钟 60=60分钟,默认为D
      retry_count : int, 默认 3
                 如遇网络等问题重复执行的次数 
      pause : int, 默认 0
                重复请求数据过程中暂停的秒数,防止请求间隔时间太短出现的问题
      drop_factor : bool, 默认 True
                是否移除复权因子,在分析过程中可能复权因子意义不大,但是如需要先储存到数据库之后再分析的话,有该项目会更加灵活

 

接下来一行行分析(用红色表示get_k_data函数的代码):

symbol = ct.INDEX_SYMBOL[code] if index else _code_to_symbol(code)
url = ''
dataflag = ''

index若为True直接去预先定义好的字典中找对应的symb,如果index是False,则调用函数_code_to_symbol:    

def _code_to_symbol(code):
    """
        生成symbol代码标志
    """
    if code in ct.INDEX_LABELS:
        return ct.INDEX_LIST[code]
    else:
        if len(code) != 6 :
            return ''
        else:
            return 'sh%s'%code if code[:1] in ['5', '6', '9'] else 'sz%s'%code

  找到INDEX_LABELS和INDEX_LIST的定义:

INDEX_LABELS = ['sh', 'sz', 'hs300', 'sz50', 'cyb', 'zxb', 'zx300', 'zh500']
INDEX_LIST = {'sh': 'sh000001', 'sz': 'sz399001', 'hs300': 'sz399300',
              'sz50': 'sh000016', 'zxb': 'sz399005', 'cyb': 'sz399006', 'zx300': 'sz399008', 'zh500':'sh000905'}

如果code是以'5','6','9'开头,则在code前加上sh,否则在code前加上sz。

可见这个symbol的主要作用是根据code在前面加上了sh或sz。

    if ktype.upper() in ct.K_LABELS:                                       %K_LABELS = ['D', 'W', 'M']     
        fq = autype if autype is not None else ''                          %是否复权以及复权类型
        if code[:1] in ('1', '5') or index:                                %如果code是'1','5'开头或者index(是指数)为真
            fq = ''
        kline = '' if autype is None else 'fq'                             %只有填None才是不复权
        url = ct.KLINE_TT_URL%(ct.P_TYPE['http'], ct.DOMAINS['tt'],        %P_TYPE = {'http': 'http://', 'ftp': 'ftp://'},DOMAINS定义见下方
                                kline, fq, symbol,                         %''或者'fq',具体复权类型或者'',加了sh或sz的code
                                ct.TT_K_TYPE[ktype.upper()], start, end,   %TT_K_TYPE = {'D': 'day', 'W': 'week', 'M': 'month'}
                                fq, _random(17))                           %具体复权类型或者'',生成一个10**16到10**17-1之间的随机数
        dataflag = '%s%s'%(fq, ct.TT_K_TYPE[ktype.upper()])                %复权类型或''并上'day'或'week'或'month'
    elif ktype in ct.K_MIN_LABELS:                                         %K_MIN_LABELS = ['5', '15', '30', '60']
        url = ct.KLINE_TT_MIN_URL%(ct.P_TYPE['http'], ct.DOMAINS['tt'],    %基本同上
                                    symbol, ktype, ktype,
                                    _random(16))
        dataflag = 'm%s'%ktype                                             %m'5'或'15'或'30'或'60'
    else:
        raise TypeError('ktype input error.')                              
DOMAINS定义:
DOMAINS = {'sina': 'sina.com.cn', 'sinahq': 'sinajs.cn',
           'ifeng': 'ifeng.com', 'sf': 'finance.sina.com.cn',
           'vsf': 'vip.stock.finance.sina.com.cn', 
           'idx': 'www.csindex.com.cn', '163': 'money.163.com',
           'em': 'eastmoney.com', 'sseq': 'query.sse.com.cn',
           'sse': 'www.sse.com.cn', 'szse': 'www.szse.cn',
           'oss': '218.244.146.57', 'idxip':'115.29.204.48',
           'shibor': 'www.shibor.org', 'mbox':'www.cbooo.cn',
           'tt': 'gtimg.cn'}

  

上面两个URL的定义

KLINE_TT_URL = '%sweb.ifzq.%s/appstock/app/%skline/get?_var=kline_day%s&param=%s,%s,%s,%s,320,%s&r=0.%s'
KLINE_TT_MIN_URL = '%sifzq.%s/appstock/app/kline/mkline?param=%s,m%s,,320&_var=m%s_today&r=0.%s'

 

    for _ in range(retry_count):                                     %retry_count是重做次数,_只是作为一个变量,就跟变量i一样
        time.sleep(pause)                                            %中间暂停的时间
        try:
            request = Request(url)                                   %使用上面求出的url
            lines = urlopen(request, timeout = 10).read()            %读出数据
            if len(lines) < 100: #no data                            %如果lines太短,表明未读到数据
                return None
        except Exception as e:
            print(e)
        else:
            lines = lines.decode('utf-8') if ct.PY3 else lines      %PY3 = (sys.version_info[0] >= 3)  这个解码出来的lines在下方
            lines = lines.split('=')[1]                             %按'='分隔,取第一个分片。
            reg = re.compile(r',{"nd.*?}') 
            lines = re.subn(reg, '', lines)                         %对lines进行正则表达式替换
            js = json.loads(lines[0])                               %之所以要选lines[0]是因为subn返回的是一个tuple,lines[1]部分是替换次数
            df = pd.DataFrame(js['data'][symbol][dataflag], columns=ct.KLINE_TT_COLS)   %KLINE_TT_COLS就是date,open,close等六列标题
            df['code'] = symbol if index else code                                      %df新加一列code,且设置为指数代码或股票代码
            if ktype in ct.K_MIN_LABELS:                                                %如果是分钟k线数据
                df['date'] = df['date'].map(lambda x: '%s-%s-%s %s:%s'%(x[0:4], x[4:6], 
                                                                        x[6:8], x[8:10], 
                                                                        x[10:12]))      %date部分改成天-时-分-秒的格式
            return df
    raise IOError(ct.NETWORK_URL_ERROR_MSG)

lines:

kline_dayhfq={"code":0,"msg":"","data":{"sz002792":{"hfqday":[["2016-10-26","84.635","82.541","85.268","82.149","27380.000"],
["2016-10-27","82.707","82.556","83.038","80.748","22315.000"],["2016-10-28","82.903","82.571","83.731","78.428","22165.000"],
["2016-10-31","82.541","81.502","82.556","79.995","16437.000"],["2016-11-01","81.517","84.319","85.072","81.517","30741.000"],
["2016-11-02","84.349","82.873","85.268","82.707","30526.000"],["2016-11-03","81.200","81.984","83.611","81.200","24593.000"],
["2016-11-04","81.863","85.720","86.729","81.863","57996.000"],["2016-11-07","85.464","85.991","86.383","84.756","31572.000"],
["2016-11-08","86.292","84.801","86.322","79.845","29328.000"]],
"qt":{"sz002792":["51","\u901a\u5b87\u901a\u8baf","002792","55.91","56.29","56.25","36536","18510","18026","55.91","38","55.90","127",
"55.89","201","55.85","10","55.83","10","55.99","30","56.00","3","56.10","10","56.12","8","56.15","26",
"15:00:04\/55.91\/301\/S\/1682891\/15265|14:57:00\/55.89\/1\/B\/5589\/15163|14:56:52\/55.71\/90\/S\/503812\/15154|
14:56:45\/55.89\/18\/B\/100602\/15146|14:56:39\/55.82\/8\/S\/44544\/15140|14:56:36\/56.12\/12\/B\/67324\/15136","20161109150137",
"-0.38","-0.68","56.75","54.46","55.89\/36235\/201929177","36536","20361","8.12","56.40","","56.75","54.46","4.07","25.16",
"125.80","7.09","61.92","50.66","1.05"],"market":["2016-11-09 20:57:01|HK_close_\u5df2\u6536\u76d8|SH_close_\u5df2\u6536\u76d8|
SZ_close_\u5df2\u6536\u76d8|US_close_\u672a\u5f00\u76d8|SQ_close_\u5df2\u4f11\u5e02|DS_close_\u5df2\u4f11\u5e02|ZS_close_
\u5df2\u4f11\u5e02"],"zjlx":["sz002792","8206.89","10347.24","-2140.35","-10.51","12154.32","10013.97","2140.35","10.51",
"20361.21","41080.23","41732.96","\u901a\u5b87\u901a\u8baf","20161109","20161108^5889.20^7540.99","20161107^6888.64^7504.11",
"20161104^15471.59^10227.30","20161103^4623.91^6113.32"]},"mx_price":{"mx":{"data":[],"timeline":[]},"price":{"data":[]}},
"prec":"22.940","version":"5"}}}

  

 这样详细的扣代码就这一次吧,以后还是应该提高效率,记录得简略些。

posted on 2016-11-10 19:41  mickey_yzy  阅读(5720)  评论(0编辑  收藏  举报