博客地址:https://www.cnblogs.com/zylyehuo/
- 股票分析
- 使用tushare包获取某股票的历史行情数据。
- 输出该股票所有收盘比开盘上涨3%以上的日期。
- 输出该股票所有开盘比前日收盘跌幅超过2%的日期。
开发环境
- anaconda
- 集成环境:集成好了数据分析和机器学习中所需要的全部环境
- 安装目录不可以有中文和特殊符号
- jupyter
- anaconda提供的一个基于浏览器的可视化开发工具
tushare财经数据接口包
!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tushare
import tushare as ts
import pandas as pd
from pandas import DataFrame,Series
import numpy as np
获取某只股票的历史行情数据
# code:字符串形式的股票代码
df = ts.get_k_data(code='600519',start='2000-01-01')
df
将互联网上获取的股票数据存储到本地
df.to_csv('./maotai.csv') # 调用to_xxx方法将df中的数据写入到本地进行存储
将本地存储的数据读入到df
df = pd.read_csv('./maotai.csv')
df.head()
对读取出来的数据进行相关的处理
删除df中指定的一列
# 在 drop 函数中:axis=0 表示行 axis=1 表示列
# inplace=True 将操作直接应用到源数据中
df.drop(labels='Unnamed: 0',axis=1,inplace=True)
查看每一列的数据类型
# df['date'].dtype
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5289 entries, 0 to 5288
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 5289 non-null object
1 open 5289 non-null float64
2 close 5289 non-null float64
3 high 5289 non-null float64
4 low 5289 non-null float64
5 volume 5289 non-null float64
6 code 5289 non-null int64
dtypes: float64(5), int64(1), object(1)
memory usage: 289.4+ KB
将date列转为时间序列类型
df['date'] = pd.to_datetime(df['date'])
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5289 entries, 0 to 5288
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 5289 non-null datetime64[ns]
1 open 5289 non-null float64
2 close 5289 non-null float64
3 high 5289 non-null float64
4 low 5289 non-null float64
5 volume 5289 non-null float64
6 code 5289 non-null int64
dtypes: datetime64[ns](1), float64(5), int64(1)
memory usage: 289.4 KB
将date列作为源数据的行索引
df.set_index('date',inplace=True)
df.head()
输出该股票所有收盘比开盘上涨3%以上的日期
# 伪代码:(收盘-开盘)/开盘 > 0.03
(df['open'] - df['close']) / df['open'] > 0.03
# 在分析的过程中如果产生了boolean值则下一步马上将布尔值作为源数据的行索引
# 如果布尔值作为df的行索引,则可以取出true对应的行数据,忽略false对应的行数据
df.loc[(df['open'] - df['close']) / df['open'] > 0.03] # 获取了True对应的行数据(满足需求的行数据)
df.loc[(df['open'] - df['close']) / df['open'] > 0.03].index # df的行数据
DatetimeIndex(['2006-12-14', '2006-12-18', '2007-01-12', '2007-01-18',
'2007-01-19', '2007-01-24', '2007-02-01', '2007-03-06',
'2007-04-12', '2007-04-13',
...
'2021-12-29', '2022-01-13', '2022-01-28', '2022-03-07',
'2022-10-10', '2022-10-19', '2022-10-24', '2022-10-27',
'2023-04-12', '2023-08-28'],
dtype='datetime64[ns]', name='date', length=798, freq=None)
输出该股票所有开盘比前日收盘跌幅超过2%的日期
# 伪代码:(开盘-前日收盘)/前日收盘 < -0.02
(df['open'] - df['close'].shift(1))/df['close'].shift(1) < -0.02
# 将布尔值作为源数据的行索引取出True对应的行数据
# df['close'].shift(1) 将整列数据下移,用NaN补空
df.loc[(df['open'] - df['close'].shift(1))/df['close'].shift(1) < -0.02]
df.loc[(df['open'] - df['close'].shift(1))/df['close'].shift(1) < -0.02].index
DatetimeIndex(['2001-09-12', '2002-06-26', '2002-12-13', '2004-07-01',
'2004-10-29', '2006-08-21', '2006-08-23', '2007-01-25',
'2007-02-01', '2007-02-06', '2007-03-19', '2007-05-21',
'2007-05-30', '2007-06-05', '2007-07-27', '2007-09-05',
'2007-09-10', '2008-03-13', '2008-03-17', '2008-03-25',
'2008-03-27', '2008-04-22', '2008-04-23', '2008-04-29',
'2008-05-13', '2008-06-10', '2008-06-13', '2008-06-24',
'2008-06-27', '2008-08-11', '2008-08-19', '2008-09-23',
'2008-10-10', '2008-10-15', '2008-10-16', '2008-10-20',
'2008-10-23', '2008-10-27', '2008-11-06', '2008-11-12',
'2008-11-20', '2008-11-21', '2008-12-02', '2009-02-27',
'2009-03-25', '2009-08-13', '2010-04-26', '2010-04-30',
'2011-08-05', '2012-03-27', '2012-08-10', '2012-11-22',
'2012-12-04', '2012-12-24', '2013-01-16', '2013-01-25',
'2013-09-02', '2014-04-25', '2015-01-19', '2015-05-25',
'2015-07-03', '2015-07-08', '2015-07-13', '2015-08-24',
'2015-09-02', '2015-09-15', '2017-11-17', '2018-02-06',
'2018-02-09', '2018-03-23', '2018-03-28', '2018-07-11',
'2018-10-11', '2018-10-24', '2018-10-25', '2018-10-29',
'2018-10-30', '2019-05-06', '2019-05-08', '2019-10-16',
'2020-01-02', '2020-02-03'],
dtype='datetime64[ns]', name='date', freq=None)
需求
============================================
假如我从2010年1月1日开始,每月第一个交易日买入一手股票,每年最后一个交易日卖出所有股票,到今天为止,我的收益如何?
============================================
- 分析:
- 时间节点:2010-2020
- 一手股票:100支股票
- 买:
- 卖:
- 买卖股票的单价:开盘价
new_df = df['2010-01':'2020-02'] # 必须得是时间序列才能这样切
new_df
new_df.head(2)
买股票
# 找每个月的第一个交易日对应的行数据(捕获到开盘价)==> 每月的第一行数据
# M:根据月份从原始数据中提取指定的数据
# 每月第一个交易日对应的行数据
df_monthly = new_df.resample('M').first() # 数据的重新取样
df_monthly
# 买入股票花费的总金额
cost = df_monthly['open'].sum()*100
cost
2397463.1
卖股票
# 卖出股票到手的钱
# 特殊情况:2020年买入的股票卖不出去(要到每年最后一个交易日才能卖出所有股票)
new_df.resample('A').last()
# 将最后一行2020年的数据切出去
# A:根据年份从原始数据中提取指定的数据
df_yearly = new_df.resample('A').last()[:-1]
df_yearly
# 卖出股票到手的钱
resv = df_yearly['open'].sum()*1200
resv
2798833.1999999997
# 最后手中剩余的股票需要估量其价值计算到总收益中
# 使用昨天的收盘价作为剩余股票的单价
last_money = 200*new_df['close'][-1]
last_money
190237.2
# 计算总收益
resv+last_money-cost
591607.2999999998