pandas官方文档:https://pandas.pydata.org/pandas-docs/stable/?v=20190307135750
pandas基于Numpy,可以看成是处理文本或者表格数据。pandas中有两个主要的数据结构,其中
Series数据结构类似于Numpy中的一维数组
DataFrame类似于多维表格数据结构
pandas是python数据分析的核心模块。用于数据分析、数据处理、数据可视化。它主要提供了五大功能:
支持文件存取操作,支持数据库(sql)、html、json、pickle、csv(txt、excel)、sas、stata、hdf等。
支持增删改查、切片、高阶函数、分组聚合等单表操作,以及和dict、list的互相转换。
支持多表拼接合并操作。
支持简单的绘图操作。
支持简单的统计分析操作。
一、Series数据结构
Series是一种类似于一维数组的对象,由一组数据(不同的数据类型)和一组与之相关的数据标签(索引)组成。
Series比较像列表(数组)和字典的结合体
import numpy as np
import pandas as pd
df = pd. Series( 0 , index= [ 'a' , 'b' , 'c' , 'd' ] )
print ( df)
a 0
b 0
c 0
d 0
dtype: int64
print ( df. values)
print ( df. index)
1.1 创建Series三种的方法
1.仅用数据列表既可产生最简单的Series
import pandas as pd
s1 = pd. Series( [ 1 , 'a' , 5.2 , 7 ] )
print ( s1)
'''
0 1
1 a
2 5.2
3 7
dtype: object
'''
print ( s1. index)
print ( s1. values)
2.创建一个具有标签索引的Series
s2 = pd. Series( [ 1 , 'a' , 5.2 , 7 ] , index= [ 'd' , 'b' , 'a' , 'c' ] )
print ( s2)
print ( s2. index)
print ( s2. values)
d 1
b a
a 5.2
c 7
dtype: object
Index( [ 'd' , 'b' , 'a' , 'c' ] , dtype= 'object' )
[ 1 'a' 5.2 7 ]
3.使用python字典创建Series
sdata = { 'Ohio' : 35000 , 'Texas' : 72000 , 'Origon' : 16000 , 'Utah' : 5000 }
s3 = pd. Series( sdata)
print ( s3)
Ohio 35000
Texas 72000
Origon 16000
Utah 5000
dtype: int64
1.2 Series支持NumPy模块的特性(下标)
详解
方法
从ndarray创建Series
Series(arr)
与标量运算
df*2
两个Series运算
df1+df2
索引
df[0], df[[1,2,4]]
切片
df[0:2]
通用函数
np.abs(df)
布尔值过滤
df[df>0]
arr = np. array( [ 1 , 2 , 3 , 4 , np. nan] )
print ( arr)
df = pd. Series( arr, index= [ 'a' , 'b' , 'c' , 'd' , 'e' ] )
print ( df)
a 1.0
b 2.0
c 3.0
d 4.0
e NaN
dtype: float64
print ( df** 2 )
a 1.0
b 4.0
c 9.0
d 16.0
e NaN
dtype: float64
print ( df[ 0 ] )
print ( df[ 'a' ] )
print ( df[ [ 0 , 1 , 2 ] ] )
a 1.0
b 2.0
c 3.0
dtype: float64
print ( df[ 0 : 2 ] )
a 1.0
b 2.0
dtype: float64
print ( np. sin( df) )
a 0.841471
b 0.909297
c 0.141120
d - 0.756802
e NaN
dtype: float64
print ( df[ df > 1 ] )
b 2.0
c 3.0
d 4.0
dtype: float64
1.2 Series支持字典的特性(标签)
详解
方法
从字典创建Series
Series(dic),
in运算
’a’ in sr
键索引
sr[‘a’], sr[[‘a’, ‘b’, ‘d’]]
df = pd. Series( { 'a' : 1 , 'b' : 2 } )
print ( df)
a 1
b 2
dtype: int64
print ( 'a' in df)
print ( df[ 'a' ] )
1.3 Series缺失数据处理
方法
详解
dropna()
过滤掉值为NaN的行
fillna()
填充缺失数据
isnull()
返回布尔数组,缺失值对应为True
notnull()
返回布尔数组,缺失值对应为False
df = pd. Series( [ 1 , 2 , 3 , 4 , np. nan] , index= [ 'a' , 'b' , 'c' , 'd' , 'e' ] )
print ( df)
a 1.0
b 2.0
c 3.0
d 4.0
e NaN
dtype: float64
print ( df. dropna( ) )
a 1.0
b 2.0
c 3.0
d 4.0
dtype: float64
print ( df. fillna( 5 ) )
a 1.0
b 2.0
c 3.0
d 4.0
e 5.0
dtype: float64
df1 = pd. Series( [ 1 , 2 , 3 , 4 , np. nan, np. nan] , index= [ 'a' , 'b' , 'c' , 'd' , 'e' , 'f' ] )
print ( df1. fillna( 6 ) )
a 1.0
b 2.0
c 3.0
d 4.0
e 6.0
f 6.0
dtype: float64
print ( df. isnull( ) )
a False
b False
c False
d False
e True
dtype: bool
print ( df. notnull( ) )
a True
b True
c True
d True
e False
dtype: bool
二、DataFrame数据结构
DataFrame是一个表格型的数据结构,含有一组有序的列。
DataFrame可以被看做是由Series组成的字典,并且共用一个索引。
2.1 产生时间对象数组:date_range
date_range参数详解:
参数
详解
start
开始时间
end
结束时间
periods
时间长度
freq
时间频率,默认为’D’,可选H(our),W(eek),B(usiness),S(emi-)M(onth),(min)T(es), S(econd), A(year),…
dates = pd. date_range( '20190101' , periods= 6 , freq= 'M' )
print ( dates)
DatetimeIndex( [ '2019-01-31' , '2019-02-28' , '2019-03-31' , '2019-04-30' ,
'2019-05-31' , '2019-06-30' ] ,
dtype= 'datetime64[ns]' , freq= 'M' )
np. random. seed( 1 )
arr = 10 * np. random. randn( 6 , 4 )
print ( arr)
[ [ 16.24345364 - 6.11756414 - 5.28171752 - 10.72968622 ]
[ 8.65407629 - 23.01538697 17.44811764 - 7.61206901 ]
[ 3.19039096 - 2.49370375 14.62107937 - 20.60140709 ]
[ - 3.22417204 - 3.84054355 11.33769442 - 10.99891267 ]
[ - 1.72428208 - 8.77858418 0.42213747 5.82815214 ]
[ - 11.00619177 11.4472371 9.01590721 5.02494339 ] ]
df = pd. DataFrame( arr, index= dates, columns= [ 'c1' , 'c2' , 'c3' , 'c4' ] )
print ( df)
c1
c2
c3
c4
2019-01-31
16.243454
-6.117564
-5.281718
-10.729686
2019-02-28
8.654076
-23.015387
17.448118
-7.612069
2019-03-31
3.190391
-2.493704
14.621079
-20.601407
2019-04-30
-3.224172
-3.840544
11.337694
-10.998913
2019-05-31
-1.724282
-8.778584
0.422137
5.828152
2019-06-30
-11.006192
11.447237
9.015907
5.024943
三、DataFrame属性
属性
详解
dtype
查看数据类型
index
查看行序列或者索引
columns
查看各列的标签
values
查看数据框内的数据,也即不含表头索引的数据
describe
查看数据每一列的极值,均值,中位数,只可用于数值型数据
transpose
转置,也可用T来操作
sort_index
排序,可按行或列index排序输出
sort_values
按数据值来排序
print ( df. dtypes)
0 float64
1 float64
2 float64
3 float64
dtype: object
print ( df. index)
DatetimeIndex( [ '2019-01-31' , '2019-02-28' , '2019-03-31' , '2019-04-30' ,
'2019-05-31' , '2019-06-30' ] ,
dtype= 'datetime64[ns]' , freq= 'M' )
print ( df. columns)
print ( df. values)
[ [ 16.24345364 - 6.11756414 - 5.28171752 - 10.72968622 ]
[ 8.65407629 - 23.01538697 17.44811764 - 7.61206901 ]
[ 3.19039096 - 2.49370375 14.62107937 - 20.60140709 ]
[ - 3.22417204 - 3.84054355 11.33769442 - 10.99891267 ]
[ - 1.72428208 - 8.77858418 0.42213747 5.82815214 ]
[ - 11.00619177 11.4472371 9.01590721 5.02494339 ] ]
print ( df. describe( ) )
c1
c2
c3
c4
count
6.000000
6.000000
6.000000
6.000000
mean
2.022213
-5.466424
7.927203
-6.514830
std
9.580084
11.107772
8.707171
10.227641
min
-11.006192
-23.015387
-5.281718
-20.601407
25%
-2.849200
-8.113329
2.570580
-10.931606
50%
0.733054
-4.979054
10.176801
-9.170878
75%
7.288155
-2.830414
13.800233
1.865690
max
16.243454
11.447237
17.448118
5.828152
print ( df. T)
2019-01-31 00:00:00
2019-02-28 00:00:00
2019-03-31 00:00:00
2019-04-30 00:00:00
2019-05-31 00:00:00
2019-06-30 00:00:00
c1
16.243454
8.654076
3.190391
-3.224172
-1.724282
-11.006192
c2
-6.117564
-23.015387
-2.493704
-3.840544
-8.778584
11.447237
c3
-5.281718
17.448118
14.621079
11.337694
0.422137
9.015907
c4
-10.729686
-7.612069
-20.601407
-10.998913
5.828152
5.024943
df. sort_index( axis= 0 )
c1
c2
c3
c4
2019-01-31
16.243454
-6.117564
-5.281718
-10.729686
2019-02-28
8.654076
-23.015387
17.448118
-7.612069
2019-03-31
3.190391
-2.493704
14.621079
-20.601407
2019-04-30
-3.224172
-3.840544
11.337694
-10.998913
2019-05-31
-1.724282
-8.778584
0.422137
5.828152
2019-06-30
-11.006192
11.447237
9.015907
5.024943
df. sort_index( axis= 1 )
c1
c2
c3
c4
2019-01-31
16.243454
-6.117564
-5.281718
-10.729686
2019-02-28
8.654076
-23.015387
17.448118
-7.612069
2019-03-31
3.190391
-2.493704
14.621079
-20.601407
2019-04-30
-3.224172
-3.840544
11.337694
-10.998913
2019-05-31
-1.724282
-8.778584
0.422137
5.828152
2019-06-30
-11.006192
11.447237
9.015907
5.024943
df. sort_values( by= 'c2' )
c1
c2
c3
c4
2019-02-28
8.654076
-23.015387
17.448118
-7.612069
2019-05-31
-1.724282
-8.778584
0.422137
5.828152
2019-01-31
16.243454
-6.117564
-5.281718
-10.729686
2019-04-30
-3.224172
-3.840544
11.337694
-10.998913
2019-03-31
3.190391
-2.493704
14.621079
-20.601407
2019-06-30
-11.006192
11.447237
9.015907
5.024943
四、DataFrame取值
df
c1
c2
c3
c4
2019-01-31
16.243454
-6.117564
-5.281718
-10.729686
2019-02-28
8.654076
-23.015387
17.448118
-7.612069
2019-03-31
3.190391
-2.493704
14.621079
-20.601407
2019-04-30
-3.224172
-3.840544
11.337694
-10.998913
2019-05-31
-1.724282
-8.778584
0.422137
5.828152
2019-06-30
-11.006192
11.447237
9.015907
5.024943
4.1 通过columns取值
df[ 'c2' ]
2019 - 01 - 31 - 6.117564
2019 - 02 - 28 - 23.015387
2019 - 03 - 31 - 2.493704
2019 - 04 - 30 - 3.840544
2019 - 05 - 31 - 8.778584
2019 - 06 - 30 11.447237
Freq: M, Name: c2, dtype: float64
df[ [ 'c2' , 'c3' ] ]
c2
c3
2019-01-31
-6.117564
-5.281718
2019-02-28
-23.015387
17.448118
2019-03-31
-2.493704
14.621079
2019-04-30
-3.840544
11.337694
2019-05-31
-8.778584
0.422137
2019-06-30
11.447237
9.015907
4.2 loc(通过行标签取值)
df. loc[ '2019-01-01' : '2019-01-03' ]
print ( df[ 0 : 3 ] )
c1
c2
c3
c4
2019-01-31
16.243454
-6.117564
-5.281718
-10.729686
2019-02-28
8.654076
-23.015387
17.448118
-7.612069
2019-03-31
3.190391
-2.493704
14.621079
-20.601407
4.3 iloc(类似于numpy数组取值)
print ( df. values)
array( [ [ 16.24345364 , - 6.11756414 , - 5.28171752 , - 10.72968622 ] ,
[ 8.65407629 , - 23.01538697 , 17.44811764 , - 7.61206901 ] ,
[ 3.19039096 , - 2.49370375 , 14.62107937 , - 20.60140709 ] ,
[ - 3.22417204 , - 3.84054355 , 11.33769442 , - 10.99891267 ] ,
[ - 1.72428208 , - 8.77858418 , 0.42213747 , 5.82815214 ] ,
[ - 11.00619177 , 11.4472371 , 9.01590721 , 5.02494339 ] ] )
print ( df. iloc[ 2 , 1 ] )
print ( df. iloc[ 1 : 4 , 1 : 4 ] )
c2
c3
c4
2019-02-28
-23.015387
17.448118
-7.612069
2019-03-31
-2.493704
14.621079
-20.601407
2019-04-30
-3.840544
11.337694
-10.998913
4.4 使用逻辑判断取值
df[ df[ 'c1' ] > 0 ]
c1
c2
c3
c4
2019-01-31
16.243454
-6.117564
-5.281718
-10.729686
2019-02-28
8.654076
-23.015387
17.448118
-7.612069
2019-03-31
3.190391
-2.493704
14.621079
-20.601407
df[ ( df[ 'c1' ] > 0 ) & ( df[ 'c2' ] > - 8 ) ]
c1
c2
c3
c4
2019-01-31
16.243454
-6.117564
-5.281718
-10.729686
2019-03-31
3.190391
-2.493704
14.621079
-20.601407
五、DataFrame值替换
df
c1
c2
c3
c4
2019-01-31
16.243454
-6.117564
-5.281718
-10.729686
2019-02-28
8.654076
-23.015387
17.448118
-7.612069
2019-03-31
3.190391
-2.493704
14.621079
-20.601407
2019-04-30
-3.224172
-3.840544
11.337694
-10.998913
2019-05-31
-1.724282
-8.778584
0.422137
5.828152
2019-06-30
-11.006192
11.447237
9.015907
5.024943
df. iloc[ 0 : 3 , 0 : 2 ] = 0
df
c1
c2
c3
c4
2019-01-31
0.000000
0.000000
-5.281718
-10.729686
2019-02-28
0.000000
0.000000
17.448118
-7.612069
2019-03-31
0.000000
0.000000
14.621079
-20.601407
2019-04-30
-3.224172
-3.840544
11.337694
-10.998913
2019-05-31
-1.724282
-8.778584
0.422137
5.828152
2019-06-30
-11.006192
11.447237
9.015907
5.024943
df[ 'c3' ] > 10
2019 - 01 - 31 False
2019 - 02 - 28 True
2019 - 03 - 31 True
2019 - 04 - 30 True
2019 - 05 - 31 False
2019 - 06 - 30 False
Freq: M, Name: c3, dtype: bool
df[ df[ 'c3' ] > 10 ] = 100
df
c1
c2
c3
c4
2019-01-31
0.000000
0.000000
-5.281718
-10.729686
2019-02-28
100.000000
100.000000
100.000000
100.000000
2019-03-31
100.000000
100.000000
100.000000
100.000000
2019-04-30
100.000000
100.000000
100.000000
100.000000
2019-05-31
-1.724282
-8.778584
0.422137
5.828152
2019-06-30
-11.006192
11.447237
9.015907
5.024943
df = df. astype( np. int32)
df[ df[ 'c3' ] . isin( [ 100 ] ) ] = 1000
df
c1
c2
c3
c4
2019-01-31
0
0
-5
-10
2019-02-28
1000
1000
1000
1000
2019-03-31
1000
1000
1000
1000
2019-04-30
1000
1000
1000
1000
2019-05-31
-1
-8
0
5
2019-06-30
-11
11
9
5
六、读取CSV文件
有行有列的二维数据
数据类型
说明
pandas读取方法
csv、tsv、txt
用逗号分割、tab分割的纯文本文件
pd.read_csv
excel
微软xls或者xlsx文件
pd.read_excel
mysql
关系型数据库
pd.read_sql
import pandas as pd
from io import StringIO
test_data = '''
5.1,,1.4,0.2
4.9,3.0,1.4,0.2
4.7,3.2,,0.2
7.0,3.2,4.7,1.4
6.4,3.2,4.5,1.5
6.9,3.1,4.9,
,,,
'''
test_data = StringIO( test_data)
df = pd. read_csv( test_data, header= None )
df. columns = [ 'c1' , 'c2' , 'c3' , 'c4' ]
df
c1
c2
c3
c4
0
5.1
NaN
1.4
0.2
1
4.9
3.0
1.4
0.2
2
4.7
3.2
NaN
0.2
3
7.0
3.2
4.7
1.4
4
6.4
3.2
4.5
1.5
5
6.9
3.1
4.9
NaN
6
NaN
NaN
NaN
NaN
df. head( )
df. shape
df. columns
df. index
df. dtypes
七、处理丢失数据
df. isnull( )
c1
c2
c3
c4
0
False
True
False
False
1
False
False
False
False
2
False
False
True
False
3
False
False
False
False
4
False
False
False
False
5
False
False
False
True
6
True
True
True
True
print ( df. isnull( ) . sum ( ) )
c1 1
c2 2
c3 2
c4 2
dtype: int64
df. dropna( axis= 0 )
c1
c2
c3
c4
1
4.9
3.0
1.4
0.2
3
7.0
3.2
4.7
1.4
4
6.4
3.2
4.5
1.5
df. dropna( axis= 1 )
0
1
2
3
4
5
6
df. dropna( how= 'all' )
c1
c2
c3
c4
0
5.1
NaN
1.4
0.2
1
4.9
3.0
1.4
0.2
2
4.7
3.2
NaN
0.2
3
7.0
3.2
4.7
1.4
4
6.4
3.2
4.5
1.5
5
6.9
3.1
4.9
NaN
df. dropna( thresh= 4 )
c1
c2
c3
c4
1
4.9
3.0
1.4
0.2
3
7.0
3.2
4.7
1.4
4
6.4
3.2
4.5
1.5
df. dropna( subset= [ 'c2' ] )
c1
c2
c3
c4
1
4.9
3.0
1.4
0.2
2
4.7
3.2
NaN
0.2
3
7.0
3.2
4.7
1.4
4
6.4
3.2
4.5
1.5
5
6.9
3.1
4.9
NaN
df. fillna( value= 10 )
c1
c2
c3
c4
0
5.1
10.0
1.4
0.2
1
4.9
3.0
1.4
0.2
2
4.7
3.2
10.0
0.2
3
7.0
3.2
4.7
1.4
4
6.4
3.2
4.5
1.5
5
6.9
3.1
4.9
10.0
6
10.0
10.0
10.0
10.0
八、合并数据
df1 = pd. DataFrame( np. zeros( ( 3 , 4 ) ) )
df1
0
1
2
3
0
0.0
0.0
0.0
0.0
1
0.0
0.0
0.0
0.0
2
0.0
0.0
0.0
0.0
df2 = pd. DataFrame( np. ones( ( 3 , 4 ) ) )
df2
0
1
2
3
0
1.0
1.0
1.0
1.0
1
1.0
1.0
1.0
1.0
2
1.0
1.0
1.0
1.0
pd. concat( ( df1, df2) , axis= 0 )
0
1
2
3
0
0.0
0.0
0.0
0.0
1
0.0
0.0
0.0
0.0
2
0.0
0.0
0.0
0.0
0
1.0
1.0
1.0
1.0
1
1.0
1.0
1.0
1.0
2
1.0
1.0
1.0
1.0
pd. concat( ( df1, df2) , axis= 1 )
0
1
2
3
0
1
2
3
0
0.0
0.0
0.0
0.0
1.0
1.0
1.0
1.0
1
0.0
0.0
0.0
0.0
1.0
1.0
1.0
1.0
2
0.0
0.0
0.0
0.0
1.0
1.0
1.0
1.0
# append只能合并列
df1.append(df2)
0
1
2
3
0
0.0
0.0
0.0
0.0
1
0.0
0.0
0.0
0.0
2
0.0
0.0
0.0
0.0
0
1.0
1.0
1.0
1.0
1
1.0
1.0
1.0
1.0
2
1.0
1.0
1.0
1.0
九、导入导出数据
使用df = pd.read_excel(filename)读取文件,使用df.to_excel(filename)保存文件。
9.1 读取文件导入数据
读取文件导入数据函数主要参数:
参数
详解
sep
指定分隔符,可用正则表达式如’\s+’
header=None
指定文件无行名
name
指定列名
index_col
指定某列作为索引
skip_row
指定跳过某些行
na_values
指定某些字符串表示缺失值
parse_dates
指定某些列是否被解析为日期,布尔值或列表
df = pd. read_excel( filename)
df = pd. read_csv( filename)
9.2 写入文件导出数据
写入文件函数的主要参数:
参数
详解
sep
分隔符
na_rep
指定缺失值转换的字符串,默认为空字符串
header=False
不保存列名
index=False
不保存行索引
cols
指定输出的列,传入列表
df. to_excel( filename)
十、pandas读取json文件
strtext = '[ { "ttery" : "min" , "issue" : "20130801-3391" , "code" : "8,4,5,2,9" , "code1" : "297734529" , "code2" : null, "time" : 1013395466000 } , \
{ "ttery" : "min" , "issue" : "20130801-3390" , "code" : "7,8,2,1,2" , "code1" : "298058212" , "code2" : null, "time" : 1013395406000 } , \
{ "ttery" : "min" , "issue" : "20130801-3389" , "code" : "5,9,1,2,9" , "code1" : "298329129" , "code2" : null, "time" : 1013395346000 } , \
{ "ttery" : "min" , "issue" : "20130801-3388" , "code" : "3,8,7,3,3" , "code1" : "298588733" , "code2" : null, "time" : 1013395286000 } , \
{ "ttery" : "min" , "issue" : "20130801-3387" , "code" : "0,8,5,2,7" , "code1" : "298818527" , "code2" : null, "time" : 1013395226000 } ] '
df = pd. read_json( strtext, orient= 'records' )
df
code
code1
code2
issue
time
ttery
0
8,4,5,2,9
297734529
NaN
20130801-3391
1013395466000
min
1
7,8,2,1,2
298058212
NaN
20130801-3390
1013395406000
min
2
5,9,1,2,9
298329129
NaN
20130801-3389
1013395346000
min
3
3,8,7,3,3
298588733
NaN
20130801-3388
1013395286000
min
4
0,8,5,2,7
298818527
NaN
20130801-3387
1013395226000
min
df. to_excel( 'pandas处理json.xlsx' ,
index= False ,
columns= [ "ttery" , "issue" , "code" , "code1" , "code2" , "time" ] )
10.1 orient参数的五种形式
orient是表明预期的json字符串格式。orient的设置有以下五个值:
1.’split’ : dict like {index -> [index], columns -> [columns], data -> [values]}
这种就是有索引,有列字段,和数据矩阵构成的json格式。key名称只能是index,columns和data。
s = '{"index":[1,2,3],"columns":["a","b"],"data":[[1,3],[2,8],[3,9]]}'
df = pd. read_json( s, orient= 'split' )
df
2.’records’ : list like [{column -> value}, … , {column -> value}]
这种就是成员为字典的列表。如我今天要处理的json数据示例所见。构成是列字段为键,值为键值,每一个字典成员就构成了dataframe的一行数据。
strtext = '[ { "ttery" : "min" , "issue" : "20130801-3391" , "code" : "8,4,5,2,9" , "code1" : "297734529" , "code2" : null, "time" : 1013395466000 } , \
{ "ttery" : "min" , "issue" : "20130801-3390" , "code" : "7,8,2,1,2" , "code1" : "298058212" , "code2" : null, "time" : 1013395406000 } ] '
df = pd. read_json( strtext, orient= 'records' )
df
code
code1
code2
issue
time
ttery
0
8,4,5,2,9
297734529
NaN
20130801-3391
1013395466000
min
1
7,8,2,1,2
298058212
NaN
20130801-3390
1013395406000
min
3.’index’ : dict like {index -> {column -> value}}
以索引为key,以列字段构成的字典为键值。如:
s = '{"0":{"a":1,"b":2},"1":{"a":9,"b":11}}'
df = pd. read_json( s, orient= 'index' )
df
4.’columns’ : dict like {column -> {index -> value}}
这种处理的就是以列为键,对应一个值字典的对象。这个字典对象以索引为键,以值为键值构成的json字符串。如下图所示:
s = '{"a":{"0":1,"1":9},"b":{"0":2,"1":11}}'
df = pd. read_json( s, orient= 'columns' )
df
5.’values’ : just the values array。
values这种我们就很常见了。就是一个嵌套的列表。里面的成员也是列表,2层的。
s = '[["a",1],["b",2]]'
df = pd. read_json( s, orient= 'values' )
df
十一、pandas读取sql语句
import numpy as np
import pandas as pd
import pymysql
def conn ( sql) :
conn = pymysql. connect(
host= "localhost" ,
port= 3306 ,
user= "root" ,
passwd= "123" ,
db= "db1" ,
)
try :
data = pd. read_sql( sql, con= conn)
return data
except Exception as e:
print ( "SQL is not correct!" )
finally :
conn. close( )
sql = "select * from test1 limit 0, 10"
data = conn( sql)
print ( data. columns. tolist( ) )
print ( data)
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 【杭电多校比赛记录】2025“钉耙编程”中国大学生算法设计春季联赛(1)