python -- pandas常见的一些行、列操作方法（感兴趣的，可以跟着一起练练手）

这篇文章分享一下pandas常见的一些行、列操作方法，比较基础，感兴趣的童鞋可以看看。

首先，我们用 “random.seed(int i)” 生成一组测试数据。

对于random.seed()有兴趣进一步了解的，可以前往阅读 python -- numpy.random.seed()

如果“pip install pandas”遇到问题，可参考 python -- 解决"pip install pandas"时遇到的几个小问题（内含解决pip升级问题）

import pandas as pd
import numpy as np

np.random.seed(1) #i取值1，保证代码每次运行都能得到相同的一组数据
sample = pd.DataFrame(np.random.randn(8, 5), columns = list("abcde"))
print(sample)

运行一下，看看生成的随机数组

          a         b         c         d         e
0  1.624345 -0.611756 -0.528172 -1.072969  0.865408
1 -2.301539  1.744812 -0.761207  0.319039 -0.249370
2  1.462108 -2.060141 -0.322417 -0.384054  1.133769
3 -1.099891 -0.172428 -0.877858  0.042214  0.582815
4 -1.100619  1.144724  0.901591  0.502494  0.900856
5 -0.683728 -0.122890 -0.935769 -0.267888  0.530355
6 -0.691661 -0.396754 -0.687173 -0.845206 -0.671246
7 -0.012665 -1.117310  0.234416  1.659802  0.742044

一、单列查询（四种方式）

#方式一：选择列名
sample['a']

#方式二：使用iloc方法，基于位置的索引
sample.iloc[:1,0]

#方式三：使用loc方法，基于标签的索引
sample.loc[:,'a']

#方式四：返回pandas数据框类
sample[['a']]

我们来看看代码分别执行这四种方式，效果是怎样的

import pandas as pd
import numpy as np

np.random.seed(1) #保证代码每次运行都能得到相同的一组数据，设置随机数种子
sample = pd.DataFrame(np.random.randn(8, 5), columns = list("abcde"))

#查询单列
#方式一：选择列名
print(sample['a'])

#方式二：使用iloc方法，基于位置的索引，也可写为iloc[0:1, 1]
print(sample.iloc[:1, 0])

#方式三：使用loc方法，基于标签的索引，也可写为loc[0:1, 'a']
print(sample.loc[:1, 'a'])

#方式四：返回pandas数据框类
print(sample[['a']])

运行一下，看看效果

#方式一：输出所有行，且列名为a的数据
0    1.624345
1   -2.301539
2    1.462108
3   -1.099891
4   -1.100619
5   -0.683728
6   -0.691661
7   -0.012665
Name: a, dtype: float64

#方式二：iloc[:1, 0]表示第0行开始，筛选出1行，且为第0列的数组
0    1.624345
Name: a, dtype: float64

#方式三:loc[:1, 'a']表示第0行到第1行，且列名为'a'的数组（与方式二有区别，请注意区分）
0    1.624345
1   -2.301539
Name: a, dtype: float64

#方式四：输出所有行，且列名为'a'的数组
      a
0  1.624345
1 -2.301539
2  1.462108
3 -1.099891
4 -1.100619
5 -0.683728
6 -0.691661
7 -0.012665

注解：（温馨提示：数组的开始位置是0行、0列。）

1、iloc是基于位置索引，iloc[:1, 0]（亦为写为iloc[0:1, 0]）：表示第0行开始，筛选出1行，且为第0列的数组。

2、loc是基于标签索引，loc[:1,'a']（亦可写为loc[0:1,'a']）：表示第0行到第1行，且列名为'a'的数组。

二、多列查询（两种方式）

#使用loc方法，标签索引
sample.loc[:3, :'c']

#使用iloc，位置索引
sample.iloc[1:, 1:]

注解：

ioc[r_begin:r_end, c_begin:c:end]，iloc[r_begin:r_end, c_begin:c:end]中，r_begin、c_begin表示行、列开始的位置，r_end、c_end表示行、列结束的位置。

如果r_begin或c_end留空，表示从第0行或第0列开始；若r_end或c_end留空，表示到最尾行或最尾列。

敲敲代码，看看效果

import pandas as pd
import numpy as np

np.random.seed(1) #保证代码每次运行都能得到相同的一组数据，设置随机数种子
sample = pd.DataFrame(np.random.randn(8, 5), columns = list("abcde"))
print(sample)

#使用loc方法，标签索引，输出第0行到第3行，第0列到列名为c'的数组
print(sample.loc[:3, :'c'])
#使用iloc方法，位置索引，输出第1行到结尾行，第1列到结尾列的数组
print(sample.iloc[1:, 1:])

运行一下，看看输出

#完整数组
          a         b         c         d         e
0  1.624345 -0.611756 -0.528172 -1.072969  0.865408
1 -2.301539  1.744812 -0.761207  0.319039 -0.249370
2  1.462108 -2.060141 -0.322417 -0.384054  1.133769
3 -1.099891 -0.172428 -0.877858  0.042214  0.582815
4 -1.100619  1.144724  0.901591  0.502494  0.900856
5 -0.683728 -0.122890 -0.935769 -0.267888  0.530355
6 -0.691661 -0.396754 -0.687173 -0.845206 -0.671246
7 -0.012665 -1.117310  0.234416  1.659802  0.742044

#sample.loc[:3, :'c']
          a         b         c
0  1.624345 -0.611756 -0.528172
1 -2.301539  1.744812 -0.761207
2  1.462108 -2.060141 -0.322417
3 -1.099891 -0.172428 -0.877858


#sample.iloc[1:, 1:]
          b         c         d         e
1  1.744812 -0.761207  0.319039 -0.249370
2 -2.060141 -0.322417 -0.384054  1.133769
3 -0.172428 -0.877858  0.042214  0.582815
4  1.144724  0.901591  0.502494  0.900856
5 -0.122890 -0.935769 -0.267888  0.530355
6 -0.396754 -0.687173 -0.845206 -0.671246
7 -1.117310  0.234416  1.659802  0.742044

三、新增列

sample['new_col'] = sample['a'] + sample['b']
print(sample)

          a         b         c         d         e   new_col
0  1.624345 -0.611756 -0.528172 -1.072969  0.865408  1.012589
1 -2.301539  1.744812 -0.761207  0.319039 -0.249370 -0.556727
2  1.462108 -2.060141 -0.322417 -0.384054  1.133769 -0.598033
3 -1.099891 -0.172428 -0.877858  0.042214  0.582815 -1.272319
4 -1.100619  1.144724  0.901591  0.502494  0.900856  0.044105
5 -0.683728 -0.122890 -0.935769 -0.267888  0.530355 -0.806618
6 -0.691661 -0.396754 -0.687173 -0.845206 -0.671246 -1.088414
7 -0.012665 -1.117310  0.234416  1.659802  0.742044 -1.129975

使用assign方法赋值

#方式一
sample.assign(new_col2 = sample['a'] - sample['b'],
              new_col3 = sample['c'] - sample['d'])
print(sample)


#方式二
sample = sample.assign(new_col2 = sample['a'] - sample['b'],
                       new_col3 = sample['c'] - sample['d'])
print(sample)

#方式一：assign赋值后，没有被保存到原始数组中
          a         b         c         d         e   new_col
0  1.624345 -0.611756 -0.528172 -1.072969  0.865408  1.012589
1 -2.301539  1.744812 -0.761207  0.319039 -0.249370 -0.556727
2  1.462108 -2.060141 -0.322417 -0.384054  1.133769 -0.598033
3 -1.099891 -0.172428 -0.877858  0.042214  0.582815 -1.272319
4 -1.100619  1.144724  0.901591  0.502494  0.900856  0.044105
5 -0.683728 -0.122890 -0.935769 -0.267888  0.530355 -0.806618
6 -0.691661 -0.396754 -0.687173 -0.845206 -0.671246 -1.088414
7 -0.012665 -1.117310  0.234416  1.659802  0.742044 -1.129975

#方式二：用assign赋值后，重新赋予sample，保存到原始数组中
          a         b         c  ...   new_col  new_col2  new_col3
0  1.624345 -0.611756 -0.528172  ...  1.012589  2.236102  0.544797
1 -2.301539  1.744812 -0.761207  ... -0.556727 -4.046350 -1.080246
2  1.462108 -2.060141 -0.322417  ... -0.598033  3.522249  0.061637
3 -1.099891 -0.172428 -0.877858  ... -1.272319 -0.927463 -0.920072
4 -1.100619  1.144724  0.901591  ...  0.044105 -2.245343  0.399096
5 -0.683728 -0.122890 -0.935769  ... -0.806618 -0.560838 -0.667881
6 -0.691661 -0.396754 -0.687173  ... -1.088414 -0.294907  0.158033
7 -0.012665 -1.117310  0.234416  ... -1.129975  1.104646 -1.425386

注解：使用assign方法赋值，生成的新变量不会保留在原始表中，需要赋予新值。

四、删除

（1）列删除

#单列删除
sample = sample.drop(['new_col3'], axis=1, inplace=False)
print(sample)

#多列删除
sample = sample.drop(['a', 'b'], axis=1, inplace=False)
print(sample)

#单列删除输出结果
     a         b         c         d         e       new_col   new_col2
0  1.624345 -0.611756 -0.528172 -1.072969  0.865408  1.012589  2.236102
1 -2.301539  1.744812 -0.761207  0.319039 -0.249370 -0.556727 -4.046350
2  1.462108 -2.060141 -0.322417 -0.384054  1.133769 -0.598033  3.522249
3 -1.099891 -0.172428 -0.877858  0.042214  0.582815 -1.272319 -0.927463
4 -1.100619  1.144724  0.901591  0.502494  0.900856  0.044105 -2.245343
5 -0.683728 -0.122890 -0.935769 -0.267888  0.530355 -0.806618 -0.560838
6 -0.691661 -0.396754 -0.687173 -0.845206 -0.671246 -1.088414 -0.294907
7 -0.012665 -1.117310  0.234416  1.659802  0.742044 -1.129975  1.104646

#多列删除输出结果
          c         d         e   new_col  new_col2
0 -0.528172 -1.072969  0.865408  1.012589  2.236102
1 -0.761207  0.319039 -0.249370 -0.556727 -4.046350
2 -0.322417 -0.384054  1.133769 -0.598033  3.522249
3 -0.877858  0.042214  0.582815 -1.272319 -0.927463
4  0.901591  0.502494  0.900856  0.044105 -2.245343
5 -0.935769 -0.267888  0.530355 -0.806618 -0.560838
6 -0.687173 -0.845206 -0.671246 -1.088414 -0.294907
7  0.234416  1.659802  0.742044 -1.129975  1.104646

（2）行删除

#单行删除
sample = sample.drop(0)
print(sample)

#多行删除
sample = sample.drop([1, 5])
print(sample)

#删除第0行
          c         d         e   new_col  new_col2
1 -0.761207  0.319039 -0.249370 -0.556727 -4.046350
2 -0.322417 -0.384054  1.133769 -0.598033  3.522249
3 -0.877858  0.042214  0.582815 -1.272319 -0.927463
4  0.901591  0.502494  0.900856  0.044105 -2.245343
5 -0.935769 -0.267888  0.530355 -0.806618 -0.560838
6 -0.687173 -0.845206 -0.671246 -1.088414 -0.294907
7  0.234416  1.659802  0.742044 -1.129975  1.104646

#删除第1、5行
          c         d         e   new_col  new_col2
2 -0.322417 -0.384054  1.133769 -0.598033  3.522249
3 -0.877858  0.042214  0.582815 -1.272319 -0.927463
4  0.901591  0.502494  0.900856  0.044105 -2.245343
6 -0.687173 -0.845206 -0.671246 -1.088414 -0.294907
7  0.234416  1.659802  0.742044 -1.129975  1.104646

此篇暂时分享到此，后续再继续更新。

posted @ 2023-05-19 23:47 lmei 阅读(330) 评论(0) 编辑收藏举报

刷新页面返回顶部

python -- pandas常见的一些行、列操作方法（感兴趣的，可以跟着一起练练手）

公告