【六】推导数据

一:编写程序

现如今有4组秒表记录的数据,分别如下:

  1. james.txt:2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22
  2. julie.txt:2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21
  3. mikey.txt:2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38
  4. sarah.txt:2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55

1.需要从各个文件将数据读入各自的列表,编写一个小程序,处理每个文件,为每个数据创建一个列表,并在屏幕上显示这些列表

james.txt

In [1]: with open('james.txt',"r") as jam:
   ...:     data=jam.readline()
   ...: james=data.strip().split(",") 
#strip():去除空白换行符
#split():分割(也是最快将元素转换成列表的方法)
   ...: 
In [2]: james    #变量james
Out[2]: ['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
In [3]: cat james.txt        #james.txt下的内容
2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22

julie.txt

In [4]: with open("julie.txt","r") as ju:
   ...:     data=ju.readline()
   ...: julie=data.strip().split(",")
   ...: 
In [5]: julie
Out[5]: ['2.59', '2.11', '2:11', '2:23', '3-10', '2-23', '3:10', '3.21', '3-21']
In [6]: cat julie.txt
2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21 

mikey.txt

In [7]: with open("mikey.txt","r") as mi:
   ...:     data=mi.readline()
   ...: mikey=data.strip().split(",")
   ...: 
In [8]: mikey
Out[8]: ['2:22', '3.01', '3:01', '3.02', '3:02', '3.02', '3:22', '2.49', '2:38']
In [9]: cat mikey.txt
2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38

sarah.txt

In [12]: with open("sarah.txt","r") as sa:
    ...:     data=sa.readline()
    ...: sarah=data.strip().split(",")
    ...: 
In [13]: sarah
Out[13]: ['2:58', '2.58', '2:39', '2-25', '2-55', '2:54', '2.18', '2:55', '2:55']
In [14]: cat sarah.txt
2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55

二:排序的两种方式

    • 原地排序:sort()方法  升序
    • 降序:sort(reverse=True)
    • 复制排序:sort()   BIF升序
    • 降序:sorted(reverse=True)
In [15]: data=[2,3,4,543221,333,1,2,3,2]
In [16]: data    #原数据
Out[16]: [2, 3, 4, 543221, 333, 1, 2, 3, 2]
In [17]: data.sort()#原地排序(升序)
In [18]: data
Out[18]: [1, 2, 2, 2, 3, 3, 4, 333, 543221]
In [19]: data=[2,3,4,543221,333,1,2,3,2]
In [20]: data2=sorted(data)   #复制排序
In [21]: data
Out[21]: [2, 3, 4, 543221, 333, 1, 2, 3, 2]
In [22]: data2
Out[22]: [1, 2, 2, 2, 3, 3, 4, 333, 543221]
In [24]: data.sort(reverse=True) #原地排序(降序)
In [25]: data
Out[25]: [543221, 333, 4, 3, 3, 2, 2, 2, 1]

1.给julie排序

In [32]: julie
Out[32]: ['2.59', '2.11', '2:11', '2:23', '3-10', '2-23', '3:10', '3.21', '3-21']
In [33]: julie2=sorted(julie)
In [34]: julie2
Out[34]: ['2-23', '2.11', '2.59', '2:11', '2:23', '3-10', '3-21', '3.21', '3:10']
In [35]: julie
Out[35]: ['2.59', '2.11', '2:11', '2:23', '3-10', '2-23', '3:10', '3.21', '3-21']
#ps:该段代码还需修改

从上段代码结果可以看出:1..数据格式不统一导致排序错误(2-33居然在2.11前面)

思路:1.创建一个函数,这个函数从每个秒表数据的列表中接收一个字符串作为输入,然后处理这个字符串,将找到的所有短横线和冒号替换成一个点号,并返回清理过的字符串

   2.创建一个空列表,将清理过的数据放在该列表中,然后进行排序.

注意:如果字符串已经包含一个点好,则不需要在做清理

2.修改为(james的正常排序):

In [55]: james#原数据
Out[55]: ['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
In [56]: clean_james=[]#创建一个空列表
In [57]: clean_james
Out[57]: []
#定义一个转换数据格式的方法(将其中你给的:-都变成.)
In [58]: def sanitize(time_string):
    ...:     if '-' in time_string:
    ...:         splitter="-"
    ...:     elif ":" in time_string:
    ...:         splitter=":"
    ...:     else:
    ...:         return time_string
    ...:     (mins1,secs1)=time_string.split(splitter)
    ...:     return(mins1+"."+secs1)
    ...: 
#循环james列表,将他变成(分.秒)形式狗,添加到clean_james列表中
In [59]: for i in james:
    ...:     clean_james.append(sanitize(i))
    ...: print(clean_james)
    ...: print(sorted(clean_james))#对该列表进行排序
    ...: 
['2.34', '3.21', '2.34', '2.45', '3.01', '2.01', '2.01', '3.10', '2.22']
['2.01', '2.01', '2.22', '2.34', '2.34', '2.45', '3.01', '3.10', '3.21']

3.其他替换冒号跟短横线的方法:

james=['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
print(type(james))
tihuan=str(james)
tihuan1=tihuan.replace("-",".")
tihuan2=tihuan1.replace(":",".")
print tihuan2
print(type(tihuan2))

列表推导

  1. 创建一个新列表来存放转换后的数据
  2. 迭代处理原列表中的各个数据项
  3. 每次迭代都要完成转换
  4. 将转换后的数据追加到新列表
#将分钟转换成秒
In [60]: mins=[1,2,3]
In [61]: secs=[m*60 for m in mins]
In [62]: secs
Out[62]: [60, 120, 180]
#将name的小写变成大写
In [63]: name=["my","name","is","huahua"]
In [66]: upper=[s.upper() for s in name]
In [67]: upper
Out[67]: ['MY', 'NAME', 'IS', 'HUAHUA']
#将data中的字符串变成float
In [68]: data=['2.01','2.22','9.66']
In [69]: data1=[float(q) for q in data]
In [70]: data1
Out[70]: [2.01, 2.22, 9.66]

4. 简化上述序号为2的代码:

In [71]: james
Out[71]: ['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
In [72]: def sanitize(time_string):
    ...:     if '-' in time_string:
    ...:         splitter="-"
    ...:     elif ":" in time_string:
    ...:         splitter=":"
    ...:     else:
    ...:         return time_string
    ...:     (mins1,secs1)=time_string.split(splitter)
    ...:     return(mins1+"."+secs1)
    ...: 
In [73]: print(sorted([sanitize(i) for i in james])) #列表推倒
['2.01', '2.01', '2.22', '2.34', '2.34', '2.45', '3.01', '3.10', '3.21']

列表分片

5.迭代删除重复项,打印出最快的3个时间

思路:

  • 需要新建一个空列表
  • 填入james中找到的唯一的数据项(使用not in)
In [76]: james#james元数据
Out[76]: ['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
#替换数据格式的函数
In [77]: def sanitize(time_string):
    ...:     if '-' in time_string:
    ...:         splitter="-"
    ...:     elif ":" in time_string:
    ...:         splitter=":"
    ...:     else:
    ...:         return time_string
    ...:     (mins1,secs1)=time_string.split(splitter)
    ...:     return(mins1+"."+secs1)
    ...: 
#打印出转换格式后的james并将他排序
In [78]: james1=(sorted([sanitize(i)for i in james]))
#排序后的数据
In [79]: james1
Out[79]: ['2.01', '2.01', '2.22', '2.34', '2.34', '2.45', '3.01', '3.10', '3.21']
#空列表
In [80]: unique_james=[]
#循环james1列表,判断该元素是否在unique_james中存在,若存在,不添加,若不存在,添加
In [81]: for s in james1:
    ...:     if s not in unique_james:
    ...:         unique_james.append(s)
    ...: print(unique_james[0:3]) #打印出最快的3个成绩
    ...: 
['2.01', '2.22', '2.34']

 使用集合删除重复项

注意:集合是不允许有重复元素的

In [82]: s={1,2,3,4,5,5,5,5,5,5,}
In [83]: s
Out[83]: {1, 2, 3, 4, 5}

 6.使用set和列表分片修改上述代码(打印出最快的3个时间)

In [86]: james
Out[86]: ['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
In [87]: def sanitize(time_string):
    ...:     if '-' in time_string:
    ...:         splitter="-"
    ...:     elif ":" in time_string:
    ...:         splitter=":"
    ...:     else:
    ...:         return time_string
    ...:     (mins1,secs1)=time_string.split(splitter)
    ...:     return(mins1+"."+secs1)
    ...: 
#1.将转换后的列表变成集合(set)
#2.在将集合排序
In [88]: james1=(sorted(set([sanitize(i)for i in james])))
In [89]: james1
Out[89]: ['2.01', '2.22', '2.34', '2.45', '3.01', '3.10', '3.21']
#取最快的3个时间
In [90]: james1=(sorted(set([sanitize(i)for i in james]))[0:3])
In [91]: james1
Out[91]: ['2.01', '2.22', '2.34'

 知识点总结:

 sort()

  1. 原地排序(升序)
  2. a.sort()
  3. 降序(sort(reverse=True))

sorted()

  1. 复制排序(升序)
  2. data1=sorted(data)
  3. 降序(sorted(reverse=True))

列表推导

  1. [表达式 for 变量 in 列表]    或者  [表达式 for 变量 in 列表 if 条件]
  2. [m*60 for m in f]

列表分片

  1. 在分片规则里list、tuple、str(字符串)都可以称为序列,都可以按规则进行切片操作
  2. 注意切片的下标0代表顺序的第一个元素,-1代表倒序的第一个元素;且切片不包括右边界,例如[0:3]代表元素0、1、2不包括3。
  3. james[0:3]

set

  1. set是无序的
  2. 不存在重复元素(可以使用set来去重) 
#coding=utf-8
"""
总需求:在4组秒表记录中取出最快的3个时间
"""
#获取文件中的内容
def get_filecontent(filename):
    try:
        with open(filename) as f:
            data=f.readline().strip().split(",")
        return data
    except IOError as e:
        raise e
#清洗数据
def sanitize(time_string):
    if "-" in time_string:
        splitter='-'
    elif ":" in time_string:
        splitter=':'
    else:
        return time_string
    (mins,sece)=time_string.strip().split(splitter)
    return(mins+"."+sece)
#排序,取出最快的3个时间
yssj=get_filecontent("D:\pydj\james.txt")
print(yssj)
#数据推导分析
#1.将yssj中的每个元素去遍历,清理元素,变成分.秒格式
#2.set:将取出来的整个数据变成集合,因为集合可以去重,它具有不存在重复性元素的特性
#3.sorted:将集合复制排序
#4.[0:3],去除前3个数
print(sorted(set([sanitize(i)for i in yssj]))[0:3])
#clean_sj=[]
#for i in yssj:
#    clean_sj.append(sanitize(i))
#print(clean_sj)
#print(sorted(set(clean_sj))[0:3])

 

 

posted @ 2017-06-21 14:53  花花妹子。  阅读(550)  评论(0编辑  收藏  举报