Python--csv文件处理
CSV(Comma-Separator Values)逗号分割值,由于是纯文本文件,任何编辑器都可以打开。下面用csv和pandas两种方式进行csv文件操作
原始csv文件内容
Supplier Name,Invoice Number,Part Number,Cost,Purchase Date Supplier X,001-1001,2341,$500.00 ,1/20/14 Supplier X,001-1001,2341,$500.00 ,1/20/14 Supplier X,001-1001,5467,$750.00 ,1/20/14 Supplier X,001-1001,5467,$750.00 ,1/20/14 Supplier Y,50-9501,7009,$250.00 ,1/30/14 Supplier Y,50-9501,7009,$250.00 ,1/30/14 Supplier Y,50-9505,6650,$125.00 ,2002/3/14 Supplier Y,50-9505,6650,$125.00 ,2002/3/14 Supplier Z,920-4803,3321,$615.00 ,2002/3/14 Supplier Z,920-4804,3321,$615.00 ,2002/10/14 Supplier Z,920-4805,3321,"$6,015.00 ",2/17/14 Supplier Z,920-4806,3321,"$1,006,015.00 ",2/24/14
1. csv包操作csv文件
#coding=utf-8 import sys import csv import re read_file = sys.argv[1] write_file = sys.argv[2] with open(read_file, "r") as readfile: with open(write_file, "w") as writefile: reader = csv.reader(readfile, delimiter=",") writer = csv.writer(writefile, delimiter=",") header = next(reader) writer.writerow(header) for rowlist in reader: #通过正则表达是进行行匹配 if re.match(r"^001-*.", str(rowlist[1])): print (rowlist) writer.writerow(rowlist)
>>> D:\Pystu>python parsecsvfile.py supplier_data.csv ceshi.csv
>>> Supplier Name,Invoice Number,Part Number,Cost,Purchase Date
>>> Supplier X,001-1001,2341,500.00,1/20/14>>>SupplierX,001−1001,2341,500.00 ,1/20/14
>>> Supplier X,001-1001,5467,750.00,1/20/14>>>SupplierX,001−1001,5467,750.00 ,1/20/14
2. pandas包操作csv文件
#coding=utf-8 ''' 运用pandas包解析csv文件''' import pandas from pandas import Series,DataFrame import sys file_path = sys.argv[1] write_path = sys.argv[2] data_frame = pandas.read_csv(file_path) #print (data_frame) #注意str的使用 data_frame["Cost"] = data_frame["Cost"].str.replace(",", "").str.strip("$").astype(float) #print (data_frame) newa = data_frame.loc[data_frame["Cost"] > 600, :] #print (newa) newa.to_csv(write_path, index = False)
>>> D:\Pystu>python parse_csv_file_by_pandas.py supplier_data.csv ceshi.csv
>>> Supplier Name,Invoice Number,Part Number,Cost,Purchase Date
>>> Supplier X,001-1001,5467,750.0,1/20/14
>>> Supplier X,001-1001,5467,750.0,1/20/14
>>> Supplier Z,920-4803,3321,615.0,2002/3/14
>>> Supplier Z,920-4804,3321,615.0,2002/10/14
>>> Supplier Z,920-4805,3321,6015.0,2/17/14
>>> Supplier Z,920-4806,3321,1006015.0,2/24/14
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 解答了困扰我五年的技术问题
· 为什么说在企业级应用开发中,后端往往是效率杀手?
· 用 C# 插值字符串处理器写一个 sscanf
· Java 中堆内存和栈内存上的数据分布和特点
· 开发中对象命名的一点思考
· DeepSeek 解答了困扰我五年的技术问题。时代确实变了!
· PPT革命!DeepSeek+Kimi=N小时工作5分钟完成?
· What?废柴, 还在本地部署DeepSeek吗?Are you kidding?
· DeepSeek企业级部署实战指南:从服务器选型到Dify私有化落地
· 程序员转型AI:行业分析