Python csv存储

对比其他语言来说,python中的文件句柄操作是即简洁又简便。常用保存形式有TXT,JSON,CSV。本文就介绍了CSV文件存储

 

写入:

这里先看一个最简单的例子

import csv
with open('./data.csv',mode='w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['id','name','12'])
    writer.writerow(['1', 'ccdjun','20'])
    writer.writerow(['2', 'bob', '33'])
    writer.writerow(['3', 'alex', '22’])

首先打开data.csv文件,指定打开模式为w,随后实例化一个writer对象,传入文件句柄即csvfile,最后调用writerow()方法写入即可完成。运行结束后会生成一个data.csv文件其内容如下

id,name,12
1,ccdjun,20
2,bob,33
3,alex,22

 

也可以使用writerows()写入多行,此时参数就需要为二位列表

import csv
with open('./data.csv',mode='w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['id','name','12’])
     writer.writerows([['1', 'ccdjun','20’],['2', 'bob', '33’],['3', 'alex', '22']])

这时的的效果与上面是相同的

 

但是一般情况下,爬虫爬取的都是结构化数据,我们一般都会用字典来表示,在csv库中也提供了字典的写入方式:

with open('./content.csv',mode=‘w') as csvfile:
    filednames = [‘id’,’name’,’age']
    writer = csv.DictWriter(csvfile,fieldnames=filenames)
    writer.writeheader()
    writer.writerow({‘id’:1,’namet’:’ccdjun’,’age’:22})
    writer.writerow({‘id’:2,’namet’:’alex’,’age’:25})
    writer.writerow({‘id’:3,’namet’:’bob,’age’:32})

这里先定义了三个字端,用filednames表示,然后将其传给DictWrite来初始化一个字典写入对象,接着可以调用writeheader()方法先写入头信息,然后再调用writerow方法传入相应字典即可。最终写入结果是一样的。

如果想追加写入的话,mode后面赋上'a'就可以了,如果想写入中文的话就得指定编码,也就是在mode后添加encoding='utf-8'

with open(‘./data.csv’,mode=‘a’,encoding=‘utf-8') as csvfile:

 

读取:

同样可以使用csv库读取CSV文件

with open('./data.csv',mode='r') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print(row)

 

下面看一个爬虫使用csv文件存储的例子:

import requests
from lxml import etree
import csv

headers = { 'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36', } for i in range(1,11): url = f'https://www.qiushibaike.com/text/page/{i}/‘ #构造糗百前10页的url response = requests.get(url=url,headers=headers).text tree = etree.HTML(response) div_list = tree.xpath('//*[@id="content"]/div/div[2]/div') for div in div_list: content = div.xpath('./a/div/span//text()')[0] author = div.xpath('./div/a[2]/h2/text()')[0] # print(author,content) with open('./content.csv',mode='a',encoding='utf-8') as csvfile: filenames = ['author','content'] writer = csv.DictWriter(csvfile,fieldnames=filenames) writer.writeheader() writer.writerow({'author':author,'content':content})

爬虫例子仅供学习参考不作它用

 

posted @ 2021-04-14 23:26  Ccdjun  阅读(2175)  评论(0编辑  收藏  举报