Python编程快速上手-处理CSV文件和JSON数据

CSV模块

Read对象

　　将CSV文件表示为列表的列表

>>> import csv
>>> exampleFile = open('example.csv')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'example.csv'
>>> exampleFile = open('example.csv')
>>> exampleReader = csv.reader(exampleFile)
>>> exampleData = list(exampleReader)
>>> exampleData
[['4/5/2014 13:34', 'Apples', '73'], ['4/5/2014 3:41', 'Cherries', '85'], ['4/6/2014 12:46', 'Pears', '14'], ['4/8/2014 8:59', 'Oranges', '52'], ['4/10/2014 2:07', 'Apples', '152'], ['4/10/2014 18:10', 'Bananas', '23'], ['4/10/2014 2:40', 'Strawberries', '98']]
>>>

　　表达式exampleData[row][col] 来访问特定行和列的值。

>>> exampleData[0][0]
'4/5/2014 13:34'
>>>
>>> exampleData[0][1]
'Apples'
>>> exampleData[0][2]
'73'
>>> exampleData[1][1]
'Cherries'
>>> exampleData[1][0]
'4/5/2014 3:41'
>>> exampleData[6][1]
'Strawberries'
>>>

在for循环中，从Reader对象读取数据

　　对于大型的CSV文件，需要在一个for循环中使用Reader对象。避免将整个文件一次性装入内存。

>>> import csv
>>> exampleFile = open('example.csv')
>>> exampleReader = csv.reader(exampleFile)
>>> for row in exampleReader:
...     print('Row #' + str(exampleReader.line_num) + ' ' + str(row))
...
Row #1 ['4/5/2014 13:34', 'Apples', '73']
Row #2 ['4/5/2014 3:41', 'Cherries', '85']
Row #3 ['4/6/2014 12:46', 'Pears', '14']
Row #4 ['4/8/2014 8:59', 'Oranges', '52']
Row #5 ['4/10/2014 2:07', 'Apples', '152']
Row #6 ['4/10/2014 18:10', 'Bananas', '23']
Row #7 ['4/10/2014 2:40', 'Strawberries', '98']

　　在导入csv模块，从CSV文件得到Reader对象之后，可以循环遍历Reader 对象中的行。每一行时一个值的列表，每个值表示一个单元格。　　

　　print()函数将打印出当前行的编号以及该行的内容，要取出行号，需要使用Reader对象的line_num变量。它包含当前行的编号。Reader 对象只能循环遍历一次，要再次读取CSV文件，必须调用csv.reader，创建一个对象。

Writer对象

>>> outputFile = open('output.csv', 'w', newline='')
>>> outputWriter = csv.writer(outputFile)
>>> outputWriter.writerow(['spam', 'eggs', 'bacon', 'ham'])
21
>>> outputWriter.writerow(['Hello, world!', 'eggs', 'bacon', 'ham'])
32
>>> outputWriter.writerow([1, 2, 3.141592, 4])
16
>>> outputFile.close()
>>>

delimiter和lineterminator关键字参数

　　默认情况下，CSV文件的分隔符时逗号。行终止字符时出现在行末的字符。默认情况下，行终止字符是换行符，可以利用csv.writer()的delimiter和lineterminator关键字参数，将这些字符改成不同的值。

　　传入delimeter='\t'和lineterminator='\n\n'，这将单元格之间的字符改变为制表符，将行之间的字符改变为两个换行符。然后调用writerow()三次，得到3行。

>>> import csv
>>> csvFile = open('example.csv', 'w', newline='')
>>> csvWriter = csv.writer(csvFile, delimiter='\t', lineterminator='\n\n')
>>> csvWriter.writerow(['apples', 'oranges', 'grapes'])
23
>>> csvWriter.writerrow(['eggs', 'bacon', 'ham'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: '_csv.writer' object has no attribute 'writerrow'
>>> csvWriter.writerow(['eggs', 'bacon', 'ham'])
16
>>> csvWriter.writerow(['spam', 'spam', 'spam', 'spam', 'spam', 'spam'])
31
>>> csvFile.close()

从CSV文件中删除表头

程序做到以下几点:

找到当前目录中的所有CSV文件
读取每个文件的全部内容
跳过第一行，将内容写入一个新的CSV文件。

在代码层面上，这意味着该程序需要做到以下几点：

循环遍历从os.listdir() 得到的文件列表，跳过非CSV文件
创建一个CSV Reader对象，读取该文件的内容，利用line_num属性确定要跳过哪一行
创建一个CSV Writer对象，将读入的数据写入新文件。

#!/usr/bin/env python
import csv, os
os.makedirs('headerRemoved', exist_ok=True)
# Loop through every file in the current working directory.
for csvFilename in os.listdir('.'):
  if not csvFilename.endswith('.csv'):
    continue  #skip non-vsv files
  print('Removing header from ' + csvFilename + '...')
  # TODO: Read the CSV file in (skipping first now).
  csvRows = []
  csvFileObj = open(csvFilename)
  readerObj = csv.reader(csvFileObj)
  for row in readerObj:
    if readerObj.line_num == 1:
      continue  #skip first row
    csvRows.append(row)
  csvFileObj.close()
  # TODO: Write out the CSV file.
  csvFileObj = open(os.path.join('headerRemoved', csvFilename), 'w', newline='')
  csvWriter = csv.writer(csvFileObj)
  for row in csvRows:
    csvWriter.writerow(row)
  csvFileObj.close()

1 C:\PyProjects>C:/Python37/python3.exe c:/PyProjects/removeCsvHeader.py
2 Removing header from example.csv...
3 Removing header from output.csv...

执行结果

第1步：循环遍历每个CSV文件

　　循环遍历当前工作目录中所有CSV文件名的列表。

#!/usr/bin/env python
import csv, os
os.makedirs('headerRemoved', exist_ok=True)
# Loop through every file in the current working directory.
for csvFilename in os.listdir('.'):
  if not csvFilename.endswith('.csv'):
    continue  #skip non-vsv files
  print('Removing header from ' + csvFilename + '...')
  # TODO: Read the CSV file in (skipping first now).
  # TODO: Write out the CSV file.

　　os.makedirs() 调用将创建headerRemoved文件夹，所有的无表头的CSV文件将写入该文件夹。针对os.listdir('.') 进行for循环完成一部分任务，但这会遍历工作目录中的所有文件，所以需要在循环开始处添加一些代码，跳过扩展名不是.csv的文件。如果遇到非CSV文件，continue语句让循环转向下一个文件名。

第2步：读取CSV文件

　　循环当前是否在处理第一行

#!/usr/bin/env python
--snip--
  # TODO: Read the CSV file in (skipping first now).
  csvRows = []
  csvFileObj = open(csvFilename)
  readerObj = csv.reader(csvFileObj)
  for row in readerObj:
    if readerObj.line_num == 1:
      continue  #skip first row
    csvRows.append(row)
  csvFileObj.close()
  # TODO: Write out the CSV file.

　　Reader对象的line_num属性可以用了确定当前读入的是CSV文件的哪一行。另一个for循环会遍历CSV Reader对象返回所有行，除了第一行，所有行都会添加到csvRows.

第3步：写入CSV文件，没有第一行

　　现在csvRows包含除第一行的所有行，该列表需要写入headerRemoved 文件夹的一个CSV文件

#!/usr/bin/env python
--snip--
# Loop through every file in the current working directory.
for csvFilename in os.listdir('.'):
  if not csvFilename.endswith('.csv'):
    continue  #skip non-vsv files
 
  --snip--
  # TODO: Write out the CSV file.
  csvFileObj = open(os.path.join('headerRemoved', csvFilename), 'w', newline='')
  csvWriter = csv.writer(csvFileObj)
  for row in csvRows:
    csvWriter.writerow(row)
  csvFileObj.close()

　　CSV Writer 对象利用csvFilename 将列表写入headerRemoved中的一个CSV文件。这将覆盖原来的文件。

JSON和API

　　JSON格式数据的例子：

{"name": "Zophie", "isCat": true,

"miceCaught": 0, "napsTaken": 37.5,

"felineIQ": null}

　　很多网站都提供JSON格式的内容，作为程序与网站交互的方式。这就是所谓的提供“应用程序编程接口（API）”。访问API和通过URL访问认识其他网页时一样的。不同的时，API 返回的数据时针对机器格式化的（例如用JSON），API不是人容易阅读的。

　　利用API，可以编程完成下列任务：

从网站抓取原始数据（访问API通常比下载网页并用Beautiful Soup解析HTML更方便）。
自动从一个社区网络账号下载新的帖子，并发布到另一个账号。
从维基百科提取数据，放送到计算机的一个文本文件中。为你个人的电影收藏创建一个“电影百科全书”

json模块

　　JSON不能存储每一种Python值，JSON只能包含以下数据结构的值：字符串、整型、浮点型、布尔型、列表、字典和NoneType；JSON 不能表示Python特有的对象，如File对象、CSV Reader或Writer对象、Regex对象或Selenium WebElement对象。

用loads()函数读取JSON

　　导入json模块，就可以调用loads()，向它传入一个JSON数据字符串。注意，JSON字符串总是用双引号，它将该数据返回为一个Python字典

>>> stringOfJsonData = '{"name": "Zophie", "isCat": true, "miceCaught": 0, "felineIO": null}'
>>> import json
>>> jsonDataAsPythonValue = json.loads(stringOfJsonData)
>>> jsonDataAsPythonValue
{'name': 'Zophie', 'isCat': True, 'miceCaught': 0, 'felineIO': None}
>>>

用dumps函数写入JSON

　　json.dumps()函数将一个Python值转换成JSON格式的数据字符串。

>>> pythonValue = {'isCat': True, 'miceCautht': 0, 'name': 'Zophie', 'felineIO': None}
>>> import json
>>> stringOfJsonData = json.dumps(pythonValue)
>>> stringOfJsonData
'{"isCat": true, "miceCautht": 0, "name": "Zophie", "felineIO": null}'
>>>

　　该值只能时以下基本Python数据类型之一：字典、列表、整型、浮点型、字符串、布尔型或None

取得当前的天气数据

完整代码

 1 #!/usr/bin/env python
 2 # quickWeather.py - Prints the weather for a location from the command line.
 3 
 4 from http.client import responses
 5 import json, requests, sys
 6 # Compute location from command line arguments.
 7 if len(sys.argv) < 2:
 8   print('Usage: quickWeather.py location')
 9   sys.exit()
10 location = ' '.join(sys.argv[1:])
11 # TODO: Download the JSON data from OpenWeatherMap.org's API.
12 url = 'http://api.openweathermap.org/data/2.5/forecast/daily?q=%s&cnt=3' % (location)
13 response = responses.get(url)
14 response.raise_for_status()
15 # TODO: Load JSON data into a Python variable.
16 weatherData = json.loads(response.text)
17 # Print weather descriptions.
18 w = weatherData['list']
19 print('Current weather in %s:' % (location))
20 print(w[0]['weather'][0]['main'], '-', w[0]['weather'][0]['description'])
21 print()
22 print('Tomorrow')
23 print(w[1]['weather'][0]['main'], '-', w[1]['weather'][0]['description'])
24 print()
25 print('Day after tomorrow: ')
26 print(w[2]['weather'][0]['main'], '-', w[2]['weather'][0]['description'])

quikWeather.py

第1步：从命令行参数获取位置

#!/usr/bin/env python
# quickWeather.py - Prints the weather for a location from the command line.
import json, requests, sys
# Compute location from command line arguments.
if len(sys.argv) < 2:
  print('Usage: quickWeather.py location')
  sys.exit()
location = ' '.join(sys.argv[1:])
# TODO: Download the JSON data from OpenWeatherMap.org's API.
# TODO: Load JSON data into a Python variable.

第2步：下载JSON数据

#!/usr/bin/env python
# quickWeather.py - Prints the weather for a location from the command line.

--snip--
# TODO: Download the JSON data from OpenWeatherMap.org's API.
url = 'http://api.openweathermap.org/data/2.5/forecast/daily?q=%s&cnt=3' % (location)
response = responses.get(url)
response.raise_for_status()
# TODO: Load JSON data into a Python variable.

第3步：加载JSON数据并打印天气

#!/usr/bin/env python
# quickWeather.py - Prints the weather for a location from the command line.

--snip--
# Print weather descriptions.
w = weatherData['list']
print('Current weather in %s:' % (location))
print(w[0]['weather'][0]['main'], '-', w[0]['weather'][0]['description'])
print()
print('Tomorrow')
print(w[1]['weather'][0]['main'], '-', w[1]['weather'][0]['description'])
print()
print('Day after tomorrow: ')
print(w[2]['weather'][0]['main'], '-', w[2]['weather'][0]['description'])

posted on 2022-05-03 19:14 HelonTian 阅读(281) 评论(0) 编辑收藏举报