Python linecache、glob模块

今天学习了两个好玩的模块，linecache、glob模块

linecache模块

在python中，有个好用的模块linecache，该模块允许从任何文件里得到任何的行，并且使用缓存进行优化，常见的情况是从单个文件读取多行

#从linecache的名称，我们可以知道该模块和cache(缓存)有关

#linecache现把文件读入到缓存中，在以后访问文件的时候，就不必要再从硬盘读取

#所以经常用于那些读取频率很高的文件

还可以参考：open()

linecache提供了如下几个函数：

linecache.getlines(filename,module_globals=None)

从名为filename的文件中得到全部内容，输出为列表格式，以文件每行为列表中的一个元素,并以linenum-1为元素在列表中的位置存储

linecache.getline(filename, lineno,module_globals=None)

从名为filename的文件中得到第lineno行。这个函数从不会抛出一个异常–产生错误时它将返回”（换行符将包含在找到的行里）

如果文件没有找到，这个函数将会在sys.path搜索

linecache.clearcache()

清除缓存。如果你不再需要先前从getline()中得到的行

linecache.checkcache(filename=None)

检查缓存的有效性。如果在缓存中的文件在硬盘上发生了变化，并且你需要更新版本，使用这个函数。如果省略filename，将检查缓存里的所有条目

linecache.updatecache(filename ,module_globals=None)

更新文件名为filename的缓存。如果filename文件更新了，使用这个函数可以更新linecache.getlines(filename)返回的列表

下面看一个例子：

import linecache
import pprint
 
# 创建一个文件
filename = 'linecacheTest.txt'
myfile = open(filename, 'w')
for i in range(1, 5):
   myfile.write('This is the '+str(i)+'th line\n')
myfile.close()
 
# 获取所有的行
pprint.pprint(linecache.getlines(filename))
 
# 获取其中任意一行
pprint.pprint(linecache.getline(filename,3))
 
# 获取其中第3,4行
pprint.pprint(linecache.getlines(filename)[2:4])
 
# 释放缓存
linecache.clearcache()

结果是：

['This is the 1th line\n',
 'This is the 2th line\n',
 'This is the 3th line\n',
 'This is the 4th line\n']
'This is the 3th line\n'
['This is the 3th line\n', 'This is the 4thline\n']

注意：使用linecache.getlines(filename)打开文件的内容之后，如果filename文件发生了改变，如果要再次用linecache.getlines(filename)获取的内容，不是文件的最新内容，还是之前的内容，此时有两种方法：

1、使用linecache.checkcache(filename)来更新文件在硬盘上的缓存，然后在执行linecache.getlines(filename)就可以获取到a.txt的最新内容；

2、直接使用linecache.updatecache(filename)，即可获取最新的a.txt的最新内容

另外：

1）、读取文件之后，不需要使用文件的缓存时，需要在最后清理一下缓存，使linecache.clearcache()清理缓存，释放缓存

2）、此模块使用内存来缓存文件内容，所以需要耗费内存，打开文件的大小和打开速度和你的内存大小有关系

glob模块

globbing是通配符的意思，这个模块的意思就是查找符合特定规则的文件路径名

常用的通配符有下面几个：

* matches everything

? matches any single character

[seq] matchesany character in seq

[!seq] matchesany character not in seq

glob模块提供了如下几个函数：

glob.glob(pathname)

返回所有匹配的文件路径列表。它只有一个参数pathname，定义了文件路径匹配规则，这里可以是绝对路径，也可以是相对路径，可以使用通配符。

glob.iglob(pathname)

获取一个可遍历的对象，使用它可以逐个获取匹配的文件路径名。与glob.glob()的区别是：glob.glob同时获取所有的匹配路径，而 glob.iglob一次只获取一个匹配路径，一般用于循环处理每个路径

glob.escape(pathname)

忽略所有的通配符，如果文件名中含有通配符，但又不想一个一个的使用’\’进行转义，那么就使用这个函数，忽略掉所有的通配符

下面看这个例子：

>>> import glob
>>> print(glob.glob('a*.*'))
['autohomehtml.html', 'autohomeParser.py','autotemp.txt', 'autotempfile1.txt']
>>> print(glob.glob('*.py'))
['autohomeParser.py','beautifulSoupTest.py', 'collectionsTest.py', 'itertoolsTest.py','linecacheTest.py', 'linecacheTest_forBlog.py', 'lxmlTest.py', 'myRe.py','PyQtTest.py', 'requestsTest.py', 'tablibTest.py', 'timeitTest.py','urllibtest.py']
>>> print(glob.glob('*[0-9]*.*'))
['autotempfile1.txt']
>>> print(glob.glob('*.txt'))
['autotemp.txt', 'autotempfile1.txt','linecacheTest.txt', 'linecachetext.txt', 'mypage.txt', '安装scrapy.txt']
>>> for i in glob.iglob('*.py'):
                  print(i)
 
autohomeParser.py
beautifulSoupTest.py
collectionsTest.py
itertoolsTest.py
linecacheTest.py
linecacheTest_forBlog.py
lxmlTest.py
myRe.py
PyQtTest.py
requestsTest.py
tablibTest.py
timeitTest.py
urllibtest.py

posted @ 2014-12-01 10:07 Callingwisdom 阅读(147) 评论(0) 收藏举报

刷新页面返回顶部

Python linecache、glob模块

公告