统计同一目录下同类文件的行数 - oyzway

统计同一目录下同类文件的行数

很久没写代码，挺生手了。这两天写了一些，想统计下这两天所写的 py 代码的行数，一个一个打开数必然是不现实的。Python !

就十来行代码的量，也浪费也不少时间。唉。

动手。

1. 先列出自己所想要达到的目的：

1）获取目录(包括子目录)下某类文件的行数；

2）可以自定义需要统计的文件类型；

3）可以捕获非法参数；

4）封装，可以当模块使用。

2. 先完成第一个小点。记得 [可爱的Python] 中有这样的一句话：

文件是系统的事儿；系统->操作系统->operating system->os模块。

然后呢，就很简单了。import os; dir(os) 列出 os 模块中所有的可用方法与属性。

os.py

1 >>> import os
2 >>> dir(os)
3 ['F_OK', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_NOINHERIT', 'O_RANDOM', 'O_RDONLY', 'O_RDWR', 'O_SEQUENTIAL', 'O_SHORT_LIVED', 'O_TEMPORARY', 'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', 'P_NOWAITO', 'P_OVERLAY', 'P_WAIT', 'R_OK', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'TMP_MAX', 'UserDict', 'W_OK', 'X_OK', '_Environ', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_copy_reg', '_execvpe', '_exists', '_exit', '_get_exports_list', '_make_stat_result', '_make_statvfs_result', '_pickle_stat_result', '_pickle_statvfs_result', 'abort', 'access', 'altsep', 'chdir', 'chmod', 'close', 'closerange', 'curdir', 'defpath', 'devnull', 'dup', 'dup2', 'environ', 'errno', 'error', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'extsep', 'fdopen', 'fstat', 'fsync', 'getcwd', 'getcwdu', 'getenv', 'getpid', 'isatty', 'linesep', 'listdir', 'lseek', 'lstat', 'makedirs', 'mkdir', 'name', 'open', 'pardir', 'path', 'pathsep', 'pipe', 'popen', 'popen2', 'popen3', 'popen4', 'putenv', 'read', 'remove', 'removedirs', 'rename', 'renames', 'rmdir', 'sep', 'spawnl', 'spawnle', 'spawnv', 'spawnve', 'startfile', 'stat', 'stat_float_times', 'stat_result', 'statvfs_result', 'strerror', 'sys', 'system', 'tempnam', 'times', 'tmpfile', 'tmpnam', 'umask', 'unlink', 'unsetenv', 'urandom', 'utime', 'waitpid', 'walk', 'write']

浏览一圈我看到了两个有用的东西：path, walk 。好，继续 dir 。

os.path

1 >>> dir(os.path)
2 ['__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_getfullpathname', 'abspath', 'altsep', 'basename', 'commonprefix', 'curdir', 'defpath', 'devnull', 'dirname', 'exists', 'expanduser', 'expandvars', 'extsep', 'genericpath', 'getatime', 'getctime', 'getmtime', 'getsize', 'isabs', 'isdir', 'isfile', 'islink', 'ismount', 'join', 'lexists', 'normcase', 'normpath', 'os', 'pardir', 'pathsep', 'realpath', 'relpath', 'sep', 'split', 'splitdrive', 'splitext', 'splitunc', 'stat', 'supports_unicode_filenames', 'sys', 'walk', 'warnings']

这下好了，看到两个熟悉的方法！os.path.split() ; os.path.splitext() 。不过忘记了用法，help !

help

 1 >>> help(os.path.split)
 2 Help on function split in module ntpath:
 3 
 4 split(p)
 5     Split a pathname.
 6     
 7     Return tuple (head, tail) where tail is everything after the final slash.
 8     Either part may be empty.
 9 
10 >>> help(os.path.splitext)
11 Help on function splitext in module ntpath:
12 
13 splitext(p)
14     Split the extension from a pathname.
15     
16     Extension is everything from the last dot to the end, ignoring
17     leading dots.  Returns "(root, ext)"; ext may be empty.

呐呐呐，有东西真正能用了！开始试验。目录是有 os.walk ; os.path.split ; os.path.splitext 。可是 help(os.walk) 之后吓了我一跳，整整输出了 60 行 pydoc 。发挥自己所长，搜索可用东西！``dirpath, dirnames, filenames`` ; ``for root, dirs, files in os.walk('python/Lib/email'):`` 。有 Example 啊，知道咋用了！开始动手。

os.walk

1 >>> p = os.walk('G:\\Python26\\my_work')
2 >>> p
3 <generator object walk at 0x014B1698>
4 >>> print p
5 <generator object walk at 0x014B1698>
6 >>>

唔，看来这东西不是这样用。不过 generator object 似曾相识。for ?

for

1 >>> for f in p:
2     print f,
3 
4 ....  此处省略大堆内容 ...

得到自己想要的了。不要忘了之前 help(os.walk) 有 Example 啊。

for_2

1 for root, dirs, files in os.walk('G:\\Python26\\my_work'):
2     print files,
3 
4 ... 此处省略一大堆内容 ...

可以提取出需要同一目录下包括子目录的文件了！可是这样的输出是用不了的，得把它放进一个地方里去。list ?

all_files

1 >>> all_files = []
2 >>> for root, dirs, files in os.walk('G:\\Python26\\my_work'):
3     all_files.extend(files)
4 
5     
6 >>> all_files
7 ... 此处省略一大堆内容 ...

嗯，逐一打开并对统计各个文件的行数便可。可是另一个问题出现了。我拿到了文件名，但是没有路径，python 是不会帮你匹配各文件的路径再打开的。看来得把各个文件都加上相应的咱径名。

不要忘了，os.walk 的 Example 是 for root, dirs, files 的，这个 root 便是目录名了。多好用的方法啊。

all_files_2

1 >>> all_files = []
2 >>> for root, dirs, files in os.walk('G:\\Python26\\my_work'):
3     for f in files:
4         all_files.append(root + '\\' + f)
5 
6         
7 >>> all_files
8 ... 此处省略一大堆内容 ...

对了，这个过程的输出你可能会看到一些 ~\xbb\xf9\xb4 类的东西，不要担心，目录下有中文命名的文件吧。可以忽略，也可以使用适当的编码来解决。当然，这是将代码写在文件中的后话了。好了。现在已经可以拿到需要的东西，即有路径的文件了。

接下来便是打开文件，统计行数这些工作了。这些比较简单（当然，全文都很简单^^）。

line_num

 1 >>> for f in all_files:
 2     pass
 3 
 4 # 因为忘记设一个 line_num 存储行数的值了，所以上面用了pass
 5 
 6 >>> line_num = 0
 7 >>> for f in all_files:
 8     line_num += len(open(f).readlines())
 9 
10     
11 >>> line_num
12 1288

好吧，完成了一半。

而本文的标题是统计同一目录下的文件行数。到这里为止只是统计了文本文件的行数，并且假设该目录下只有 py 一类文件。不管怎样，成果是出来了。将所有有效代码放在一起吧。

final

 1 >>> all_files = []
 2 >>> for root, dirs, files in os.walk('G:\\Python26\\my_work'):
 3     for f in files:
 4         all_files.append(root + '\\' + f)
 5 
 6         
 7 >>> line_num = 0
 8 >>> for f in all_files:
 9     line_num += len(open(f).readlines())
10 
11     
12 >>> line_num
13 1288

刚完成这个的时候，看到了这样一个东西，洪强宁在Qcon 上的 PPT ：Python于Web 2.0网站的应用 - QCon Beijing 2010 。我了个去啊，我这几行代码还有很大的优化空间呢。

亮点：1）os.path.join(dirpath, filename) ; 2）len(list(open(path))) ; 3) ext 。

下回继续。。。我得继续努力。

-------------20111217 18：00 updated-------------

count_line.py

 1 import os
 2 
 3 def count_lines(*args, **kwargs):
 4     path = raw_input('Input the path of the file: ')
 5     ext = raw_input('Input the file ext: ')
 6     
 7     num_dict = {}
 8 
 9     for root, dirs, files in os.walk( path ):
10         for f in files:
11             ext_key = os.path.splitext(f)[1]
12             if ext_key in ext:
13                 file_path = os.path.join(root, f)
14                 if ext_key in num_dict:
15                     num_dict[ ext_key ] += len(list(open(file_path)))
16                 else:
17                     num_dict[ ext_key ] = len(list(open(file_path)))
18                     
19     return num_dict
20 
21 if __name__ == '__main__':
22     print count_lines()
23     raw_input()

运行效果图：

打包成了 exe 文件，打包方法参考这里。有需要此功能但又没装 py 开发环境的可以试试。双击即可。

附件：https://files.cnblogs.com/way_testlife/count_lines.rar

posted on 2011-12-17 13:01 oyzway 阅读(2346) 评论(1) 编辑收藏举报

刷新页面返回顶部