python标准库之文本

对于python来说,最显而易见的文本处理工具就所string类,不过除此之外,标准库中还提供了大量的其他工具,可以帮大家轻松的完成高级文本处理。

最近在看《python标准库》一书,所以下面就介绍集中常见的方法:

      string——文本常量

  目前string模块中还有两个函数未移除:capwords()和maketrans(). capwords()的作用是将一个字符串中所有的单词的首字母大写。例如:

1 import string
2 
3 s = "hello! this is test...."
4 
5 print s
6 print string.capwords(s)

  其运行结果为:
       hello! this is test...

       Hello! This Is Test...

  maketrans()函数将创建转换表,可以用来结合translate()方法将一组字符修改为另一组字符,这种做法比反复调用replace()更为高效。例如:

1 import string
2 
3 pass_code = string.maketrans("abcdefghi","123456789")
4 s = "the quick brown fox jumped over the lazy dog."
5 
6 print s
7 print s.translate(pass_code)

  其运行结果为:

        the quick brown fox jumped over the lazy dog.

        t85 qu93k 2rown 6ox jump54 ov5r t85 l1zy 4o7.
  在这个例子中,一些字母被替换为相应的“火星文”数字(事先准备好的对应字母与数字:"abcdefghi","123456789"
  

备注:  translate()是字符的一一映射. 每个字符只要出现都会被替换为对应的字符.
       replace()是字符串替换, 字符串完整出现后被整体替换.replace的两个字符串参数长度可以不同.

textwrap——格式化文本段落

  以下例子中用到的范本:

    sample_text = """the textwrap module     can be used to    format text for otput in situations where pretty-printing s dsired,   it offers                   programmatic functionality similar to the paragraph wrapping or filling features found in many text editors"""

  示例代码为:

import textwrap

sample_text = """the textwrap module   can be used to format text for  otput     in situations where pretty-printing s dsired,   it offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors
"""    
print "Test:\n"
print textwrap.fill(sample_text,width=50)

  输出内容为:

  Test:

         the textwrap module    can be used to format text
  for otput in situations    where pretty-printing s
  dsired,     it offers programmatic functionality
  similar   to the paragraph    wrapping or filling
  features found in many text editors

  fill()函数取文本作为输入,生成格式化的文本作为输出。

  虽然文本最后的结果为左对齐,不过只有第一行保留了缩进,其余各行前面的空格则嵌入到段落中。

  下面修改一下代码即可(dedent()函数执行去除缩进):

print textwrap.dedent(sample_text)

  输出内容为:

  Test:

  the textwrap module can be used to format text
  for otput in situations where pretty-printing s
  dsired, it offers programmatic functionality
  similar to the paragraph wrapping or filling
  features found in many text editors

除以上方法外还可以结合dedent和fill使用,可以去除缩进的文本传入fill(),并提供一组不同的width值。按照指定宽带结合dedent去除缩进现实。另外还可以进行悬挂缩进处理。示例代码分别为:

import textwrap

sample_text = """the textwrap module   can be used to format text for  otput     in situations where pretty-printing s dsired,   it offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors
"""    
dedent_text = textwrap.dedent(sample_text).strip()
for width in [45,70]:
    print "%d columns:\n" % width
    print textwrap.fill(dedented_text,width=width)
    print
import textwrap

sample_text = """the textwrap module   can be used to format text for  otput     in situations where pretty-printing s dsired,   it offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors
"""    
dedent_text = textwrap.dedent(sample_text).strip()
print textwrap.fill(dedent_text,
                    initial_indent=" ",
                    subsequent_indent= " " *4,
                    width=50,
                    )        

备注:strip()去除字符串开始及结束的空格符号等。

posted @ 2013-04-27 18:58  烤串的_  阅读(287)  评论(0编辑  收藏  举报