在一些场景下我们需要对PPT的备注进行字数统计, 比如非常严格的项目答辩、报奖等的PPT音频录制。但是我们发现Macrosoft PowerPoint和WPS PPT等,都没有直接的统计功能,官方提供的统计指导,速度非常慢效率很低。下面提供一种通过Python快速统计中文备注的方法。
方法:
使用python-pptx
库来读取和分析PowerPoint文档。这个库提供了一个可以访问、修改和创建PowerPoint .pptx文件的API。使用它来读取slide的备注,并统计总字数。
下面是一个基本的示例代码:
from pptx import Presentation def count_characters_in_notes(ppt_file): ppt = Presentation(ppt_file) total_characters = 0 for k, slide in enumerate(ppt.slides): if slide.has_notes_slide: notes_slide = slide.notes_slide slide_word_counts = 0 for paragraph in notes_slide.notes_text_frame.paragraphs: for run in paragraph.runs: total_characters += len(run.text) # 直接计算字符数,而不是分割后的单词数 slide_word_counts += len(run.text) print(f'slide {k}: {slide_word_counts}') return total_characters file = '法律智能答辩.pptx' count_words_in_notes = count_characters_in_notes(file) print(f"Total word count in notes: {count_words_in_notes}")
运行结果:
slide 0: 50 slide 1: 15 slide 2: 68 slide 3: 55 slide 4: 125 slide 5: 35 slide 6: 85 slide 7: 62 slide 8: 102 slide 9: 36 slide 10: 31 slide 11: 63 slide 12: 99 slide 13: 105 slide 14: 48 slide 15: 21 slide 16: 140 slide 17: 24 slide 18: 124 slide 19: 62 slide 20: 52 slide 21: 27 slide 22: 53 slide 23: 31 slide 24: 85 slide 25: 79 slide 26: 68 slide 27: 43 slide 28: 36 slide 29: 70 slide 30: 87 slide 31: 90 slide 32: 35 slide 33: 41 slide 34: 89 slide 35: 19 slide 36: 21 slide 37: 74 slide 38: 79 slide 39: 23 slide 40: 115 slide 41: 30 slide 42: 84 slide 43: 77 slide 44: 86 slide 45: 20 slide 46: 74 slide 47: 100 slide 48: 54 slide 49: 149 Total word count in notes: 3241