字符串划分方法汇总

基本的 split() 方法

split() 按照指定分隔符将字符串分割成一个列表。如果不指定分隔符，默认使用空格。

text = "apple banana orange"
result = text.split()  # 默认按空格分割，结果：['apple', 'banana', 'orange']

text = "apple,banana,orange"
result = text.split(",")  # 按逗号分割，结果：['apple', 'banana', 'orange']

partition() 和 rpartition() 方法

partition() 将字符串按照第一个匹配的分隔符分成三部分：分隔符前、分隔符本身、分隔符后。
rpartition() 从右边开始查找第一个匹配的分隔符。

text = "apple-banana-orange"
result = text.partition("-")  # ('apple', '-', 'banana-orange')

result = text.rpartition("-")  # ('apple-banana', '-', 'orange')

多分隔符分割：re.split()

使用 re.split() 可以按多个分隔符分割字符串。

import re
text = "apple; banana, orange"
result = re.split(r"[;, ]+", text)  # ['apple', 'banana', 'orange']

按固定长度分割字符串
可以使用列表解析等方式，按固定长度划分字符串。

text = "abcdefgh"
n = 2
result = [text[i:i+n] for i in range(0, len(text), n)]  # ['ab', 'cd', 'ef', 'gh']

使用 csv 模块处理复杂分隔符（如逗号、引号等）
如果字符串中包含逗号或特殊格式，可以用 csv 模块处理。

mport csv
text = 'apple,"banana, mango",orange'
reader = csv.reader([text])
result = next(reader)  # ['apple', 'banana, mango', 'orange']

自然语言处理的分词（Tokenizer）
使用 nltk、spaCy 等库可以将文本分割成词语或句子。

from nltk.tokenize import word_tokenize
text = "This is a sample sentence."
result = word_tokenize(text)  # ['This', 'is', 'a', 'sample', 'sentence', '.']

textwrap 模块按宽度分割
textwrap 模块可将字符串按指定宽度拆分成多行。

import textwrap
text = "This is a long string that needs to be wrapped."
result = textwrap.wrap(text, width=10)
# ['This is a', 'long string', 'that needs', 'to be', 'wrapped.']

posted @ 2024-10-28 13:37 XieBuWan 阅读(220) 评论(0) 收藏举报

刷新页面返回顶部