How to remove bad path characters in Python?
Unfortunately, the set of acceptable characters varies by OS and by filesystem.
-
- Use almost any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
- The following reserved characters are not allowed:
< > : " / \ | ? * - Characters whose integer representations are in the range from zero through 31 are not allowed.
- Any other character that the target file system does not allow.
- The following reserved characters are not allowed:
The list of accepted characters can vary depending on the OS and locale of the machine that first formatted the filesystem.
.NET has GetInvalidFileNameChars and GetInvalidPathChars, but I don't know how to call those from Python.
- Use almost any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
- Mac OS: NUL is always excluded, "/" is excluded from POSIX layer, ":" excluded from Apple APIs
- HFS+: any sequence of non-excluded characters that is representable by UTF-16 in the Unicode 2.0 spec
- HFS: any sequence of non-excluded characters representable in MacRoman (default) or other encodings, depending on the machine that created the filesystem
- UFS: same as HFS+
- Linux:
- native (UNIX-like) filesystems: any byte sequence excluding NUL and "/"
- FAT, NTFS, other non-native filesystems: varies
Your best bet is probably to either be overly-conservative on all platforms, or to just try creating the file name and handle errors.
import re
re.sub('[^\w\-_\. ]', '_', filename)
参考:https://stackoverflow.com/questions/1033424/how-to-remove-bad-path-characters-in-python
Turn a string into a valid filename?
import unicodedata import re def slugify(value, allow_unicode=False): """ Taken from https://github.com/django/django/blob/master/django/utils/text.py Convert to ASCII if 'allow_unicode' is False. Convert spaces or repeated dashes to single dashes. Remove characters that aren't alphanumerics, underscores, or hyphens. Convert to lowercase. Also strip leading and trailing whitespace, dashes, and underscores. """ value = str(value) if allow_unicode: value = unicodedata.normalize('NFKC', value) else: value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii') value = re.sub(r'[^\w\s-]', '', value.lower()) return re.sub(r'[-\s]+', '-', value).strip('-_')
https://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-filename