查看以及改变文件的编码格式
Linux
https://www.shellhacks.com/linux-check-change-file-encoding/
显示
在某一个目录下,直接执行file *
$ file *
chucklu.autoend.js: HTML document, UTF-8 Unicode text, with very long lines, with CRLF line terminators
custom.css: UTF-8 Unicode text, with CRLF line terminators
SimpleMemory.css: UTF-8 Unicode text, with CRLF line terminators
$ file *
chucklu.autoend.js: HTML document, Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators
custom.css: UTF-8 Unicode text, with CRLF line terminators
SimpleMemory.css: UTF-8 Unicode text, with CRLF line terminators
$ file -bi chucklu.autoend.js
text/html; charset=utf-8
$ file -bi custom.css
text/plain; charset=utf-8
-b,--brief Don’t print filename (brief mode)
-i, --mime Print filetype and encoding
file -i *
Daily Sales Report_2021_04_09.bad.csv: application/csv; charset=utf-8
Daily Sales Report_2021_04_09.good.csv: application/csv; charset=utf-16le
file *
Daily Sales Report_2021_04_09.bad.csv: CSV text
Daily Sales Report_2021_04_09.good.csv: CSV text
file * --mime-encoding --mime-type
Daily Sales Report_2021_04_09.bad.csv: application/csv; charset=utf-8
Daily Sales Report_2021_04_09.good.csv: application/csv; charset=utf-16le
修改
iconv -f utf-16 -t ascii text.txt
windows
https://stackoverflow.com/questions/64860/best-way-to-convert-text-files-between-character-sets
On Windows with Powershell (Jay Bazuzi):
-
PS C:\> gc -en utf8 in.txt | Out-File -en ascii out.txt
(No ISO-8859-15 support though; it says that supported charsets are unicode, utf7, utf8, utf32, ascii, bigendianunicode, default, and oem.)
Edit
Do you mean iso-8859-1 support? Using "String" does this e.g. for vice versa
gc -en string in.txt | Out-File -en utf8 out.txt
Note: The possible enumeration values are "Unknown, String, Unicode, Byte, BigEndianUnicode, UTF8, UTF7, Ascii".
- CsCvt - Kalytta's Character Set Converter is another great command line based conversion tool for Windows.
How to detect the encoding of a file?
There is a pretty simple way using Firefox. Open your file using Firefox, then View > Character Encoding. Detailed here.
Files generally indicate their encoding with a file header. There are many examples here. However, even reading the header you can never be sure what encoding a file is really using.
For example, a file with the first three bytes 0xEF,0xBB,0xBF
is probably a UTF-8 encoded file. However, it might be an ISO-8859-1 file which happens to start with the characters 
. Or it might be a different file type entirely.
Notepad++ does its best to guess what encoding a file is using, and most of the time it gets it right. Sometimes it does get it wrong though - that's why that 'Encoding' menu is there, so you can override its best guess.
For the two encodings you mention:
- The "UCS-2 Little Endian" files are UTF-16 files (based on what I understand from the info here) so probably start with
0xFF,0xFE
as the first 2 bytes. From what I can tell, Notepad++ describes them as "UCS-2" since it doesn't support certain facets of UTF-16. - The "UTF-8 without BOM" files don't have any header bytes. That's what the "without BOM" bit means.
使用ude查看文件编码
https://www.nuget.org/packages/UDE.CSharp
public void GetEncoding2(string filePath) { using (FileStream fs = File.OpenRead(filePath)) { Ude.CharsetDetector cdet = new Ude.CharsetDetector(); cdet.Feed(fs); cdet.DataEnd(); if (cdet.Charset != null) { Console.WriteLine("Charset: {0}, confidence: {1}", cdet.Charset, cdet.Confidence); } else { Console.WriteLine("Detection failed."); } } }
Charset: ASCII, confidence: 1 file *显示的是 ASCII text, with CRLF line terminators
Charset: UTF-8, confidence: 0.7525 file *显示的是UTF-8 Unicode text, with CRLF line terminators
Charset: gb18030, confidence: 0.99 file *显示的是ISO-8859 text, with CRLF line terminators
读取文件前4个字节
public string GetEncoding(string filePath) { var bom = new byte[4]; using (var file = new FileStream(filePath, FileMode.Open, FileAccess.Read)) { file.Read(bom, 0, 4); } var str = string.Join(" ", bom.Select(x => x.ToString("X2"))); Console.WriteLine($"{str}, {filePath}"); return str; }
使用C#代码保存文件为utf8 without bom
filename = "2019-04-23-001.txt"; filePath = Path.Combine(folder, filename); using (StreamWriter sw = new StreamWriter(File.Open(filePath, FileMode.Create), new UTF8Encoding(false))) { sw.WriteLine("hello"); } filename = "2019-04-23-002.txt"; filePath = Path.Combine(folder, filename); using (StreamWriter sw = new StreamWriter(File.Open(filePath, FileMode.Create), new UTF8Encoding(false))) { sw.WriteLine("你好"); }
2019-04-23-001.txt: ASCII text, with CRLF line terminators
2019-04-23-002.txt: UTF-8 Unicode text, with CRLF line terminators
C#在保存的时候,如果没有特殊字符,会自动保存utf8 without bom保存为ascii.
filename = "2019-04-23-003.txt"; filePath = Path.Combine(folder, filename); using (StreamWriter sw = new StreamWriter(File.Open(filePath, FileMode.Create), Encoding.ASCII)) { sw.WriteLine("hello"); }
filename = "2019-04-23-004.txt"; filePath = Path.Combine(folder, filename); using (StreamWriter sw = new StreamWriter(File.Open(filePath, FileMode.Create), Encoding.ASCII)) { sw.WriteLine("你好"); }
2019-04-23-003.txt: ASCII text, with CRLF line terminators
2019-04-23-004.txt: ASCII text, with CRLF line terminators
使用系统自带的notepad,新建文件并保存为ANSI
第一个文本文件中的内容,包含中文“你好”
2019-04-23-011.txt: ISO-8859 text, with no line terminators
第二个文本文件中的内容,包含英文“hello”
2019-04-23-012.txt: ASCII text, with no line terminators
扩展阅读
作者:Chuck Lu GitHub |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
2016-05-18 string[][]和string[,] 以及 int[][]和int[,]
2015-05-18 const和readonly的区别
2015-05-18 Anchor和Dock的区别