ASCII or UTF-8?

ASCII or UTF-8?

问题

Long long time ago before world scripts birth, text files are all ASCII.
Nowadays, we have world scripts.
I would like to ask if I open up a text file in a hex editor, is there a way to tell its code page is in ASCII or UTF-8?

 

回答1

UTF-8 is backwards compatible with ASCII: an ASCII text file is also a UTF-8 text file.

If a file contains bytes starting with 8 through F it's not ASCII.

If a file is not ASCII, it may be UTF-8 if every byte that starts with C, D, E, or F is followed by one to three bytes that start with 8, 9, A, or B. If any of these bytes appears in any other context it's not UTF-8.

There are a few more requirements for valid UTF-8, but they are harder to glean with a hex editor. See https://en.m.wikipedia.org/wiki/UTF-8

作者:Chuck Lu    GitHub    
posted @   ChuckLu  阅读(24)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
历史上的今天:
2020-06-20 哈佛大学公开课 幸福课 积极心理学
2019-06-20 Ajax
2019-06-20 jQuery FileUpload doesn't trigger 'done'
2019-06-20 Sending forms through JavaScript[form提交 form data]
2019-06-20 Sending form data
2019-06-20 Your first HTML form
2019-06-20 form submission
点击右上角即可分享
微信分享提示