网页中的字符编码(html的unicode实体编码)

1、编码转换(to Unicode)

(程序代码来源于网络)

 

Js版

<script>
      test = "你好abc"
      str = ""
      for( i=0;     i<test.length; i++ )
      {
       temp = test.charCodeAt(i).toString(16);
       str     += "\\u"+ new Array(5-String(temp).length).join("0") +temp;
      }
      document.write (str)
</script>


vbs版

Function Unicode(str1)
      Dim str,temp
      str = ""
      For i=1     to len(str1)
       temp = Hex(AscW(Mid(str1,i,1)))
       If len(temp) < 5 Then     temp = right("0000" & temp, 4)
       str = str & "\u" & temp
      Next
      Unicode = str
End Function


Function htmlentities(str)
      For i = 1 to Len(str)
          char = mid(str, i, 1)
          If Ascw(char) > 128 then
              htmlentities = htmlentities & "&#" & Ascw(char) & ";"
          Else
              htmlentities = htmlentities & char
          End if
      Next
End Function

 

coldfusion

 

function nochaoscode(str)
{
      var new_str = “”;
      for(i=1; i lte len(str);i=i+1){
          if(asc(mid(str,i,1)) lt 128){
              new_str = new_str & mid(str,i,1);
          }else{
              new_str = new_str & “&##” & asc(mid(str,i,1));
          }
      }
      return new_str;
}

 


 

附:

在php中我们可以用mbstring的mb_convert_encoding函数实现这个正向及反向的转化。 如:

 

mb_convert_encoding ("你好", "HTML-ENTITIES", "gb2312"); //输出:&#20320;&#22909;
mb_convert_encoding ("&#20320;&#22909;", "gb2312", "HTML-ENTITIES"); //输出:你好

 

如果需要对整个页面转化,则只需要在php文件的头部加上这三行代码:

 

mb_internal_encoding("gb2312"); // 这里的gb2312是你网站原来的编码
mb_http_output("HTML-ENTITIES");
ob_start('mb_output_handler');


如果没有打开mbstring扩展,可以参考coolcode.cn上的这两篇文章:
在任意字符集下正常显示网页的方法
在任意字符集下正常显示网页的方法(续)


 

2、HTML实体

 

HTML 4.01 支持 ISO 8859-1 (Latin-1) 字符集。

提示 实体名是区分大小写的。

备注 同一个符号,可以用“实体名称”和“实体编号”两种方式引用,“实体名称”的优势在于便于记忆,但不能保证所有的浏览器都能顺利识别它,而“实体编号”则没有这种担忧,但它实在不方便记忆。


ASCII中部分实体的新名字

显示

描述

实体名称

实体编号

"

quotation mark

&quot; &#34;
' apostrophe

&apos; (IE下无效)

&#39;
& ampersand &amp; &#38;
< less-than &lt; &#60;
> greater-than &gt; &#62;

ISO 8859-1 符号实体

显示

描述

实体名称

实体编号

 

non-breaking space

&nbsp; &#160;
¡

inverted exclamation mark

&iexcl; &#161;
¤ currency &curren; &#164;

cent &cent; &#162;

pound &pound; &#163;

yen &yen; &#165;
¦

broken vertical bar

&brvbar; &#166;
§ section &sect; &#167;
¨

spacing diaeresis

&uml; &#168;
© copyright &copy; &#169;
a

feminine ordinal indicator

&ordf; &#170;
«

angle quotation mark (left)

&laquo; &#171;
? negation &not; &#172;
-

soft hyphen

&shy; &#173;
®

registered trademark

&reg; &#174;
trademark &trade; &#8482;
ˉ

spacing macron

&macr; &#175;
° degree &deg; &#176;
± plus-or-minus &plusmn; &#177;
2

superscript 2

&sup2; &#178;
3

superscript 3

&sup3; &#179;

spacing acute

&acute;

&#180;
μ micro &micro; &#181;
? paragraph &para; &#182;
·

middle dot

&middot; &#183;
?

spacing cedilla

&cedil; &#184;
1

superscript 1

&sup1; &#185;
o

masculine ordinal indicator

&ordm; &#186;
»

angle quotation mark (right)

&raquo; &#187;
?

fraction 1/4

&frac14; &#188;
?

fraction 1/2

&frac12; &#189;
?

fraction 3/4

&frac34; &#190;
?

inverted question mark

&iquest; &#191;
× multiplication &times; &#215;
÷ division &divide; &#247;

ISO 8859-1 字符实体

显示

描述

实体名称

实体编号

À

capital a, grave accent

&Agrave; &#192;
Á

capital a, acute accent

&Aacute; &#193;
Â

capital a, circumflex accent

&Acirc; &#194;
Ã

capital a, tilde

&Atilde; &#195;
Ä

capital a, umlaut mark

&Auml; &#196;
Å

capital a, ring

&Aring; &#197;
Æ

capital ae

&AElig; &#198;
Ç

capital c, cedilla

&Ccedil; &#199;
È

capital e, grave accent

&Egrave; &#200;
É

capital e, acute accent

&Eacute; &#201;
Ê

capital e, circumflex accent

&Ecirc; &#202;
Ë

capital e, umlaut mark

&Euml; &#203;
Ì

capital i, grave accent

&Igrave; &#204;
Í

capital i, acute accent

&Iacute; &#205;
Î

capital i, circumflex accent

&Icirc; &#206;
Ï

capital i, umlaut mark

&Iuml; &#207;
Ð

capital eth, Icelandic

&ETH; &#208;
Ñ

capital n, tilde

&Ntilde; &#209;
Ò

capital o, grave accent

&Ograve; &#210;
Ó

capital o, acute accent

&Oacute; &#211;
Ô

capital o, circumflex accent

&Ocirc; &#212;
Õ

capital o, tilde

&Otilde; &#213;
Ö

capital o, umlaut mark

&Ouml; &#214;
Ø

capital o, slash

&Oslash; &#216;
ù

capital u, grave accent

&Ugrave; &#217;
ú

capital u, acute accent

&Uacute; &#218;
?

capital u, circumflex accent

&Ucirc; &#219;
ü

capital u, umlaut mark

&Uuml; &#220;
Y

capital y, acute accent

&Yacute; &#221;
T

capital THORN, Icelandic

&THORN; &#222;
?

small sharp s, German

&szlig; &#223;
à

small a, grave accent

&agrave; &#224;
á

small a, acute accent

&aacute; &#225;
a

small a, circumflex accent

&acirc; &#226;
?

small a, tilde

&atilde; &#227;
?

small a, umlaut mark

&auml; &#228;
?

small a, ring

&aring; &#229;
?

small ae

&aelig; &#230;
?

small c, cedilla

&ccedil; &#231;
è

small e, grave accent

&egrave; &#232;
é

small e, acute accent

&eacute; &#233;
ê

small e, circumflex accent

&ecirc; &#234;
?

small e, umlaut mark

&euml; &#235;
ì

small i, grave accent

&igrave; &#236;
í

small i, acute accent

&iacute; &#237;
?

small i, circumflex accent

&icirc; &#238;
?

small i, umlaut mark

&iuml; &#239;
e

small eth, Icelandic

&eth; &#240;
?

small n, tilde

&ntilde; &#241;
ò

small o, grave accent

&ograve; &#242;
ó

small o, acute accent

&oacute; &#243;
?

small o, circumflex accent

&ocirc; &#244;
?

small o, tilde

&otilde; &#245;
?

small o, umlaut mark

&ouml; &#246;
?

small o, slash

&oslash; &#248;
ù

small u, grave accent

&ugrave; &#249;
ú

small u, acute accent

&uacute; &#250;
?

small u, circumflex accent

&ucirc; &#251;
ü

small u, umlaut mark

&uuml; &#252;
y

small y, acute accent

&yacute; &#253;
t

small thorn, Icelandic

&thorn; &#254;
?

small y, umlaut mark

&yuml; &#255;

其它一些 HTML 所支持的实体

显示

描述

实体名称

实体编号

Œ

capital ligature OE

&OElig; &#338;
œ

small ligature oe

&oelig; &#339;
Š

capital S with caron

&Scaron; &#352;
š

small S with caron

&scaron; &#353;
Ÿ

capital Y with diaeres

&Yuml; &#376;
ˆ

modifier letter circumflex accent

&circ; &#710;
˜

small tilde

&tilde; &#732;

en space

&ensp; &#8194;

em space

&emsp; &#8195;

thin space

&thinsp; &#8201;

zero width non-joiner

&zwnj; &#8204;

zero width joiner

&zwj; &#8205;

left-to-right mark

&lrm; &#8206;

right-to-left mark

&rlm; &#8207;

en dash

&ndash; &#8211;

em dash

&mdash; &#8212;

left single quotation mark

&lsquo; &#8216;

right single quotation mark

&rsquo; &#8217;

single low-9 quotation mark

&sbquo; &#8218;

left double quotation mark

&ldquo; &#8220;

right double quotation mark

&rdquo; &#8221;

double low-9 quotation mark

&bdquo; &#8222;
dagger &dagger; &#8224;

double dagger

&Dagger; &#8225;

horizontal ellipsis

&hellip; &#8230;

per mille

&permil; &#8240;

single left-pointing angle quotation

&lsaquo; &#8249;

single right-pointing angle quotation

&rsaquo; &#8250;
  euro &euro; &#8364;
posted @ 2012-02-04 21:11  Ryan-CN  阅读(6429)  评论(1编辑  收藏  举报