网页设计师是上辈子都是折翼的天屎真正自动识别网页编码

世上有一些很牛逼的事情，这些事情能让很多牛逼的人死在牛逼里。

我们先看一个页面 http://www.skxox.com/xxinfo_127691.html

这个页面应该在浏览器里面可以正常显示。不会出现乱码。

再查看他的源文件，可以看到这一行 <meta http-equiv="Content-Type" content="text/html; charset=gb2312" />

于是，牛逼的你很牛逼的认为，这个页面时gb2312编码的。。。

那现在试试，让浏览器以GB2313编码显示这个网页试试：

涓�鍛ㄩ挗閾佽涓氫俊鎭嫨瑕�

尼玛啊，这到底是神马啊。。。。。。

所以，博客园上面那些自动识别网页编码的文章都是骗人的。。。

抓包工具看下：

HTTP/1.1 200 OK

Date: Thu, 21 Apr 2011 07:36:27 GMT

Server: Microsoft-IIS/6.0

X-Powered-By: ASP.NET

X-AspNet-Version: 2.0.50727

X-Powered-By: UrlRewriter.NET 1.7.0

Cache-Control: private

Content-Type: text/html; charset=utf-8

Content-Length: 41750

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

上面这个才是它真正的编码。

所以，求求你不要再去分析网页里面的charset了。

获取编码的语句换成：

string c = response.ContentType.Replace("text/html;", "").Replace("charset=", "").Trim();

一整坨代码：

/// <summary>

        /// 远程获取url地址的页面源代码
        /// </summary>
        /// <param name="url">要获取页面的URL</param>
        /// <returns>返回HTML代码</returns>
        public static string GetHtml(string url, string ucoid)
        {
            HttpWebRequest request = null;
            HttpWebResponse response = null;
            StreamReader reader = null;
            try
            {
                request = (HttpWebRequest)WebRequest.Create(url);
                request.UserAgent = "www.svnhost.cn";

                request.Timeout = 20000;
                request.AllowAutoRedirect = true;
                response = (HttpWebResponse)request.GetResponse();
                if (response.StatusCode == HttpStatusCode.OK && response.ContentLength < 1024 * 1024)
                {

                    string c = response.ContentType.Replace("text/html;", "").Replace("charset=", "").Trim();
                    if (ucoid.IsNullOrEmpty())
                    {
                        ucoid = c;
                    }
                    reader = new StreamReader(response.GetResponseStream(), System.Text.Encoding.GetEncoding(ucoid));

                    string html = reader.ReadToEnd();

                    return html;
                }
            }
            catch { }
            finally
            {
                if (response != null)
                {
                    response.Close();
                    response = null;
                }
                if (reader != null)
                {
                    reader.Close();
                }
                if (request != null)
                {
                    request = null;
                }
            }
            return string.Empty;
        }

所以网页设计师你桑不起啊。。。。他们上辈子都是掉进化粪池里折翼的天屎啊。。。

来源：http://www.njxsw.com/forum-viewthread-tid-1965-fromuid-4.html

posted on 2011-04-21 16:18 kuibono 阅读(554) 评论(2) 编辑收藏举报