前言:
上一篇随笔中网友 skyaspnet 问我如何压缩HTML,当时回答是推荐他使用gzip,后来想想,要是能把所有的html,jsp(aspx)在运行前都压缩成1行未免不是一件好事啊。一般我们启动gzip都比较少对html启动gzip,因为现在的html都是动态的,不会使用浏览器缓存,而启用gzip的话每次请求都需要压缩,会比较消耗服务器资源,对js,css启动gzip比较好是因为js,css都会使用缓存。我个人觉得的压缩html的最大好处就是一本万利,只要写好了一次,以后所有程序都可以使用,不会增加任何额外的开发工作。
在“JS、CSS的合并、压缩、缓存管理”一文中说到自己写过的1个自动合并、压缩JS,CSS,并添加版本号的组件。这次把压缩html的功能也加入到该组件中,流程很简单,就是在程序启动(contextInitialized or Application_Start)的时候扫描所有html,jsp(aspx)进行压缩。
压缩的注意事项:
实现的方式主要是用正则表达式去查找,替换。在html压缩的时候,主要要注意下面几点:
1. pre,textarea 标签里面的内容格式需要保留,不能压缩。
2. 去掉html注释的时候,有些注释是不能去掉的,比如:<!--[if IE 6]> ..... <![endif]-->
3. 压缩嵌入式js中的注释要注意,因为可能注释符号会出现在字符串中,比如: var url = "http://www.cnblogs.com"; // 前面的//不是注释
去掉JS换行符的时候,不能直接跟一下行动内容,需要有空格,考虑下面的代码:
else
return;
如果不带空格,则变成elsereturn。
4. jsp(aspx) 中很有可能会使用<% %>嵌入一些服务器代码,这个时候也需要单独处理,里面注释的处理方法跟js的一样。
源代码:
下面是java实现的源代码,也可以 猛击此处 下载该代码,相信大家都看的懂,也很容易改成net代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 | import java.io.StringReader; import java.io.StringWriter; import java.util.*; import java.util.regex.*; /******************************************* * 压缩jsp,html中的代码,去掉所有空白符、换行符 * @author bearrui(ak-47) * @version 0.1 * @date 2010-5-13 *******************************************/ public class HtmlCompressor { private static String tempPreBlock = "%%%HTMLCOMPRESS~PRE&&&" ; private static String tempTextAreaBlock = "%%%HTMLCOMPRESS~TEXTAREA&&&" ; private static String tempScriptBlock = "%%%HTMLCOMPRESS~SCRIPT&&&" ; private static String tempStyleBlock = "%%%HTMLCOMPRESS~STYLE&&&" ; private static String tempJspBlock = "%%%HTMLCOMPRESS~JSP&&&" ; private static Pattern commentPattern = Pattern.compile( "<!--\\s*[^\\[].*?-->" , Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE); private static Pattern itsPattern = Pattern.compile( ">\\s+?<" , Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE); private static Pattern prePattern = Pattern.compile( "<pre[^>]*?>.*?</pre>" , Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE); private static Pattern taPattern = Pattern.compile( "<textarea[^>]*?>.*?</textarea>" , Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE); private static Pattern jspPattern = Pattern.compile( "<%([^-@][\\w\\W]*?)%>" , Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE); // <script></script> private static Pattern scriptPattern = Pattern.compile( "(?:<script\\s*>|<script type=['\"]text/javascript['\"]\\s*>)(.*?)</script>" , Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE); private static Pattern stylePattern = Pattern.compile( "<style[^>()]*?>(.+)</style>" , Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE); // 单行注释, private static Pattern signleCommentPattern = Pattern.compile( "//.*" ); // 字符串匹配 private static Pattern stringPattern = Pattern.compile( "(\"[^\"\\n]*?\"|'[^'\\n]*?')" ); // trim去空格和换行符 private static Pattern trimPattern = Pattern.compile( "\\n\\s*" ,Pattern.MULTILINE); private static Pattern trimPattern2 = Pattern.compile( "\\s*\\r" ,Pattern.MULTILINE); // 多行注释 private static Pattern multiCommentPattern = Pattern.compile( "/\\*.*?\\*/" , Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE); private static String tempSingleCommentBlock = "%%%HTMLCOMPRESS~SINGLECOMMENT&&&" ; // //占位符 private static String tempMulitCommentBlock1 = "%%%HTMLCOMPRESS~MULITCOMMENT1&&&" ; // /*占位符 private static String tempMulitCommentBlock2 = "%%%HTMLCOMPRESS~MULITCOMMENT2&&&" ; // */占位符 public static String compress(String html) throws Exception { if (html == null || html.length() == 0 ) { return html; } List<String> preBlocks = new ArrayList<String>(); List<String> taBlocks = new ArrayList<String>(); List<String> scriptBlocks = new ArrayList<String>(); List<String> styleBlocks = new ArrayList<String>(); List<String> jspBlocks = new ArrayList<String>(); String result = html; //preserve inline java code Matcher jspMatcher = jspPattern.matcher(result); while (jspMatcher.find()) { jspBlocks.add(jspMatcher.group( 0 )); } result = jspMatcher.replaceAll(tempJspBlock); //preserve PRE tags Matcher preMatcher = prePattern.matcher(result); while (preMatcher.find()) { preBlocks.add(preMatcher.group( 0 )); } result = preMatcher.replaceAll(tempPreBlock); //preserve TEXTAREA tags Matcher taMatcher = taPattern.matcher(result); while (taMatcher.find()) { taBlocks.add(taMatcher.group( 0 )); } result = taMatcher.replaceAll(tempTextAreaBlock); //preserve SCRIPT tags Matcher scriptMatcher = scriptPattern.matcher(result); while (scriptMatcher.find()) { scriptBlocks.add(scriptMatcher.group( 0 )); } result = scriptMatcher.replaceAll(tempScriptBlock); // don't process inline css Matcher styleMatcher = stylePattern.matcher(result); while (styleMatcher.find()) { styleBlocks.add(styleMatcher.group( 0 )); } result = styleMatcher.replaceAll(tempStyleBlock); //process pure html result = processHtml(result); //process preserved blocks result = processPreBlocks(result, preBlocks); result = processTextareaBlocks(result, taBlocks); result = processScriptBlocks(result, scriptBlocks); result = processStyleBlocks(result, styleBlocks); result = processJspBlocks(result, jspBlocks); preBlocks = taBlocks = scriptBlocks = styleBlocks = jspBlocks = null ; return result.trim(); } private static String processHtml(String html) { String result = html; //remove comments // if(removeComments) { result = commentPattern.matcher(result).replaceAll( "" ); // } //remove inter-tag spaces // if(removeIntertagSpaces) { result = itsPattern.matcher(result).replaceAll( "><" ); // } //remove multi whitespace characters // if(removeMultiSpaces) { result = result.replaceAll( "\\s{2,}" , " " ); // } return result; } private static String processJspBlocks(String html, List<String> blocks){ String result = html; for ( int i = 0 ; i < blocks.size(); i++) { blocks.set(i, compressJsp(blocks.get(i))); } //put preserved blocks back while (result.contains(tempJspBlock)) { result = result.replaceFirst(tempJspBlock, Matcher.quoteReplacement(blocks.remove( 0 ))); } return result; } private static String processPreBlocks(String html, List<String> blocks) throws Exception { String result = html; //put preserved blocks back while (result.contains(tempPreBlock)) { result = result.replaceFirst(tempPreBlock, Matcher.quoteReplacement(blocks.remove( 0 ))); } return result; } private static String processTextareaBlocks(String html, List<String> blocks) throws Exception { String result = html; //put preserved blocks back while (result.contains(tempTextAreaBlock)) { result = result.replaceFirst(tempTextAreaBlock, Matcher.quoteReplacement(blocks.remove( 0 ))); } return result; } private static String processScriptBlocks(String html, List<String> blocks) throws Exception { String result = html; // if(compressJavaScript) { for ( int i = 0 ; i < blocks.size(); i++) { blocks.set(i, compressJavaScript(blocks.get(i))); } // } //put preserved blocks back while (result.contains(tempScriptBlock)) { result = result.replaceFirst(tempScriptBlock, Matcher.quoteReplacement(blocks.remove( 0 ))); } return result; } private static String processStyleBlocks(String html, List<String> blocks) throws Exception { String result = html; // if(compressCss) { for ( int i = 0 ; i < blocks.size(); i++) { blocks.set(i, compressCssStyles(blocks.get(i))); } // } //put preserved blocks back while (result.contains(tempStyleBlock)) { result = result.replaceFirst(tempStyleBlock, Matcher.quoteReplacement(blocks.remove( 0 ))); } return result; } private static String compressJsp(String source) { //check if block is not empty Matcher jspMatcher = jspPattern.matcher(source); if (jspMatcher.find()) { String result = compressJspJs(jspMatcher.group( 1 )); return ( new StringBuilder(source.substring( 0 , jspMatcher.start( 1 ))).append(result).append(source.substring(jspMatcher.end( 1 )))).toString(); } else { return source; } } private static String compressJavaScript(String source) { //check if block is not empty Matcher scriptMatcher = scriptPattern.matcher(source); if (scriptMatcher.find()) { String result = compressJspJs(scriptMatcher.group( 1 )); return ( new StringBuilder(source.substring( 0 , scriptMatcher.start( 1 ))).append(result).append(source.substring(scriptMatcher.end( 1 )))).toString(); } else { return source; } } private static String compressCssStyles(String source) { //check if block is not empty Matcher styleMatcher = stylePattern.matcher(source); if (styleMatcher.find()) { // 去掉注释,换行 String result= multiCommentPattern.matcher(styleMatcher.group( 1 )).replaceAll( "" ); result = trimPattern.matcher(result).replaceAll( "" ); result = trimPattern2.matcher(result).replaceAll( "" ); return ( new StringBuilder(source.substring( 0 , styleMatcher.start( 1 ))).append(result).append(source.substring(styleMatcher.end( 1 )))).toString(); } else { return source; } } private static String compressJspJs(String source){ String result = source; // 因注释符合有可能出现在字符串中,所以要先把字符串中的特殊符好去掉 Matcher stringMatcher = stringPattern.matcher(result); while (stringMatcher.find()){ String tmpStr = stringMatcher.group( 0 ); if (tmpStr.indexOf( "//" ) != - 1 || tmpStr.indexOf( "/*" ) != - 1 || tmpStr.indexOf( "*/" ) != - 1 ){ String blockStr = tmpStr.replaceAll( "//" , tempSingleCommentBlock).replaceAll( "/\\*" , tempMulitCommentBlock1) .replaceAll( "\\*/" , tempMulitCommentBlock2); result = result.replace(tmpStr, blockStr); } } // 去掉注释 result = signleCommentPattern.matcher(result).replaceAll( "" ); result = multiCommentPattern.matcher(result).replaceAll( "" ); result = trimPattern2.matcher(result).replaceAll( "" ); result = trimPattern.matcher(result).replaceAll( " " ); // 恢复替换掉的字符串 result = result.replaceAll(tempSingleCommentBlock, "//" ).replaceAll(tempMulitCommentBlock1, "/*" ) .replaceAll(tempMulitCommentBlock2, "*/" ); return result; } } |
使用注意事项:
使用了上面方法后,再运行程序,是不是发现每个页面查看源代码的时候都变成1行啦,还不错吧,但是在使用的时候还是要注意一些问题:
1. 嵌入js本来想调用yuicompressor来压缩,yuicompressor压缩JS前,会先编译js是否合法,因我们嵌入的js中可能很多会用到一些服务器端代码,比如 var now = <%=DateTime.now %> ,这样的代码会编译不通过,所以无法使用yuicompressor。
最后只能自己写压缩JS代码,自己写的比较粗燥,所以有个问题还解决,就是如果开发人员在一句js代码后面没有加分号的话,压缩成1行就很有可能出问题。所以使用这个需要保证每条语句结束后都必须带分号。
2. 因为是在程序启动的时候压缩所有jsp(aspx),所以如果是用户请求的时候动态产生的html就无法压缩。
有需要请查看:高性能WEB开发系列
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 上周热点回顾(2.24-3.2)
2006-05-17 微软正版验证工具