词频统计-------------web版本
要求:把程序迁移到web平台,通过用户上传TXT的方式接收文件。建议(但不强制要求)保留并维护Console版本,有利于测试。
在页面上设置上传的控件,然后在servlet中接受,得到的是一个字节流,然后转化为字符型在原有代码中进行统计。
jsp页面的代码如下
<%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>Insert title here</title> </head> <body> <table> <tr> <td> <form action="server/CountWordServlet" method="post" enctype="multipart/form-data"> 请上传要统计的文件<input type="file" name="sourceFile"/> <input type="submit" value="上传"> </form> </td> </tr> </table> </body> </html>
展示结果的页面如下
<%@page import="com.server.servlet.Word"%> <%@page import="java.util.ArrayList"%> <%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <%ArrayList<Word> list=(ArrayList<Word>)request.getAttribute("list"); %> <title>Insert title here</title> </head> <body> <table> <% if(list!=null&&list.size()!=0){ %> <tr> <td>单词</td><td>数量</td> </tr> <% for(int i=0;i<list.size();i++){ String word=((Word)list.get(i)).getWord(); int num=((Word)list.get(i)).getNum(); %><tr> <td><%=word%></td> <td><%=num%></td> </tr> <% } }else{ %> <td>此文件没有单词或者文件不存在</td> <% } %> </table> </body> </html>
servle中的代码如下
public class CountWordServlet extends HttpServlet { private static final long serialVersionUID = 1L; protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { try { request.setCharacterEncoding("utf-8"); ArrayList<Word> list=new ArrayList<>(); DiskFileItemFactory factory=new DiskFileItemFactory(); ServletFileUpload upload=new ServletFileUpload(factory); FileItemIterator iterator=upload.getItemIterator(request); while(iterator.hasNext()){ InputStream input=iterator.next().openStream(); WordCountFreq wcf=new WordCountFreq(); list=(ArrayList<Word>) wcf.sortAndOutput(input); request.setAttribute("list", list); } } catch (FileUploadException e) { e.printStackTrace(); } System.out.println("成功了!"); response.setContentType("text/html;charset=utf-8"); request.getRequestDispatcher("/show.jsp").forward(request, response); } }
然后将统计过程的关键方法sortAndOutput()展示如下
public List<Word> sortAndOutput(InputStream input) throws IOException { BufferedInputStream bis=new BufferedInputStream(input); byte [] buf = new byte[1024]; int len = -1; String temp = ""; String lastWord = ""; while((len = bis.read(buf)) != -1) { //将读取到的字节数据转化为字符串打印出来 String str = new String(buf,0,len); temp = ""; temp += lastWord; for (int i = 0; i < str.length(); i++) { temp += str.charAt(i); } lastWord = ""; if (Character.isLetter(str.charAt(str.length()-1))) { int j, t; for (j = str.length() - 1, t = 0; Character.isLetter(str.charAt(j)); j--, t++); temp = temp.substring(0, temp.length() - t); for (int k = j + 1; k < str.length(); k++) { lastWord += str.charAt(k); } } root = generateCharTree(temp); }
示例如下
在没做web版本之前,只是传入文件的路径进行处理。改为web版本之后将遇见的一点小困难是要将字节流转化为字符进行处理,经过查询也很快就解决了。
ssh:git@git.coding.net:muziliquan/GUIVersion.git
git:git://git.coding.net/muziliquan/GUIVersion.git