Python 爬取 热词并进行分类数据分析-[热词分类+目录生成]
日期:2020.02.04
博客期:143
星期二
【本博客的代码如若要使用,请在下方评论区留言,之后再用(就是跟我说一声)】
所有相关跳转:
a.【简单准备】
b.【云图制作+数据导入】
c.【拓扑数据】
d.【数据修复】
e.【解释修复+热词引用】
f.【JSP演示+页面跳转】
g.【热词分类+目录生成】(本期博客)
h.【热词关系图+报告生成】
i . 【App制作】
j . 【安全性改造】
如下图,我已经解决的需求是标黄的部分,剩余需求就只有 热词分类、目录生成、热词关系图展示、数据报告导出 四部分了,这些需求是最紧要完成的,呼~撸起袖子加油干!
1、热词分类
老师说要参照各大平台的分类,我就直接按照博客园的分类来吧(我实在看不懂那些机器学习是怎么实现的,连入门的门槛都远远不及)!如下图,可以看到 博客园的新闻将新闻分成了如下几类:互联网类、IT业界类、软件开发类、开源类、电脑硬件类、游戏类、创业类、手机相关类、科学类、其他类。我就根据这几类将对应类新闻里爬出来的数据进行对应类的划分。(看来又要重新爬数据了啊)
开始爬之前事先说明一下,这次改动应该是最后一次改动了,另外我发现每一类新闻都有 100 页,这...相当于每一类都有,所以不保证有误差的存在,另外为了减少数据量,我打算将 “频数为15” 这一条件上升到 “频数为20”,不然怎么爬的完?我先预算一下,今天和明天一起写这个博客,另外明天的话,就再写一份总结性的博客,这个小目标就算完结吧!当然最后可能会加入微信小程序部分或者APP部分,到时候再说。
根据这10类新闻,我们总共要爬取些什么数据呢?
首先,通过带有 header 的 request 方式爬取 https://news.cnblogs.com/ 这一初始链接,要爬以上 10 类新闻的链接,再爬取类中封装链接的构造,并开启新的爬取,对应每一类数据给爬到的热词信息后面追加一个“热词类型”的标签,这需要我们改造 KeyWords 类,向 KeyWords 类中加入 kind 属性,改写 __toString() 成员函数。之后改造调用过 KeyWords 类的地方。(News不需要)
关于分类页面的构造方法:
首先是原新闻网址:https://news.cnblogs.com/
其次,以 “互联网” 为例:https://news.cnblogs.com/n/c1101
然后是第 100 页的地址:https://news.cnblogs.com/n/c1101?page=100
很容易的判断到是在原网址的基础上加入对应 互联网的 a 标签上的 href 链接,需要将数据加载到一起来组成爬取链接!
但是爬的过程中发现了问题,就是我爬不到对应的分类链接,既然这样,我只能人工地获取它们的链接了,就10条数据无所谓了,本来因为懒想让网页帮我做的,看来是博客园让我勤快的。哈哈哈!
对应链接:
互联网类:https://news.cnblogs.com/n/c1101
IT业界类:https://news.cnblogs.com/n/c1102
软件开发类:https://news.cnblogs.com/n/c1103
开源类:https://news.cnblogs.com/n/c1109
电脑硬件类:https://news.cnblogs.com/n/c1111
游戏类:https://news.cnblogs.com/n/c1110
创业类:https://news.cnblogs.com/n/c1112
手机相关类:https://news.cnblogs.com/n/c1113
科学类:https://news.cnblogs.com/n/c1114
其他类:https://news.cnblogs.com/n/c1199
在 Surapity 类 中建立字典,存储类型的名称和对应链接。
爬取时间较长,从下午4:51到现在第2天的1:44,过程曲折且难以简言明之。
途中遇到好几个网站会使爬虫程序终止,比如 其他类的 Apple Watch UI动效解析 ,呜哇~试一次,卡一次。程序员的痛苦莫过于此!!!
统计基础数据共计 17469 条 数据!文件大小约为 1.96 M !
现在开始制作数据表:(先修改 fileR.py)
1 import codecs 2 3 4 def makeSql(): 5 file_path = "../../testFile/frc/words_sql.txt" 6 f = codecs.open(file_path, "w+", 'utf-8') 7 f.write("") 8 f.close() 9 10 fw = open("../../testFile/frc/word.txt", mode='r', encoding='utf-8') 11 tmp = fw.readlines() 12 13 num = tmp.__len__() 14 15 for i in range(0,num): 16 group = tmp[i].split("\t") 17 group[0] = "'" + group[0] + "'" 18 group[3] = "'" + group[3][0:group[3].__len__()-1] + "'" 19 f = codecs.open(file_path, "a+", 'utf-8') 20 f.write("Insert into words values ("+group[0]+","+group[1]+",'"+group[2]+"',"+group[3]+",'"+group[4]+"');"+"\n") 21 f.close() 22 23 makeSql()
执行并按照之前的方法导入数据,这里博主因为使用电脑管家清理了一下C盘,然后 Navicat就崩掉了,真的崩了(建立不了查询了,这个之后有解决方法的话,我再写一期博客吧!)!所以,不搞虚的,直接用文本导入了!
建立 keywords 表(或视图)的方法同上上期的博客,那样获取每一个热词的数量!
1 CREATE TABLE keywords 2 AS 3 ( 4 SELECT 5 word AS word, 6 SUM(num) AS num 7 FROM 8 words 9 GROUP BY word 10 ORDER BY num 11 DESC 12 )
哈哈哈哈!热词频数过万了呢!希望我的电脑还能撑住,继续爬!(但是现在已经2点了,先定个2个小时的闹钟,拓扑数据让它自己爬着)
对于 WebConnector 类,我要着重说一下,我本次爬取将此代码注释掉了:
# 这句话处理以后,就将带有 “年”、“月”、“日” 字眼的语句以及之后的语句全部清除掉了,当时是旨在消除不必要的解释部分,但现在看来没必要!多多益善嘛! tpl = StrSpecialDealer.ut_date(tpl)
早上醒来发现大问题——电脑自己休眠了,唉~希望自己能够吃一堑长一智吧!
在电脑熬夜干爬虫的时候尽力将休眠关闭,在设置中如下:
拓扑数据也完成了,大约又历时 5 个小时,关键是在电脑爬虫时我还不能用电脑干其他的(尤其是截图软件,运行的话,爬虫程序一准给你崩停)
终于有完整数据了,现在我们开始数据处理!
根据不同分类将数据汇总和数据处理了(也就是说剩余没有Python的事情了),至此热词分类完毕。
2、热词目录生成
我们需要展示每一个分类的前10个数据,以此做成第一个页面。
可以制作新的视图,也可以直接写大长 Sql 语句,我比较懒,就按长语句来了
1 package com.servlet; 2 3 import java.io.IOException; 4 import java.sql.SQLException; 5 import java.util.List; 6 7 import javax.servlet.ServletException; 8 import javax.servlet.ServletOutputStream; 9 import javax.servlet.annotation.WebServlet; 10 import javax.servlet.http.HttpServlet; 11 import javax.servlet.http.HttpServletRequest; 12 import javax.servlet.http.HttpServletResponse; 13 14 import org.json.JSONArray; 15 import org.json.JSONObject; 16 17 import com.dblink.basic.utils.SqlUtils; 18 import com.dblink.basic.utils.sqlKind.MySql_s; 19 import com.dblink.basic.utils.user.UserInfo; 20 import com.dblink.bean.BeanGroup; 21 import com.dblink.sql.DBLink; 22 23 @SuppressWarnings("unused") 24 public class ServletForMoreInfo extends HttpServlet{ 25 /** 26 * 27 */ 28 private static final long serialVersionUID = 1L; 29 //----------------------------------------------------------------------// 30 public void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException 31 { 32 request.setCharacterEncoding("utf-8"); 33 response.setCharacterEncoding("utf-8"); 34 response.setContentType("application/json"); 35 response.setHeader("Cache-Control", "no-cache"); 36 response.setHeader("Pragma", "no-cache"); 37 38 String kind = request.getParameter("kind"); 39 40 JSONArray jsonArray = new JSONArray(); 41 42 JSONObject jsonObj = new JSONObject(); 43 44 45 DBLink dbLink = new DBLink(new SqlUtils(new MySql_s("rc"),new UserInfo("root","123456"))); 46 BeanGroup bg = null; 47 try { 48 bg = dbLink.getSelect("Select word As word , SUM(num) As num From ( Select * From words Where kind = '"+kind+"' ) Group By word Order By num DESC Limit 0,10 ").beans; 49 50 int leng = bg.size(); 51 52 jsonObj.put("Length",leng); 53 54 jsonArray.put(jsonObj); 55 56 for(int i=0;i<leng;++i) 57 { 58 JSONObject jsonObject = new JSONObject(); 59 jsonObject.put("word",bg.get(i).get(0)); 60 jsonObject.put("num",bg.get(i).get(1)); 61 jsonArray.put(jsonObject); 62 } 63 } catch (SQLException e) { 64 // Do Nothing ... 65 } 66 dbLink.free(); 67 68 ServletOutputStream os = response.getOutputStream(); 69 os.write(jsonArray.toString().getBytes()); 70 os.flush(); 71 os.close(); 72 } 73 //---------------------------------------------------------------------------------// 74 }
如果你建立了对应 10 个分类的视图,你可以添加 Servlet 如下:(否则将视图名称替换成建立视图的Select语句)
1 package com.servlet; 2 3 import java.io.IOException; 4 import java.sql.SQLException; 5 import java.util.List; 6 7 import javax.servlet.ServletException; 8 import javax.servlet.ServletOutputStream; 9 import javax.servlet.annotation.WebServlet; 10 import javax.servlet.http.HttpServlet; 11 import javax.servlet.http.HttpServletRequest; 12 import javax.servlet.http.HttpServletResponse; 13 14 import org.json.JSONArray; 15 import org.json.JSONObject; 16 17 import com.dblink.basic.utils.SqlUtils; 18 import com.dblink.basic.utils.sqlKind.MySql_s; 19 import com.dblink.basic.utils.user.UserInfo; 20 import com.dblink.bean.BeanGroup; 21 import com.dblink.sql.DBLink; 22 23 @SuppressWarnings("unused") 24 public class ServletForKindKeyWords extends HttpServlet{ 25 /** 26 * 27 */ 28 private static final long serialVersionUID = 1L; 29 //----------------------------------------------------------------------// 30 public void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException 31 { 32 request.setCharacterEncoding("utf-8"); 33 response.setCharacterEncoding("utf-8"); 34 response.setContentType("application/json"); 35 response.setHeader("Cache-Control", "no-cache"); 36 response.setHeader("Pragma", "no-cache"); 37 38 String table = request.getParameter("table"); 39 String sql_rest = request.getParameter("sql"); 40 41 JSONArray jsonArray = new JSONArray(); 42 43 JSONObject jsonObj = new JSONObject(); 44 45 46 DBLink dbLink = new DBLink(new SqlUtils(new MySql_s("rc"),new UserInfo("root","123456"))); 47 BeanGroup bg = null; 48 try { 49 bg = dbLink.getSelect("Select * From "+table+" "+sql_rest).beans; 50 51 int leng = bg.size(); 52 53 int maxSize = dbLink.getSelect("Select * From "+table+" ").beans.size(); 54 55 int page = maxSize%leng==0?(maxSize/30):(maxSize/30)+1; 56 57 jsonObj.put("Length",leng); 58 jsonObj.put("MaxSize",maxSize); 59 jsonObj.put("Page",page); 60 61 jsonArray.put(jsonObj); 62 63 for(int i=0;i<leng;++i) 64 { 65 JSONObject jsonObject = new JSONObject(); 66 jsonObject.put("word",bg.get(i).get(0)); 67 jsonObject.put("num",bg.get(i).get(1)); 68 jsonObject.put("exp",bg.get(i).get(2)); 69 jsonArray.put(jsonObject); 70 } 71 } catch (SQLException e) { 72 // Do Nothing ... 73 } 74 dbLink.free(); 75 76 ServletOutputStream os = response.getOutputStream(); 77 os.write(jsonArray.toString().getBytes()); 78 os.flush(); 79 os.close(); 80 } 81 //---------------------------------------------------------------------------------// 82 }
然后制作 js 部分:
先显示分类,然后利用套装形式进行数据载入:
如果点击 获取本类更多热词,就可以跳转至本类页面!
Like this:
附加新 js 代码:
1 function makePageToKind() 2 { 3 var Area = ''; 4 Area += '<div class="row">'; 5 Area += ' <div class="col-md-12">'; 6 Area += ' <h2>热词目录</h2>'; 7 Area += ' </div>'; 8 Area += '</div>'; 9 Area += '<hr />'; 10 Area += '<br>'; 11 Area += '<br>'; 12 Area += '<div id="MessageArea">'; 13 Area += '</div>'; 14 document.getElementById("page-inner").innerHTML = Area; 15 madeAllKindP(); 16 } 17 function madeAllKindP() 18 { 19 var Area = ''; 20 Area += '<div>'; 21 Area += ' <ul>'; 22 Area += ' <li>'; 23 Area += ' <b>互联网类<b>'; 24 Area += ' <div id="hlw"></div>'; 25 Area += ' </li>'; 26 Area += ' <li>'; 27 Area += ' <b>IT业界类<b>'; 28 Area += ' <div id="ityj"></div>'; 29 Area += ' </li>'; 30 Area += ' <li>'; 31 Area += ' <b>软件开发类<b>'; 32 Area += ' <div id="rjkf"></div>'; 33 Area += ' </li>'; 34 Area += ' <li>'; 35 Area += ' <b>开源类<b>'; 36 Area += ' <div id="ky"></div>'; 37 Area += ' </li>'; 38 Area += ' <li>'; 39 Area += ' <b>电脑硬件类<b>'; 40 Area += ' <div id="dnyj"></div>'; 41 Area += ' </li>'; 42 Area += ' <li>'; 43 Area += ' <b>游戏类<b>'; 44 Area += ' <div id="yx"></div>'; 45 Area += ' </li>'; 46 Area += ' <li>'; 47 Area += ' <b>创业类<b>'; 48 Area += ' <div id="cy"></div>'; 49 Area += ' </li>'; 50 Area += ' <li>'; 51 Area += ' <b>手机相关类<b>'; 52 Area += ' <div id="sjxg"></div>'; 53 Area += ' </li>'; 54 Area += ' <li>'; 55 Area += ' <b>科学类<b>'; 56 Area += ' <div id="kx"></div>'; 57 Area += ' </li>'; 58 Area += ' <li>'; 59 Area += ' <b>其他类<b>'; 60 Area += ' <div id="qt"></div>'; 61 Area += ' </li>'; 62 Area += ' </ul>'; 63 Area += '</div>'; 64 document.getElementById("MessageArea").innerHTML = Area; 65 makeNextStepOfGroupK("互联网类"); 66 makeNextStepOfGroupK("IT业界类"); 67 makeNextStepOfGroupK("软件开发类"); 68 makeNextStepOfGroupK("开源类"); 69 makeNextStepOfGroupK("电脑硬件类"); 70 makeNextStepOfGroupK("游戏类"); 71 makeNextStepOfGroupK("创业类"); 72 makeNextStepOfGroupK("手机相关类"); 73 makeNextStepOfGroupK("科学类"); 74 makeNextStepOfGroupK("其他类"); 75 } 76 function getKindWordsByKindName(word) 77 { 78 var id_t = ""; 79 if(word=="互联网类") 80 id_t = "hlw"; 81 else if(word=="IT业界类") 82 id_t = "ityj"; 83 else if(word=="软件开发类") 84 id_t = "rjkf"; 85 else if(word=="开源类") 86 id_t = "ky"; 87 else if(word=="电脑硬件类") 88 id_t = "dnyj"; 89 else if(word=="游戏类") 90 id_t = "yx"; 91 else if(word=="创业类") 92 id_t = "cy"; 93 else if(word=="手机相关类") 94 id_t = "sjxg"; 95 else if(word=="科学类") 96 id_t = "kx"; 97 else if(word=="其他类") 98 id_t = "qt"; 99 return id_t; 100 } 101 function makeNextStepOfGroupK(word_t) 102 { 103 var xmlHttp = null; 104 try{ 105 xmlHttp = new XMLHttpRequest(); 106 } catch (e1) { 107 try { 108 xmlHttp = new ActiveXObject("Microsoft.XMLHTTP"); 109 } catch (e2) { 110 alert("Your browser does not support XMLHTTP!"); 111 return; 112 } 113 } 114 xmlHttp.onreadystatechange = function() { 115 if (xmlHttp.readyState == 4) { 116 if (xmlHttp.status == 200) 117 { 118 var Area = " "; 119 s = xmlHttp.responseText; 120 var InformationSet = eval('('+s+')'); 121 var leng = InformationSet[0].Length; 122 123 var kindness = InformationSet[0].KindNess; 124 125 for(var i=1;i<=leng;++i) 126 { 127 var word_s = InformationSet[i].word; 128 var num = InformationSet[i].num; 129 Area += " "; 130 Area += "<a href='#' title='在本类型中引用次数:"+num+"' onclick='toSomeWhere(\""+word_s+"\")'>"+word_s+"</a>"; 131 Area += " "; 132 } 133 Area += " "; 134 Area += " "; 135 Area += "<a href='#' onclick='makePageToOneKind(\""+kindness+"\")'/>获取本类更多热词...</a>"; 136 Area += " "; 137 Area += " "; 138 139 var id_t = getKindWordsByKindName(kindness); 140 document.getElementById(id_t).innerHTML = Area; 141 } 142 } 143 }; 144 var url ="../com/servlet/ServletForMoreInfo"; 145 var server = "kind="+word_t; 146 147 xmlHttp.open("POST", url, true); 148 xmlHttp.setRequestHeader("Content-Type","application/x-www-form-urlencoded"); 149 xmlHttp.send(server); 150 } 151 function makePageToOneKind(kind) 152 { 153 var Area = ''; 154 Area += '<div class="row">'; 155 Area += ' <div class="col-md-12">'; 156 Area += ' <h2>'+kind+'</h2>'; 157 Area += ' </div>'; 158 Area += '</div>'; 159 Area += '<hr />'; 160 Area += '<br>'; 161 Area += '<div style="background:rgb(0,153,255);margin-left:20px;margin-right:20px;height:25px;">'; 162 Area += ' <div style="margin-left:10px;margin-right:10px;margin-top:5px;margin-bottom:5px;">'; 163 Area += ' <b style="float:left;">热词表</b>'; 164 Area += ' <div style="float:right;">'; 165 Area += ' <select id="sty" onchange="simpleReset_Kind(\''+kind+'\')">'; 166 Area += ' <option value="0" selected>按照词频顺序</option>'; 167 Area += ' <option value="1">按照字母表顺序</option>'; 168 Area += ' </select>'; 169 Area += ' '; 170 Area += ' <select id="order" onchange="simpleReset_Kind(\''+kind+'\')">'; 171 Area += ' <option value="0" selected>降序</option>'; 172 Area += ' <option value="1">增序</option>'; 173 Area += ' </select>'; 174 Area += ' '; 175 Area += ' </div>'; 176 Area += ' </div>'; 177 Area += '</div>'; 178 Area += '<br>'; 179 Area += '<br>'; 180 Area += '<div id="MessageArea">'; 181 Area += '</div>'; 182 document.getElementById("page-inner").innerHTML = Area; 183 simpleReset_Kind(kind); 184 } 185 function simpleReset_Kind(kind) 186 { 187 wordPage = 1; 188 resetAndFresh_Kind(kind); 189 } 190 function XReset_Kind(p,kind) 191 { 192 wordPage = p; 193 wordPage = parseInt(""+wordPage); 194 resetAndFresh_Kind(kind); 195 } 196 function makeSurePage_Kind(kind) 197 { 198 wordPage = document.getElementById("selPage").value; 199 wordPage = parseInt(""+wordPage); 200 resetAndFresh_Kind(kind); 201 } 202 function resetAndFresh_Kind(kind) 203 { 204 var sty = document.getElementById("sty").value; 205 var order = document.getElementById("order").value; 206 var xmlHttp = null; 207 try{ 208 xmlHttp = new XMLHttpRequest(); 209 } catch (e1) { 210 try { 211 xmlHttp = new ActiveXObject("Microsoft.XMLHTTP"); 212 } catch (e2) { 213 alert("Your browser does not support XMLHTTP!"); 214 return; 215 } 216 } 217 xmlHttp.onreadystatechange = function() { 218 if (xmlHttp.readyState == 4) { 219 if (xmlHttp.status == 200) 220 { 221 var Area = ""; 222 223 s = xmlHttp.responseText; 224 var InformationSet = eval('('+s+')'); 225 var leng = InformationSet[0].Length; 226 var max = InformationSet[0].MaxSize; 227 var pageNum = InformationSet[0].Page; 228 var kind = InformationSet[0].KindNess; 229 230 Area += "<table class='WhatATable' style='margin-left:200px;float:left;'>"; 231 Area += "<tr>"; 232 Area += "<th style='width:100px;'>热词</th>"; 233 Area += "<th style='width:100px;'>词频</th>"; 234 Area += "<th style='width:100px;'>详细信息链接</th>"; 235 Area += "</tr>"; 236 if(leng<10) 237 { 238 for (var i=1;i<=leng;++i) 239 { 240 Area += "<tr>"; 241 Area += " <td>"; 242 Area += InformationSet[i].word; 243 Area += " </td>"; 244 Area += " <td>"; 245 Area += InformationSet[i].num; 246 Area += " </td>"; 247 Area += " <td>"; 248 Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>"; 249 Area += " </td>"; 250 Area += "</tr>"; 251 } 252 } 253 else 254 { 255 for (var i=1;i<=10;++i) 256 { 257 Area += "<tr>"; 258 Area += " <td>"; 259 Area += InformationSet[i].word; 260 Area += " </td>"; 261 Area += " <td>"; 262 Area += InformationSet[i].num; 263 Area += " </td>"; 264 Area += " <td>"; 265 Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>"; 266 Area += " </td>"; 267 Area += "</tr>"; 268 } 269 } 270 Area += "</table>"; 271 272 if(leng>10) 273 { 274 Area += "<table class='WhatATable' style='margin-left:10px;float:left;'>"; 275 Area += "<tr>"; 276 Area += "<th style='width:100px;'>热词</th>"; 277 Area += "<th style='width:100px;'>词频</th>"; 278 Area += "<th style='width:100px;'>详细信息链接</th>"; 279 Area += "</tr>"; 280 if(leng<=20) 281 { 282 for (var i=11;i<=leng;++i) 283 { 284 Area += "<tr>"; 285 Area += " <td>"; 286 Area += InformationSet[i].word; 287 Area += " </td>"; 288 Area += " <td>"; 289 Area += InformationSet[i].num; 290 Area += " </td>"; 291 Area += " <td>"; 292 Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>"; 293 Area += " </td>"; 294 Area += "</tr>"; 295 } 296 } 297 else 298 { 299 for (var i=11;i<=20;++i) 300 { 301 Area += "<tr>"; 302 Area += " <td>"; 303 Area += InformationSet[i].word; 304 Area += " </td>"; 305 Area += " <td>"; 306 Area += InformationSet[i].num; 307 Area += " </td>"; 308 Area += " <td>"; 309 Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>"; 310 Area += " </td>"; 311 Area += "</tr>"; 312 } 313 } 314 Area += "</table>"; 315 } 316 317 if(leng>20) 318 { 319 Area += "<table class='WhatATable' style='margin-left:10px;float:left;'>"; 320 Area += "<tr>"; 321 Area += "<th style='width:100px;'>热词</th>"; 322 Area += "<th style='width:100px;'>词频</th>"; 323 Area += "<th style='width:100px;'>详细信息链接</th>"; 324 Area += "</tr>"; 325 for (var i=21;i<=leng;++i) 326 { 327 Area += "<tr>"; 328 Area += " <td>"; 329 Area += InformationSet[i].word; 330 Area += " </td>"; 331 Area += " <td>"; 332 Area += InformationSet[i].num; 333 Area += " </td>"; 334 Area += " <td>"; 335 Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>"; 336 Area += " </td>"; 337 Area += "</tr>"; 338 } 339 Area += "</table>"; 340 } 341 Area += "<div style='clear:both;'></div>"; 342 Area += "<br>"; 343 Area += "<br>"; 344 Area += "<br>"; 345 Area += "<br>"; 346 Area += "<p style='margin-left:30px;margin-right:30px;'>"; 347 Area += " <button onclick='simpleReset_Kind(\""+kind+"\")'>起始页</button> "; 348 349 var start = ((wordPage-4)>=1)?wordPage-4:1; 350 var end = ((wordPage+4)<=pageNum)?(wordPage+4):pageNum; 351 352 //alert(parseInt(wordPage+4+"")); 353 354 if(start!=1) 355 { 356 Area += " ... "; 357 } 358 359 for(var i=start;i<=end;++i) 360 { 361 Area += " <button onclick='XReset_Kind(\""+i+"\",\""+kind+"\")'>"+i+"</button> "; 362 } 363 364 if(end!=pageNum) 365 { 366 Area += " ... "; 367 } 368 369 Area += " <button onclick='XReset_Kind("+pageNum+",\""+kind+"\")'>结束页</button> "; 370 Area += " <b>选择页数跳转</b> "; 371 Area += "<select id='selPage' onchange='makeSurePage_Kind(\""+kind+"\")'>"; 372 for(var i=1;i<=pageNum;++i) 373 { 374 Area += "<option value='"+i+"'>"+i+"</option>"; 375 } 376 Area += "</select>"; 377 Area += "</p>"; 378 document.getElementById("MessageArea").innerHTML = Area; 379 surePage_Kind(); 380 } 381 } 382 }; 383 var url ="../com/servlet/ServletForKindKeyWords"; 384 var server = "sql="; 385 // 按照词频顺序 386 if(sty==0) 387 { 388 server += " order by num "; 389 } 390 // 按照字母表顺序 391 else if(sty==1) 392 { 393 server += " order by word "; 394 } 395 396 // 如果是降序 397 if(order==0) 398 { 399 server += " DESC "; 400 } 401 402 server += (" Limit "+((wordPage-1)*30)+",30 "); 403 404 server += "&table="+kind; 405 406 xmlHttp.open("POST", url, true); 407 xmlHttp.setRequestHeader("Content-Type","application/x-www-form-urlencoded"); 408 xmlHttp.send(server); 409 } 410 function surePage_Kind(kind) 411 { 412 document.getElementById("selPage").selectedIndex = wordPage-1; 413 }
1 var wordPage = 1; 2 function makePageToWord() 3 { 4 var Area = ''; 5 Area += '<div class="row">'; 6 Area += '<div class="col-md-12">'; 7 Area += '<h2>全部热词</h2>'; 8 Area += '</div>'; 9 Area += '</div>'; 10 Area += '<hr />'; 11 Area += '<br>'; 12 Area += '<div style="background:rgb(0,153,255);margin-left:20px;margin-right:20px;height:25px;">'; 13 Area += ' <div style="margin-left:10px;margin-right:10px;margin-top:5px;margin-bottom:5px;">'; 14 Area += ' <b style="float:left;">热词表</b>'; 15 Area += ' <div style="float:right;">'; 16 Area += ' <select id="sty" onchange="simpleReset()">'; 17 Area += ' <option value="0" selected>按照词频顺序</option>'; 18 Area += ' <option value="1">按照字母表顺序</option>'; 19 Area += ' </select>'; 20 Area += ' '; 21 Area += ' <select id="order" onchange="simpleReset()">'; 22 Area += ' <option value="0" selected>降序</option>'; 23 Area += ' <option value="1">增序</option>'; 24 Area += ' </select>'; 25 Area += ' '; 26 Area += ' </div>'; 27 Area += ' </div>'; 28 Area += '</div>'; 29 Area += '<br>'; 30 Area += '<br>'; 31 Area += '<div id="MessageArea">'; 32 Area += '</div>'; 33 document.getElementById("page-inner").innerHTML = Area; 34 simpleReset(); 35 } 36 function simpleReset() 37 { 38 wordPage = 1; 39 resetAndFresh(); 40 } 41 function XReset(p) 42 { 43 wordPage = p; 44 wordPage = parseInt(""+wordPage); 45 resetAndFresh(); 46 } 47 function resetAndFresh() 48 { 49 var sty = document.getElementById("sty").value; 50 var order = document.getElementById("order").value; 51 var xmlHttp = null; 52 try{ 53 xmlHttp = new XMLHttpRequest(); 54 } catch (e1) { 55 try { 56 xmlHttp = new ActiveXObject("Microsoft.XMLHTTP"); 57 } catch (e2) { 58 alert("Your browser does not support XMLHTTP!"); 59 return; 60 } 61 } 62 xmlHttp.onreadystatechange = function() { 63 if (xmlHttp.readyState == 4) { 64 if (xmlHttp.status == 200) 65 { 66 var Area = ""; 67 68 s = xmlHttp.responseText; 69 var InformationSet = eval('('+s+')'); 70 var leng = InformationSet[0].Length; 71 var max = InformationSet[0].MaxSize; 72 var pageNum = InformationSet[0].Page; 73 74 Area += "<table class='WhatATable' style='margin-left:200px;float:left;'>"; 75 Area += "<tr>"; 76 Area += "<th style='width:100px;'>热词</th>"; 77 Area += "<th style='width:100px;'>词频</th>"; 78 Area += "<th style='width:100px;'>详细信息链接</th>"; 79 Area += "</tr>"; 80 if(leng<10) 81 { 82 for (var i=1;i<=leng;++i) 83 { 84 Area += "<tr>"; 85 Area += " <td>"; 86 Area += InformationSet[i].word; 87 Area += " </td>"; 88 Area += " <td>"; 89 Area += InformationSet[i].num; 90 Area += " </td>"; 91 Area += " <td>"; 92 Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>"; 93 Area += " </td>"; 94 Area += "</tr>"; 95 } 96 } 97 else 98 { 99 for (var i=1;i<=10;++i) 100 { 101 Area += "<tr>"; 102 Area += " <td>"; 103 Area += InformationSet[i].word; 104 Area += " </td>"; 105 Area += " <td>"; 106 Area += InformationSet[i].num; 107 Area += " </td>"; 108 Area += " <td>"; 109 Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>"; 110 Area += " </td>"; 111 Area += "</tr>"; 112 } 113 } 114 Area += "</table>"; 115 116 117 if(leng>10) 118 { 119 Area += "<table class='WhatATable' style='margin-left:10px;float:left;'>"; 120 Area += "<tr>"; 121 Area += "<th style='width:100px;'>热词</th>"; 122 Area += "<th style='width:100px;'>词频</th>"; 123 Area += "<th style='width:100px;'>详细信息链接</th>"; 124 Area += "</tr>"; 125 if(leng<=20) 126 { 127 for (var i=11;i<=leng;++i) 128 { 129 Area += "<tr>"; 130 Area += " <td>"; 131 Area += InformationSet[i].word; 132 Area += " </td>"; 133 Area += " <td>"; 134 Area += InformationSet[i].num; 135 Area += " </td>"; 136 Area += " <td>"; 137 Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>"; 138 Area += " </td>"; 139 Area += "</tr>"; 140 } 141 } 142 else 143 { 144 for (var i=11;i<=20;++i) 145 { 146 Area += "<tr>"; 147 Area += " <td>"; 148 Area += InformationSet[i].word; 149 Area += " </td>"; 150 Area += " <td>"; 151 Area += InformationSet[i].num; 152 Area += " </td>"; 153 Area += " <td>"; 154 Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>"; 155 Area += " </td>"; 156 Area += "</tr>"; 157 } 158 } 159 Area += "</table>"; 160 } 161 162 if(leng>20) 163 { 164 Area += "<table class='WhatATable' style='margin-left:10px;float:left;'>"; 165 Area += "<tr>"; 166 Area += "<th style='width:100px;'>热词</th>"; 167 Area += "<th style='width:100px;'>词频</th>"; 168 Area += "<th style='width:100px;'>详细信息链接</th>"; 169 Area += "</tr>"; 170 for (var i=21;i<=leng;++i) 171 { 172 Area += "<tr>"; 173 Area += " <td>"; 174 Area += InformationSet[i].word; 175 Area += " </td>"; 176 Area += " <td>"; 177 Area += InformationSet[i].num; 178 Area += " </td>"; 179 Area += " <td>"; 180 Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>"; 181 Area += " </td>"; 182 Area += "</tr>"; 183 } 184 Area += "</table>"; 185 } 186 Area += "<div style='clear:both;'></div>"; 187 Area += "<br>"; 188 Area += "<br>"; 189 Area += "<br>"; 190 Area += "<br>"; 191 Area += "<p style='margin-left:30px;margin-right:30px;'>"; 192 Area += " <button onclick='simpleReset()'>起始页</button> "; 193 194 var start = ((wordPage-4)>=1)?wordPage-4:1; 195 var end = ((wordPage+4)<=pageNum)?(wordPage+4):pageNum; 196 197 //alert(parseInt(wordPage+4+"")); 198 199 if(start!=1) 200 { 201 Area += " ... "; 202 } 203 204 for(var i=start;i<=end;++i) 205 { 206 Area += " <button onclick='XReset("+i+")'>"+i+"</button> "; 207 } 208 209 if(end!=pageNum) 210 { 211 Area += " ... "; 212 } 213 214 Area += " <button onclick='XReset("+pageNum+")'>结束页</button> "; 215 Area += " <b>选择页数跳转</b> "; 216 Area += "<select id='selPage' onchange='makeSurePage()'>"; 217 for(var i=1;i<=pageNum;++i) 218 { 219 Area += "<option value='"+i+"'>"+i+"</option>"; 220 } 221 Area += "</select>"; 222 Area += "</p>"; 223 document.getElementById("MessageArea").innerHTML = Area; 224 surePage(); 225 } 226 } 227 }; 228 var url ="../com/servlet/ServletForAllKeyWords"; 229 var server = "sql="; 230 // 按照词频顺序 231 if(sty==0) 232 { 233 server += " order by num "; 234 } 235 // 按照字母表顺序 236 else if(sty==1) 237 { 238 server += " order by word "; 239 } 240 241 // 如果是降序 242 if(order==0) 243 { 244 server += " DESC "; 245 } 246 247 server += (" Limit "+((wordPage-1)*30)+",30 "); 248 249 xmlHttp.open("POST", url, true); 250 xmlHttp.setRequestHeader("Content-Type","application/x-www-form-urlencoded"); 251 xmlHttp.send(server); 252 } 253 function toSomeWhere(word) 254 { 255 var Area = ''; 256 Area += '<div class="row">'; 257 Area += ' <div class="col-md-12">'; 258 Area += ' <h2>'+word+'</h2>'; 259 Area += ' </div>'; 260 Area += '</div>'; 261 Area += '<hr />'; 262 Area += '<br>'; 263 Area += '<div id="MessageArea">'; 264 Area += '</div>'; 265 document.getElementById("page-inner").innerHTML = Area; 266 267 var xmlHttp = null; 268 try{ 269 xmlHttp = new XMLHttpRequest(); 270 } catch (e1) { 271 try { 272 xmlHttp = new ActiveXObject("Microsoft.XMLHTTP"); 273 } catch (e2) { 274 alert("Your browser does not support XMLHTTP!"); 275 return; 276 } 277 } 278 xmlHttp.onreadystatechange = function() { 279 if (xmlHttp.readyState == 4) { 280 if (xmlHttp.status == 200) 281 { 282 var Area = ""; 283 284 s = xmlHttp.responseText; 285 var InformationSet = eval('('+s+')'); 286 var word = InformationSet[1].word; 287 var num = InformationSet[1].num; 288 var exp = InformationSet[1].exp; 289 290 Area += "<p><b id='word' style='font-size:120%;'>"+word+"</b></p>"; 291 Area += "<p style='color:rgb(200,200,200);'> 引用次数:"+num+"</p>" 292 Area += "<p style='font:\"楷体\";font-size:90%;'> "; 293 if(exp=="") 294 { 295 Area += "目前百度百科上并没有相关解释信息..."; 296 } 297 else 298 { 299 Area += exp; 300 } 301 Area += "</p>"; 302 Area += "<br>"; 303 Area += "<div id='finalDIV'></div>" 304 document.getElementById("MessageArea").innerHTML = Area; 305 306 getLinksForKey(word); 307 } 308 } 309 }; 310 var url ="../com/servlet/ServletForAllKeyWords"; 311 var server = "sql= where word='"+word+"'"; 312 313 xmlHttp.open("POST", url, true); 314 xmlHttp.setRequestHeader("Content-Type","application/x-www-form-urlencoded"); 315 xmlHttp.send(server); 316 } 317 function getLinksForKey(word) 318 { 319 var xmlHttp = null; 320 try{ 321 xmlHttp = new XMLHttpRequest(); 322 } catch (e1) { 323 try { 324 xmlHttp = new ActiveXObject("Microsoft.XMLHTTP"); 325 } catch (e2) { 326 alert("Your browser does not support XMLHTTP!"); 327 return; 328 } 329 } 330 xmlHttp.onreadystatechange = function() { 331 if (xmlHttp.readyState == 4) { 332 if (xmlHttp.status == 200) 333 { 334 var Area = ""; 335 Area += "<br>"; 336 Area += "<br>"; 337 Area += "<b style='font-size:120%;'>引用网页:</b>"; 338 Area += "<br>"; 339 Area += "<br>"; 340 Area += "<ul>"; 341 s = xmlHttp.responseText; 342 var InformationSet = eval('('+s+')'); 343 var leng = InformationSet[0].Length; 344 345 for(var i=1;i<=leng;++i) 346 { 347 var word = InformationSet[i].word; 348 var num = InformationSet[i].num; 349 var title = InformationSet[i].title; 350 var link = InformationSet[i].link; 351 Area += "<li>"; 352 Area += "<a href='"+link+"' title='引用次数:"+num+"'>"+title+"</a>" 353 Area += "</li>"; 354 } 355 Area += "</ul>"; 356 357 document.getElementById("finalDIV").innerHTML = Area; 358 } 359 } 360 }; 361 var url ="../com/servlet/ServletForLinkData"; 362 var server = "word="+word; 363 364 xmlHttp.open("POST", url, true); 365 xmlHttp.setRequestHeader("Content-Type","application/x-www-form-urlencoded"); 366 xmlHttp.send(server); 367 } 368 function surePage() 369 { 370 document.getElementById("selPage").selectedIndex = wordPage-1; 371 } 372 function makeSurePage() 373 { 374 wordPage = document.getElementById("selPage").value; 375 wordPage = parseInt(""+wordPage); 376 resetAndFresh(); 377 }
更新 web.xml 引用
1 <?xml version="1.0" encoding="UTF-8"?> 2 <web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://xmlns.jcp.org/xml/ns/javaee" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_4_0.xsd" id="WebApp_ID" version="4.0"> 3 <display-name>HotWord</display-name> 4 <servlet> 5 <description>This is the description of my J2EE component</description> 6 <display-name>This is the display name of my J2EE component</display-name> 7 <servlet-name>ServletForWords</servlet-name> 8 <servlet-class>com.servlet.ServletForWords</servlet-class> 9 </servlet> 10 <servlet-mapping> 11 <servlet-name>ServletForWords</servlet-name> 12 <url-pattern>/com/servlet/ServletForWords</url-pattern> 13 </servlet-mapping> 14 <servlet> 15 <description>This is the description of my J2EE component</description> 16 <display-name>This is the display name of my J2EE component</display-name> 17 <servlet-name>ServletForAllKeyWords</servlet-name> 18 <servlet-class>com.servlet.ServletForAllKeyWords</servlet-class> 19 </servlet> 20 <servlet-mapping> 21 <servlet-name>ServletForAllKeyWords</servlet-name> 22 <url-pattern>/com/servlet/ServletForAllKeyWords</url-pattern> 23 </servlet-mapping> 24 <servlet> 25 <description>This is the description of my J2EE component</description> 26 <display-name>This is the display name of my J2EE component</display-name> 27 <servlet-name>ServletForLinkData</servlet-name> 28 <servlet-class>com.servlet.ServletForLinkData</servlet-class> 29 </servlet> 30 <servlet-mapping> 31 <servlet-name>ServletForLinkData</servlet-name> 32 <url-pattern>/com/servlet/ServletForLinkData</url-pattern> 33 </servlet-mapping> 34 <servlet> 35 <description>This is the description of my J2EE component</description> 36 <display-name>This is the display name of my J2EE component</display-name> 37 <servlet-name>ServletForMoreInfo</servlet-name> 38 <servlet-class>com.servlet.ServletForMoreInfo</servlet-class> 39 </servlet> 40 <servlet-mapping> 41 <servlet-name>ServletForMoreInfo</servlet-name> 42 <url-pattern>/com/servlet/ServletForMoreInfo</url-pattern> 43 </servlet-mapping> 44 <servlet> 45 <description>This is the description of my J2EE component</description> 46 <display-name>This is the display name of my J2EE component</display-name> 47 <servlet-name>ServletForKindKeyWords</servlet-name> 48 <servlet-class>com.servlet.ServletForKindKeyWords</servlet-class> 49 </servlet> 50 <servlet-mapping> 51 <servlet-name>ServletForKindKeyWords</servlet-name> 52 <url-pattern>/com/servlet/ServletForKindKeyWords</url-pattern> 53 </servlet-mapping> 54 <welcome-file-list> 55 <welcome-file>index.html</welcome-file> 56 <welcome-file>index.htm</welcome-file> 57 <welcome-file>index.jsp</welcome-file> 58 <welcome-file>default.html</welcome-file> 59 <welcome-file>default.htm</welcome-file> 60 <welcome-file>default.jsp</welcome-file> 61 </welcome-file-list> 62 </web-app>
更新 jsp 页面代码:
1 <%@ page language="java" contentType="text/html; charset=utf-8" 2 pageEncoding="utf-8"%> 3 <!DOCTYPE html> 4 <html><!-- xmlns="http://www.w3.org/1999/xhtml" --> 5 <head> 6 <!--<meta charset="utf-8" />--> 7 <meta name="viewport" content="width=device-width, initial-scale=1.0" charset="utf-8"/> 8 <title>热词分析</title> 9 <!-- BOOTSTRAP STYLES--> 10 <link href="../assets/css/bootstrap.css" rel="stylesheet" /> 11 <!-- FONTAWESOME STYLES--> 12 <link href="../assets/css/font-awesome.css" rel="stylesheet" /> 13 <!-- CUSTOM STYLES--> 14 <link href="../assets/css/custom.css" rel="stylesheet" /> 15 <!-- PERSONAL FONTS--> 16 <link href='../cssFiles/basic.css' rel='stylesheet' type='text/css' /> 17 <!-- GOOGLE FONTS--> 18 <link href='http://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css' /> 19 </head> 20 <script src="../jsFiles/jquery/jquery-3.4.1.min.js" charset="utf-8"></script> 21 <script src="../jsFiles/echarts/echarts.min.js" charset="utf-8"></script> 22 <script src="../jsFiles/echarts/echarts-wordcloud-master/dist/echarts-wordcloud.min.js" charset="utf-8"></script> 23 <!-- <script src="../jsFiles/echarts/echarts-wordcloud-master/dist/echarts-wordcloud.min.js" charset="utf-8"></script> --> 24 <script src="../jsFiles/basic.js" charset="utf-8"></script> 25 <script src='../jsFiles/echarts/echarts.simple.js'></script> 26 <script src="../jsFiles/word.js" charset="utf-8"></script> 27 <script src="../jsFiles/wordkind.js" charset="utf-8"></script> 28 <script src="../jsFiles/cloud.js" charset="utf-8"></script> 29 <body> 30 <div id="wrapper"> 31 <div class="navbar navbar-inverse navbar-fixed-top"> 32 <div class="adjust-nav"> 33 <div class="navbar-header"> 34 <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".sidebar-collapse"> 35 <span class="icon-bar"></span> 36 <span class="icon-bar"></span> 37 <span class="icon-bar"></span> 38 </button> 39 <a class="navbar-brand"><i class="fa fa-square-o "></i> 欢迎您使用本热词分析系统</a> 40 </div> 41 </div> 42 </div> 43 <!-- /. NAV TOP --> 44 <div class="navbar-default navbar-side"> <!-- nav role="navigation" --> 45 <div class="sidebar-collapse"> 46 <ul class="nav" id="main-menu"> 47 <li class="text-center user-image-back"> 48 <img src="../assets/img/find_user.png" class="img-responsive" /> 49 </li> 50 <li> 51 <a href="#" onclick="makePageToMain()"><i class="fa fa-table "></i>主页</a> 52 </li> 53 <li> 54 <a href="#" onclick="makePageToWord()"><i class="fa fa-key "></i>全部热词</a> 55 </li> 56 <li> 57 <a href="#" onclick="makePageToKind()"><i class="fa fa-key "></i>热词目录</a> 58 </li> 59 <li> 60 <a href="#"><i class="fa fa-edit "></i>热词需求<span class="fa arrow"></span></a> 61 <ul class="nav nav-second-level"> 62 <li> 63 <a href="#" onclick="makePageToCl()">热词云图</a> 64 </li> 65 <li> 66 <a href="#" onclick="makePageToRe()">热词关系图</a> 67 </li> 68 </ul> 69 </li> 70 </ul> 71 </div> 72 </div> 73 <!-- /. NAV SIDE --> 74 <div id="page-wrapper" > 75 <div id="page-inner"> 76 <div class="row"> 77 <div class="col-md-12"> 78 <h2>主页</h2> 79 </div> 80 </div> 81 <!-- /. ROW --> 82 <hr /> 83 <!-- /. ROW --> 84 <br> 85 <br> 86 <div id="MessageArea"> 87 <br> 88 <h3>欢迎您使用本热词分析系统</h3> 89 </div> 90 </div> 91 <!-- /. PAGE INNER --> 92 </div> 93 <!-- /. PAGE WRAPPER --> 94 </div> 95 <!-- /. WRAPPER --> 96 <!-- SCRIPTS -AT THE BOTOM TO REDUCE THE LOAD TIME--> 97 <!-- JQUERY SCRIPTS --> 98 <script src="../assets/js/jquery-1.10.2.js"></script> 99 <!-- BOOTSTRAP SCRIPTS --> 100 <script src="../assets/js/bootstrap.min.js"></script> 101 <!-- METISMENU SCRIPTS --> 102 <script src="../assets/js/jquery.metisMenu.js"></script> 103 <!-- CUSTOM SCRIPTS --> 104 <script src="../assets/js/custom.js"></script> 105 </body> 106 </html>
另外的部分我想了,还是分开写吧!