使用线程池+CountDownLatch 实现多线程协同工作 结果汇总 (适用于数据运算分析,数据库操作,网页爬虫)
多线程数据去重使用示例:
public void obtainSimilarityRate() { List<FgTestR3> zjFg = list((new QueryWrapper<FgTestR3>()) .select("autoid","titles","contents") .eq("webGuid", "1") ); List<FgTestR3> otherFg = list((new QueryWrapper<FgTestR3>()) .select("autoid","titles","contents") .ne("webGuid", "1") // .eq("webGuid", "4") ); List<Long> deleteIds = new ArrayList<>(); int count = 1; final CountDownLatch latch = new CountDownLatch(zjFg.size()); ThreadPoolManager threadPoolManager = ThreadPoolManager.newInstance(); final Vector<Long> de = new Vector<>(); for (FgTestR3 fgTestR3 : zjFg) { log.info("===========处理第" + count++ + "条总局数据"); List<Long> autoIds = new ArrayList<>(); List<FgTestR3> oneFgs = new ArrayList<>(); String titles = fgTestR3.getTitles(); String contents = fgTestR3.getContents(); if (StringUtils.isEmpty(titles) || StringUtils.isEmpty(contents)) { continue; } // 对非总局数据进行标题匹配 for (FgTestR3 testR3 : otherFg) { String titles1 = testR3.getTitles(); if (StringUtils.isEmpty(titles1)) { continue; } if (StringUtil.getSimilarityRatio(titles, titles1) > 0.5) { oneFgs.add(testR3); } } // 对标题重复率超过50% 数据进行内容比对 // for (FgTestR3 autoIdFg : oneFgs) { // String contents1 = autoIdFg.getContents(); // if (StringUtils.isEmpty(contents1)) { // continue; // } // String text = Jsoup.parse(contents).text(); // String text1 = Jsoup.parse(contents1).text(); // // 内容重复率超过80%则移除 // if (StringUtil.getSimilarityRatio(text, text1) > 0.8) { // deleteIds.add(autoIdFg.getAutoid()); // } // } // 多线程测试 threadPoolManager.addExecuteTask(() -> ThreadHandle(latch, de, oneFgs, contents)); // 多线程测试end } try { latch.await(); } catch (InterruptedException e) { e.printStackTrace(); } // log.info("内容分析去重法规条数:" + de.size()); deleteIds = de.stream().distinct().collect(Collectors.toList()); log.info("内容分析去重法规条数:" + deleteIds.size()); batchRemoveByAutoId(deleteIds); int i = 0; } public void ThreadHandle(CountDownLatch latch, Vector<Long> de, List<FgTestR3> oneFgs, String contents) { try { log.info("正在处理:" + latch.getCount()); for (FgTestR3 autoIdFg : oneFgs) { String contents1 = autoIdFg.getContents(); if (StringUtils.isEmpty(contents1)) { continue; } String text = Jsoup.parse(contents).text(); String text1 = Jsoup.parse(contents1).text(); // 内容重复率超过80%则移除 if (StringUtil.getSimilarityRatio(text, text1) > 0.9) { de.add(autoIdFg.getAutoid()); } } } catch (Exception e) { e.printStackTrace(); }finally { latch.countDown(); } }
备注 :latch 线程计数器一定要注意是否归零 否则程序会一直卡住 等待死锁
注释:线程池工具 和 字符串相似度分析工具:
⎛⎝官萧何⎠⎞一只快乐的爪哇程序猿;邮箱:1570608034@qq.com