使用线程池+CountDownLatch 实现多线程协同工作 结果汇总 (适用于数据运算分析,数据库操作,网页爬虫)

多线程数据去重使用示例:

public void obtainSimilarityRate() {
        List<FgTestR3> zjFg = list((new QueryWrapper<FgTestR3>())
                .select("autoid","titles","contents")
                .eq("webGuid", "1")
        );

        List<FgTestR3> otherFg = list((new QueryWrapper<FgTestR3>())
                .select("autoid","titles","contents")
                .ne("webGuid", "1")
//                .eq("webGuid", "4")
        );
        List<Long> deleteIds = new ArrayList<>();
        int count = 1;

        final CountDownLatch latch = new CountDownLatch(zjFg.size());
        ThreadPoolManager threadPoolManager = ThreadPoolManager.newInstance();
        final Vector<Long> de = new Vector<>();

        for (FgTestR3 fgTestR3 : zjFg) {
            log.info("===========处理第" + count++ + "条总局数据");
            List<Long> autoIds = new ArrayList<>();
            List<FgTestR3> oneFgs = new ArrayList<>();
            String titles = fgTestR3.getTitles();
            String contents = fgTestR3.getContents();
            if (StringUtils.isEmpty(titles) || StringUtils.isEmpty(contents)) {
                continue;
            }
            // 对非总局数据进行标题匹配
            for (FgTestR3 testR3 : otherFg) {
                String titles1 = testR3.getTitles();
                if (StringUtils.isEmpty(titles1)) {
                    continue;
                }
                if (StringUtil.getSimilarityRatio(titles, titles1) > 0.5) {
                    oneFgs.add(testR3);
                }
            }
            // 对标题重复率超过50% 数据进行内容比对
//            for (FgTestR3 autoIdFg : oneFgs) {
//                String contents1 = autoIdFg.getContents();
//                if (StringUtils.isEmpty(contents1)) {
//                    continue;
//                }
//                String text = Jsoup.parse(contents).text();
//                String text1 = Jsoup.parse(contents1).text();
//                // 内容重复率超过80%则移除
//                if (StringUtil.getSimilarityRatio(text, text1) > 0.8) {
//                    deleteIds.add(autoIdFg.getAutoid());
//                }
//            }

            // 多线程测试
            threadPoolManager.addExecuteTask(() -> ThreadHandle(latch, de, oneFgs, contents));
            // 多线程测试end
        }

        try {
            latch.await();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

//        log.info("内容分析去重法规条数:" + de.size());
        deleteIds = de.stream().distinct().collect(Collectors.toList());
        log.info("内容分析去重法规条数:" + deleteIds.size());
        batchRemoveByAutoId(deleteIds);
        int i = 0;
    }

    public void ThreadHandle(CountDownLatch latch, Vector<Long> de, List<FgTestR3> oneFgs, String contents) {
        try {
            log.info("正在处理:" + latch.getCount());
            for (FgTestR3 autoIdFg : oneFgs) {
                String contents1 = autoIdFg.getContents();
                if (StringUtils.isEmpty(contents1)) {
                    continue;
                }
                String text = Jsoup.parse(contents).text();
                String text1 = Jsoup.parse(contents1).text();
                // 内容重复率超过80%则移除
                if (StringUtil.getSimilarityRatio(text, text1) > 0.9) {
                    de.add(autoIdFg.getAutoid());
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }finally {
            latch.countDown();
        }
    }

备注 :latch 线程计数器一定要注意是否归零 否则程序会一直卡住 等待死锁

注释:线程池工具 和 字符串相似度分析工具:

字符串处理:https://www.cnblogs.com/guanxiaohe/p/13305598.html

线程池:https://www.cnblogs.com/guanxiaohe/p/12634306.html

posted @ 2020-07-15 15:07  官萧何  阅读(809)  评论(0编辑  收藏  举报