两个job同时执行发生的数据问题

job1: 把超过24小时的数据移到另一张表

    @Override
    @Transactional
    public void moveItemsTo48h() {
        List<CrawlItem> historyItems = super.baseMapper.getItemsAfter24Hours();
        historyItems.stream().forEach(a -> {
            Long id = a.getId();
            CrawlItem48h item = BeanUtil.toBean(a,CrawlItem48h.class);

            QueryWrapper<CrawlItem48h> wrapper = new QueryWrapper<CrawlItem48h>();
            wrapper.eq("mall",a.getMall());
            wrapper.eq("goods_source_sn", a.getGoodsSourceSn());
            CrawlItem48h exist = crawlItem48hMapper.selectOne(wrapper);
            if(exist == null) {
                crawlItem48hMapper.insert(item);
            }else {
                BeanUtil.copyProperties(item,exist,"id");
                crawlItem48hMapper.updateById(exist);
            }
            removeById(id);
        });
    }


@Select("select id,goods_source_sn,goods_info_url,source,url_code," +
"thumb_url,zhi_count,buzhi_count,star_count,comments_count,mall,title,emphsis,detail,detail_brief," +
"label,category_text,item_create_time,item_update_time,main_image_url,big_image_urls,small_image_urls," +
"price_text,price,unit_price,actual_buy_link,transfer_link,transfer_result,transfer_remark,coupon_info,taobao_pwd," +
"score,score_minute,keywords,status,remark,creator," +
"creator_id,last_operator,last_operator_id from crawl_items where TIMESTAMPDIFF(HOUR,item_create_time,now()) > 24 for update")
List<CrawlItem> getItemsAfter24Hours();
 

job2: 把前两页的数据重新爬取

    @Override
    @Transactional
    public void reCrawl() {
        List<String> urllist = crawlItemMapper.getFirstRecrawlItems();
        StringBuffer b = new StringBuffer();
        urllist.stream().forEach(a -> b.append(a).append(","));
        String raw= b.toString();
        String urls = raw.substring(0,raw.length()-1);
        Map<String, Object> paramMap = new HashMap<String,Object>();
        paramMap.put("project","smzdmCrawler");
        paramMap.put("spider","smzdm_single");
        paramMap.put("url",urls
        String response = HttpUtil.createRequest(Method.POST,"http://42.192.51.99:6801/schedule.json").basicAuth("david_scrapyd","david_2021").form(paramMap).execute().body();
        log.info(response);
    }

    @Select("select goods_info_url from crawl_items where status=1 order by score_minute desc, id desc limit 0,40 for update")
    List<String> getFirstRecrawlItems();

 

这两个查询都有for update,而且方法都加了@Transactional,理论上job1 sql先锁表,job2的sql会等在那,直到moveItemsTo48h()执行完,然后解锁,job2执行getFirstRecrawlItems(),然后job1里被删的数据job2里应该检索不出来,也就不会重新插进去了。但线上结果似乎都是job2的sql先锁表,然后就会把在job1里删掉的数据重新爬取插进去。可惜线上没把sql语句的debug日志打出,mysql也没有开general_log,只能暂且认为job2里先锁表了。

 

posted @ 2021-08-02 23:32  zjhgx  阅读(136)  评论(0编辑  收藏  举报