不写代码的爬虫
不写代码的爬虫,鼠标直接点一点,数据哗哗就来了,采集数据从来没有这么轻松过,对很多不懂代码编程的销售人员、网络运营、市场运营、网络编辑、SEO等等都可以轻松采集常见的大多数网站数据
博客园前5页话题数据采集案例,
特此记录下,以备不时之需
{"_id":"cnblogs","startUrl":["https://www.cnblogs.com/#p[1-5:1]"],"selectors":[{"id":"blog","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.post_item","multiple":true,"delay":0},{"id":"title","type":"SelectorText","parentSelectors":["blog"],"selector":"a.titlelnk","multiple":false,"regex":"","delay":0},{"id":"url","type":"SelectorElementAttribute","parentSelectors":["blog"],"selector":"a.titlelnk","multiple":false,"extractAttribute":"href","delay":0},{"id":"desc","type":"SelectorText","parentSelectors":["blog"],"selector":"p","multiple":false,"regex":"","delay":0},{"id":"read","type":"SelectorText","parentSelectors":["blog"],"selector":".article_view a","multiple":false,"regex":"","delay":0},{"id":"pinglun","type":"SelectorText","parentSelectors":["blog"],"selector":".article_comment a","multiple":false,"regex":"","delay":0},{"id":"date","type":"SelectorText","parentSelectors":["blog"],"selector":"div.post_item_foot","multiple":false,"regex":"","delay":0}]}