随笔分类 - 爬虫

摘要：在ubuntu环境下，使用scrapy定时执行抓取任务，由于scrapy本身没有提供定时执行的功能，所以采用了crontab的方式进行定时执行：首先编写要执行的命令脚本cron.sh #! /bin/sh ... 阅读全文

posted @ 2015-05-13 14:45 justinzhang 阅读(17591) 评论(1) 推荐(0)

摘要：1. 获取某一个节点下所有的文本数据： data = response.xpath('//div[@id="zoomcon"]') content = ''.join(data.xpath('string(.)').extract()) 这段代码将获取，div为某一个特定id的所有文本数据： http://www.nhfpc.gov.cn/fzs/s3576/200804/cdbda975... 阅读全文

posted @ 2015-05-06 15:29 justinzhang 阅读(2440) 评论(0) 推荐(0)

justinzhang

随笔分类 - 爬虫

公告