2019 年 4月 10 日随笔档案 - .Tang

公告

2019年4月10日

摘要： items定义字段名字当怕爬取到数据时 pipelines储存进mongodb，需将数据转换成dict 阅读全文

posted @ 2019-04-10 18:34 .Tang 阅读(162) 评论(0) 推荐(0) 编辑

摘要： # -*- coding: utf-8 -*- import scrapy class HrSpider(scrapy.Spider): name = 'hr' allowed_domains = ['tencent.com'] start_urls = ['https://hr.tencent.com/position.php'] def parse(... 阅读全文

posted @ 2019-04-10 17:57 .Tang 阅读(547) 评论(0) 推荐(0) 编辑

日志模块logging

摘要： import logging # 设置日志基础样式 logging.basicConfig(level=logging.INFO, format='levelname:%(levelname)s filename: %(filename)s ' 'outputNumber: [%(lineno)... 阅读全文

posted @ 2019-04-10 16:04 .Tang 阅读(112) 评论(0) 推荐(0) 编辑

scrapy-logging

摘要： settings 阅读全文

posted @ 2019-04-10 15:50 .Tang 阅读(121) 评论(0) 推荐(0) 编辑

scrapy-pipeline,mysql

摘要： scrapy中多个pipeline作用：一个项目可能需要爬取多个网站，根据每个网站的数据量（处理方式）不同，可创建多个管道 pipeline pipeline的方法 mysql mongodb 阅读全文

posted @ 2019-04-10 15:28 .Tang 阅读(263) 评论(0) 推荐(0) 编辑

scrapy

摘要： scrapy中间件下载中间件Downloader Middlewares和开发代理中间件 1.创建一个scrapy项目 scrapy startproject SpiderAnything 2.生成一个爬虫 itcash爬虫名字， itcash.cn爬虫范围 scrapy genspider it 阅读全文

posted @ 2019-04-10 15:18 .Tang 阅读(197) 评论(0) 推荐(0) 编辑