小白Amir - 博客园

摘要： # -*- coding: utf-8 -*- import scrapy from ..items import QutoutiaoItem import json import re from ..settings import CATEGORY_INFO, LIST_LIMIT class QutoutiaoSpider(scrapy.Spider): name = 'qu... 阅读全文

posted @ 2018-06-02 10:28 小白Amir 阅读(899) 评论(0) 推荐(0)

[置顶] scrapy+selenium 爬取淘宝

摘要： # -*- coding: utf-8 -*- import scrapy from scrapy import Request from urllib.parse import quote from ..items import ScrapyseleniumtestItem class TaobaoSpider(scrapy.Spider): name = 'tao_bao' ... 阅读全文

posted @ 2018-05-15 18:45 小白Amir 阅读(2716) 评论(0) 推荐(0)

2019年12月29日

pycharm远程开发环境设置

摘要：切记：最好是linux中的项目名和window上的一样，环境名也可以一样。阅读全文

posted @ 2019-12-29 14:59 小白Amir 阅读(596) 评论(1) 推荐(0)

2018年6月8日

scrapy+lxml.etree爬取百度贴吧

摘要：分析：首先通过scrapy内置的xpath提取内容，发现为空，所以不行咯采用正则re匹配出所有的<li>标签，也就是需要提取的所有内容在把li标签通过resultTree = lxml.etree.HTML(articleBody)，变成'lxml.etree._Element' 在通过resu 阅读全文

posted @ 2018-06-08 16:58 小白Amir 阅读(406) 评论(0) 推荐(0)