2020 年 5月 31 日随笔档案 - ^更上一层楼$

2020年5月31日

摘要： 1 爬虫介绍 # 1 本质：模拟发送http请求（requests）》解析返回数据（re，bs4，lxml，json）》入库（redis，mysql，mongodb） # 2 app爬虫：本质一模一样 # 3 为什么python做爬虫最好：包多，爬虫框架：scrapy：性能很高的爬虫框架，爬虫界阅读全文

posted @ 2020-05-31 22:03 ^更上一层楼$ 阅读(165) 评论(0) 推荐(0) 编辑

爬虫~scrapy1

摘要： 1 全站爬取cnblogs # 1 scrapy startproject cnblogs_crawl # 2 scrapy genspider cnblogs www.cnblogs.com 2 scarpy请求传参 # 1 放：yield Request(url,callback=self.p 阅读全文

posted @ 2020-05-31 22:01 ^更上一层楼$ 阅读(141) 评论(0) 推荐(0) 编辑

爬虫~scrapy

摘要： 1 scarpy框架的安装和启动 # 1 框架不是模块 # 2 号称爬虫界的django（你会发现，跟django很多地方一样） # 3 安装 -mac，linux平台：pip3 install scrapy -windows平台：pip3 install scrapy（大部分人可以） - 如果阅读全文

posted @ 2020-05-31 21:58 ^更上一层楼$ 阅读(126) 评论(0) 推荐(0) 编辑

爬虫~选择器

摘要： 1 css选择器和xpath选择器 # css选择器 ####### #1 css选择器 ####### # 重点 # Tag对象.select("css选择器") # #ID号 # .类名 # div>p：儿子和div p：子子孙孙 # 找div下最后一个a标签 div a:last-child 阅读全文

posted @ 2020-05-31 21:56 ^更上一层楼$ 阅读(346) 评论(0) 推荐(0) 编辑

爬虫~bs4

摘要：今日内容 1 requests+bs4爬汽车之家新闻 # 今日头条 # https://www.autohome.com.cn/news/1/#liststart ###### #2 爬取汽车之家新闻 ###### import requests # 向汽车之家发送get请求，获取到页面 ret=r 阅读全文

posted @ 2020-05-31 21:53 ^更上一层楼$ 阅读(149) 评论(0) 推荐(0) 编辑

天天好心情

Working and Learning make me happy !

公告