随笔档案「2019年4月」 - 安智伟

005 动态加载实例

摘要：from selenium import webdriver from selenium.webdriver.chrome.options import Options from time import sleep # 创建一个对象，用来控制chorme以无界面模式打开 chrome_options 阅读全文

posted @ 2019-04-22 17:47 安智伟阅读(129) 评论(0) 推荐(0)

爬虫实现案例

摘要：import requests from lxml import etree from 爬虫.old_boy.p3 import get_code_text session = requests.session() # session的作用与requests的作用几乎一样，都可以请求的发送，并且请求阅读全文

posted @ 2019-04-18 23:22 安智伟阅读(157) 评论(0) 推荐(0)

004 使用scrapy框架爬虫

摘要：0. 建立housePro的scrapy爬虫框架 1. 用scrapy爬取网站信息 2. scrapy进行数据解析调用parse的response参数，其中response对象可以直接调用xpath方法 3. scrapy的持久化存储使用管道进行持久化流程 1.获取解析到的数据值 2.将解析到的阅读全文

posted @ 2019-04-15 19:38 安智伟阅读(183) 评论(0) 推荐(0)

003 爬虫持久化的三个不同数据库的python代码

摘要：MongoDB import pymongo # 1、连接MongoDB服务 mongo_py = pymongo.MongoClient() print(mongo_py) # 2、库和表的名字；有时间会自动建库建表 # 数据库 db = mongo_py['test2'] # 表、集合 coll 阅读全文

posted @ 2019-04-15 17:15 安智伟阅读(206) 评论(0) 推荐(0)

002 requests的使用方法以及xpath和beautifulsoup4提取数据

摘要：1、直接使用url，没用headers的请求 import requests url = 'http://www.baidu.com' # requests请求用get方法 response = requests.get(url) # 返回的content是字节需要解码 data = respon 阅读全文

posted @ 2019-04-08 13:46 安智伟阅读(442) 评论(0) 推荐(0)

001 爬虫的基本概念以及urllib的request和parse

摘要：from urllib import request def load_data(): url = "http://www.baidu.com/" # 发送get的http请求 # respense: http相应的对象 response = request.urlopen(url) # 读取内容阅读全文

posted @ 2019-04-07 12:25 安智伟阅读(355) 评论(0) 推荐(0)

安智伟

04 2019 档案

公告