icui4cu

2022年4月16日

摘要：和python进行互动安装redis库 pip3 install redis 连接数据库 import redis # StrictRedis和Redis效果一样，后者是前者的子集 r = redis.StrictRedis(host='localhost',port=6379,db=0) r.s 阅读全文

posted @ 2022-04-16 18:55 icui4cu 阅读(66) 评论(0) 推荐(0) 编辑

9.Redis数据库部署和操作

摘要： Redis 介绍 NoSQL类型数据库之一，内存运行，效率极高，支持分布式，理论上可以无限拓展数据库，支持各种语言的API，可安装在各种平台关系型数据库：MySQL、SQLServer、Oracle 非关系型数据库：NoSQL、Redis（key-value）特点： c/s通信模式（Client 阅读全文

posted @ 2022-04-16 18:54 icui4cu 阅读(32) 评论(0) 推荐(0) 编辑

8.中间件以及crawlspider使用

摘要：同时采集多个字段 items.py import scrapy class Test1Item(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() # 在items定义数据类型 titl 阅读全文

posted @ 2022-04-16 18:53 icui4cu 阅读(33) 评论(0) 推荐(0) 编辑

7.selenium收尾工作以及认识scrapy

摘要：模拟手机操作 # 引入触控类 from selenium.webdriver.common.touch_actions import TouchActions mobileEmulation = {'deviceName':'iPhone 6/7/8'} options = webdriver.Ch 阅读全文

posted @ 2022-04-16 18:52 icui4cu 阅读(35) 评论(0) 推荐(0) 编辑

6.Cookie与滑块标签操作

摘要： cookie操作 # 查看cookie cookie = driver.get_cookies() print(cookie) # 增加cookie driver.add_cookie({'name':'xiaoming','key':'9988'}) # 清除所有cookie driver.del 阅读全文

posted @ 2022-04-16 18:51 icui4cu 阅读(51) 评论(0) 推荐(0) 编辑

5.模拟人类操作采集信息

摘要：拖拽操作 # 拖拽操作 first_target = driver.find_element_by_xpath("//span[contains(text(),'喜羊羊与灰太狼之决战次时代')]") second_target = driver.find_element_by_xpath("//a[ 阅读全文

posted @ 2022-04-16 18:50 icui4cu 阅读(34) 评论(0) 推荐(0) 编辑

4.复杂操作以及特殊情况处理

摘要： selenium电脑模式和手机模式 # 指定调用某个地方的chrome options = webdriver.ChromeOptions() # chrome浏览器的主程序位置 location = r"F:\All_python_code\scrapy\chrome-win\chrome.exe 阅读全文

posted @ 2022-04-16 18:49 icui4cu 阅读(38) 评论(0) 推荐(0) 编辑

3.xpath以及selenium使用

摘要： xpath 处理网页：pip install lxml from lxml import etree # 网页的源码 html_doc = resp.content.decode('utf-8') # 使用etree去转换html_doc，转换成了一个html对象，此时element对象可以使用xp 阅读全文

posted @ 2022-04-16 18:48 icui4cu 阅读(30) 评论(0) 推荐(0) 编辑

2.JSON数据处理以及BS使用

摘要： Json 在线数据生成器：https://www.onlinedatagenerator.com/ 加载json数据 import requests import json # from pprint import pprint def main(): url = "http://192.168.2 阅读全文

posted @ 2022-04-16 18:47 icui4cu 阅读(117) 评论(0) 推荐(0) 编辑

1.认识爬虫以及基本爬虫流程

摘要：爬虫实验简单的爬取网页源代码 import requests def main(): url = "http://www.4399dmw.com/search/dh-1-0-0-0-0-{}-0/" url_list = [] headers = { "User-Agent": "User-Age 阅读全文

posted @ 2022-04-16 18:44 icui4cu 阅读(202) 评论(0) 推荐(0) 编辑

公告