随笔分类 - python
摘要:1.解决每次运行脚本都要安装验证的问题 参考https://blog.csdn.net/hszxd479946/article/details/78900982 2.安装appium的客户端 3.安装appium的python第三方库
阅读全文
摘要:附 pip install pycryptodome
阅读全文
摘要:from lxml import etree import requests from urllib import request import time import os from queue import Queue import threading import re from multip
阅读全文
摘要:附 https://blog.csdn.net/weixin_43430036/article/details/84871624 # -*- coding: utf-8 -*- from urllib import request import scrapy import json from sel
阅读全文
摘要:chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\selenum\AutomationProfile" 此条命令复制到命令行,打开端口为9222的浏览器 ,勿关闭 (此前应先配置环境变量 否则无chrome.exe此命令) chr
阅读全文
摘要:from scrapy import signals import random class Test001UseragentMiddleware(object): USER_AGENT=[ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1
阅读全文
摘要:操作命令 建立项目scrapy startproject [项目名] You can start your first spider with: cd jxnsh scrapy genspider example example.com 构建爬虫文件 先转到项目目录下 正常情况下再执行scrapy
阅读全文
摘要:初版 # -*- coding: utf-8 -*- import scrapy import requests from lxml import etree from selenium import webdriver from scrapy.http.response.html import H
阅读全文
摘要:1 # -*- coding: utf-8 -*- 2 import scrapy 3 import requests 4 from lxml import etree 5 from selenium import webdriver 6 from scrapy.http.response.html
阅读全文
摘要:from lxml import etree import requests from urllib import request import time import os from queue import Queue import threading import re class Procu
阅读全文
摘要:from selenium import webdriver from selenium.webdriver.common.action_chains import ActionChains from selenium.webdriver.common.by import By from selen
阅读全文
摘要:from lxml import etree import requests from urllib import request import time import os number = 0 def get_page(): for x in range(1,20): url = "https:
阅读全文
摘要:import re text = "apple is $20.09,orange is $100.99" #ret = re.findall(".*\$\d+\.*\d*", text) #会找出所有匹配项 以list形式返回 #ret = re.sub("\$","㊙", text,1) #会替换
阅读全文
摘要:from bs4 import BeautifulSoup text = """ <ul id="navList" class="w1"> <li><a id="blog_nav_sitehome" class="menu" href="https://www.cnblogs.com/">博客园</
阅读全文
摘要:from lxml import etree import requests #一般访问网页需要有request请求 请求有请求头 只需要模仿请求头 就能访问到网页内容 baseurl0 = "https://www.ygdy8.net" headers = { "User-Agent": "Moz
阅读全文
摘要:1 from lxml import etree 2 text = "<div><p>nmsl</p><span>nmsl</span></div>" 3 def htmlstree(text): 4 html = etree.HTML(text) 5 result = etree.tostring
阅读全文
摘要:1.urllib库的几个基础方法 from urllib import request,parse request.urlretrieve("http://www.baidu.com","index.html") #可快捷的将网页源码保存到本地 req=request.Request("http:/
阅读全文