2020 年 2月随笔档案 - myrj

webdriver xpath

摘要：aa=wd.find_elements_by_xpath('//a') for a in aa: print(a.text) #显示所有A标签中文本 aa=wd.find_elements_by_xpath('//a') for a in aa: print(a.get_attribute("hre 阅读全文

posted @ 2020-02-24 19:29 myrj 阅读(137) 评论(0) 推荐(0) 编辑

数据库：随机显示n条记录

摘要：1、sqlite3数据库select * from QG order by random() limit 6 以下显示前10条记录 2、SQL Server数据库select top 10 * from table_name; 3、DB2数据库select * from table_name fet 阅读全文

posted @ 2020-02-24 10:52 myrj 阅读(251) 评论(0) 推荐(0) 编辑

SELECT SQL

摘要：替换换行符： update qgnews set article_url=REPLACE(article_url,char(10),'') 替换回车符： update qgnews set article_url=REPLACE(article_url,char(13),'') 阅读全文

posted @ 2020-02-21 16:01 myrj 阅读(108) 评论(0) 推荐(0) 编辑

python chrome

摘要：from selenium.webdriver.chrome.options import Options from selenium import webdriver wd = webdriver.Chrome()#打开有界面浏览器 wd.maximize_window()#最大化浏览器 wd.e 阅读全文

posted @ 2020-02-19 10:04 myrj 阅读(800) 评论(0) 推荐(0) 编辑

python 导入模块

摘要：方法一：import sysimport osBASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))sys.path.append(BASE_DIR)方法二：import sys,osprint(sys.path# 阅读全文

posted @ 2020-02-18 16:53 myrj 阅读(121) 评论(0) 推荐(0) 编辑

python pycharm 正则表达式批量替换

摘要：{accept:application/json, text/plain, */*,accept-encoding:gzip, deflate, br,accept-language:zh-CN,zh;q=0.9,cookie:WEIBOCN_WM=3349; H5_wentry=H5; backU 阅读全文

posted @ 2020-02-18 11:32 myrj 阅读(2544) 评论(0) 推荐(0) 编辑

python正则表达式应用

摘要：import re ab='''ms: [["", "\u7acb\u5373\u4e0b\u8f7d"], ["", "\u5207\u6362\u81f3\u4e2a\u4eba\u8d26\u53f7\u4e0b\u8f7d"],''' ab=re.sub(r' +','',ab) #将ab中阅读全文

posted @ 2020-02-17 15:35 myrj 阅读(147) 评论(0) 推荐(0) 编辑

python 操作word

摘要：pip install python.docx from docx import DocumentDoc = Document() 解释：from 从 docx这个文件中，导入一个叫Document的一个东西，Document是文档的意思，所以它是对word文档进行操作的一个玩意. 在下面Doc = 阅读全文

posted @ 2020-02-17 07:14 myrj 阅读(186) 评论(0) 推荐(0) 编辑

python requests

摘要：import requests rr=requests.get("https://api.github.com",auth=('user','pass')) print(rr.status_code) print(rr.headers['content-type']) 结果： RESTART: D: 阅读全文

posted @ 2020-02-16 18:19 myrj 阅读(133) 评论(0) 推荐(0) 编辑

python "format"

摘要：urls=[f'https://www.baidu.com/?page={page}' for page in range(1,5)] #F f大小写都可以 print(urls) page=10 url='https://www.baidu.com/?page={}'.format(page) p 阅读全文

posted @ 2020-02-16 16:43 myrj 阅读(193) 评论(0) 推荐(0) 编辑

python 元组推导式

摘要：>>> b=(page for page in range(10))>>> print(b)<generator object <genexpr> at 0x0000000002EE61C8>>>> list(b) #只能生成一次[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]>>> l 阅读全文

posted @ 2020-02-16 16:32 myrj 阅读(446) 评论(0) 推荐(0) 编辑

python 列表指导式

摘要：>>> a=[page for page in range(10)]>>> print (a)[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]>>> a=[page*2 for page in range(10)]>>> print(a)[0, 2, 4, 6, 8, 10, 12, 1 阅读全文

posted @ 2020-02-16 16:24 myrj 阅读(151) 评论(0) 推荐(0) 编辑

python xpath

摘要：//img/@src 得到所有img标签的src值 //a/text() 得到所有A标签中的文本 name=response.xpath('//img@src').getall() #getall（）从对象中获取具体值 import requests,re from lxml import etre 阅读全文

posted @ 2020-02-16 16:00 myrj 阅读(218) 评论(1) 推荐(0) 编辑

python 正则表达式

摘要：r'[\u4e00-\u9fa5]汉字 .匹配除换行符外的任意字符 \d匹配所有数字 ,与[0-9]相同[0123456789] \D匹配非数字字符，[^0-9] \w匹配数字，字母，下划线[0-9a-zA-Z_] \W匹配非数字字线下划线[^0-9A-Za-z] \s匹配任意空白符（空格换行阅读全文

posted @ 2020-02-14 20:18 myrj 阅读(122) 评论(0) 推荐(0) 编辑

python try except

摘要：执行过程中出错，复制错误信息，利用TRY得到相应的错误，并用特别方法解决： print(3/0)提示： Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> print(3/0)ZeroDivisionE 阅读全文

posted @ 2020-02-14 18:15 myrj 阅读(157) 评论(0) 推荐(0) 编辑

PYTHON ITERTOOLS

摘要：import itertools mylist=list(itertools.permutations([1,2,3,4],3)) #排列 print(mylist) print(len(mylist)) mylist=list(itertools.combinations([1,2,3,4],3) 阅读全文

posted @ 2020-02-14 17:41 myrj 阅读(178) 评论(0) 推荐(0) 编辑

python根据正则表达式生成指定规律的网址

摘要：import os def file_name(file_dir): for root, dirs, files in os.walk(file_dir): print(root) #当前目录路径 print(dirs) #当前路径下所有子目录 print(files) #当前路径下所有非目录子文件阅读全文

posted @ 2020-02-13 14:47 myrj 阅读(528) 评论(0) 推荐(0) 编辑

scrapy 403

摘要：https://images.weserv.nl/?url= 阅读全文

posted @ 2020-02-12 21:21 myrj 阅读(98) 评论(0) 推荐(0) 编辑

python encode decode

摘要：Python encode()encode() 方法以 encoding 指定的编码格式编码字符串。errors参数可以指定不同的错误处理方案。写法：str.encode(encoding='UTF-8',errors='strict')参数encoding -- 要使用的编码，如"UTF-8"。e 阅读全文

posted @ 2020-02-11 21:04 myrj 阅读(197) 评论(0) 推荐(0) 编辑

PYTHON startswith (endswith类似）

摘要：Python startswith()方法Python startswith() 方法用于检查字符串是否是以指定子字符串开头，如果是则返回 True，否则返回 False。如果参数 beg 和 end 指定值，则在指定范围内检查语法：str.startswith(str, beg=0,end=le 阅读全文

posted @ 2020-02-11 20:37 myrj 阅读(229) 评论(0) 推荐(0) 编辑

PYTHON 利用ImagePipeline专门爬取图片

摘要：自定义file_path()函数，即可以原有图像文件名为名来保存，并分类保存 def file_path(self, request, response=None, info=None): image_guid = request.url.split('/')[-2]+"/"+request.url 阅读全文

posted @ 2020-02-10 19:51 myrj 阅读(363) 评论(0) 推荐(0) 编辑

WIN7 WIN10修改path不用重启即可生效

摘要：近来安装python scrapy经常忘了添加到path,需要时增加了但需要重启才能起作用，用下面的方法马上能生效： 1修改path:右击“计算机”--高级--环境变量--path 2。打开“任务管理器”，结束进程"explorer.exe",再打开.方法：在任务管理器进程中找到“explorer. 阅读全文

posted @ 2020-02-08 20:01 myrj 阅读(1459) 评论(0) 推荐(0) 编辑

scrapy::Max retries exceeded with url

摘要：运行scrapy时出错这个错误：Max retries exceeded with url解决方法： img1=requests.get(url=aa,headers=header1,timeout=5,verify=False)爬虫能运行了，但还是报错，但不影响使用阅读全文

posted @ 2020-02-08 19:54 myrj 阅读(588) 评论(0) 推荐(0) 编辑

python 遍历文件夹中所有文件

摘要：'''使用walk方法递归遍历目录文件，walk方法会返回一个三元组，分别是root、dirs和files。其中root是当前正在遍历的目录路径；dirs是一个列表，包含当前正在遍历的目录下所有的子目录名称，不包含该目录下的文件； files也是一个列表，包含当前正在遍历的目录下所有的文件，但不包阅读全文

posted @ 2020-02-08 19:47 myrj 阅读(7735) 评论(0) 推荐(0) 编辑

抓包工具charles

摘要：https://tools.zzzmode.com/mytools/charles/ https://www.charlesproxy.com/download/ 阅读全文

posted @ 2020-02-07 06:48 myrj 阅读(101) 评论(0) 推荐(0) 编辑

scrapy 命令

摘要：1.建立爬虫项目(结果形成与名称相同的文件夹） scrapy startproject <爬虫项目名称> （下面的所有操作都进入下一级文件进行操作） 2建立一个爬虫 scrapy genspider [- t 模板名称] <爬虫名称> < 爬虫爬取的域名> 3.运行一个爬虫scrapy craw < 阅读全文

posted @ 2020-02-07 06:30 myrj 阅读(93) 评论(0) 推荐(0) 编辑

不用SCRAPY也可以应用selector

摘要：在PY文件中： from scrapy.selector import Selectorfrom scrapy.http import HtmlResponse url="https://m.mm131.net/" r=requests.get(url) r.encoding='gbk' #根据情况阅读全文

posted @ 2020-02-06 12:36 myrj 阅读(182) 评论(0) 推荐(0) 编辑

scrapy设置自己的headers referer字段

摘要：1。在middlewares中添加自己的新类： class Mylei(object): def process_request(self,request,spider): referer=request.url if referer: request.headers["referer"] = re 阅读全文

posted @ 2020-02-06 12:06 myrj 阅读(5121) 评论(0) 推荐(0) 编辑

建立第一个SCRAPY的具体过程

摘要：1。安装SCRAPY2。进入CMD：执行：SCRAPY显示： Scrapy 1.8.0 - no active project Usage: scrapy <command> [options] [args] Available commands: bench Run quick benchmark 阅读全文

posted @ 2020-02-05 06:45 myrj 阅读(491) 评论(0) 推荐(0) 编辑

python 得到变量名的结果为名的变量的值locals()

摘要：>>> a="1">>> b="a">>> print(a,b)1 a>>> print(a,locals()[b])1 1>>>locals() 函数会以字典类型返回当前位置的全部局部变量。>>> print(locals()) {'__name__': '__main__', '__doc__' 阅读全文

posted @ 2020-02-03 17:00 myrj 阅读(515) 评论(0) 推荐(0) 编辑

python scrapy

摘要：中文帮助进入文件夹：1。scrapy startproject mingzi #建立爬虫项目2.scrapy genspider -t crawl ygdy8 ygdy8.com #建立指定爬虫：ygdy8为爬虫名称，ygdy8.com：爬虫允许的范围，即只在这个范围内爬取 3.scrapy cra 阅读全文

posted @ 2020-02-01 15:41 myrj 阅读(235) 评论(0) 推荐(0) 编辑

myrj

02 2020 档案

公告