Bob__Zhang - 博客园

2018年3月13日

摘要： #使用requests模块 #1.登录lagou #2.登录人人，保存个人首页 import requests from urllib import parse #hashlib是MD5加密的一个python内置模块 #导入hashlib模块 import hashlib ''' python提供了一个进行hash加密的模块：hashlib 下面主要记录下其中的md5加密方式 >>> impo... 阅读全文

posted @ 2018-03-13 21:27 Bob__Zhang 阅读(557) 评论(0) 推荐(0) 编辑

selenium,webdriver模仿浏览器访问百度基础1

摘要：这是一种比较好的反反爬技术阅读全文

posted @ 2018-03-13 21:23 Bob__Zhang 阅读(273) 评论(0) 推荐(0) 编辑

用webdriver模仿浏览器爬取豆瓣python书单

摘要：用webdriver模仿浏览器爬取豆瓣python书单其中运用到os 模块作用是生成文件夹存储爬取的信息 etree 用于xpath解析内容详细代码如下可用我的上一篇博客存取到excel当中阅读全文

posted @ 2018-03-13 20:52 Bob__Zhang 阅读(258) 评论(0) 推荐(0) 编辑

2018年3月11日

爬取拉勾网所有python职位并保存到excel表格对象方式

摘要： # 1.把之间案例，使用bs4,正则，xpath，进行数据提取。 # 2.爬取拉钩网上的所有python职位。 from urllib import request,parse import json,random #导入xlsxwriter 主要用于生成excel表格对象 import xlsxwriter #创建python的职位类 class python_position: ... 阅读全文

posted @ 2018-03-11 22:00 Bob__Zhang 阅读(418) 评论(0) 推荐(0) 编辑

爬取拉钩网上所有的python职位

摘要： # 2.爬取拉钩网上的所有python职位。 from urllib import request,parse import json,random def user_agent(page): #浏览器列表,每次访问可以用不同的浏览器访问 user_agent_list = [ 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWeb... 阅读全文

posted @ 2018-03-11 19:10 Bob__Zhang 阅读(558) 评论(0) 推荐(0) 编辑

2018年3月10日

正则finditer的使用

摘要： import re #\. 是刚需必须有 d+ 必须一个或多个数字 pattern = re.compile(r'\d+\.\d*') d = pattern.finditer('3.141592653 PI 100 10001.11 3. .8 0.9') print(d) for item in d: print(item) 阅读全文

posted @ 2018-03-10 21:49 Bob__Zhang 阅读(265) 评论(0) 推荐(0) 编辑

正则findall的使用

摘要： import re title = 'hello, 你好,world' print(title) title = u'hello, 你好,world' print(title) #汉字匹配 +的意思是找到一个汉字继续找直到找完然后统一打印出来 #如果没有+号则一个汉字一个汉字打印出来 pattern = re.compile(u'[\u4e00-\u9fa5]+') s = pat... 阅读全文

posted @ 2018-03-10 21:44 Bob__Zhang 阅读(114) 评论(0) 推荐(0) 编辑

正则split的使用

摘要： import re #\s 空格 +号表示至少出现一次 # path = 'C:\\Users\\cz\Desktop\py06\\PY6_Day01\\爬虫作业\\2018_03_07\\05_split.py' #前面加r表示元字符 path = r'C:\Users\cz\Desktop\py06\PY6_Day01\爬虫作业\2018_03_07\05_split.py' #用\做... 阅读全文

posted @ 2018-03-10 21:42 Bob__Zhang 阅读(200) 评论(0) 推荐(0) 编辑

正则sub的使用

摘要： import re # unicode 编码匹配范围[u4e00-u9fa5] pattern = re.compile('(\w+) (\w+)') s = 'hello 123,hello 456' s_list = pattern.findall(s) print(s_list) s_list = pattern.sub('hello world',s) print(s_list)... 阅读全文

posted @ 2018-03-10 21:42 Bob__Zhang 阅读(151) 评论(0) 推荐(0) 编辑

正则应用2

摘要： import re #\. 是刚需必须有 d+ 必须一个或多个数字 pattern = re.compile(r'\d+\.\d*') d = pattern.finditer('3.141592653 PI 100 10001.11 3. .8 0.9') print(d) for item in d: print(item) 阅读全文

posted @ 2018-03-10 21:41 Bob__Zhang 阅读(102) 评论(0) 推荐(0) 编辑

白桦林

公告