junrob

day9

摘要： ''' 主页：图标地址、 https://www.wandoujia.com/category/6001 32 ''' import requests from bs4 import BeautifulSoup from pymongo import MongoClient # 连接MongoDB客户端 client = MongoClient('localh... 阅读全文

posted @ 2019-06-22 10:02 junrob 阅读(137) 评论(0) 推荐(0)

day8

摘要：课堂笔记:1、BeautifulSoup 解析库2、MongoDB 存储库3、requests-html 请求库BeautifulSoup1、什么bs4，为什么要使用bs4？是一个基于re开发的解析库，可以提供一些强大的解析功能。提高提取数据的效率与爬虫开发效率。2、安装与使用 pip3 install beautifulsoup4 # 安装bs4 pip3 inst... 阅读全文

posted @ 2019-06-22 09:55 junrob 阅读(146) 评论(0) 推荐(0)

day7

摘要：今日内容: 1、破解极验滑动验证 2、BeautifulSoup解析库 '''''' ''' 破解极验滑动验证破解极验滑动验证博客园登录url: https://account.cnblogs.com/signin?returnUrl=https%3A%2F%2Fwww.cnblogs.com%2F 代码逻辑: 1、输入用户名与密码，并点击登录 2、弹出滑动验证，获取有... 阅读全文

posted @ 2019-06-22 09:43 junrob 阅读(92) 评论(0) 推荐(0)

day6

摘要： # 今日内容：# 注意：selenium驱动的浏览器是干净的，没有任何缓存# 1、##### from selenium import webdriver## driver = webdriver.Chrome(r'D:\BaiduNetdisk\BaiduNetdiskDownload\chrom 阅读全文

posted @ 2019-06-18 22:03 junrob 阅读(126) 评论(0) 推荐(0)

day5

摘要： post请求登陆github'''import requestsimport re# 一访问login页获取token信息'''请求url: https://github.com/login请求方式: GET响应头: Set-Cookie请求头: Cookie User-Agent'''heade 阅读全文

posted @ 2019-06-17 22:27 junrob 阅读(124) 评论(0) 推荐(0)

day4

摘要：昨日回顾爬虫的全过程：1发送请求（请求库）——requests模块——selenium模块2获取响应数据（服务器返回）——bs4——Xpath3解析并提取数据（解析库）4保存数据（存储库）——MongoDB1，3，4需要手动写爬取梨视频：1分析网站的视频源地址2通过requests网视频源地址今日内阅读全文

posted @ 2019-06-17 22:20 junrob 阅读(141) 评论(0) 推荐(0)

day3

摘要： 1函数剩余部分 2内置模块 3模块与包阅读全文

posted @ 2019-06-13 22:47 junrob 阅读(74) 评论(0) 推荐(0)

day2

摘要：今日内容：1、数据类型剩余的内置方法 2、字符编码 3、文件处理 4、函数基础作业一作业二阅读全文

posted @ 2019-06-12 23:17 junrob 阅读(380) 评论(0) 推荐(0)

day1

摘要：课上例子print('hello world!') x = 10 y = 10 SCHOOL = '安徽工程大学' print(id(x)) print(id(y)) name = 'tank' print(type(x)) print(type(name)) print(x == y) print(x is y) name = input('input your name:') pwd = i... 阅读全文

posted @ 2019-06-11 22:37 junrob 阅读(161) 评论(0) 推荐(0)

导航

公告