爬虫 - 随笔分类 - 杨鑫Zz

爬虫 17k小说

摘要：# coding=gbk import requests from lxml import etree url = 'https://www.17k.com/list/3357123.html' response = requests.get(url, headers={ 'User-Agent': 阅读全文

posted @ 2021-10-25 16:07 杨鑫Zz 阅读(70) 评论(0) 推荐(0)

爬虫框架scrapy

摘要：-mongodb -操作 -scrapy -安装 -pip3 install scrapy -先装Twisted -装pywin32 -新建项目 -scrapy startproject 项目名字 -新建爬虫切换到你创建的项目下 -scrapy genspider cnblogs www.cnblo 阅读全文

posted @ 2019-11-28 21:41 杨鑫Zz 阅读(114) 评论(0) 推荐(0)

selennuim,

摘要：from selenium import webdriver import time bro=webdriver.Chrome() bro.get("http://www.baidu.com") bro.implicitly_wait(10) # 1、find_element_by_id 根据id找阅读全文

posted @ 2019-11-27 21:27 杨鑫Zz 阅读(148) 评论(0) 推荐(0)

selennium模块

摘要：#获取属性： # tag.get_attribute('src') #获取文本内容 # tag.text #获取标签ID，位置，名称，大小（了解） # print(tag.id) # print(tag.location) # print(tag.tag_name) # print(tag.size 阅读全文

posted @ 2019-11-27 21:24 杨鑫Zz 阅读(171) 评论(0) 推荐(0)

有关爬虫模块

摘要：from requests_html import HTMLSession #请求解析库import base64 #base64解密加密库from PIL import Image #图片处理库import hmac #加密库from hashlib import sha1 #加密库import 阅读全文

posted @ 2019-11-27 21:20 杨鑫Zz 阅读(103) 评论(0) 推荐(0)

爬虫_requests_html

摘要：https://cncert.github.io/requests-html-doc-cn/#/?id=%e5%ae%89%e8%a3%85 官方文档详情查询阅读全文

posted @ 2019-11-27 21:18 杨鑫Zz 阅读(111) 评论(0) 推荐(0)

爬虫x_path

摘要：doc=''' <html> <head> <base href='http://example.com/' /> <title>Example website</title> </head> <body> <div id='images'> <a href='image1.html' a="xxx 阅读全文

posted @ 2019-11-27 21:14 杨鑫Zz 阅读(222) 评论(0) 推荐(0)

selenium 通过pycharm打开网页

摘要：from selenium import webdriverfrom selenium.webdriver.common.keys import Keys #键盘按键操作import time # from selenium.webdriver.chrome.options import Optio 阅读全文

posted @ 2019-11-26 20:24 杨鑫Zz 阅读(901) 评论(0) 推荐(0)

爬虫bs4

摘要：import requests# res = requests.get('http://httpbin.org/get')# res1 = res.json()#转换成json数据# import json# res1=json.loads(response.text) #太麻烦#什么SSL,就是安阅读全文

posted @ 2019-11-26 20:15 杨鑫Zz 阅读(150) 评论(0) 推荐(0)

你我山巅自相逢

你我山巅自相逢 ,人去归来烟雨中, 笑看今朝薄情人 ,世人皆是自欺穷.

随笔分类 - 爬虫

公告