sgj191024 - 博客园

2023年10月

摘要： movie.py import scrapy from movieProject.items import MovieprojectItem class MovieSpider(scrapy.Spider): name = 'movie' allowed_domains = ['www.ygdy8. 阅读全文

posted @ 2023-10-05 09:48 sgj191024 阅读(49) 评论(0) 推荐(0) 编辑

scrapy当当网练习

摘要： def parse(self, response): print('当当网') li = response.xpath('//ul[@id="component_59"]/li') #src,name,price有个共同的父元素li,但是对于第一个li,没有data-original,所以遍历根据l 阅读全文

posted @ 2023-10-04 16:13 sgj191024 阅读(3) 评论(0) 推荐(0) 编辑

scrapy框架入门

摘要： 1.创建scrapy项目：终端输入 scrapy startproject 项目名称在spiders文件夹下创建py文件 scrapy genspider baidu http://www.baidu.com settings.py ROBOTSTXT_OBEY = False 4.运行爬虫文件阅读全文

posted @ 2023-10-04 00:53 sgj191024 阅读(8) 评论(0) 推荐(0) 编辑

古诗词网登录之二维码的处理

摘要： import requests from lxml import etree import urllib.request url = 'https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.asp 阅读全文

posted @ 2023-10-03 12:15 sgj191024 阅读(7) 评论(0) 推荐(0) 编辑

requests基本使用

摘要： import requests url = 'http://www.baidu.com' res = requests.get(url)# 去除响应的乱码问题 res.encoding = 'utf-8' print(res.text) 3.response的属性以及类型类型：models.Re 阅读全文

posted @ 2023-10-03 00:43 sgj191024 阅读(15) 评论(0) 推荐(0) 编辑

seleum基本操作

摘要： from selenium import webdriver path = 'chromedriver.exe' broswer = webdriver.Chrome(path) url = 'http://www.baidu.com' broswer.get(url) 元素定位： 1.find_e 阅读全文

posted @ 2023-10-02 02:17 sgj191024 阅读(38) 评论(0) 推荐(0) 编辑

批量爬取多分页多张图片

摘要： import urllib.request from lxml import etree # https://sc.chinaz.com/tupian/siwameinvtupian.html url = 'https://sc.chinaz.com/tupian/siwameinvtupian_2 阅读全文

posted @ 2023-10-01 17:01 sgj191024 阅读(40) 评论(0) 推荐(0) 编辑

xpath解析

摘要： from lxml import etree # 获取本地文件 tree = etree.parse('bendi.html') print(tree) # /表示子元素，//表示子孙后代元素 li = tree.xpath('//body/ul/li') print(li) print(len(l 阅读全文

posted @ 2023-10-01 00:49 sgj191024 阅读(5) 评论(0) 推荐(0) 编辑

2023年9月

获取个人中心的信息时需要带上cookie

摘要： import urllib.request import urllib.parse headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 阅读全文

posted @ 2023-09-30 22:05 sgj191024 阅读(56) 评论(0) 推荐(0) 编辑

ajax post请求爬肯德基餐厅

摘要： import urllib.request import urllib.parse import json def getKenData(index): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl 阅读全文

posted @ 2023-09-30 20:14 sgj191024 阅读(11) 评论(0) 推荐(0) 编辑

公告