python入门 - 随笔分类 - sgj191024

python pandas操作excel

摘要：创建空的excel import pandas as pd # 表示excel的sheet页 df = pd.DataFrame() df.to_excel("D:/pycode/output/output.xlsx") df = pd.DataFrame({"ID":[1,2,3],"Name": 阅读全文

posted @ 2023-10-16 01:00 sgj191024 阅读(129) 评论(0) 推荐(0) 编辑

python字典把函数作为字典的值

摘要：def add(x,y): return x + y sum = add(3,5) #print(sum) dict = {"add":add} sum1 = dict.get("add")(4,6) 通过传参把列表list传进去，在调用的方法中添加元素，原来的列表list也就成功添加了元素 def 阅读全文

posted @ 2023-10-15 21:43 sgj191024 阅读(29) 评论(0) 推荐(0) 编辑

.CrawlSpider读书网练习

摘要：1.创建项目：scrapy startproject dushuproject 2.跳转到spiders路径 cd\dushuproject\dushuproject\spiders 3.创建爬虫类：scrapy genspider read www.dushu.com import scrapy 阅读全文

posted @ 2023-10-05 16:30 sgj191024 阅读(4) 评论(0) 推荐(0) 编辑

scrapy post请求练习

摘要：import scrapy import json class TransferpostSpider(scrapy.Spider): name = 'transferPost' allowed_domains = ['fanyi.baidu.com'] # start_urls = ['http:/ 阅读全文

posted @ 2023-10-05 16:12 sgj191024 阅读(4) 评论(0) 推荐(0) 编辑

读书网入库练习

摘要：settings.py DB_HOST = 'localhost' DB_PORT = 3306 DB_USER = 'root' DB_PWD = '1234' DB_NAME = 'guli' DB_CHARSET = 'utf8' # Configure item pipelines # Se 阅读全文

posted @ 2023-10-05 15:23 sgj191024 阅读(4) 评论(0) 推荐(0) 编辑

scrapy电影天堂练习

摘要：movie.py import scrapy from movieProject.items import MovieprojectItem class MovieSpider(scrapy.Spider): name = 'movie' allowed_domains = ['www.ygdy8. 阅读全文

posted @ 2023-10-05 09:48 sgj191024 阅读(55) 评论(0) 推荐(0) 编辑

scrapy当当网练习

摘要：def parse(self, response): print('当当网') li = response.xpath('//ul[@id="component_59"]/li') #src,name,price有个共同的父元素li,但是对于第一个li,没有data-original,所以遍历根据l 阅读全文

posted @ 2023-10-04 16:13 sgj191024 阅读(5) 评论(0) 推荐(0) 编辑

scrapy框架入门

摘要：1.创建scrapy项目：终端输入 scrapy startproject 项目名称在spiders文件夹下创建py文件 scrapy genspider baidu http://www.baidu.com settings.py ROBOTSTXT_OBEY = False 4.运行爬虫文件阅读全文

posted @ 2023-10-04 00:53 sgj191024 阅读(10) 评论(0) 推荐(0) 编辑

古诗词网登录之二维码的处理

摘要：import requests from lxml import etree import urllib.request url = 'https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.asp 阅读全文

posted @ 2023-10-03 12:15 sgj191024 阅读(10) 评论(0) 推荐(0) 编辑

requests基本使用

摘要：import requests url = 'http://www.baidu.com' res = requests.get(url)# 去除响应的乱码问题 res.encoding = 'utf-8' print(res.text) 3.response的属性以及类型类型：models.Re 阅读全文

posted @ 2023-10-03 00:43 sgj191024 阅读(19) 评论(0) 推荐(0) 编辑

seleum基本操作

摘要：from selenium import webdriver path = 'chromedriver.exe' broswer = webdriver.Chrome(path) url = 'http://www.baidu.com' broswer.get(url) 元素定位： 1.find_e 阅读全文

posted @ 2023-10-02 02:17 sgj191024 阅读(48) 评论(0) 推荐(0) 编辑

批量爬取多分页多张图片

摘要：import urllib.request from lxml import etree # https://sc.chinaz.com/tupian/siwameinvtupian.html url = 'https://sc.chinaz.com/tupian/siwameinvtupian_2 阅读全文

posted @ 2023-10-01 17:01 sgj191024 阅读(46) 评论(0) 推荐(0) 编辑

xpath解析

摘要：from lxml import etree # 获取本地文件 tree = etree.parse('bendi.html') print(tree) # /表示子元素，//表示子孙后代元素 li = tree.xpath('//body/ul/li') print(li) print(len(l 阅读全文

posted @ 2023-10-01 00:49 sgj191024 阅读(6) 评论(0) 推荐(0) 编辑

获取个人中心的信息时需要带上cookie

摘要：import urllib.request import urllib.parse headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 阅读全文

posted @ 2023-09-30 22:05 sgj191024 阅读(57) 评论(0) 推荐(0) 编辑

ajax post请求爬肯德基餐厅

摘要：import urllib.request import urllib.parse import json def getKenData(index): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl 阅读全文

posted @ 2023-09-30 20:14 sgj191024 阅读(12) 评论(0) 推荐(0) 编辑

爬取豆瓣电影，保存到json文件中

摘要：import urllib.request url = 'https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=0&limit=20' headers = { 'User-Agent': 阅读全文

posted @ 2023-09-30 19:49 sgj191024 阅读(106) 评论(0) 推荐(0) 编辑

python爬虫请求头键值对批量加引号

摘要：原始数据： from: en to: zh query: love transtype: realtime simple_means_flag: 3 sign: 198772.518981 token: 1b434ed1e595135ac1b2959f4430a51f domain: common 阅读全文

posted @ 2023-09-30 15:46 sgj191024 阅读(106) 评论(0) 推荐(0) 编辑

post请求

摘要：import urllib.request import urllib.parse headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 阅读全文

posted @ 2023-09-30 15:41 sgj191024 阅读(58) 评论(0) 推荐(0) 编辑

urllib发送请求

摘要：import urllib.request url = "http://www.baidu.com" response = urllib.request.urlopen(url) content = response.read().decode('utf-8') print(content) 如果不阅读全文

posted @ 2023-09-30 13:42 sgj191024 阅读(15) 评论(0) 推荐(0) 编辑

python 文件 json序列号和反序列化

摘要：json序列号和反序列化： file1 = open('test1.txt','r') content = file1.read() print(content) result = json.loads(content) print(result) print(type(result)) for i 阅读全文

posted @ 2023-09-30 09:20 sgj191024 阅读(24) 评论(0) 推荐(0) 编辑

随笔分类 - python入门

公告

搜索

常用链接

随笔分类

随笔档案

阅读排行榜

推荐排行榜