腾讯云、爬虫、微信脚本
记录一些有趣的代码
1,腾讯云cos存储用python实现,之后写表,(安装腾讯云的依赖 pip install -U cos-python-sdk-v5 )
import pymysql from qcloud_cos import CosConfig from qcloud_cos import CosS3Client import sys import logging import time import os logging.basicConfig(level=logging.INFO, stream=sys.stdout) secret_id = '????' secret_key = '????' region = '????' token = None scheme = 'https' list1 = os.listdir('读取这个路径的文件,把文件名放到一个列表里') conn = pymysql.Connect(host='?', port=?, user='?', passwd='?', db='?', #数据库连接配置 charset='utf8') cursor = conn.cursor() config = CosConfig(Region=region, SecretId=secret_id, SecretKey=secret_key, Token=token, Scheme=scheme) client = CosS3Client(config) for i in list1: file_name = str(time.time()) + i
# 视频上传到腾讯云 with open('文件路径\\' + i, 'rb') as fp: client.put_object( Bucket='存储桶名', Body=fp, Key='桶文件夹名' + file_name, StorageClass='test/', EnableMD5=False ) logging.info("~~~~~~~~~%s!!!!!!!上传完成~~~~~~~~~" % file_name) url = '存储桶地址路径' + file_name cover_pic = "anything" sort = anything video_name = anything sql = f''' insert into 表名(字段1,字段2,字段3,。。。。) values(值1,值2。。。'{url}','{cover_pic}','{sort}'。。。) ''' cursor.execute(sql) conn.commit() logging.info("~~~~~~~~~%s!!!!!!!!!写表成功~~~~~~~~~" % file_name) logging.info("~~~~~~~~成功~~~~~~~~~")
当初用Java写腾讯云的上传下载,用了很久,现在用python算是轻车熟路了,一会就搞定了,语言差异对编程的影响好像是挺小的。
2,生成格式化的yaml文件 安装依赖 pip install ruamel.yaml
功能很简单,但是急用的时候,不用对着yaml包口吐芬芳了
from ruamel.yaml import YAML yaml = YAML() src_data = {'user': {'name': '可优', 'age': 17, 'money': None, 'gender': True }, 'lovers': ['柠檬小姐姐', '橘子小姐姐', '小可可'] } with open('aa.yaml', 'w', encoding='utf-8') as f: yaml.dump(src_data, f) 展示效果 user: name: 可优 age: 17 money: gender: true lovers: - 柠檬小姐姐 - 橘子小姐姐 - 小可可
3,一个简单的爬虫
爬虫东西太多了,我只是学了点皮毛,方便抓取一些需要的东西。
一个正则匹配的小用处,提取数据的方法,如果放到网页源码里,那就意义非凡了,学会一个.*?就能干很多事。
import re s = ''' <div class="jjs"><span id="1">大聪明</span></div> <div class="jjss"><span id="2">大聪</span></div> <div class="jjsss"><span id="3">大dada</span></div> <div class="jjssss"><span id="4">大聪明a </span></div> ''' obj = re.compile(r'<div class="(?P<class>.*?)"><span id="\d+">(?P<name>.*?)</span></div>',re.S) ret = obj.finditer(s) for i in ret: # print(i.group("class")) print(i.group("name"))
放一个百度翻译的例子
import requests '''百度翻译''' url = 'https://fanyi.baidu.com/sug' str = input('请输入你要翻译的英文') data = { "kw": str } res = requests.post(url, data=data) print(res.json()) url = 'https://fanyi.baidu.com/v2transapi?from=zh&to=en' str2 = input('请输入你要翻译的中文') data2 = { "kw": str2 } res = requests.post(url, data=data2) print(res.json())
放一个梨视频的例子,视频里有反爬链,研究爬虫的过程就是和网页源码斗智斗勇的过程啊
headers里 有的要加 "User-Agent",有的要加"Referer",这个就要自己试了
import requests url = "https://www.pearvideo.com/video_1756213" cont_id = url.split('_')[1] header = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36", "Referer": url, } srcUrl = "https://www.pearvideo.com/videoStatus.jsp?contId=%s&mrd=0.389415005000727"%cont_id resp = requests.get(srcUrl, headers=header) dic = resp.json() urll = dic["videoInfo"]["videos"]["srcUrl"] systemTime = dic["systemTime"] urll = urll.replace(systemTime, "cont-%s"%cont_id) with open("video.mp4",'wb') as f: f.write(requests.get(urll).content)
一个xpath的例子,xpath用网页的层级结构,就可以很好的避免标签里面的动态属性了,而且能用浏览器右键复制xpath,很方便了属于是。
import requests from lxml import etree import csv import time url = "https://www.softwareadvice.com/categories/" resp = requests.get(url,timeout=20) resp.encoding = 'utf-8' html = etree.HTML(resp.text) divs = html.xpath( "/html/body/app-root/main/app-categories-container/div/section[2]/app-category-list/div") for div in divs: first_name = div.xpath("./h2/a/text()") second_name = div.xpath("./ul/li/a/text()") second_url = div.xpath("./ul/li/a/@href") for i in range(len(second_url)): try: response = requests.get(second_url[i], timeout=20) page_content = response.text child_html = etree.HTML(page_content) child_divs = child_html.xpath('//*[@id="product-catalog"]/div/section[2]/div/div') for di in child_divs[1:]: product_logo = di.xpath("./a/div/img/@src") product_name = di.xpath("./div/a/h3/text()") product_score = di.xpath("./div/div[1]/p/strong/text()") args = { "first_name": first_name, "second_name": second_name[i], "name": product_name[0], "logo": product_logo[0], "score": product_score[0] } with open('product_data.csv', 'a') as f: csv.writer(f).writerow(args.values()) except Exception as e: print(e)
requests.get()的时候,如果请求不到网址,可能永久阻塞,所以要加一个超时属性 timeout 超过时间就抛出异常,用try捕捉,下一位。
4,微信表白不会,我帮你
写一个脚本,帮你发消息。微信检测到你一直发消息,会屏蔽的,估计只能发100多条吧,够用了
import time from pynput.keyboard import Controller as key_cl from pynput.mouse import Button, Controller def keyboard_input(string): keyboard = key_cl() keyboard.type(string) def mouse_click(): mouse = Controller() mouse.press(Button.left) mouse.release(Button.left) def main(number, string): time.sleep(5) for i in range(number): keyboard_input(string) mouse_click() time.sleep(0.2) if __name__ == '__main__': main(4, "你想说的话") 配置你想说的话,和次数
我这只是简单的写一下,如果你有心,可以做个情(脏)话汇总文件,然后用随机数····咳咳,点到为止。请符合社会主义核心价值观!
5,别人朋友圈九图好帅气啊,怎么做的?
程序员怎么会让你羡慕别人呢。
from PIL import Image # 读取图片 im = Image.open("122.jpg") # 宽高除以三 width = im.size[0]//3 height = im.size[1]//3 # 裁剪图片的左上角 start_x = 0 start_y = 0 im_name = 1 for i in range(3): for j in range(3): crop = im.crop((start_x, start_y, start_x+width, start_y+height)) crop.save("images/" + str(im_name) + '.jpg') start_x += width im_name += 1 start_x = 0 start_y += height
6,不会PS,图片尺寸怎么调,小事儿
from PIL import Image img = Image.open("00.jpg") out = img.resize((358, 441)) out.save('000.jpg')
就这样,继续努力,继续学习。