topass123 - 博客园

2020年7月19日

摘要：安装软件： pip instal pymysq pip install peewee 创建数据模型orm from peewee import * db = MySQLDatabase("spider", host="127.0.0.1", port=3306, user="root", passw 阅读全文

posted @ 2020-07-19 15:57 topass123 阅读(177) 评论(0) 推荐(0) 编辑

2020年7月17日

爬虫-数据存储（8）

摘要： Python的orm数据存储有三大类型： pymysl，sqlachemy，peewee 安装： pip install pymysql【解决peewee的驱动依赖问题】 pip install peewee peewee的具体实现如下： from peewee import * db = MySQ 阅读全文

posted @ 2020-07-17 21:05 topass123 阅读(135) 评论(0) 推荐(0) 编辑

爬虫-css选择器（7）

摘要：基本语法：代码实现： html = """ <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>bobby基本信息</title> <script src="//code.jquery.com/jquery-1 阅读全文

posted @ 2020-07-17 15:12 topass123 阅读(504) 评论(0) 推荐(0) 编辑

爬虫-xpath的应用（6）

摘要：什么是xpath 1】xpath使用路径表达式在xml和html中进行导航 2】xpath包含标准库 3】xpath是一个w3c的标准在本文将会利用scrapy的select实现。故而将会安装以下的依赖包 pip install twisted pip install lxml pip insta 阅读全文

posted @ 2020-07-17 15:02 topass123 阅读(130) 评论(0) 推荐(0) 编辑

爬虫-beautifulsoup的使用（5）

摘要：资料准备： from bs4 import BeautifulSoup html = """ <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>bobby基本信息</title> <script src="// 阅读全文

posted @ 2020-07-17 14:11 topass123 阅读(86) 评论(0) 推荐(0) 编辑

爬虫-正则（4）

摘要： import re # info = "姓名:bobby1987 生日:1987年10月1日本科:2005年9月1日" # # # print(re.findall("\d{4}", info)) # match_result = re.match(".*生日.*?(\d{4}).*本科.*?(\ 阅读全文

posted @ 2020-07-17 10:21 topass123 阅读(79) 评论(0) 推荐(0) 编辑

2020年7月16日

下载所有新东方在太原的英语讲师照片

摘要： # import requests from bs4 import BeautifulSoup import shutil import string import os def get_img_url(url): img_url = [] while True: resp = requests.g 阅读全文

posted @ 2020-07-16 15:20 topass123 阅读(126) 评论(0) 推荐(0) 编辑

2020年7月14日

生成器的throw和close方法

摘要： throw有两方面的作用，首先是抛给生成器一个异常，然后如果生成器能处理掉异常的话，throw方法接着迭代一次取得返回值，比如上面这个案例就抛给了生成器一个Exception异常，然后生成器处理掉了。注意，捕获异常是在上一次迭代中断的位置捕获，因为每次生成器运行的时候，都是从上一次发生yield中断阅读全文

posted @ 2020-07-14 16:07 topass123 阅读(179) 评论(0) 推荐(0) 编辑

阻塞IO和非阻塞IO的区别与io多路复用

摘要：读：在阻塞条件下，如果没有发现数据在网络缓冲中会一直等待，当发现有数据的时候会把数据读到用户指定的缓冲区。但是如果这个时候读到的数据量比较少，比参数中指定的长度要小，read并不会一直等待下去，而是立刻返回。read的原则是数据在不超过指定的长度的时候有多少读多少，没有数据就会一直等待。所以一般情阅读全文

posted @ 2020-07-14 14:42 topass123 阅读(861) 评论(0) 推荐(0) 编辑

2020年7月12日

python3多线程通信方式，主要理解队列的join()和task_done()方法

摘要： threading.Thread().join()方法和queue.join)()的区别线程的join()是主线程等待子线程的执行完毕再执行队列的join()是主线程等待队列中的任务都消耗完再执行作者：747大雄链接：https://www.jianshu.com/p/6c292fd34282 阅读全文

posted @ 2020-07-12 22:00 topass123 阅读(713) 评论(0) 推荐(0) 编辑

道阻且长，行则将至，行而不辍，未来可期