Python 每日一练（4）

引言

今天继续是python每日一练的几个专题，主要涵盖简单的敏感词识别以及图片爬虫

敏感词识别

这个敏感词的识别写的感觉比较简单，总的概括之后感觉功能可以简略成if filter_words in xxx,即一个简单的匹配
不过这次练习又学到一句比较好的语句，之前我构造类似敏感此种这种列表时，通常的操作都是先将文本复制进notepad++,然后手动修饰成符合的列表样式
但现在我们可以这样做，就能简单的完成这个操作了。

with open('C:/Users/xxx/Desktop/filter_words.txt','r',encoding='utf-8') as f:
    filter_words = [line.rstrip() for line in f] #处理那些一行就只有一个数据的文件时，就可以这样将每一行右侧空白符删除后写入列表

这次练习代码示例：

# -*- coding:utf-8 -*-
# Author:Konmu
# 第 0011 题： 敏感词文本文件 filtered_words.txt，里面的内容为以下内容，
# 当用户输入敏感词语时，则打印出 Freedom，否则打印出 Human Rights。
# 第 0012 题： 敏感词文本文件 filtered_words.txt，里面的内容 和 0011题一样，
#当用户输入敏感词语，则用 星号 * 替换，
# 例如当用户输入「北京是个好城市」，则变成「**是个好城市」。

with open('C:/Users/xxx/Desktop/filter_words.txt','r',encoding='utf-8') as f:
    filter_words = [line.rstrip() for line in f]

def client_Input():
    input_word = input("please input what you want to say:")
    for i in filter_words:
        if i in input_word:
            print("Freedom")
            new_word = input_word.replace(i,'*'*len(i))
            return(new_word)
        return('Human Rights')

if __name__ == "__main__":
    print(client_Input())

最终效果

图片爬虫

这个练习可以说是宅男福利了，手动滑稽(/ω＼)
先看一下最终效果
不得不说，好久不写图片类型的爬虫了，这次练习才发现有好多地方又忘了，看来还是要多加练习
思路分析：大体的流程就是爬虫常规写法了，先将整个网站的源码下载下来，然后在匹配图片链接，最后下载即可
代码示例如下：

# -*- coding:utf-8 -*-
# Author:Konmu
# 用 Python 写一个爬图片的程序，爬 这个链接里的日本妹子图片 :-)

import requests
import re

url='https://tieba.baidu.com/p/2166231880?red_tag=0872956249'
session=requests.session()
#context=ssl._create_unverified_context()
html=session.get(url).content.decode('utf-8')

pattern=r'<img pic_type="0" class="BDE_Image" src=(.*?) .*?>'

img_url=re.findall(pattern,html)
#print(img_url)
x=0

for i in img_url:
    i=i.replace('"','')
    photo = requests.get(i)
    with open('D:/py_tu/output{}.jpg'.format(x),'ab') as f:
        f.write(photo.content)
        x+=1
        print("图片开始下载，注意查看文件夹")

注：一开始我是想用urllib.request的urlretrieve()来下载图片的，但是发现urllib无法处理https,而且编译安装python之前没有编译安装类似于openssl这样的SSL库，所以导致python不支持SSL,网上大多是针对Linux的解决方法，对于windows我尝试按照使用python的ssl库但是也没能解决，遂选择了直接保存文件，即上述代码中示例

posted @ 2020-03-21 14:58 Konmu 阅读(251) 评论(0) 编辑收藏举报

刷新页面返回顶部

Konmu

Python 每日一练（4）

引言

敏感词识别

图片爬虫

公告