随笔档案「2019年11月22日」：Re库入门 ... - 武韵

Re库入门

摘要： 1. 正则表达式语法由字符和操作符构成 . 表示任何单个字符 [] 字符集，对单个字符给出取值范围 [abc]表示a、b、c,[a - z]表示a到z单个字符 [^] 非字符集，对单个字符给出排除范围 [^abc]表示非a或b或c的单个字符 * 前一个字符0次或无限次扩展 abc*表示ab、abc、阅读全文

posted @ 2019-11-22 18:54 武韵阅读(257) 评论(0) 推荐(0)

实例二：淘宝商品比价定向爬虫

摘要： import requestsimport redef getHTMLText(url): try: r = requests.get(url, timeout = 30) r.raise_for_status() r.encoding = r.apparent_encoding return r. 阅读全文

posted @ 2019-11-22 18:34 武韵阅读(494) 评论(0) 推荐(0)

Beautiful Soup库入门

摘要： 1.安装：pip install beautifulsoup4 Beautiful Soup库是解析、遍历、维护“标签树”的功能库 2.引用：(1)from bs4 import BeautifulSoup (2)import bs4 BeautifulSoup对应一个HTML/XML文档的全部内容阅读全文

posted @ 2019-11-22 14:20 武韵阅读(159) 评论(0) 推荐(0)

笨方法学Python摘记(1)

摘要：编程新手所需的最重要的三种技能：读和写、注重细节、发现不同不要复制粘贴！ #-*-codinig:utf-8 -*- (脚本使用unicode UTF-8) 书写习惯：操作符的两边加上空格，提高代码的易读性 Python格式化字符： %r 不管什么都打印出来 %c 格式化字符及其ASCII码 %s 阅读全文

posted @ 2019-11-22 12:53 武韵阅读(139) 评论(0) 推荐(0)

实例一：中国大学排名爬取

摘要： import requestsfrom bs4 import BeautifulSoupimport bs4def getHTMLText(url): try: r = requests.get(url, timeout = 30) r.raise_for_status() r.encoding = 阅读全文

posted @ 2019-11-22 12:50 武韵阅读(236) 评论(0) 推荐(0)

Requests库练习

摘要：实例一：京东商品页面爬取import requestsurl = "http://item.jd.com/2967929.html"try: r = requests.get(url) r.raise_for_status() r.encoding = r.apparent_encoding pri 阅读全文

posted @ 2019-11-22 12:48 武韵阅读(366) 评论(0) 推荐(0)