Python网络爬虫与信息提取

Requests ：自动爬取HTML页面，自动网络请求提交
robots.txt ：网络爬虫排除标准 https://www.baidu.com/robots.txt
Beautiful Soup ：解析HTML页面
正则表达式Re
爬虫框架Scrapy