熊咪 - 博客园

爬虫学习----正则表达式

摘要： 1.Python通过re模块提供对正则表达式的支持。使用re的一般步骤是：Step1：先将正则表达式的字符串形式编译为Pattern实例。Step2：然后使用Pattern实例处理文本并获得匹配结果（一个Match实例）。Step3：最后使用Match实例获得信息，进行其他的操作。# -*- cod... 阅读全文

posted @ 2015-05-15 09:08 熊咪阅读(134) 评论(0) 推荐(0) 编辑

爬虫学习---贴吧

摘要： # -*- coding: utf-8 -*-#---------------------------------------# 程序：百度贴吧爬虫# 版本：0.1# 作者：why# 日期：2013-05-14# 语言：Python 2.7# 操作：输入带分页的地址，去掉最后... 阅读全文

posted @ 2015-05-14 20:24 熊咪阅读(180) 评论(0) 推荐(0) 编辑

爬虫学习----获取cookie

摘要： http://blog.csdn.net/samxx8/article/details/215359011.获取cookieimport urllib import http.cookiejarcookie = http.cookiejar.CookieJar()opener = urllib.re... 阅读全文

posted @ 2015-05-14 19:09 熊咪阅读(205) 评论(0) 推荐(0) 编辑

爬虫学习一些有用的函数吧

摘要： 1.geturl---- 获取真实的urlfrom urllib.request import Request, urlopenfrom urllib.error import URLError, HTTPError old_url = 'http://rrurl.cn/b1UZuP' req... 阅读全文

posted @ 2015-05-14 18:31 熊咪阅读(162) 评论(0) 推荐(0) 编辑

爬虫学习错误获取

摘要： http://blog.csdn.net/column/details/why-bug.html1.模拟浏览器获取数据import urllib.requestreq = urllib.request.Request('http://www.baidu.com') response = urllib... 阅读全文

posted @ 2015-05-14 16:46 熊咪阅读(250) 评论(0) 推荐(0) 编辑

python的循环语句等

摘要： names = ['Michael', 'Bob', 'Tracy']for name in names: print namesum = 0for x in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]: sum = sum + xprint sumsum = 0n ... 阅读全文

posted @ 2015-05-14 11:04 熊咪阅读(108) 评论(0) 推荐(0) 编辑

python的变量

摘要： 1.python是动态语言，不需要事先声明。2.python中字符串赋值，实际上是内存中新建了一个常量。阅读全文

posted @ 2015-05-14 10:37 熊咪阅读(129) 评论(3) 推荐(0) 编辑

公告