新学了一个用python编写的简单的百度贴吧帖子的爬虫

# -*- coding: utf-8 -*-
#---------------------------------------
#   作者：chendn
#   语言：Python 2.7.10
#---------------------------------------

import string,urllib2
def tieba(url,beginPage,endPage):
    for i in range(beginPage,endPage+1):
        htmlName=string.zfill(i,3)+'.html' #自动填充成3位的文件名,i=1的时候htmlName=001.html
        print '正在下载第'+str(i)+'个页面'+htmlName
        createHtml=open(htmlName, 'w+') #我理解的是创建一个空白的html页面，名字为htmlName，w+表示读写
        tiebaHtml=urllib2.urlopen(url+str(i)).read() #读取要抓取的页面
        createHtml.write(tiebaHtml) #将抓取的页面写入空白页面
        createHtml.close() #关闭页面，完成
url='http://tieba.baidu.com/p/3977277793?pn='
tieba(url,1,5) #显示该帖的前5页

posted @ 2015-08-29 17:01 皮蛋的主人阅读(320) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

皮蛋的主人

皮老板的首席铲屎官

新学了一个用python编写的简单的百度贴吧帖子的爬虫

公告