<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">整个爬虫十分的简单。</span>
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">但是我再写他的过程中,可能是由于我看基础的时候不太仔细,再raw_input()括号里面没有加入(u'string')...导致乱码。</span>
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">在看了一下午的python之后,终于开始写爬虫了。</span>
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">我这次写的爬虫很简单。</span>
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">下载百度贴吧指定页数的HTML。</span>
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">废话不多说,让我们开始吧。主要的模块只有一个 urllib2</span>
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);"></span><pre name="code" class="python">import string,urllib2
def a(url,bgp,ep):
for i in range(bgp,ep):
sName = string.zfill(i,5)+'.html' #自动补全为五位0000X的html文件名
print('downloading the'+str(i)+'page')
f = open(sName,'w+')
m = urllib2.urlopen(url+str(i)).read()
f.write(m)
f.close
burl = str(raw_input(u'请输入百度贴吧地址,去掉页数\n'))
bgp1 = int(raw_input(u'请输入开始页数'))
ep1 = int(raw_input(u'请输入结束页数'))
a(burl,bgp1,ep1)