Python爬虫从入门到放弃(1)

暑假里闲着没事学习了一波Python,在Python基础里面感觉正则表达式较难,各种字符串的匹配问题。。。巴拉巴拉。。。

于是找了个网站简单的入了个手:

 1 import urllib.request
 2 import re
 3 import xlwt
 4 
 5 url = 'http://www.mp4ba.net/'
 6 Web = urllib.request.urlopen(url).read()
 7 soup = BeautifulSoup(Web, "html.parser")
 8 Doc = soup.find(id="threadlisttableid")
 9 soup = BeautifulSoup(str(Doc), "html.parser")
10 nums = soup.find_all('tbody', id=re.compile(r"normalthread"))
11 Workbook = xlwt.Workbook()
12 sheet = Workbook.add_sheet('sheet1')
13 i = 0
14 for num in nums:
15     addr = num.em
16     name = BeautifulSoup(str(num.find('a', "s xst")), "html.parser")
17     sheet.write(i, 0, addr.get_text())
18     sheet.write(i, 1, name.get_text())
19     i = i + 1
20 Workbook.save("text.xls")

爬到的结果如下:

再接再厉。。。。666666666666666666

posted @ 2017-07-29 16:29  雨中枫玲  阅读(1195)  评论(0编辑  收藏  举报