python爬虫入门--抓取wiki词条

Posted on 2017-12-11 22:11 sunshine_blog 阅读(1652) 评论(0) 编辑收藏举报

from bs4 import BeautifulSoup
import re 
from urllib import request
req = request.urlopen("https://en.m.wikipedia.org/wiki/Main_Page").read().decode("utf-8");
soup = BeautifulSoup(req,"html.parser");
for tag in soup.find_all("a",href=re.compile('^/wiki/')):
    if not re.search("\.(jpg|JPG)$",tag["href"]):
        print(tag.get_text(),"<--->","http://en.m.wikipedia.org"+tag["href"]);

刷新页面返回顶部

顽强的蜗牛

公告

python爬虫入门--抓取wiki词条