📂Python
🔖Python
2017-04-11 12:49阅读: 571评论: 0推荐: 0

百度搜索引擎取真实地址-python代码

代码

def parseBaidu(keyword, pagenum):
    keywordsBaseURL = 'https://www.baidu.com/s?wd=' + str(quote(keyword)) + '&oq=' + str(quote(keyword)) + '&ie=utf-8' + '&pn='
    pnum = 0
    while pnum <= int(pagenum):
        baseURL = keywordsBaseURL + str(pnum*10)
        try:
            request = requests.get(baseURL, headers=headers)
            soup = BeautifulSoup(request.text, "html.parser")
            for a in soup.select('div.c-container > h3 > a'):
                url = requests.get(a['href'], headers=headers).url
                yield url
        except:
            yield None
        finally:
            pnum += 1

示例用法

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0"
}

def parseBaidu(keyword, pagenum)

def main():
    for url in parseBaidu("keyword",10):
        if url:
            print(url)
        else:
            continue

本文作者:Akkuman

本文链接:https://www.cnblogs.com/Akkuman/p/6963141.html

版权声明:本作品采用知识共享署名-非商业性使用-禁止演绎 2.5 中国大陆许可协议进行许可。

posted @   Akkuman  阅读(571)  评论(0编辑  收藏  举报
点击右上角即可分享
微信分享提示
💬
评论
📌
收藏
💗
关注
👍
推荐
🚀
回顶
收起
  1. 1 404 not found REOL
404 not found - REOL
00:00 / 00:00
An audio error has occurred.