Python爬取51job招聘网Python工作薪资

python岗位的薪资情况如何?用python爬虫来收集数据

一,需求内容

51job招聘网站的python岗位的薪资情况

 

 二,准备工具

1,urllib库

2,beautifulsoup库

#导入库
from
bs4 import BeautifulSoup from urllib.request import urlopen

 

三,分析源码

<div class="el">
        <p class="t1 ">
            <em class="check" name="delivery_em" onclick="checkboxClick(this)">'l</em>
            <input class="checkbox" type="checkbox" name="delivery_jobid" value="118428015" jt="0" style="display:none" />
            <span>
                <a target="_blank" title="Python开发工程师实习生" href="https://jobs.51job.com/hefei-ssq/118428015.html?s=01&t=0"  onmousedown="">
                    Python开发工程师实习生                </a>
            </span>
                                                                    </p>
        <span class="t2"><a target="_blank" title="合肥合和信息科技有限公司" href="https://jobs.51job.com/all/co5410918.html">合肥合和信息科技有限公司</a></span>
        <span class="t3">合肥-蜀山区</span>
        <span class="t4">4.5-6千/月</span>
        <span class="t5">11-10</span>
    </div>

从源码中可以看到所需的内在<p>标签和<span>标签中

 

soup = BeautifulSoup(html,"html.parser")
#通过标签选择
titles=soup.select("p[class='t1'] a")
salaries=soup.select("span[class='t4']") # CSS 选择器

 四,输出结果

 

for i in range(len(titles)):
    print("{:30}{}".format(titles[i].get('title'),salaries[i+1].get_text()))

 

 

 五,模拟请求头

header ={    "Connection": "keep-alive",    "Upgrade-Insecure-Requests": "1",    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36",    "Accept":" text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",    "Accept-Encoding": "gzip,deflate",    "Accept-Language": "zh-CN,zh;q=0.8"};

六,完整代码

from bs4 import BeautifulSoup
from urllib.request import urlopen
header ={    "Connection": "keep-alive",    "Upgrade-Insecure-Requests": "1",    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36",    "Accept":" text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",    "Accept-Encoding": "gzip,deflate",    "Accept-Language": "zh-CN,zh;q=0.8"};

html = urlopen("https://search.51job.com/list/000000,000000,0000,00,9,99,Python,2,1.html?lang=c&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare=").read().decode('GBK')
soup = BeautifulSoup(html,"html.parser")
titles=soup.select("p[class='t1'] a")
salaries=soup.select("span[class='t4']") # CSS 选择器


for i in range(len(titles)):
    print("{:30}{}".format(titles[i].get('title'),salaries[i+1].get_text()))

 

posted @ 2019-11-10 15:18  陈畅  阅读(795)  评论(1编辑  收藏  举报