爬虫之requests urllib urllib2 BeautifulSoup

一、python3 requests 登陆51job后下载简历照片

1、打开谷歌浏览器,按F12,手动登陆一下,获取登陆地址和表单数据及要下载的图片地址

 

 

 

 

2、实现代码

#!/usr/bin/env python
#_*_ coding:utf-8 _*_ 
#encoding=utf-8
#function:
#created by shangshanyang
#date: 2019
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
#from bs4 import BeautifulSoup
 
LOGIN_URL = 'https://login.51job.com'  #请求的登陆URL地址
DATA = {"lang": "c",
         "action": "save",
        "from_domain": "i",
         "loginname": "shangshanyang",
        "password":"123456",
        "verifycode": ""}   #Form Data 表单数据,登录系统的账号密码等
 
HEADER = #"Host":"login.51job.com",
            #"Referer": "https://login.51job.com/login.php?lang=c",
            "User-Agent" : "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" ,
           }
def Get_Session(URL,DATA,HEADERS):
    '''保存登录参数'''
    ROOM_SESSION  = requests.session()
    ROOM_SESSION.post(URL,data =DATA,headers=HEADERS,verify=False)
    return ROOM_SESSION
 
SESSION =Get_Session(LOGIN_URL,DATA,HEADER)
urlimage="http://i.51job.com/resume/ajax/image.php?type=avatar&userid=306511370"#图片地址
 
RES2 = SESSION.get(urlimage)
print(RES2.status_code)
if RES2.status_code == 200:
    if RES2.text:
        print(RES2.text)
        with open('image2.jpg', 'wb') as f:#保存图片
            for chunk in RES2:
                f.write(chunk)
    else:
        print('图片不存在')
else:
    print('地址错误')
SESSION.close()

二、urllib 实现下载网页及显示下载进度

三、urllib2 模拟用户浏览行为,下载禁止爬虫的网页

四、urllib 爬取百度贴吧照片

五、BeautifulSoup爬取百度贴吧照片 

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

  

  

 

posted @   shangshanyang  阅读(359)  评论(0编辑  收藏  举报
努力加载评论中...
点击右上角即可分享
微信分享提示