
环境: python3、windows



pip3 install BeautifulSoup4
pip3 install requests



from bs4 import BeautifulSoup

# 找到所有新闻
# 标题,简介,url,图片
response = requests.get('http://www.autohome.com.cn/news/')
response.encoding = 'gbk'

soup = BeautifulSoup(response.text,'html.parser')
li_list = soup.find(id='auto-channel-lazyload-article').find_all(name='li')

for li in li_list:
    title = li.find('h3')
    if not title:
    summary = li.find('p').text
    # url = li.find('a').attrs['href']
    url = li.find('a').get('href')

    img_url = li.find('img').get('src')
    img = requests.get(img_url)

    file_name = title.text
    with open(file_name+'.jpg','wb') as f:




首先我们进入github登录页面,输入错误的用户名以及密码,通过f12 NetWork一栏查看htttp请求状态



此时,再查找服务端需要的Data信息,再最下方找到了Form Data



import requests
from bs4 import BeautifulSoup

r1 = requests.get('https://github.com/login')
s1 = BeautifulSoup(r1.text,'html.parser')
#同样是通过f12查看源码搜索token,找到了作为CSRF禁止跨站请求的token的标签,通过解析取得它的值 token = s1.find(name='input',attrs={'name':'authenticity_token'}).get('value')
#有的网站会在第一次get请求时给客户端发送一组cookies,当客户端带着此cookies来进行验证才会通过,所以这里先获取未登录的cookies r1_cookie_dict= r1.cookies.get_dict() #将用户名密码token发送到服务端 r2 = requests.post('https://github.com/session', data={ 'utf8':'✓', 'authenticity_token':token, 'login':'Mitsui1993', 'password':'假装有密码', 'commit':'Sign in' }, cookies = r1_cookie_dict ) #获取登陆后拿到的cookies,并整合到一个dict里 r2_cookie_dict = r2.cookies.get_dict() cookie_dict = {} cookie_dict.update(r1_cookie_dict) cookie_dict.update(r2_cookie_dict) #带着cookies验证是否登录成功,查看登录后可见的页面 r3 = requests.get( url='https://github.com/settings/emails', cookies=cookie_dict ) #text里包含我的用户名,由此判定已经登录成功。 print(r3.text)



import requests

r1 = requests.get('http://dig.chouti.com')
r1_cookies = r1.cookies.get_dict()

r2 = requests.post('http://dig.chouti.com/login',
                   cookies = r1_cookies)

r2_cookies = r2.cookies.get_dict()

r_cookies = {}

# r_cookies = {'gpsd':r1_cookies['gpsd']}

r3 = requests.post('http://dig.chouti.com/link/vote?linksId=13921736',
              cookies = r_cookies)



三。requests模块与 模块的其它方法:

 1 def request(method, url, **kwargs):
 2     """Constructs and sends a :class:`Request <Request>`.
 4     :param method: method for the new :class:`Request` object.
 5     :param url: URL for the new :class:`Request` object.
 6     :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
 7     :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
 8     :param json: (optional) json data to send in the body of the :class:`Request`.
 9     :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
10     :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
11     :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
12         ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
13         or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
14         defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
15         to add for the file.
16     :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
17     :param timeout: (optional) How long to wait for the server to send data
18         before giving up, as a float, or a :ref:`(connect timeout, read
19         timeout) <timeouts>` tuple.
20     :type timeout: float or tuple
21     :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.
22     :type allow_redirects: bool
23     :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
24     :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.
25     :param stream: (optional) if ``False``, the response content will be immediately downloaded.
26     :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
27     :return: :class:`Response <Response>` object
28     :rtype: requests.Response
30     Usage::
32       >>> import requests
33       >>> req = requests.request('GET', 'http://httpbin.org/get')
34       <Response [200]>
35     """
36 复制代码


