在下面的代码中, 展示了使用Python脚本登录Github的方法。 如果需要登录别的网站,那么请使用Chrome的Inspect的功能寻找到目标的object,对代码进行替换。
代码先登录了github网站,然后在登录过的session里打开了discover页面,然后统计了一下这个网页里加载了多少个项目。
废话不多说,上代码。
from requests import session
from bs4 import BeautifulSoup as bs
USER = 'username@yourmail.local'
PASSWORD = 'InputYourPassword(^_^)'
URL1 = 'https://github.com/session'
URL2 = 'https://github.com/discover'
with session() as s:
req = s.get(URL1).text html = bs(req, "lxml") token = html.find("input", {"name": "authenticity_token"}).attrs['value'] com_val = html.find("input", {"name": "commit"}).attrs['value']
login_data = {'login': USER, 'password': PASSWORD, 'commit' : com_val, 'authenticity_token' : token}
r1 = s.post(URL1, data = login_data)
r2 = s.get(URL2) data2 = r2.content page_html = data2 page_soup = bs(page_html, "html.parser") containers = page_soup.findAll("div", {"class":"mb-1"}) print("On this page, there are how many projects listed? \n") print(len(containers))
|
上面代码在Python 3.6.5上调试通过并成功运行。
参考资料
================
Intro to Web Scraping with Python and Beautiful Soup
https://www.youtube.com/watch?v=XQgXKtPSzUI&t=1507s