彼方尚有荣光剑开天门

python爬虫爬取CVPR2021论文标题和简介

吕洞玄·2022-05-27 09:43·156 次阅读

python爬虫爬取CVPR2021论文标题和简介

复制代码

可以直接复制粘贴然后改一下数据库名字和密码 我使用的MySQL

#
-*- codeing = utf-8 -*- import requests import urllib.request import pymysql from bs4 import BeautifulSoup try: db = pymysql.connect(host="localhost", user="root", password="密码", database="jdbc1",charset="utf8") print("数据库连接成功") except pymysql.Error as e: print("数据库连接失败:"+str(e)) cursor = db.cursor() headers={ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.41 Safari/537.36 Edg/101.0.1210.32' } url = "https://openaccess.thecvf.com/CVPR2021?day=all" html=requests.get(url) soup = BeautifulSoup(html.content,'html.parser') soup.a.contents == 'pdf' pdfs = soup.findAll(name="a", text="pdf") lis = [] jianjie = "" for i, pdf in enumerate(pdfs): pdf_name = pdf["href"].split('/')[-1] print(pdf_name) name = pdf_name.split('.')[0].replace("_CVPR_2021_paper", "") link = "http://openaccess.thecvf.com/content/CVPR_2021/html/" + name + "_CVPR_2021_paper.html" url1 = link html1 = requests.get(url1) soup1 = BeautifulSoup(html1.content, 'html.parser') weizhi = soup1.find('div', attrs={'id': 'abstract'}) if weizhi: jianjie = weizhi.get_text(); print("这是第" + str(i) + "条数据") keyword = str(name).split('_') keywords = '' for k in range(len(keyword)): if (k == 0): keywords += keyword[k] else: keywords += ',' + keyword[k] info = {} info['title'] = name info['link'] = link info['abstract'] = jianjie info['keywords'] = keywords lis.append(info) #for i in range(len(lis)): cursor = db.cursor() cols = ",".join('`{}`'.format(k) for k in lis[i].keys()) print(cols) # '`name`, `age`' val_cols = ','.join('%({})s'.format(k) for k in lis[i].keys()) print(val_cols) # '%(name)s, %(age)s' sql = "insert into lunwen(%s) values(%s)" res_sql = sql % (cols, val_cols) print(res_sql) cursor.execute(res_sql, lis[i]) # 将字典a传入 db.commit() num = 1 print(num) print("成功")
复制代码

 

posted @   吕洞玄  阅读(156)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 25岁的心里话
· 闲置电脑爆改个人服务器(超详细) #公网映射 #Vmware虚拟网络编辑器
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· 零经验选手,Compose 一天开发一款小游戏!
· 一起来玩mcp_server_sqlite,让AI帮你做增删改查!!
点击右上角即可分享
微信分享提示