1030-实体关系整理与部分数据补充
实体关系整理
实体关系
待解决问题
部分数据补充
诗人头像链接
考虑到后期可视化展示,有诗人的头像会更加生动一些,当初收集诗人数据时未进行爬取
import requests
from bs4 import BeautifulSoup
from lxml import etree
import re
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}#创建头部信息
pom_list=[]
k=1
for i in range(1,2010):
url='https://www.xungushici.com/authors/p-'+str(i)
r=requests.get(url,headers=headers)
content=r.content.decode('utf-8')
soup = BeautifulSoup(content, 'html.parser')
hed=soup.find('div',class_='col col-sm-12 col-lg-9')
list=hed.find_all('div',class_="card mt-3")
origin_url='https://www.xungushici.com'
for it in list:
print("第" + str(k) + "个")
content = {}
# 1.1获取单页所有诗集
title = it.find('h4', class_='card-title')
poemauthor=title.find_all('a')[1].text
#print(poemauthor)
if it.find('a',class_='ml-2 d-none d-md-block')!=None:
src=it.find('a',class_='ml-2 d-none d-md-block').img['src']
else:
src="http://www.huihua8.com/uploads/allimg/20190802kkk01/1531722472-EPucovIBNQ.jpg"
print(src+poemauthor)
content['author']=poemauthor
content['src']=src
pom_list.append(content)
k=k+1
import xlwt
xl = xlwt.Workbook()
# 调用对象的add_sheet方法
sheet1 = xl.add_sheet('sheet1', cell_overwrite_ok=True)
sheet1.write(0,0,"author")
sheet1.write(0,1,'src')
for i in range(0,len(pom_list)):
sheet1.write(i+1,0,pom_list[i]['author'])
sheet1.write(i+1, 1, pom_list[i]['src'])
xl.save("src.xlsx")
生成数据
对于没有头像的诗人,网上找了一个头像来代替,可以发现还挺有规律~~~~
明天任务
1.完成诗人朋友爬取
2.根据诗人生平获取对应的轨迹地点
3.将上述的属性信息更新到图数据库中