分布式爬虫(5)：微博数据爬取

一、使用Selenium+Phantoms来抓取数据

　　　　1.登录：最重要的是设置User-Agent，否则无法转跳链接　

from selenium.webdriver.common.desired_capability import DesiredCapabilities
user_agent=(
　　"Mozilla/5.0()"

)

　　　　2.输入用户名和密码：

<input id="loginname"
type="text"
class="W input" maxlength="128"
autocomplete="off"
action-data="text=........"
name="username"
node-type="username" 
tabindex="1">

　　　　(1)为了与微博内容交互，需要用到javascript

　　　　　　相关的javascript代码：

　　　　　　document.getElementById('loginname').value='abc'

　　　　　　document.getElementsByName('password')[0].value='abc'

　　　　　　通过Selenium提供的send_keys来进行传递value

　　　　　　driver.find_element_by_id('loginname').send_keys(username)

　　　　　　driver.find_element_by_name('password').send_keys(password)

二、微博接口分析

三、直接调用微博API来抓取

四、表单及登录

posted @ 2018-10-27 15:51 stone1234567890 阅读(809) 评论(0) 收藏举报

刷新页面返回顶部

大数据开发程序猿

做有态度的码农，欢迎各位朋友光临，本博客长期更新，需要学习讨论找工作面试的同学可以加qq群：694117549，交个朋友相互交流。

分布式爬虫(5)：微博数据爬取

一、使用Selenium+Phantoms来抓取数据

1.登录：最重要的是设置User-Agent，否则无法转跳链接

2.输入用户名和密码：

二、微博接口分析

三、直接调用微博API来抓取

四、表单及登录

四、表单及登录

公告