form submit

RHTMLForms在R高版本中不可用,解决如下

install_github("omegahat/RHTMLForms")

 提交表单,http://一定不能省略

u = "http://www.bing.com"
form = getHTMLFormDescription(u)[[1]];form

得到

HTML Form: http://cn.bing.com/search 
q: 

 制作一个form提交的function

bing_search = createFunction(form)

 这样bing_search()里面就能提交各式各样的搜索关键字,最后用

getHTMLLinks(bing_search("rstudio"))

这边得到

[36] "http://www.liangchan.net/liangchan/1123.html"
[37] "https://rstudio.org/"
[38] "http://www.microsofttranslator.com/bv.aspx?ref=SERP&br=ro&mkt=zh-CN&dl=zh&lp=EN_ZH-CHS&a=https%3a%2f%2frstudio.org%2f"


中间[13]-[81]是有效链接

如果只是想提取我们需要的链接呢?用xpath,结果更精确,但是也损失了不少信息(怎么处理?)

xpq = "//a/@href[starts-with(.,'/search?q=rstudio')]"
getHTMLLinks(txt,xpQuery = xpq)

 

[1] "/search?q=rstudio&qs=ds&intlF=1&FORM=TIPEN1"
[2] "/search?q=rstudio&qs=ds&intlF=&upl=zh-chs&FORM=TIPCN1"
[3] "/search?q=rstudio+%e4%b8%ad%e6%96%87%e4%b9%b1%e7%a0%81&FORM=QSRE1"

posted @ 2015-03-03 20:49  Dearc  阅读(825)  评论(0编辑  收藏  举报