VBA学习资料分享-6

从网上抓取数据到EXCEL中是VBA的一个常用之处,今天分享下VBA网抓的一些套路,主要有以下几种:

第一种:msxml2.xmlhttp/Microsoft.XMLHTTP/WinHttp.WinHttpRequest.5.1

With CreateObject("msxml2.xmlhttp")
'With CreateObject("Microsoft.XMLHTTP")
'With CreateObject("WinHttp.WinHttpRequest.5.1")
.Open "GET", "http://xinpi.cs.com.cn/new/", False
.Send
Do Until .ReadyState = 4
DoEvents
Loop
s = .ResponseText
End With

 

第二种:internetexplorer.application

With CreateObject("internetexplorer.application")
.Visible = False
.Navigate "http://xinpi.cs.com.cn/new/"
Do Until .ReadyState = 4
DoEvents
Loop
s = CStr(.Document.All.tags("html")(0).outerHTML)
End With

 

第三种:webbrowser控件

Sub d1()
    Dim dmt As HTMLDocument
    Dim htMent As HTMLCommentElement    'As IHTMLElementCollection
    Dim i As Integer
    Set dmt = UserForm1.WebBrowser1.Document
    Application.ScreenUpdating = False
    Dim Quote As IHTMLElement
    On Error Resume Next
    For i = 0 To dmt.all.Length - 1
        Set htMent = dmt.all(i)
        Vcels1 htMent, i
    Next i
End Sub
Sub Vcels1(htMent As HTMLCommentElement, i)
    On Error Resume Next
    With Sheets(1)
        .Cells(i + 2, "A") = htMent.tagName
        .Cells(i + 2, "B") = TypeName(htMent)
        .Cells(i + 2, "C") = htMent.ID
        .Cells(i + 2, "D") = htMent.Name
        .Cells(i + 2, "E") = htMent.Value
        .Cells(i + 2, "F") = htMent.Text
        .Cells(i + 2, "G") = htMent.innerText
        .Cells(i + 2, "H") = htMent.outerText
        .Cells(i + 2, "I") = htMent.nameProp
        .Cells(i + 2, "J") = htMent.tagUrn
        .Cells(i + 2, "K") = htMent.sourceIndex
        .Cells(i + 2, "L") = htMent.href
        .Columns("A:L").WARPTEXT = False
    End With
End Sub

要先在窗体中插入webbrowser控件,还要引用Microsoft HTML Object Library,窗体部分的代码如下:

Private Sub CommandButton1_Click()
WebBrowser1.Navigate TextBox1.Value
End Sub

Private Sub CommandButton2_Click()
Sheets(1).[a2:zz1000000].ClearContents
Call d1
Sheet1.Columns("E:H").WrapText = False  
End Sub

Private Sub UserForm_Initialize()
TextBox1.Value = "www.baidu.com"
Sheets(1).[a2:zz1000000].ClearContents
End Sub

 

第四种:QueryTable

With ActiveSheet.QueryTables.Add("url;http://data.bank.hexun.com/lccp/jrxp.aspx", Range("a1"))
.WebFormatting = xlWebFormattingNone '不包含格式'
.WebSelectionType = xlSpecifiedTables '指定table模式'
.WebTables = "2" '第2张table'
.Refresh False
End With

 

拿到了网页源代码后,可以利用VBA自带的Split和Replace函数或者正则表达式将需要的部分截取出来,写进数组,或者通过javascript(CreateObject("scriptcontrol"))进行处理,这就根据各种情况进行自定义开发了。这里提供一个用ie对象网抓的稍微复杂的例子:AFP

 

posted @ 2019-05-20 14:17  微时空  阅读(542)  评论(0编辑  收藏  举报