从IE浏览器获取当前页面的内容
从IE浏览器获取当前页面内容可能有多种方式,今天我所介绍的是其中一种方法。基本原理:当鼠标点击当前IE页面时,获取鼠标的坐标位置,根据鼠标位置获取当前页面的句柄,然后根据句柄,调用win32的东西进而获取页面内容。具体代码:
1 private void timer1_Tick(object sender, EventArgs e) 2 { 3 lock (currentLock) 4 { 5 System.Drawing.Point MousePoint = System.Windows.Forms.Form.MousePosition; 6 if (_leftClick) 7 { 8 timer1.Stop(); 9 _leftClick = false; 10 11 _lastDocument = GetHTMLDocumentFormHwnd(GetPointControl(MousePoint, false)); 12 if (_lastDocument != null) 13 { 14 if (_getDocument) 15 { 16 _getDocument = true; 17 try 18 { 19 string url = _lastDocument.url; 20 string html = _lastDocument.documentElement.outerHTML; 21 string cookie = _lastDocument.cookie; 22 string domain = _lastDocument.domain; 23 24 var resolveParams = new ResolveParam 25 { 26 Url = new Uri(url), 27 Html = html, 28 PageCookie = cookie, 29 Domain = domain 30 }; 31 32 RequetResove(resolveParams); 33 } 34 catch (Exception ex) 35 { 36 System.Windows.MessageBox.Show(ex.Message); 37 Console.WriteLine(ex.Message); 38 Console.WriteLine(ex.StackTrace); 39 } 40 } 41 } 42 else 43 { 44 new MessageTip().Show("xx", "当前页面不是IE浏览器页面,或使用了非IE内核浏览器,如火狐,搜狗等。请使用IE浏览器打开网页"); 45 } 46 47 _getDocument = false; 48 } 49 else 50 { 51 _pointFrm.Left = MousePoint.X + 10; 52 _pointFrm.Top = MousePoint.Y + 10; 53 } 54 } 55 56 }
第11行的 GetHTMLDocumentFormHwnd(GetPointControl(MousePoint, false)) 分解下,先从鼠标坐标获取页面的句柄:
1 public static IntPtr GetPointControl(System.Drawing.Point p, bool allControl) 2 { 3 IntPtr handle = Win32APIsFull.WindowFromPoint(p); 4 if (handle != IntPtr.Zero) 5 { 6 System.Drawing.Rectangle rect = default(System.Drawing.Rectangle); 7 if (Win32APIsFull.GetWindowRect(handle, out rect)) 8 { 9 return Win32APIsFull.ChildWindowFromPointEx(handle, new System.Drawing.Point(p.X - rect.X, p.Y - rect.Y), allControl ? Win32APIsFull.CWP.ALL : Win32APIsFull.CWP.SKIPINVISIBLE); 10 } 11 } 12 return IntPtr.Zero; 13 14 }
接下来,根据句柄获取页面内容:
1 public static HTMLDocument GetHTMLDocumentFormHwnd(IntPtr hwnd) 2 { 3 IntPtr result = Marshal.AllocHGlobal(4); 4 Object obj = null; 5 6 Console.WriteLine(Win32APIsFull.SendMessageTimeoutA(hwnd, HTML_GETOBJECT_mid, 0, 0, 2, 1000, result)); 7 if (Marshal.ReadInt32(result) != 0) 8 { 9 Console.WriteLine(Win32APIsFull.ObjectFromLresult(Marshal.ReadInt32(result), ref IID_IHTMLDocument, 0, out obj)); 10 } 11 12 Marshal.FreeHGlobal(result); 13 14 return obj as HTMLDocument; 15 }
大致原理:
给IE窗体发送消息,获取到一个指向 IE浏览器(非托管)的某个内存块的指针,然后根据这个指针获取到HTMLDocument对象。
这个方法涉及到win32的两个函数:
[System.Runtime.InteropServices.DllImportAttribute("user32.dll", EntryPoint = "SendMessageTimeoutA")] public static extern int SendMessageTimeoutA( [InAttribute()] System.IntPtr hWnd, uint Msg, uint wParam, int lParam, uint fuFlags, uint uTimeout, System.IntPtr lpdwResult);
[System.Runtime.InteropServices.DllImportAttribute("oleacc.dll", EntryPoint = "ObjectFromLresult")] public static extern int ObjectFromLresult( int lResult, ref Guid riid, int wParam, [MarshalAs(UnmanagedType.IDispatch), Out] out Object pObject );