跨平台的Html解析代码
发表于 2013 年 8 月 27 日 由 admin
前一段时间为了解析HTML在网上找Delphi版本的HTML解析器,发现没有太好用的.遇到复杂的HTML都会出错.最常见的JavaScript中嵌入HTML的字符串,会解析出错.
至于收费的没看过.不知道怎么样.
于是自己写了一个,到现在没有遇到解析出错的HTML.现在公开出来给大家用.只是苦了老外那几个收费的.
采用的是接口形式,生存期自管理,不用理会释放的事情.最近又增加了CSS Selector语法的查找功能.可以像CSS选择器一样选择节点.
只引用了SysUtils单元.避免了在高版本Delphi中Classes这个体积大户.同时也具有较好的跨平台性.
支持Delphi7-DelphiXE4为止的编译器.
因为采用的是接口,理论上编译成DLL的话C++和VB也能使用.
接口声明如下:
IHtmlElement = interface [ '{8C75239C-8CFA-499F-B115-7CEBEDFB421B}' ] function GetOwner: IHtmlElement; stdcall; function GetTagName: WideString ; safecall; function GetContent: WideString ; safecall; function GetOrignal: WideString ; safecall; function GetChildrenCount: Integer ; stdcall; function GetChildren(Index: Integer ): IHtmlElement; stdcall; function GetCloseTag: IHtmlElement; stdcall; function GetInnerHtml(): WideString ; safecall; function GetOuterHtml(): WideString ; safecall; function GetInnerText(): WideString ; safecall; function GetAttributes(Key: WideString ): WideString ; safecall; function GetSourceLineNum(): Integer ; stdcall; function GetSourceColNum(): Integer ; stdcall; // 属性是否存在 function HasAttribute(AttributeName: WideString ): Boolean ; stdcall; // 查找节点 { FindElements('Link','type="application/rss+xml"') FindElements('','type="application/rss+xml"') } function FindElements(ATagName: WideString ; AAttributes: WideString ; AOnlyInTopElement: Boolean ): IHtmlElementList; stdcall; //用CSS选择器语法查找Element function SimpleCSSSelector( const selector: WideString ) : IHtmlElementList; stdcall; // 枚举属性 procedure EnumAttributeNames(AParam: Pointer ; ACallBack: TEnumAttributeNameCallBack); stdcall; property TagName: WideString read GetTagName; property ChildrenCount: Integer read GetChildrenCount; property Children[index: Integer ]: IHtmlElement read GetChildren; default; property CloseTag: IHtmlElement read GetCloseTag; property Content: WideString read GetContent; property Orignal: WideString read GetOrignal; property Owner: IHtmlElement read GetOwner; // 获取元素在源代码中的位置 property SourceLineNum: Integer read GetSourceLineNum; property SourceColNum: Integer read GetSourceColNum; // property InnerHtml: WideString read GetInnerHtml; property OuterHtml: WideString read GetOuterHtml; property InnerText: WideString read GetInnerText; property Attributes[Key: WideString ]: WideString read GetAttributes; end ; IHtmlElementList = interface [ '{8E1380C6-4263-4BF6-8D10-091A86D8E7D9}' ] function GetCount: Integer ; stdcall; function GetItems(Index: Integer ): IHtmlElement; stdcall; property Count: Integer read GetCount; property Items[Index: Integer ]: IHtmlElement read GetItems; default; end ; function ParserHTML( const Source: WideString ): IHtmlElement; stdcall; |
GoogleCode SVN源代码:
http://code.google.com/p/delphi-html-parser/