RTF Dom Parser
Download Source/Files/xdesigner/RtfDomParser_1.0_source.zip .
Introduce
RTF DOM Parser( short name RDP ) is a open source C# library which can parse RTF document and generate rtf DOM Tree. Use DOM tree , .NET programmer can read rtf document very easy, It is use GNU2 license.
Background
RTF is a document format use widely. May software support it to exchange text data , also some time C#er need to read and write rtf document.
Although RTF is not a very complex format, but it is not easy to parse, So I create RDP , It can parser rtf document and generate a DOM Tree , DOM Tree is easy to use , So that , .NET application developer need not much knowledge about RTF syntax .
RTF format
RTF is not a complex format , a rtf file’s content is plait text , those text can parse some key word . for example , you open windows WordPad , input characters “abcdef” without specify format , and save it to a rtf file. You open the rtf file with windows notepad, you can find the content as the following:
{\rtf1\ansi\ansicpg936\deff0\deflang1033\deflangfe2052{\fonttbl{\f0\fmodern\fprq6\fcharset134 \'cb\'ce\'cc\'e5;}}{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\lang2052\f0\fs20 abcdefg\par}
This is a verty simple rtf’s content , For friendly to analyze, I can indent these code as following.
{\rtf1 \ansi \ansicpg936 \deff0 \deflang1033 \deflangfe2052
{\fonttbl
{\f0\fmodern\fprq6\fcharset134 \'cb\'ce\'cc\'e5;}
}
{\*\generator Msftedit 5.41.15.1515;}
\viewkind4\uc1\pard\lang2052\f0\fs20 abcdefg\par
}
it can parse group and nested group, group is starts with "{" , and finish by "}". A rtf keyword start with "\" , and following by a keyword name , maybe some keyword has a integer parameter.
For example , "\ansicpg936" is a rtf keyword , it starts with "\" , and name is "ansicpg" , and has a parameter value "936" ; keyword "\ansi" , it’s name is "ansi" , and do no have parameter.
Notice, white space, include blank, tab, enter may effect rtf document’s render, so do not indent rtf code.
RDP
Some .NET programmer may have to parse or write rtf document, so I provide this RDP, I order by RTF standard V1.7, and parse rtf document and generate DOM tree , So .NET programmer can use this RTF DOM tree to read rtf document content. RTF code is not easy to read , but DOM tree is very easy to use ,I hope RDP can save up .NET programmer’s time.
In RDP, there are 3 part :RTF DOM, RTF Reader and RTF Writer. RTF DOM is the mainly party. The following shape descript RDP’s structure.
In RTF DOM, RTFDomElement is the root element type , it derive other document element type, such as bookmark , document , image and so on.
RTFDomDocument is the root element to access RTF DOM, It derived from RTFDomElement and deputy the whole rtf document.
In rtf standard, there are not exist table, table column type , only table row and cell. even table row is a special paragraph . But in RTF DOM , I defined table, table row, table column, table cell to describe the full table DOM, So programmer can read table information easy. But it is hard to parse table structure from RTF code , I spend a lot of C# code to realize this function.
Part of RTF Reader is a base module to read native rtf source code. It is in read only mode, It can read rtf code and generate rtf node. This can reduce RTF DOM’s workload. Because RTFReader use a read only stream mode , so RDP can parse big rtf file which size more than 100MB fast.
RTF Writer use to generate rtf source code. Current version of RDP is 1.0 , it is the first version, so RTF Writer is not powerful, This part include only RTFWriter type , use this type , .NET programmer can generate rtf source without any obvious syntax error.
RDP is easy to use, you can use the following C# code to parse rtf file.
XDesigner.RTF.RTFDomDocument doc = new XDesigner.RTF.RTFDomDocument();
doc.Load( rtfFileName);
After these two line C# code , you can get data from rtf document by use doc variable.
For example , you can use code "doc.Info.Title" to get this rtf document’s title, you can enumerate elements of "doc.Elements" to get some part you need.
posted on 2010-10-13 20:04 袁永福 电子病历,医疗信息化 阅读(2402) 评论(1) 编辑 收藏 举报
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人