itextSharp 附pdf文件解析

一、PdfObject: pdf对象，有9种，对象是按照对象内涵来分的，如果按照对象的使用规则来说，对象又分为间接对象和直接对象。间接对象是PDF中最常用的对象，如前面对象集合里面的，所有对象都是间接对象，在其他位置通过R关键字来引用，在交叉引用表里面都是通过间接对象来引用的。直接对象就更好理解了，9种对象单独出现的时候就叫直接对象。

PdfObject pdfObject = this.reader.GetPdfObject(index);

pdf对象的公共方法和属性：

 public PRIndirectReference IndRef { get; set; }
        public int Length { get; }
        public int Type { get; }

        public bool CanBeInObjStm();
        public int CompareTo(PdfObject obj);
        public virtual byte[] GetBytes();
        public override int GetHashCode();
        public bool IsArray();
        public bool IsBoolean();
        public bool IsDictionary();
        public bool IsIndirect();
        public bool IsName();
        public bool IsNull();
        public bool IsNumber();
        public bool IsStream();
        public bool IsString();
        public virtual void ToPdf(PdfWriter writer, Stream os);
        public override string ToString();

二、PdfName: is an object that can be used as a name in a PDF-file.- 有常用（官方）名字的pdf对象，根据名字new的pdfObject，俗称Name对象。PdfName 继承自PdfObject

类PdfName的实例化

  /** A name */
        public static readonly PdfName IDENTITY = new PdfName("Identity");
        /** A name */
        public static readonly PdfName IF = new PdfName("IF");
        /** A name */
        public static readonly PdfName IMAGE = new PdfName("Image");
        /** A name */
        public static readonly PdfName IMAGEB = new PdfName("ImageB");
        /** A name */
        public static readonly PdfName IMAGEC = new PdfName("ImageC");
        /** A name */
        public static readonly PdfName IMAGEI = new PdfName("ImageI");
        /** A name */
        public static readonly PdfName IMAGEMASK = new PdfName("ImageMask");
        /** A name */
        public static readonly PdfName INCLUDE = new PdfName("Include");
        public static readonly PdfName IND = new PdfName("Ind");
        /** A name */
        public static readonly PdfName INDEX = new PdfName("Index");
        /** A name */
        public static readonly PdfName INDEXED = new PdfName("Indexed");
        /** A name */
        public static readonly PdfName INFO = new PdfName("Info");
        /** A name */
        public static readonly PdfName INK = new PdfName("Ink");
        /** A name */
        public static readonly PdfName INKLIST = new PdfName("InkList");

由于PdfName 继承自PdfObject，后者的公用方法和属性同样能被前者调用。

三、PdfNameTree ，这个就是pdf“有名"的文档结构树了。

四、Dictionary
用"<<"和">>"包含的若干组条目,每组条目都由key和value组成,其中key必须是name对象,并且
一个dictionary内的key是唯一的;value可以是任何pdf的合法对象(包括dictionary对象).

五、Resources ，是Dictionary 对象

(必须有)记录了当前page用到的所有资源。如果当前页不用任何资源，则这是个空字典。忽略所有字段则表示继承父节点的资源。

Pdf文件解析下载

posted on 2013-12-16 14:45 NLazyo 阅读(2083) 评论(0) 收藏举报