PdfiumViewer组件扩展(Pdfium.Net.Free)--删除或编辑pdf内容

项目地址:

Pdfium.Net:https://github.com/1000374/Pdfium.Net.Free

PdfiumViewer:https://github.com/1000374/PdfiumViewer

Pdfium.Net.Free 支持

  • .NETFramework 4.0

  • .NETFramework 4.5

  • .NETStandard 2.0

  • .Net8.0

可以和PdfiumViewer.Free共同使用预览、编辑pdf,也可以直接引用Pdfium.Net.Free 操作pdf,Pdfium.Net.Free封装了现有Pdfium的函数,实现了部分操作pdf的功能,部分功能等待后续~~

如需删除或者编辑pdf中的内容,首先要获取pdf内需要修改或者删除的对象,所有对页面编辑操作都需要调用GenerateContent函数方才生效

获取pdf所有对象的方法:

返回的信息包含当前对象的index、文字及字体信息(如对象是文本)、位置信息

1
2
3
4
5
6
var pathPdf = "./Pdfium.NetTests/resources/fontText.pdf";
using (var doc = PdfDocument.Load(new MemoryStream(File.ReadAllBytes(pathPdf))))
{
    var page0 = doc.Pages[0];
    var infos = page0.GetCharacterInformation();
}

 

如上述不能满足需求,请使用下示例获取:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
//How to know "GetObject" index?
var pathPdf = "./Pdfium.NetTests/resources/fontText.pdf";
using (var doc = PdfDocument.Load(new MemoryStream(File.ReadAllBytes(pathPdf))))
{
    var arr = "武则天".ToCharArray();
    var page0 = doc.Pages[0];
    var count = page0.GetObjectsCount();
    for (int i = 0; i < count; i++)
    {
        var obj = page0.GetObject(0);
        if (!obj.IsNull)
        {
            var objType = obj.PageObjGetObjType();
            switch (objType)
            {
                case FpdfPageObj.FPDF_PAGEOBJ_UNKNOWN:
                    {
 
                    }
                    break;
                case FpdfPageObj.FPDF_PAGEOBJ_TEXT:
                    {
                        var txt = obj.TextObjGetText(page0.PageText);
                    }
                    break;
                case FpdfPageObj.FPDF_PAGEOBJ_PATH:
                    {
                        var res = obj.PathMoveTo(10, 20);
                    }
                    break;
                case FpdfPageObj.FPDF_PAGEOBJ_IMAGE:
                    {
                        /* Matrix: | a, c, e| ==> | width, 0,      offsetX|
                         *         | b, d, f|     | 0,     height, offsetY|*/
                        var bitmap = obj.ImageObjGetBitmap();
                        var res = obj.ImageObjSetMatrix(bitmap.Width, 0.1, 0, bitmap.Height, 100, 100);
                        if (!bitmap.IsNull)
                        {
                            //There is a feature request discussing this: https://crbug.com/pdfium/1930 (disclaimer: I'm the reporter)
                            //TLDR The functions you mention do provide the main data stream, but for some filters complementary data would be needed to actually re - construct the image, which pdfium does not provide.
 
                            //For CCITTDecode, as the TIFF format can use, pdfium's public API does not tell the CCITT group, but this would be needed to re-construct the TIFF header, which the PDF format strips. And I think BlackIs1 info would also be needed; possibly more.
                            //JBIG2Decode may optionally use a separate JBIG2Globals stream, which again pdfium does not provide.I had filed a separate bug about this: https://crbug.com/pdfium/1927. However, I guess the raw JBIG2 data might not be very useful except for re-insertion into a PDF. IIRC the way pikepdf handles JBIG2 extraction to files is to just decode the data and re-encode to some other format. From a programmatic POV that's not ideal, but I guess the context is that standalone JBIG2 isn't really supported by end-user apps.
                            //Concerning FPDFImageObj_GetImageDataDecoded(), note that it does not fully decode images; it only applies "simple" filters(see https://crbug.com/pdfium/1203#c7), so the function name is a bit misleading.
 
                            //For the plain pixel data, you can use FPDFImageObj_GetBitmap(), FPDFBitmap_GetBuffer() & co, but note that FPDF_BITMAP is limited in supported pixel formats and bit depth(e.g.no CMYK, B / W, > 8bpc RGB(A), ...).
                        }
                    }
                    break;
                case FpdfPageObj.FPDF_PAGEOBJ_SHADING:
                    {
 
                    }
                    break;
                case FpdfPageObj.FPDF_PAGEOBJ_FORM:
                    {
 
                    }
                    break;
            }
        }
    }
    page0.GenerateContent();
}

  

编辑对象代码示例(文本):经测试、对于中文替换有时会出现乱码,暂未发现设置字库的方式,可通过先删除在添加文本的方式修改

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
var pathPdf = "./Pdfium.NetTests/resources/hello_world.pdf";
using (var doc = PdfDocument.Load(new MemoryStream(File.ReadAllBytes(pathPdf))))
{
    //var fontPath = @"c:\Windows\fonts\simhei.ttf";
    //doc.LoadFont(fontPath);
    var page0 = doc.Pages[0];
    var obj = page0.GetObject(0);
    if (!obj.IsNull)
    {
        if (!obj.IsNull)
        {
            var objType = obj.PageObjGetObjType();
            switch (objType)
            {
                case FpdfPageObj.FPDF_PAGEOBJ_TEXT:
                    {
                        var txt = obj.TextObjGetText(page0.PageText);
                        var res = obj.TextSetText("Changed for SetText test");
                    }
                    break;
            }
        }
        page0.GenerateContent();
        doc.Save("./Pdfium.NetTests/TextObjFontChange.pdf");
    }
}

 

删除对象:

1
2
3
4
5
6
7
8
9
10
11
12
var pathPdf = "./Pdfium.NetTests/resources/fontText.pdf";
using (var doc = PdfDocument.Load(new MemoryStream(File.ReadAllBytes(pathPdf))))
{
    var page0 = doc.Pages[0];
    var obj = page0.GetObject(0);
    if (!obj.IsNull)
    {
        var res = page0.RemoveObject(obj);
        page0.GenerateContent();
        doc.Save("./Pdfium.NetTests/TextObjFont.pdf");
    }
}

  

 

 

 

posted @   小树禾小央  阅读(270)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· 单线程的Redis速度为什么快?
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 展开说说关于C#中ORM框架的用法!
点击右上角即可分享
微信分享提示