lucene.net搜索文档（pdf，doc，txt）内容

   public static void AddTxtDocument(string path, IndexWriter writer)
        {
            try
            {
                
                Document doc = new Document();
                string StrContent;
                if (Path.GetExtension(path).ToLower() == ".pdf")
                {
                    StrContent = pdfToTxt(path);
                }
                else
                {
                    using (StreamReader sr = new StreamReader(path, System.Text.Encoding.Default))
                    {
                        StrContent = sr.ReadToEnd();
                    }
                }
                doc.Add(new Field(CONTENT_KEY_NAME, StrContent, Field.Store.NO, Field.Index.ANALYZED));//内容
                doc.Add(new Field(TITLE,Path.GetFileNameWithoutExtension(path) , Field.Store.YES, Field.Index.ANALYZED));//标题
                doc.Add(new Field(FILE_KEY_NAME, path, Field.Store.YES, Field.Index.NO));//文件名
                doc.Add(new Field(CREATEDATE, new FileInfo(path).LastWriteTime.ToString(), Field.Store.YES, Field.Index.NO));//创建时间
                writer.AddDocument(doc);
            }
            catch (Exception)
            {
                
                throw;
            }
           
        }

        private static string   pdfToTxt(string pdffile)
        {



            PDDocument doc = PDDocument.load(pdffile);



            PDFTextStripper pdfStripper = new PDFTextStripper();



            return  pdfStripper.getText(doc);

        }

lucene.net搜索pdf文件内容前，先要读取pdf文本，这必然要有一个转换，pdfbox就必不可少了，当然也还有其他方式（运行已有的exe），网上方法很多，

只要能把pdf图片转为字符串，lucene.net就能搜索得到了。

使用pdfbox需：

1.下载pdfbox的dll

2.再引用一下两个命名空间：

using org.pdfbox.pdmodel;
using org.pdfbox.util;

posted on 2013-06-04 15:26 NLazyo 阅读(1552) 评论(0) 编辑收藏举报

lucene.net搜索文档（pdf，doc，txt）内容

搜索

常用链接

最新随笔

我的标签

随笔分类

随笔档案

文章分类

阅读排行榜

评论排行榜

推荐排行榜

最新评论