C#用Tesseract进行OCR识别,可识别中英日韩所有语言

TesseractEngine ocr;
ocr = new TesseractEngine("./tessdata", "chi_sim");//设置语言   中文
//ocr = new TesseractEngine("./tessdata", "eng", EngineMode.TesseractAndCube);//设置语言   英文
//ocr = new TesseractEngine("./tessdata", "jpn");//设置语言   日语

 

源码下载:https://download.csdn.net/download/horseroll/10739546    源码下包含部分语言包,所以文件比较大

先上效果图。测试中文英文日语识别,其他语言也都行,只要下载相应的语言包,操作使用后面都有讲

 

1.首先在Nuget中搜索Tesseract,下载到项目中

 

 

 

2.下载相应的语言包放至Debug/tessdata文件夹下,Tesseract语言包下载地址:https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-302

 

 

3.代码操作

首先先初始化类,设置语言

 

 

 

 

 

 

导入图片进行识别

Bitmap bit = new Bitmap(Image.FromFile(filename.FileName.ToString()));
//bit = PreprocesImage(bit);//进行图像处理,如果识别率低可试试
Page page = ocr.Process(bit);
string str = page.GetText();//识别后的内容
page.Dispose();

图片处理算法,如果是识别数字,识别率低可以试试这个方法

/// <summary>
/// 图片颜色区分,剩下白色和黑色
/// </summary>
/// <param name="image"></param>
/// <returns></returns>
private Bitmap PreprocesImage(Bitmap image)
{
    //You can change your new color here. Red,Green,LawnGreen any..
    Color actualColor;
    //make an empty bitmap the same size as scrBitmap
    image = ResizeImage(image, image.Width * 5, image.Height * 5);
    //image.Save(@"D:\UpWork\OCR_WinForm\Preprocess_Resize.jpg");

    Bitmap newBitmap = new Bitmap(image.Width, image.Height);
    for (int i = 0; i < image.Width; i++)
    {
        for (int j = 0; j < image.Height; j++)
        {
            //get the pixel from the scrBitmap image
            actualColor = image.GetPixel(i, j);
            // > 150 because.. Images edges can be of low pixel colr. if we set all pixel color to new then there will be no smoothness left.
            if (actualColor.R > 23 || actualColor.G > 23 || actualColor.B > 23)//在这里设置RGB
                newBitmap.SetPixel(i, j, Color.White);
            else
                newBitmap.SetPixel(i, j, Color.Black);
        }
    }
    return newBitmap;
}

/// <summary>
/// 调整图片大小和对比度
/// </summary>
/// <param name="image"></param>
/// <param name="width"></param>
/// <param name="height"></param>
/// <returns></returns>
private Bitmap ResizeImage(Image image, int width, int height)
{
    var destRect = new Rectangle(0, 0, width, height);
    var destImage = new Bitmap(width, height);

    destImage.SetResolution(image.HorizontalResolution, image.VerticalResolution * 2);//2,3
    //image.Save(@"D:\UpWork\OCR_WinForm\Preprocess_HighRes.jpg");

    using (var graphics = Graphics.FromImage(destImage))
    {
        graphics.CompositingMode = CompositingMode.SourceOver;
        graphics.CompositingQuality = CompositingQuality.HighQuality;
        graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;
        graphics.SmoothingMode = SmoothingMode.HighQuality;
        graphics.PixelOffsetMode = PixelOffsetMode.HighQuality;

        using (var wrapMode = new ImageAttributes())
        {
            wrapMode.SetWrapMode(WrapMode.Clamp);
            graphics.DrawImage(image, destRect, 0, 0, image.Width, image.Height, GraphicsUnit.Pixel, wrapMode);
        }
    }

    return destImage;
}



转载:https://blog.csdn.net/HorseRoll/article/details/83310677?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param

 

Tesseract4配置与示例

https://blog.csdn.net/jumencibaliang92/article/details/82150883

 

 

posted @ 2020-10-05 23:50  冲向蓝天  阅读(2514)  评论(0编辑  收藏  举报