1. 程式人生 > >驗證碼識別(Tess4J初體驗)

驗證碼識別(Tess4J初體驗)

遇到一道機試題


當時就懵逼了0.0查了好多資料,大體知道了基本的步驟:1.預處理 2.灰度化 3.二值化 4.去噪 5.分割 6.識別

還好題目要求不嚴格,可以使用開源程式。機智的我還真找到一個:Tesseract

下面開始正文:

Tess4J官方描述:A Java JNA wrapper for Tesseract OCR API.

2.將下載的檔案解壓後把下面幾個資料夾(圖片中選中的)複製到新建的專案中


3.將lib下的jar包加到build path 中。注意:lib裡面除了jar包還有別的。

4.根據官網的樣例在剛建的專案中使用一下:

The following code example shows common usage of the library. Make sure 

tessdata folder are in the search path, and the .jar files are in the classpath.注意在第4步之前確保tessdata 資料夾在專案中,jar包在classpath中。前面的2,3兩步已經做了。

package net.sourceforge.tess4j.example;

import java.io.File;
import net.sourceforge.tess4j.*;

public class TesseractExample {

    public static void main(String[] args) {
        File imageFile = new File("eurotext.tif");
        ITesseract instance = new Tesseract();  // JNA Interface Mapping
        // ITesseract instance = new Tesseract1(); // JNA Direct Mapping

        try {
            String result = instance.doOCR(imageFile);
            System.out.println(result);
        } catch (TesseractException e) {
            System.err.println(e.getMessage());
        }
    }
}

我稍微改了一下,識別指定資料夾下所有驗證碼
package blog.csdn.net.dr_guo;

import java.io.File;

import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
/**
 * 驗證碼識別(圖片名即為驗證碼數字)
 * @author drguo
 *
 */
public class VCR {
	public static void main(String[] args) {
		File root = new File(System.getProperty("user.dir") + "/imgs");
		ITesseract instance = new Tesseract();

		try {
			File[] files = root.listFiles();
			for (File file : files) {
				String result = instance.doOCR(file);
				String fileName = file.toString().substring(file.toString().lastIndexOf("\\")+1);
				System.out.println("圖片名:" + fileName +" 識別結果:"+result);
			}
		} catch (TesseractException e) {
			System.err.println(e.getMessage());
		}
    }
}
直接執行就行了,但這時可能會報錯
Exception in thread "main" java.lang.UnsatisfiedLinkError: Unable to load library 'libtesseract304': Native library (win32-x86-64/libtesseract304.dll) not found in resource path ([file:/G:/Eclipse/Demo/bin/, file:/G:/Eclipse/Demo/lib/commons-beanutils-1.9.2.jar, file:/G:/Eclipse/Demo/lib/commons-io-2.4.jar, file:/G:/Eclipse/Demo/lib/commons-logging-1.2.jar, file:/G:/Eclipse/Demo/lib/ghost4j-1.0.1.jar, file:/G:/Eclipse/Demo/lib/hamcrest-core-1.3.jar, file:/G:/Eclipse/Demo/lib/itext-2.1.7.jar, file:/G:/Eclipse/Demo/lib/jai-imageio-core-1.3.1.jar, file:/G:/Eclipse/Demo/lib/jna-4.2.2.jar, file:/G:/Eclipse/Demo/lib/jul-to-slf4j-1.7.19.jar, file:/G:/Eclipse/Demo/lib/junit-4.12.jar, file:/G:/Eclipse/Demo/lib/lept4j-1.1.2.jar, file:/G:/Eclipse/Demo/lib/log4j-1.2.17.jar, file:/G:/Eclipse/Demo/lib/logback-classic-1.1.6.jar, file:/G:/Eclipse/Demo/lib/logback-core-1.1.6.jar, file:/G:/Eclipse/Demo/lib/rococoa-core-0.5.jar, file:/G:/Eclipse/Demo/lib/slf4j-api-1.7.19.jar, file:/G:/Eclipse/Demo/lib/xmlgraphics-commons-1.5.jar])
注意前面的報錯資訊,把lib下的win32-x86-64拷到專案中的bin目錄下就可以了


準確率還是挺高的。

注意我的jdk版本是jdk1.8.0_74,如果你的版本低於我的版本可能會報錯~