맥북 유저를 위한 파이썬 OCR 추천

학생들 진료 확인서 인식을 위해 안써본 OCR이 없는 것 같다.

Tesseract, EasyOCR, PaddleOCR 등 다양한 OCR들을 써보았는데, 한글에 대해 만족할만한 성능을 가진 OCR이 없었다.

그런데 최근에 맥북에 있는 라이브 텍스트 기능을 래핑한 파이썬 라이브러리를 발견했다.

애플에서 공개한 Vision Framework API를 이용한 것이다.

1. Vision Framework

비전 프레임워크는 애플이 High Sierra부터 개발자용으로 공개한 머신비전 프레임워크이다.

이미지 분류, 정렬, 텍스트 인식, 얼굴 인식 등 다양한 API를 제공한다.

https://developer.apple.com/kr/machine-learning/api/

developer.apple.com

온디바이스이기 때문에 인터넷 연결도 따로 필요없다.

순수 swift로 구현한다면 아래와 같이 이용할 수 있다.

import Foundation
import Vision
import AppKit

func loadNSImage(_ path: String) -> NSImage? {
    return NSImage(contentsOfFile: path)
}

func cgImage(from nsImage: NSImage) -> CGImage? {
    var rect = CGRect(origin: .zero, size: nsImage.size)
    return nsImage.cgImage(forProposedRect: &rect, context: nil, hints: nil)
}

let args = CommandLine.arguments
guard args.count >= 2 else {
    fputs("Usage: ocr <image_path> [lang1,lang2,...] [roi]\n", stderr)
    exit(1)
}
let imagePath = args[1]
let langs = args.count >= 3 ? args[2].split(separator: ",").map { String($0) } : ["ko-KR","en-US"]

// ROI: "x,y,w,h" in 0~1 (optional)
var roi: CGRect? = nil
if args.count >= 4 {
    let comps = args[3].split(separator: ",").compactMap { Double($0) }
    if comps.count == 4 {
        roi = CGRect(x: comps[0], y: comps[1], width: comps[2], height: comps[3])
    }
}

guard let nsImage = loadNSImage(imagePath), let cg = cgImage(from: nsImage) else {
    fputs("Failed to load image\n", stderr)
    exit(1)
}

let request = VNRecognizeTextRequest { request, error in
    if let error = error {
        fputs("Error: \(error.localizedDescription)\n", stderr)
        exit(1)
    }
    let observations = request.results as? [VNRecognizedTextObservation] ?? []
    for obs in observations {
        if let top = obs.topCandidates(1).first {
            print(top.string)
        }
    }
}

// 핵심 옵션들
request.recognitionLevel = .accurate
request.usesLanguageCorrection = true
request.recognitionLanguages = langs
request.customWords = [] // 필요시 도메인 사전 추가

if let r = roi {
    request.regionOfInterest = r
}

// 가장 최신 리비전을 지원하면 자동으로 사용(명시 필요시 아래 라인 사용)
// request.revision = VNRecognizeTextRequestRevision3 // OS에 맞춰 변경

let handler = VNImageRequestHandler(cgImage: cg, options: [:])
do {
    try handler.perform([request])
} catch {
    fputs("Perform error: \(error.localizedDescription)\n", stderr)
    exit(1)
}

그리고 터미널에서 아래와 같이 컴파일 후 실행하면 된다.

xcrun swiftc ocr.swift -o ocr
./ocr sample.png "ko-KR,en-US"

인식률과 속도가 엄청나다.

EasyOCR로 했을 때는 1장 읽는데 3초 이상이 걸리는데, 비젼 프레임워크를 이용하면 1초당 2장씩 처리한다.

검색해보니 누가 만들어놓은 파이썬 라이브러리도 있었다.

2. ocrmac

GitHub - straussmaximilian/ocrmac: A python wrapper to extract text from images on a mac system. Uses the vision framework from Apple.

A python wrapper to extract text from images on a mac system. Uses the vision framework from Apple. - straussmaximilian/ocrmac

github.com

역시 누군가 고맙게도 API를 파이썬으로 래핑해 놓았다.

우리는 pip로 다운받아서 이용하면 된다.

pip install ocrmac

그리고 샘플 이미지를 한번 인식시켜보자.

처음에는 글자 인식이 잘 안됐는데, 타겟 언어를 지정하니 훨씬 인식률이 높아졌다.

from ocrmac import ocrmac
import matplotlib.pyplot as plt

img_path = './pdf2png/scan0.png'
result = ocrmac.OCR(img_path, language_preference=["ko-KR","en-US"]).recognize()
for line in result:
    print(line)

첫번째 값은 글자, 두번째 값은 정확도, 세번째 값은 바운딩 박스이다.

보통 cv2로 바운딩 박스를 그리기도 하지만, 이 라이브러리는 그냥 그려줌.

아래와 같이 annotation_PIL를 쓰면 된다.

from ocrmac import ocrmac
import matplotlib.pyplot as plt

img_path = './pdf2png/scan0.png'
result = ocrmac.OCR(img_path, language_preference=["ko-KR","en-US"])
img  = result.annotate_PIL()
plt.figure(figsize=(12,12))
plt.imshow(img)

프레임워크를 livetext로 하거나 인식 레벨을 변경시켜줄수도 있다.

자세한 예제는 아래를 이용해보자.

ocrmac/ExampleNotebook.ipynb at main · straussmaximilian/ocrmac

A python wrapper to extract text from images on a mac system. Uses the vision framework from Apple. - straussmaximilian/ocrmac

github.com

3. 후기

애플의 비젼 프레임워크를 쓰면서 OCR에 대한 갈증이 한방에 해결되었다.

200dpi의 이미지 100장을 처리하는데 EasyOCR은 10분 정도가 걸리는데 반해, ocrmac은 거의 1분 이내의 시간이 걸린다.

이걸로 웹 서비스를 한번 만들어봐도 좋을것 같다.

누군가 나에게 알려주었다면 맥북을 더 사랑했을텐데..

너무 늦게 알게 돼서 아쉬울 따름이다.

목차