Extract text from an image using the tesseract package.

image_ocr(image, language = "eng", HOCR = FALSE, ...)

image_ocr_data(image, language = "eng", ...)

Arguments

image

magick image object returned by image_read() or image_graph()

language

passed to tesseract. To install additional languages see instructions in tesseract_download().

HOCR

if TRUE return results as HOCR xml instead of plain text

...

additional parameters passed to tesseract

Details

To use this function you need to tesseract first:

  install.packages("tesseract")

Best results are obtained if you set the correct language in tesseract. To install additional languages see instructions in tesseract_download().

See also

Examples

if(require("tesseract")){ img <- image_read("http://jeroen.github.io/images/testocr.png") image_ocr(img) image_ocr_data(img) }
#> Loading required package: tesseract
#> # A tibble: 60 x 3 #> word confidence bbox #> <chr> <dbl> <chr> #> 1 This 96.6 36,92,96,116 #> 2 is 96.9 109,92,129,116 #> 3 a 96.3 141,98,156,116 #> 4 lot 96.3 169,92,201,116 #> 5 of 96.5 212,92,240,116 #> 6 12 96.5 251,92,282,116 #> 7 point 96.5 296,92,364,122 #> 8 text 96.5 374,93,427,116 #> 9 to 96.9 437,93,463,116 #> 10 test 97.0 474,93,526,116 #> # … with 50 more rows