Extract text from an image using the tesseract package.
Usage
image_ocr(image, language = "eng", HOCR = FALSE, ...)
image_ocr_data(image, language = "eng", ...)
Arguments
- image
magick image object returned by
image_read()
orimage_graph()
- language
passed to tesseract. To install additional languages see instructions in tesseract_download().
- HOCR
if
TRUE
return results as HOCR xml instead of plain text- ...
additional parameters passed to tesseract
Details
To use this function you need to tesseract first:
Best results are obtained if you set the correct language in tesseract. To install additional languages see instructions in tesseract_download().
Examples
# \donttest{
if(require("tesseract")){
img <- image_read("http://jeroen.github.io/images/testocr.png")
image_ocr(img)
image_ocr_data(img)
}
#> Loading required package: tesseract
#> # A tibble: 60 × 3
#> word confidence bbox
#> <chr> <dbl> <chr>
#> 1 This 96.8 36,92,96,116
#> 2 is 96.9 109,92,129,116
#> 3 a 95.0 141,98,156,116
#> 4 lot 95.0 169,92,201,116
#> 5 of 96.4 212,92,240,116
#> 6 12 96.4 251,92,282,116
#> 7 point 96.3 296,92,364,122
#> 8 text 96.2 374,93,427,116
#> 9 to 97.0 437,93,463,116
#> 10 test 97.0 474,93,526,116
#> # ℹ 50 more rows
# }