Create an OCR engine for a given language and control parameters. This can be used by the ocr and ocr_data functions to recognize text.

tesseract(
  language = NULL,
  datapath = NULL,
  configs = NULL,
  options = NULL,
  cache = TRUE
)

tesseract_params(filter = "")

tesseract_info()

Arguments

language

string with language for training data. Usually defaults to eng

datapath

path with the training data for this language. Default uses the system library.

configs

character vector with files, each containing one or more parameter values. These config files can exist in the current directory or one of the standard tesseract config files that live in the tessdata directory. See details.

options

a named list with tesseract parameters. See details.

cache

speed things up by caching engines

filter

only list parameters containing a particular string

Details

Tesseract control parameters can be set either via a named list in the options parameter, or in a config file text file which contains the parameter name followed by a space and then the value, one per line. Use tesseract_params() to list or find parameters. Note that that some parameters are only supported in certain versions of libtesseract, and that invalid parameters can sometimes cause libtesseract to crash.

References

tesseract wiki: control parameters

See also

Other tesseract: ocr(), tesseract_download()

Examples

tesseract_params('debug')
#> # A tibble: 66 × 3 #> param default desc #> * <chr> <chr> <chr> #> 1 editor_dbwin_xpos 50 Editor debug window X Pos #> 2 editor_dbwin_ypos 500 Editor debug window Y Pos #> 3 editor_dbwin_height 24 Editor debug window height #> 4 editor_dbwin_width 80 Editor debug window width #> 5 textord_debug_tabfind 0 Debug tab finding #> 6 textord_debug_bugs 0 Turn on output related to bugs in tab f… #> 7 textord_testregion_left -1 Left edge of debug reporting rectangle #> 8 textord_testregion_top -1 Top edge of debug reporting rectangle #> 9 textord_testregion_right 2147483647 Right edge of debug rectangle #> 10 textord_testregion_bottom 2147483647 Bottom edge of debug rectangle #> # … with 56 more rows