Perform OCR text extraction. This requires you have the tesseract package.

pdf_ocr_text(
  pdf,
  pages = NULL,
  opw = "",
  upw = "",
  language = "eng",
  dpi = 600
)

pdf_ocr_data(
  pdf,
  pages = NULL,
  opw = "",
  upw = "",
  language = "eng",
  dpi = 600
)

Arguments

pdf

file path or raw vector with pdf data

pages

which pages of the pdf file to extract

opw

string with owner password to open pdf

upw

string with user password to open pdf

language

passed to tesseract to specify the languge of the engine.

dpi

resolution to render image that is passed to tesseract::ocr.

See also

Other pdftools: pdftools, qpdf, rendering