Skip to contents

Perform OCR text extraction. This requires you have the tesseract package.

Usage

pdf_ocr_text(
  pdf,
  pages = NULL,
  opw = "",
  upw = "",
  dpi = 600,
  language = "eng",
  options = NULL
)

pdf_ocr_data(
  pdf,
  pages = NULL,
  opw = "",
  upw = "",
  dpi = 600,
  language = "eng",
  options = NULL
)

Arguments

pdf

file path or raw vector with pdf data

pages

which pages of the pdf file to extract

opw

string with owner password to open pdf

upw

string with user password to open pdf

dpi

resolution to render image that is passed to pdf_convert.

language

passed to tesseract to specify the languge of the engine.

options

passed to tesseract to specify OCR parameters

See also

Other pdftools: pdftools, qpdf, rendering