Perform OCR text extraction. This requires you have the tesseract
package.
Usage
pdf_ocr_text(
pdf,
pages = NULL,
opw = "",
upw = "",
dpi = 600,
language = "eng",
options = NULL
)
pdf_ocr_data(
pdf,
pages = NULL,
opw = "",
upw = "",
dpi = 600,
language = "eng",
options = NULL
)
Arguments
file path or raw vector with pdf data
- pages
which pages of the pdf file to extract
- opw
string with owner password to open pdf
- upw
string with user password to open pdf
- dpi
resolution to render image that is passed to pdf_convert.
- language
passed to tesseract to specify the languge of the engine.
- options
passed to tesseract to specify OCR parameters