Skip to contents

This downloads and installs the Tika App jar (~60 MB) into a user directory, and verifies the integrity of the file using a checksum. The default settings should work fine.

Usage

install_tika(
  version = "2.7.0",
  digest = paste0("7fefbe5570a95900d39193134e8277aec99e5450a8",
    "cecbb5787b3d6651ebf735e460ccccddb49bdc2990",
    "8a9058fc36e4689aed6da6d63a1cf70ca09ccf26bcca"),
  mirrors = c("https://ftp.wayne.edu/apache/tika/",
    "http://mirrors.ocf.berkeley.edu/apache/tika/", "http://apache.cs.utah.edu/tika/",
    "http://mirror.cc.columbia.edu/pub/software/apache/tika/"),
  retries = 2,
  url = character()
)

Arguments

version

The declared Tika version

digest

The sha512 checksum. Set to an empty string "" to skip the check.

mirrors

A vector of Apache mirror sites. One is picked randomly.

retries

The number of times to try the download.

url

Optional url to a particular location of the tika app. Setting this to any character string overrides downloading from random mirrors.

Value

Logical if the installation was successful.

Details

The default settings of install_tika() should typically be left as they are.

This function will download the version of the Tika jar tested to work with this package, and can verify file integrity using a checksum.

It will normally download from a random Apache mirror. If the mirror fails, it tries the archive at http://archive.apache.org/dist/tika/. You can also enter a value for url directly to override this.

It will download into a directory determined by tools::R_user_dir("rtika", which = "data"), specific to the operating system.

If tika() is stopping with an error compalining about the jar, try running install_tika() again.

Uninstalling

If you are uninstalling the entire rtika package and want to remove the Tika App jar also, run:

unlink(tools::R_user_dir("rtika", which = "data"), recursive = TRUE)

Alternately, navigate to the install folder and delete it manually. It is the file path returned by tools::R_user_dir("rtika", which = "data"). The path is OS specific.

Distribution

Tika is distributed under the Apache License Version 2.0, which generally permits distribution of the code "Object" without the "Source". The master copy of the Apache Tika source code is held in GIT. You can fetch (clone) the large source from GitHub ( https://github.com/apache/tika ).

Examples

# \donttest{
install_tika()
#> Downloading the Tika App .jar version 2.7.0 into "/github/home/.local/share/R/rtika". The file is approximately 60 MB - this may take a while.
#> Could not download the Tika App .jar from mirror "https://ftp.wayne.edu/apache/tika/2.7.0/tika-app-2.7.0.jar".
#> Trying the Apache archive.
#> The download is successful.
#> The file integrity is good.
#> The installation is successful.
# }