This downloads and installs the Tika App jar
(~60 MB) into a user directory,
and verifies the integrity of the file using a checksum.
The default settings should work fine.
Usage
install_tika(
version = "2.7.0",
digest = paste0("7fefbe5570a95900d39193134e8277aec99e5450a8",
"cecbb5787b3d6651ebf735e460ccccddb49bdc2990",
"8a9058fc36e4689aed6da6d63a1cf70ca09ccf26bcca"),
mirrors = c("https://ftp.wayne.edu/apache/tika/",
"http://mirrors.ocf.berkeley.edu/apache/tika/", "http://apache.cs.utah.edu/tika/",
"http://mirror.cc.columbia.edu/pub/software/apache/tika/"),
retries = 2,
url = character()
)
Arguments
- version
The declared Tika version
- digest
The sha512 checksum. Set to an empty string
""
to skip the check.- mirrors
A vector of Apache mirror sites. One is picked randomly.
- retries
The number of times to try the download.
- url
Optional url to a particular location of the tika app. Setting this to any character string overrides downloading from random mirrors.
Details
The default settings of install_tika()
should typically be left as they are.
This function will download the version of the Tika jar
tested to work
with this package, and can verify file integrity using a checksum.
It will normally download from a random Apache mirror.
If the mirror fails,
it tries the archive at http://archive.apache.org/dist/tika/
.
You can also enter a value for url
directly to override this.
It will download into a directory determined
by tools::R_user_dir("rtika", which = "data")
,
specific to the operating system.
If tika()
is stopping with an error compalining about the jar
,
try running install_tika()
again.
Uninstalling
If you are uninstalling the entire rtika
package
and want to remove the Tika App jar
also,
run:
unlink(tools::R_user_dir("rtika", which = "data"), recursive = TRUE)
Alternately, navigate to the install folder and delete it manually.
It is the file path returned by
tools::R_user_dir("rtika", which = "data")
.
The path is OS specific.
Distribution
Tika is distributed under the Apache License Version 2.0, which generally permits distribution of the code "Object" without the "Source". The master copy of the Apache Tika source code is held in GIT. You can fetch (clone) the large source from GitHub ( https://github.com/apache/tika ).
Examples
# \donttest{
install_tika()
#> Downloading the Tika App .jar version 2.7.0 into "/github/home/.local/share/R/rtika". The file is approximately 60 MB - this may take a while.
#> Could not download the Tika App .jar from mirror "https://ftp.wayne.edu/apache/tika/2.7.0/tika-app-2.7.0.jar".
#> Trying the Apache archive.
#> The download is successful.
#> The file integrity is good.
#> The installation is successful.
# }