rtika 2.4.1 (2021-08-05)
CRAN release: 2022-09-26
* Updated Tika to 2.4.1. Details are found at https://tika.apache.org/2.4.1/index.html . * Use tools::R_user_dir() instead of rappdirs, thanks to Maëlle Salmon (rOpenSci)
rtika 2.0.0 (2021-08-05)
CRAN release: 2021-08-06
* Updated Tika to 2.0.0. Details are found at https://tika.apache.org/2.0.0/index.html .
rtika 1.23 (2020-04-24)
CRAN release: 2019-12-12
rtika 1.23 (2019-12-12)
CRAN release: 2019-12-12
rtika 1.20 (2019-02-26)
CRAN release: 2019-02-27
* Updated Tika to 1.20 * Includes two config files to either turn on or off OCR. This is only relevant on Linux variants that have the Tesseract OCR engine installed.
rtika 1.19.1 (2018-07-08)
CRAN release: 2018-11-15
rtika 1.1.19 (2018-07-08)
CRAN release: 2018-10-05
- The new java() function is used get the command to invoke Java for all tika() functions, and allows the option of changing its value across sessions. If you want to use a particular installation of Java, set the JAVA_HOME variable using the Sys.setenv(JAVA_HOME = ‘my path’). The java() function will check for this variable, and if found return it instead of the default ‘java’ invocation.
- Updated to Tika version 1.19.
rtika 0.1.8 (2018-04-25)
CRAN release: 2018-05-02
rtika 0.1.7 (2018-03-08)
- The new install_tika() function allows this package to be distributed on CRAN. The Tika App jar was too large to go on CRAN directly. The .jar is installed in the directory determined by the rappdirs::user_data_dir() function.
- The .onLoad() function now gives various installation advice when starting up.
rtika 0.1.6 (2018-03-01)
- tika(), tika_xml(), tika_json(), tika_text(), and tika_html() have a new downloader, which preserve the server’s content-type encoding as a file extension when possible. This should help Tika identify and parse downloaded files more reliably. It depends on the ‘curl’ package.
- Added tika_fetch(), which is a stand alone function to download files and append a file extension matching the content type declared by the server. Additional features for this function include specifying the number of download retries. The output of tika_fetch() can be piped directly into other tika functions.
- New introductory vignette covers how to use the functions and surveys several applications.
- tika(), tika_xml(), tika_json(), tika_text(), and tika_html() can now be set to return=FALSE, which does not return any R character vector but invisibly returns NULL. This would be most useful in massive file conversion jobs with hundreds of thousands of files.
- Used pkgdown to create a website for github pages.
- New tika_json_text() function gets metadata in .json with plain text content.
rtika 0.1.5 (2018-02-15)
- Added dependency on ‘sys’ package because the ‘system2’ function was causing intermittent errors by ending tika in mid process.
- Added startup check of the java version, using .onLoad() call to ‘java -version’
- Removed redundant conversion to UTF-8, because the Tika batch routine is already outputting UTF-8.
- Increased the speed of building packages (fewer downloads needed for testing, and the examples do not run).
- Added Code of Conduct to CONDUCT.md file
- Set default ‘cleanup’ attribute to TRUE.
rtika 0.1.4 (2018-02-15)
- Because it is too big for CRAN, removed the Tika .jar file.
- Added the Tika .jar to a new tikajar package on github.
- Put the ropensci review badge on the tikajar package also, since its an essential component of this package.
- Updated DESCRIPTION, documentation and .travis.yml to reflect the new installation routine.
rtika 0.1.3 (2018-02-04)
- added convenience functions that advertise output format: tika_xml(), tika_json(), tika_text(), tika_html().
rtika 0.1.2 (2018-01-30)
- for Windows users, the curl package is recommended to prevent base R download.file from corrupting files.
rtika 0.1.1 (2018-01-23)
- allows the user to input the URLs and file paths of documents. URLs will be downloaded first to a temporary directory. The previous interface has been changed.
rtika 0.1.0 (2018-01-19)
- Initial release.
- R interface to Apache Tika batch processing CLI, found to be the most efficient CLI option.
- tika function returns processing results as a character vector.
- includes the Tika App .jar. Tika source is available at: https://github.com/apache/tika