tif 0.3.0
- Further discussion has lead us to simplify the corpus and token data frame formats. The doc_id, text, and token columns can be in any position within the data frame.
tif 0.2.0
- After a round of input for the initial version of the specification, we decided to allow two formats for corpus and tokens objects. In addition to the original data frame variants there is a character vector corpus object and a list-based tokens object. Converts between the various types are now included in the package.
New Functions
tif_is_corpus_characterreturns TRUE or FALSE for whether the input is a valid character vector corpus object.tif_is_tokens_listreturns TRUE or FALSE for whether the input is a valid list-based tokens object.tif_as_corpus_charactertakes a valid tif corpus object and returns a character vector corpus object.tif_as_corpus_dftakes a valid tif corpus object and returns a data frame corpus object.tif_as_tokens_charactertakes a valid tif tokens object and returns a list-based tokens object.tif_as_tokens_dftakes a valid tif tokens object and returns a list-based tokens object.
tif 0.1.0
- This is the initial implementation of the ideas discussed at the rOpenSci Text Workshop from 21-22 April 2017.
