Changelog
Source:NEWS.md
tif 0.3.0
- Further discussion has lead us to simplify the corpus and token data frame formats. The doc_id, text, and token columns can be in any position within the data frame.
tif 0.2.0
- After a round of input for the initial version of the specification, we decided to allow two formats for corpus and tokens objects. In addition to the original data frame variants there is a character vector corpus object and a list-based tokens object. Converts between the various types are now included in the package.
New Functions
tif_is_corpus_character
returns TRUE or FALSE for whether the input is a valid character vector corpus object.tif_is_tokens_list
returns TRUE or FALSE for whether the input is a valid list-based tokens object.tif_as_corpus_character
takes a valid tif corpus object and returns a character vector corpus object.tif_as_corpus_df
takes a valid tif corpus object and returns a data frame corpus object.tif_as_tokens_character
takes a valid tif tokens object and returns a list-based tokens object.tif_as_tokens_df
takes a valid tif tokens object and returns a list-based tokens object.
tif 0.1.0
- This is the initial implementation of the ideas discussed at the rOpenSci Text Workshop from 21-22 April 2017.