Package index
-
tokenize_characters()
tokenize_words()
tokenize_sentences()
tokenize_lines()
tokenize_paragraphs()
tokenize_regex()
- Basic tokenizers
-
chunk_text()
- Chunk text into smaller segments
-
mobydick
- The text of Moby Dick
-
tokenize_ngrams()
tokenize_skip_ngrams()
- N-gram tokenizers
-
tokenize_ptb()
- Penn Treebank Tokenizer
-
tokenize_character_shingles()
- Character shingle tokenizers
-
tokenize_word_stems()
- Word stem tokenizer
-
tokenizers-package
tokenizers
- Tokenizers
-
count_words()
count_characters()
count_sentences()
- Count words, sentences, characters