Count the number of words in a string.
Details
This function estimates the number of words in strings. Words are first separated using break_pattern
.
Then the resulting character vector elements are counted, including only those that are matched by word_pattern
.
The approach taken is meant to be simple and flexible.
epub
uses this function internally to estimate the number of words for each e-book section alongside the use of nchar
for counting individual characters.
It can be used directly on character strings and is convenient for applying with different regular expression pattern arguments as needed.
These two arguments are provided for control, but the defaults are likely good enough. By default, strings are split only on spaces and new line characters. The "words" that are counted in the resulting vector are those that contain any alphanumeric characters or the ampersand. This means for example that hyphenated words, acronyms and numbers displayed with digits, are all counted as words. The presence of any other characters does not negate that a word has been found.