Skip to contents

High-Performance Stemmer, Tokenizer, and Spell Checker for R

Project Status: Active – The project has reached a stable, usable state and is being actively developed. CRAN_Status_Badge CRAN RStudio mirror downloads

Low level spell checker and morphological analyzer based on the famous hunspell library https://hunspell.github.io. The package can analyze or check individual words as well as tokenize text, latex, html or xml documents. For a more user-friendly interface use the ‘spelling’ package which builds on this package with utilities to automate checking of files, documentation and vignettes in all common formats.

Installation

This package includes a bundled version of libhunspell and no longer depends on external system libraries:

install.packages("hunspell")

Documentation

About the R package: - Blog post: Hunspell: Spell Checker and Text Parser for R - Blog post: Stemming and Spell Checking in R

Hello World

# Check individual words
words <- c("beer", "wiskey", "wine")
correct <- hunspell_check(words)
print(correct)

# Find suggestions for incorrect words
hunspell_suggest(words[!correct])

# Extract incorrect from a piece of text
bad <- hunspell("spell checkers are not neccessairy for langauge ninja's")
print(bad[[1]])
hunspell_suggest(bad[[1]])

# Stemming
words <- c("love", "loving", "lovingly", "loved", "lover", "lovely", "love")
hunspell_stem(words)
hunspell_analyze(words)

The spelling package uses this package to spell R package documentation:

# Spell check a package
library(spelling)
spell_check_package("~/mypackage")