The function detect_language() is vectorised and guesses the the language of each string in text or returns NA if the language could not reliably be determined. The function detect_language_multi() is not vectorised and analyses the entire character vector as a whole. The output includes the top 3 detected languages including the relative proportion and the total number of text bytes that was reliably classified.

detect_language(text, plain_text = TRUE, lang_code = TRUE)

detect_language_mixed(text, plain_text = TRUE)



a string with text to classify or a connection to read from


if FALSE then code skips HTML tags and expands HTML entities


return a language code instead of name


# Vectorized function text <- c("To be or not to be?", "Ce n'est pas grave.", "Nou breekt mijn klomp!") detect_language(text)
#> [1] "en" "fr" "nl"
if (FALSE) { # Read HTML from connection detect_language(url(''), plain_text = FALSE) # More detailed classification output detect_language_mixed( url(''), plain_text = FALSE) detect_language_mixed( url(''), plain_text = FALSE) }