Google Cloud Translation API
Mark Edmondson
2024-11-14
Source:vignettes/translation.Rmd
translation.Rmd
The Google Cloud Translation API provides a simple programmatic interface for translating an arbitrary string into any supported language. Translation API is highly responsive, so websites and applications can integrate with Translation API for fast, dynamic translation of source text from the source language to a target language (e.g. French to English).
Read more on the Google Cloud Translation Website
You can detect the language via gl_translate_detect
, or
translate and detect language via gl_translate
Language Translation
Translate text via gl_translate
. Note this is a lot more
refined than the free version on Google’s translation website.
library(googleLanguageR)
text <- "to administer medicince to animals is frequently a very difficult matter, and yet sometimes it's necessary to do so"
## translate British into Danish
gl_translate(text, target = "da")$translatedText
You can choose the target language via the argument
target
. The function will automatically detect the language
if you do not define an argument source
. This function
which will also detect the langauge. As it costs the same as
gl_translate_detect
, its usually cheaper to detect and
translate in one step.
You can pass a vector of text which will first be attempted to translate in one API call - if that fails due to being greater than the API limits, it will attempt again but vectorising the API calls. This will result in more calls and be slower, but cost the same as you are charged per character translated, not per API call.
HTML support
You can also supply web HTML and select the
format='html'
which will handle HTML tags to give you a
cleaner translation.
Consider removing anything not needed to be translated first, such as
JavaScript and CSS scripts using the tools of rvest
- an
example is shown below:
# translate webpages
library(rvest)
library(googleLanguageR)
my_url <- "http://www.dr.dk/nyheder/indland/greenpeace-facebook-og-google-boer-foelge-apples-groenne-planer"
## in this case the content to translate is in css select .wcms-article-content
read_html(my_url) %>% # read html
html_node(css = ".wcms-article-content") %>% # select article content
html_text %>% # extract text
gl_translate(format = "html") %>% # translate with html flag
dplyr::select(translatedText) # show translatedText column of output tibble
Language Detection
This function only detects the language:
## which language is this?
gl_translate_detect("katten sidder på måtten")
The more text it has, the better. And it helps if its not Danish…
It may be better to use cld2
to
translate offline first, to avoid charges if the translation is
unnecessary (e.g. already in English). You could then verify online for
more uncertain cases.
cld2::detect_language("katten sidder på måtten")
Translation API limits
The API limits in three ways: characters per day, characters per 100
seconds, and API requests per 100 seconds. All can be set in the API
manager in Google Cloud console:
https://console.developers.google.com/apis/api/translate.googleapis.com/quotas
The library will limit the API calls for the characters and API
requests per 100 seconds. The API will automatically retry if you are
making requests too quickly, and also pause to make sure you only send
100000
characters per 100 seconds.