Introduction to opencage
Forward and Reverse Geocoding
Daniel Possenriede, Jesse Sadler, Maëlle Salmon
2022-09-05
Source:vignettes/opencage.Rmd
opencage.Rmd
Geocoding is the process of converting place names or addresses into geographic coordinates – like latitude and longitude – or vice versa. With {opencage} you can geocode using the OpenCage API, either from place name to longitude and latitude (forward geocoding) or from longitude and latitude to the name and address of a location (reverse geocoding).
This vignette covers the setup process for working with {opencage} and basic workflows for both forward and reverse geocoding. Make sure to also check out “Customise your query” if you want a deeper dive into customizing the various parameters available through the OpenCage API. The “Output options” vignette shows additional workflows by modifying the form in which the geocoding results are returned.
Setup
Before you can use the {opencage} package and query the OpenCage API you need to first register with OpenCage. Additionally, you may want to set a rate limit (if you have a paid OpenCage plan), and you might want to prevent OpenCage from storing the content of your queries. In other words, you need to setup {opencage}, so let’s go through the process.
Authentication
To use the package and authenticate yourself with the OpenCage API, you will need to register at opencagedata.com/users/sign_up to get an API key. The “Free Trial” plan provides up to 2,500 API requests a day. There are paid plans available, if you need to run more API requests. After you have registered, you can generate an API key with the OpenCage dashboard.
Now we need to ensure that the functions in {opencage} can access
your API key. {opencage} will conveniently retrieve your API key if it
is saved in the environment variable "OPENCAGE_KEY"
. If it
is not, oc_config()
will help to set that environment
variable.
Do not pass the key directly as a parameter to the function. Doing so risks exposing your API key via your script or your history. There are three safer ways to set your API key instead:
Save your API key as an environment variable in
.Renviron
as described in What They Forgot to Teach You About R or Efficient R Programming. From there it will be fetched by all functions that call the OpenCage API. You do not even have to calloc_config()
to set your key; you can start geocoding right away. If you have the {usethis} package installed, you can edit your.Renviron
withusethis::edit_r_environ()
. We strongly recommend storing your API key in the user-level.Renviron
, as opposed to the project-level. This makes it less likely you will share sensitive information by mistake.If you use a package like {keyring} to store your credentials, you can safely pass your key in a script with a function call like
oc_config(key = keyring::key_get("opencage"))
.If you call
oc_config()
in aninteractive()
session and theOPENCAGE_KEY
environment variable is not set, it will prompt you to enter the key in the console.
Whatever method you choose, keep your API key secret. OpenCage also features best practices for keeping your API key safe.
Rate limit
A rate limit is used to control the rate of requests sent, so legitimate requests do not lead to an unintended Denial of Service attack. The rate limit allowed by the API depends on the OpenCage plan you have and ranges from 1 request/sec for the “Free Trial” plan to 40 requests/sec for “Large” plan. See opencagedata.com/pricing for details and up-to-date information.
If you have a “Free Trial” account with OpenCage, you can skip to the next section, because the rate limit is already set correctly for you at 1 request/sec.
If you have a paid account, you can set the rate limit for the active
R session with oc_config(rate_sec = n)
where n
is the appropriate rate limit. You can set the rate limit persistently
across sessions by setting an oc_rate_sec
option in your
.Rprofile
. If you have the {usethis} package installed, you can
edit your .Rprofile
with
usethis::edit_r_profile()
.
Privacy
By default, OpenCage will store your queries on its server logs and will cache the forward geocoding requests on their side. They do this to speed up response times and to be able to debug errors and improve their service. Logs are automatically deleted after six months according to OpenCage’s page on data protection and GDPR.
If you have concerns about privacy and want OpenCage to have no
record of your query, i.e. the place name or latitude and longitude
coordinates you want to geocode, you can set a no_record
parameter to TRUE
, which tells the API to not log nor cache
the queries. OpenCage still records that you made a request, but not the
specific queries you made.
oc_config(no_record = TRUE)
sets an
oc_no_record
option for the active R session, so it will be
used for all subsequent OpenCage queries. You can set the
oc_no_record
option persistently across sessions in your
.Rprofile
.
For more information on OpenCage’s policies on privacy and data
protection see the Legal section in their FAQs, their GDPR page, and, for the
no_record
parameter specifically, see the relevant blog
post.
For increased privacy, {opencage} sets no_record
to
TRUE
, by default. Please note, however, that {opencage}
always caches the data it receives from the OpenCage API locally, but
only for as long as your R session is alive (see below).
(Don’t) show API key
oc_config()
has another argument, show_key
.
This is only used for debugging and we will explain it in more detail in
vignette("output_options")
. For now suffice it to say that
your OpenCage API key will not be shown in any {opencage} output, unless
you change this setting.
Altogether now
In sum, if you want to set your API key with {keyring}, set the rate limit to 10 (only do this if you have a paid account, please!), and do not want OpenCage to have records of your queries, you would configure {opencage} for the active session like this:
library("opencage")
oc_config(
key = keyring::key_get("opencage"),
rate_sec = 10,
no_record = TRUE
)
Forward geocoding
Now you can start to geocode. Forward geocoding is from location name(s) to latitude and longitude tuple(s).
oc_forward_df(placename = "Sarzeau")
#> # A tibble: 1 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Sarzeau 47.5 -2.76 56370 Sarzeau, France
All geocoding functions are vectorised, i.e. you can geocode multiple locations with one function call. Note that behind the scenes the requests are still sent to the API one-by-one.
opera <- c("Palacio de Bellas Artes", "Scala", "Sydney Opera House")
oc_forward_df(placename = opera)
#> # A tibble: 3 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Palacio de Bellas Artes 19.4 -99.1 Palacio de Bellas Artes, Avenida Juárez, Centro Urbano, 06050, CMX, Mexico
#> 2 Scala 40.7 14.6 Scala, Salerno, Italy
#> 3 Sydney Opera House -33.9 151. Sydney Opera House, 2 Macquarie Street, Sydney NSW 2000, Australia
By default, oc_forward_df()
only returns three results
columns: oc_lat
(for latitude), oc_lon
(for
longitude), and oc_formatted
(the formatted address). As
you can see, the results columns are all prefixed with oc_
.
If you specify oc_forward_df(output = all)
, you will
receive all result columns, which are often quite extensive. Which
columns you receive exactly depends on the information OpenCage returns
for each specific request.
oc_forward_df(placename = opera, output = "all")
#> # A tibble: 3 × 31
#> placename oc_lat oc_lng oc_co…¹ oc_fo…² oc_no…³ oc_no…⁴ oc_so…⁵ oc_so…⁶ oc_is…⁷ oc_is…⁸ oc_is…⁹ oc_ca…˟ oc_type oc_co…˟
#> <chr> <dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <list> <chr> <chr> <chr>
#> 1 Palacio de B… 19.4 -99.1 9 Palaci… 19.4 -99.1 19.4 -99.1 MX MEX <chr> outdoo… museum North …
#> 2 Scala 40.7 14.6 7 Scala,… 40.7 14.6 40.6 14.6 IT ITA <chr> place city Europe
#> 3 Sydney Opera… -33.9 151. 9 Sydney… -33.9 151. -33.9 151. AU AUS <chr> outdoo… arts_c… Oceania
#> # … with 16 more variables: oc_country <chr>, oc_country_code <chr>, oc_museum <chr>, oc_neighbourhood <chr>,
#> # oc_postcode <chr>, oc_road <chr>, oc_state <chr>, oc_state_code <chr>, oc_city <chr>, oc_county <chr>,
#> # oc_county_code <chr>, oc_political_union <chr>, oc_arts_centre <chr>, oc_house_number <chr>, oc_municipality <chr>,
#> # oc_suburb <chr>, and abbreviated variable names ¹oc_confidence, ²oc_formatted, ³oc_northeast_lat, ⁴oc_northeast_lng,
#> # ⁵oc_southwest_lat, ⁶oc_southwest_lng, ⁷oc_iso_3166_1_alpha_2, ⁸oc_iso_3166_1_alpha_3, ⁹oc_iso_3166_2, ˟oc_category,
#> # ˟oc_continent
You can also pass a data frame to oc_forward_df()
. By
default the results columns are added to the input data frame, which is
useful for keeping information associated with the place names that are
in separate columns. If you want a data frame with only the geocoding
results, set bind_cols = FALSE
.
concert_df <-
data.frame(location = c("Elbphilharmonie", "Concertgebouw", "Suntory Hall"))
oc_forward_df(data = concert_df, placename = location)
#> # A tibble: 3 × 4
#> location oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Elbphilharmonie 53.5 9.98 Elbe Philharmonic Hall, Platz der Deutschen Einheit 1, 20457 Hamburg, Germany
#> 2 Concertgebouw 52.4 4.88 Concertgebouw, Concertgebouwplein 2, 1071 LN Amsterdam, Netherlands
#> 3 Suntory Hall 35.7 140. Suntory Hall, Karayan Plaza, Azabu, Minato, 107-6090, Japan
You can use it in a piped workflow as well.
library(dplyr, warn.conflicts = FALSE)
concert_df %>% oc_forward_df(location)
#> # A tibble: 3 × 4
#> location oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Elbphilharmonie 53.5 9.98 Elbe Philharmonic Hall, Platz der Deutschen Einheit 1, 20457 Hamburg, Germany
#> 2 Concertgebouw 52.4 4.88 Concertgebouw, Concertgebouwplein 2, 1071 LN Amsterdam, Netherlands
#> 3 Suntory Hall 35.7 140. Suntory Hall, Karayan Plaza, Azabu, Minato, 107-6090, Japan
Reverse geocoding
Reverse geocoding works in the opposite direction of forward geocoding: from a pair of coordinates to the name and address most appropriate for the coordinates.
oc_reverse_df(latitude = 51.5034070, longitude = -0.1275920)
#> # A tibble: 1 × 3
#> latitude longitude oc_formatted
#> <dbl> <dbl> <chr>
#> 1 51.5 -0.128 10 Downing Street, London, SW1A 2AA, United Kingdom
Note that all coordinates sent to the OpenCage API must adhere to the WGS 84 (also known as EPSG:4326) coordinate reference system in decimal format. This is the coordinate reference system used by the Global Positioning System. There is usually no reason to send more than six or seven digits past the decimal. Any further precision gets to the level of a centimeter.
Like oc_forward_df()
, oc_reverse_df()
is
vectorised, can work with numeric vectors and data frames, supports the
output = "all"
argument and can be used with the {magrittr}
pipe.
OpenCage only returns at most one result per reverse geocoding request.
Caching
OpenCage allows and supports caching. To minimize the number of requests sent to the API {opencage} uses {memoise} to cache results inside the active R session.
system.time(oc_reverse(latitude = 10, longitude = 10))
#> user system elapsed
#> 0.00 0.00 0.96
system.time(oc_reverse(latitude = 10, longitude = 10))
#> user system elapsed
#> 0.01 0.00 0.02
To clear the cache of all results either start a new R session or
call oc_clear_cache()
.
oc_clear_cache()
#> [1] TRUE
system.time(oc_reverse(latitude = 10, longitude = 10))
#> user system elapsed
#> 0.01 0.00 0.91
As you probably know, cache invalidation is one of the harder things to do in computer science. Therefore {opencage} only supports invalidating the whole cache and not individual records at the moment.
The underlying data at OpenCage is updated daily.
Further information
OpenCage supports a lot of parameters to either target your search area more specifically or to specify what additional information you need. See the “Customise your query” vignette for details.
Besides oc_forward_df()
and
oc_reverse_df()
, which always return a single tibble,
{opencage} has two sibling functions — oc_forward()
and
oc_reverse()
— which can be used to return types of output.
Depending on what you specify as the return
parameter,
oc_forward()
and oc_reverse()
will return
either a list of tibbles (df_list
, the default), JSON lists
(json_list
), GeoJSON lists (geojson_list
), or
the URL with which the API would be called (url_only
).
Learn more in the “Output options”
vignette.
Please report any issues or bugs on our GitHub repository and post questions on discuss.ropensci.org.