Customise your query
Get more and better results from OpenCage
Daniel Possenriede, Jesse Sadler, Maëlle Salmon
2023-01-10
Source:vignettes/customise_query.Rmd
customise_query.Rmd
Geocoding is surprisingly hard. Address formats and spellings differ in and between countries; administrative areas on different levels intersect; names, numbers, and boundaries change over time — you name it. The OpenCage API, therefore, supports about a dozen parameters to customise queries. This vignette explains how to use the query parameters with {opencage} to get better geocoding results.
Multiple results
Forward geocoding typically returns multiple results because many places have the same or similar names.
By default oc_forward_df()
only returns one result: the
one defined as the best result by the OpenCage API. To receive more
results, modify the limit
argument, which specifies the
maximum number of results that should be returned. Integer values
between 1 and 100 are allowed.
oc_forward_df("Berlin")
#> # A tibble: 1 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Berlin 52.5 13.4 Berlin, Germany
oc_forward_df("Berlin", limit = 5)
#> # A tibble: 5 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Berlin 52.5 13.4 Berlin, Germany
#> 2 Berlin 44.5 -71.2 Berlin, NH 03570, United States of America
#> 3 Berlin 52.5 13.4 Berlin Ostbahnhof, Mitteltunnel, 10243 Berlin, Germany
#> 4 Berlin 39.8 -89.9 Berlin, Sangamon County, Illinois, United States of America
#> 5 Berlin 41.6 -72.7 Berlin, Connecticut, United States of America
Reverse geocoding only returns at most one result.
Therefore, oc_reverse_df()
does not support the
limit
argument.
OpenCage may sometimes have more than one record of one place.
Duplicated records are not returned by default. If you set the
no_dedupe
argument to TRUE
, you will receive
duplicated results when available.
oc_forward_df("Berlin", limit = 5, no_dedupe = TRUE)
#> # A tibble: 5 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Berlin 52.5 13.4 Berlin, Germany
#> 2 Berlin 44.5 -71.2 Berlin, NH 03570, United States of America
#> 3 Berlin 52.5 13.4 Berlin Ostbahnhof, Mitteltunnel, 10243 Berlin, Germany
#> 4 Berlin 39.8 -89.9 Berlin, Sangamon County, Illinois, United States of America
#> 5 Berlin 41.6 -72.7 Berlin, Connecticut, United States of America
Better targeted results
As you can see, place names are often ambiguous. Happily, the
OpenCage API has tools to deal with this problem. The
countrycode
, bounds
, and
proximity
arguments can make the query more precise.
min_confidence
lets you limit the results to those with a
specified confidence score (which is not necessarily the “best” or most
“relevant” result, though). These parameters are only relevant and
available for forward geocoding.
countrycode
The countrycode
parameter restricts the results to the
given country. The country code is a two letter code as defined by the
ISO 3166-1
Alpha 2 standard. E.g. “AR” for Argentina, “FR” for France, and “NZ”
for the New Zealand.
oc_forward_df(placename = "Paris", countrycode = "US", limit = 5)
#> # A tibble: 5 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Paris 33.7 -95.6 Paris, Texas, United States of America
#> 2 Paris 38.2 -84.3 Paris, KY 40361, United States of America
#> 3 Paris 36.3 -88.3 Paris, Tennessee, United States of America
#> 4 Paris 39.6 -87.7 Paris, IL 61944, United States of America
#> 5 Paris 44.3 -70.5 Paris, 04281, United States of America
Multiple countrycodes per placename
must be wrapped in a
list. Here is an example with places called “Paris” in Italy and
Portugal.
oc_forward_df(placename = "Paris", countrycode = list(c("IT", "PT")), limit = 5)
#> # A tibble: 5 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Paris 44.6 7.28 Brossasco, Cuneo, Italy
#> 2 Paris 46.5 10.4 23030 Valfurva SO, Italy
#> 3 Paris 37.4 -8.79 8670-320 São Teotónio, Portugal
#> 4 Paris 43.5 12.1 Paris, 52035 Monterchi AR, Italy
#> 5 Paris 43.8 11.3 Paris, Via dei Banchi, 50123 Florence FI, Italy
Despite the name, country codes also exist for territories that are not independent states, e.g. Gibraltar (“GI”), Greenland (“GL”), Guadaloupe (“GP”), or Guam (“GU”). You can look up specific country codes with the {ISOcodes} or {countrycodes} packages or on the ISO or Wikipedia webpages. In fact, you can also look up country codes via OpenCage as well. If you were interested in the country code of Curaçao for example, you could run:
oc_forward_df("Curaçao", no_annotations = FALSE)["oc_iso_3166_1_alpha_2"]
#> # A tibble: 1 × 1
#> oc_iso_3166_1_alpha_2
#> <chr>
#> 1 CW
bounds
The bounds
parameter restricts the possible results to a
defined bounding
box. A bounding box is a named numeric vector with four coordinates
specifying its south-west and north-east corners:
(xmin, ymin, xmax, ymax)
. The bounds parameter can most
easily be specified with the oc_bbox()
helper. For example,
bounds = oc_bbox(-0.56, 51.28, 0.27, 51.68)
. OpenCage
provides a ‘bounds-finder’ to
interactively determine bounds values.
Below is an example of the use of bounds
where the
bounding box specifies the the South American continent.
oc_forward_df(placename = "Paris", bounds = oc_bbox(-97, -56, -32, 12), limit = 5)
#> # A tibble: 5 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Paris 8.05 -80.6 Paris, Distrito Parita, Panama
#> 2 Paris -6.71 -69.9 Eirunepé, Região Geográfica Intermediária de Tefé, Brazil
#> 3 Paris -3.99 -79.2 110105, Loja, Ecuador
#> 4 Paris -13.5 -62.5 Canton Motegua, Municipio Baures, Provincia de Iténez, Bolivia
#> 5 Paris -23.5 -47.5 Paris, Jardim Santa Fé, Sorocaba - SP, Brazil
Again, you can also use {opencage} to determine a bounding box for subsequent queries. If you wanted to see how many Plaça d’Espanya there are on the Balearic Islands, for example, you could find the appropriate bounding box and then search for the squares:
hi <- oc_forward_df(placename = "Balearic Islands", no_annotations = FALSE)
hi_bbox <-
oc_bbox(
hi$oc_southwest_lng,
hi$oc_southwest_lat,
hi$oc_northeast_lng,
hi$oc_northeast_lat
)
oc_forward_df(placename = "Plaça d'Espanya", bounds = hi_bbox, limit = 20)
#> # A tibble: 16 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Plaça d'Espanya 39.6 2.65 Plaça d'Espanya, Carrer d'Eusebi Estada, 07005 Palma, Spain
#> 2 Plaça d'Espanya 39.6 2.65 Plaça d'Espanya, Canavall, Palma, Balearic Islands, Spain
#> 3 Plaça d'Espanya 39.0 1.53 Plaça d'Espanya, Santa Eulària des Riu, Balearic Islands, Spain
#> 4 Plaça d'Espanya 39.0 1.30 Plaça d'Espanya, 07820 Sant Antoni de Portmany, Spain
#> 5 Plaça d'Espanya 39.9 4.27 Plaça d'Espanya, Maó, Spain
#> 6 Plaça d'Espanya 39.5 3.15 Plaça d'Espanya, 07200 Felanich, Spain
#> 7 Plaça d'Espanya 39.0 1.53 Plaça d'Espanya, 07840 Santa Eulària des Riu, Spain
#> 8 Plaça d'Espanya 39.6 2.65 Plaça d'Espanya, 07002 Palma, Spain
#> 9 Plaça d'Espanya 39.9 4.27 Plaça d'Espanya, 07701 Maó, Spain
#> 10 Plaça d'Espanya 38.9 1.44 Plaça d'Espanya, Ibiza, Spain
#> 11 Plaça d'Espanya 39.8 2.72 Plaça d'Espanya, Sóller, Spain
#> 12 Plaça d'Espanya 39.7 2.91 Plaça d'Espanya, Inca, Spain
#> 13 Plaça d'Espanya 39.6 2.42 plaça d'Espanya, Andratx, Spain
#> 14 Plaça d'Espanya 39.6 2.75 Plaça d'Espanya, Marratxí, Spain
#> 15 Plaça d'Espanya 39.6 2.90 Plaça d'Espanya, 07140 Sencelles, Spain
#> 16 Plaça d'Espanya 39.8 2.74 Plaça d'Espanya, Fornalutx, Spain
Note that OpenCage does not support point-of-interest or feature search, like “show me all bus stops in this area”. If you are more interested in these kind of features, you might want to take a look at the {osmdata} package.
proximity
The proximity
parameter provides OpenCage with a hint to
bias results in favour of those closer to the specified location. It is
just one of many factors used for ranking results, however, and (some)
results may be far away from the location or point passed to the
proximity
parameter. A point is a named numeric vector of a
latitude and longitude coordinate pair in decimal format. The
proximity
parameter can most easily be specified with the
oc_points()
helper. For example,
proximity = oc_point(38.0, -84.5)
, if you happen to already
know the coordinates. If not, you can also look them up with {opencage},
of course:
lx <- oc_forward_df("Lexington, Kentucky")
lx_point <- oc_points(lx$oc_lat, lx$oc_lng)
oc_forward_df(placename = "Paris", proximity = lx_point, limit = 5)
#> # A tibble: 5 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Paris 38.2 -84.3 Paris, KY 40361, United States of America
#> 2 Paris 48.9 2.32 Paris, Ile-de-France, France
#> 3 Paris 39.6 -87.7 Paris, IL 61944, United States of America
#> 4 Paris 38.8 -85.6 Paris, Jennings County, IN 47230, United States of America
#> 5 Paris 33.7 -95.6 Paris, Texas, United States of America
Note that the French capital is listed before other places in the US,
which are closer to the point provided. This illustrates how
proximity
is only one of many factors influencing the
ranking of results.
Confidence
min_confidence
— an integer value between 0 and 10 —
indicates the precision of the returned result as defined by its
geographical extent, i.e. by the extent of the result’s bounding box.
When you specify min_confidence
, only results with at least
the requested confidence will be returned. Thus, in the following
example, the French capital is too large to be returned.
oc_forward_df(placename = "Paris", min_confidence = 7, limit = 5)
#> # A tibble: 5 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Paris 38.2 -84.3 Paris, KY 40361, United States of America
#> 2 Paris 36.3 -88.3 Paris, Tennessee, United States of America
#> 3 Paris 39.6 -87.7 Paris, IL 61944, United States of America
#> 4 Paris 35.3 -93.7 Paris, Logan County, AR 72855, United States of America
#> 5 Paris 43.0 -75.3 Paris, Oneida County, New York, United States of America
Note that confidence is not used for the ranking of results. It does not tell you which result is more “correct” or “relevant”, nor what type of thing the result is, but rather how small a result is, geographically speaking. See the API documentation for details.
Retrieve more information from the API
Besides parameters to target your search better, OpenCage offers parameters to receive more or specific types of information from the API.
language
If you would like to get your results in a specific language, you can
pass an IETF
BCP 47 language tag, such as “tr” for Turkish or “pt-BR” for
Brazilian Portuguese, to the language
parameter. OpenCage
will attempt to return results in that language.
oc_forward_df(placename = "Munich", language = "tr")
#> # A tibble: 1 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Munich 48.1 11.6 Münih, Bavyera, Almanya
Alternatively, you can specify the “native” tag, in which case OpenCage will attempt to return the response in the “official” language(s) of the location. Keep in mind, however, that some countries have more than one official language or that the official language may not be the one actually used day-to-day.
oc_forward_df(placename = "Munich", language = "native")
#> # A tibble: 1 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Munich 48.1 11.6 München, Bayern, Deutschland
If the language
parameter is set to NULL
(which is the default), the tag is not recognized, or OpenCage does not
have a record in that language, the results will be returned in
English.
oc_forward_df(placename = "München")
#> # A tibble: 1 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 München 48.1 11.6 Munich, Bavaria, Germany
To find the correct language tag for your desired language, you can
search for the language on the BCP47 language subtag
lookup for example. Note however, that there are some language tags
in use on OpenStreetMap, one of OpenCage’s main sources, that do not
conform with the IETF BCP 47 standard. For example, OSM uses zh_pinyin
instead of zh-Latn-pinyin
for Hanyu Pinyin. It might,
therefore, be helpful to consult the details page of the target country
on openstreetmap.org to see which language tags are actually used. In
any case, neither the OpenCage API nor the functions in this package
will validate the language tags you provide.
For further details, see OpenCage’s API documentation.
Annotations
OpenCage supplies additional information about the result location in what it calls annotations. Annotations include, among a variety of other types of information, country information, time of sunset and sunrise, UN M49 codes or the location in different geocoding formats, like Maidenhead, Mercator projection (EPSG:3857), geohash or what3words. Some annotations, like the Irish Transverse Mercator (ITM, EPSG:2157) or the Federal Information Processing Standards (FIPS) code will only be shown when appropriate.
Whether the annotations are shown, is controlled by the
no_annotations
argument. It is TRUE
by
default, which means that the output will not contain
annotations. (Yes, inverted argument names are confusing, but we just
follow OpenCage’s lead here.) When you set no_annotations
to FALSE
, all columns are returned
(i.e. output
is implicitly set to "all"
). This
leads to a result with a lot of columns.
oc_forward_df("Dublin", no_annotations = FALSE)
#> # A tibble: 1 × 70
#> placen…¹ oc_lat oc_lng oc_co…² oc_fo…³ oc_mgrs oc_ma…⁴ oc_ca…⁵ oc_flag oc_ge…⁶ oc_qi…⁷ oc_wi…⁸ oc_dm…⁹ oc_dm…˟ oc_it…˟ oc_it…˟
#> <chr> <dbl> <dbl> <int> <chr> <chr> <chr> <int> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
#> 1 Dublin 53.3 -6.26 5 Dublin… 29UPV8… IO63ui… 353 "\U000… gc7x98… 114. Q1761 53° 20… 6° 15'… 715826… 734697…
#> # … with 54 more variables: oc_mercator_x <dbl>, oc_mercator_y <dbl>, oc_osm_edit_url <chr>, oc_osm_note_url <chr>,
#> # oc_osm_url <chr>, oc_un_m49_statistical_groupings <list>, oc_un_m49_regions_europe <chr>, oc_un_m49_regions_ie <chr>,
#> # oc_un_m49_regions_northern_europe <chr>, oc_un_m49_regions_world <chr>, oc_currency_alternate_symbols <list>,
#> # oc_currency_decimal_mark <chr>, oc_currency_html_entity <chr>, oc_currency_iso_code <chr>, oc_currency_iso_numeric <chr>,
#> # oc_currency_name <chr>, oc_currency_smallest_denomination <int>, oc_currency_subunit <chr>,
#> # oc_currency_subunit_to_unit <int>, oc_currency_symbol <chr>, oc_currency_symbol_first <int>,
#> # oc_currency_thousands_separator <chr>, oc_roadinfo_drive_on <chr>, oc_roadinfo_speed_in <chr>, …
roadinfo
roadinfo
indicates whether the geocoder should attempt
to match the nearest road (rather than an address) and provide
additional road and driving information. It is FALSE
by
default, which means OpenCage will not attempt to match the nearest
road. Some road and driving information is nevertheless provided as part
of the annotations (see above), even when roadinfo
is set
to FALSE
.
oc_forward_df(placename = c("Europa Advance Rd", "Bovoni Rd"), roadinfo = TRUE)
#> # A tibble: 2 × 30
#> placen…¹ oc_lat oc_lng oc_co…² oc_fo…³ oc_ro…⁴ oc_ro…⁵ oc_ro…⁶ oc_ro…⁷ oc_ro…⁸ oc_ro…⁹ oc_no…˟ oc_no…˟ oc_so…˟ oc_so…˟ oc_is…˟
#> <chr> <dbl> <dbl> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Europa … 36.1 -5.34 9 Europa… right yes Europa… second… km/h asphalt 36.1 -5.34 36.1 -5.35 GI
#> 2 Bovoni … 18.3 -64.9 8 Bovoni… left <NA> Bovoni… primary mph <NA> 18.3 -64.9 18.3 -64.9 VI
#> # … with 14 more variables: oc_iso_3166_1_alpha_3 <chr>, oc_category <chr>, oc_type <chr>, oc_city <chr>, oc_continent <chr>,
#> # oc_country <chr>, oc_country_code <chr>, oc_postcode <chr>, oc_road <chr>, oc_road_type <chr>, oc_iso_3166_2 <list>,
#> # oc_county <chr>, oc_state <chr>, oc_state_code <chr>, and abbreviated variable names ¹placename, ²oc_confidence,
#> # ³oc_formatted, ⁴oc_roadinfo_drive_on, ⁵oc_roadinfo_oneway, ⁶oc_roadinfo_road, ⁷oc_roadinfo_road_type,
#> # ⁸oc_roadinfo_speed_in, ⁹oc_roadinfo_surface, ˟oc_northeast_lat, ˟oc_northeast_lng, ˟oc_southwest_lat, ˟oc_southwest_lng,
#> # ˟oc_iso_3166_1_alpha_2
A blog post provides more details.
Abbreviated addresses
The geocoding functions also have an abbr
parameter,
which is FALSE
by default. When it is TRUE
,
the addresses in the formatted
field of the results are
abbreviated (e.g. “Main St.” instead of “Main Street”).
oc_forward_df("Wall Street")
#> # A tibble: 1 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Wall Street 40.7 -74.0 Wall Street, New York, NY 10005, United States of America
oc_forward_df("Wall Street", abbrv = TRUE)
#> # A tibble: 1 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 Wall Street 40.7 -74.0 Wall St, New York, NY 10005, USA
See this blog post for more information.
address_only
When address_only
is set to TRUE
(by
default FALSE
), OpenCage will attempt to exclude names of
points-of-interests from the formatted
field of the
results. In the following example, the POI “Hôtel de ville de Nantes”
(town hall of Nantes) is removed from the oc_formatted
column with address_only = TRUE
.
oc_reverse_df(47.21864, -1.55413)
#> # A tibble: 1 × 3
#> latitude longitude oc_formatted
#> <dbl> <dbl> <chr>
#> 1 47.2 -1.55 Hôtel de ville de Nantes, Place de l'Hôtel de Ville, 44000 Nantes, France
oc_reverse_df(47.21864, -1.55413, address_only = TRUE)
#> # A tibble: 1 × 3
#> latitude longitude oc_formatted
#> <dbl> <dbl> <chr>
#> 1 47.2 -1.55 Place de l'Hôtel de Ville, 44000 Nantes, France
Vectorised arguments
All of the function arguments mentioned above are vectorised, so you can send queries like this:
oc_forward_df(
placename = c("New York", "Rio", "Tokyo"),
language = c("es", "de", "fr")
)
#> # A tibble: 3 × 4
#> placename oc_lat oc_lng oc_formatted
#> <chr> <dbl> <dbl> <chr>
#> 1 New York 40.7 -74.0 Nueva York, Estados Unidos de América
#> 2 Rio -22.9 -43.2 Rio de Janeiro, Região Metropolitana do Rio de Janeiro, Brasilien
#> 3 Tokyo 35.7 140. Tokyo, Japon
Or geocode place names with country codes in a data frame:
for_df <-
data.frame(
location = c("Golden Gate Bridge", "Buckingham Palace", "Eiffel Tower"),
ccode = c("at", "cg", "be")
)
oc_forward_df(for_df, placename = location, countrycode = ccode)
#> # A tibble: 3 × 5
#> location ccode oc_lat oc_lng oc_formatted
#> <chr> <chr> <dbl> <dbl> <chr>
#> 1 Golden Gate Bridge at 47.6 15.8 Wiesenbauer, Martin's Golden Gate Bridge, 8684 Gemeinde Spital am Semmering, Austria
#> 2 Buckingham Palace cg -4.80 11.8 Buckingham Palace, Boulevard du Général Charles de Gaulle, Pointe-Noire, Congo-Brazzav…
#> 3 Eiffel Tower be 50.9 4.34 Eiffel Tower, Avenue de Bouchout - Boechoutlaan, 1020 Brussels, Belgium
This also works with oc_reverse_df()
, of course.
rev_df <-
data.frame(
lat = c(51.952659, 41.401372),
lon = c(7.632473, 2.128685)
)
oc_reverse_df(rev_df, lat, lon, language = "native")
#> # A tibble: 2 × 3
#> lat lon oc_formatted
#> <dbl> <dbl> <chr>
#> 1 52.0 7.63 Friedrich-Ebert-Straße 7, 48153 Münster, Deutschland
#> 2 41.4 2.13 Carrer de Calatrava, 68, 08017 Barcelona, España
Further information
For further information about the output and query parameters, see the OpenCage API docs and the OpenCage FAQ. When building queries, OpenCage’s best practices can be very useful, as well as their guide to geocoding accuracy.