
3. Translation of OSM to Simple Features
Mark Padgham
2023-03-27
Source:vignettes/osm-sf-translation.Rmd
osm-sf-translation.Rmd
1. OpenStreetMap Data Structure
OpenStreetMap (OSM) data has a unique structure that is not directly reconcilable with other modes of representing spatial data, notably including the widely-adopted Simple Features (SF) scheme of the Open Geospatial Consortium (OGC). The three primary spatial objects of OSM are:
nodes
, which are directly translatable to spatial pointsways
, which may be closed, in which case they form polygons, or unclosed, in which case they are (non-polygonal) lines.-
relations
which are higher-level objects used to specify relationships between collections of ways and nodes. While there are several recognised categories ofrelations
, in spatial terms these may be reduced to a binary distinction between:multipolygon
relations, which specify relationships between an exterior polygon (through designatingrole='outer'
) and possible inner polygons (role='inner'
). These may or may not be designated withtype=multipolygon
. Political boundaries, for example, often havetype=boundary
rather than explicittype-multipolygon
.osmdata
identifies multipolygons as thoserelation
objects having at least one member withrole=outer
orrole=inner
.In the absence of
inner
andouter
roles, an OSM relation is assumed to be non-polygonal, and to instead form a collection of non-enclosing lines.
2. Simple Features Data Structure
The representation of spatial objects as Simple Features is described at length by the OGC, with this document merely reviewing relevant aspects. The SF system assumes that spatial features can be represented in one of seven distinct primary classes, which by convention are referred to in all capital letters. Relevant classes for OSM data are:
- POINT
- MULTIPOINT
- LINESTRING
- MULTILINESTRING
- POLYGON
- MULTIPOLYGON
(The seventh primary class is GEOMETRYCOLLECTION
, which contains several objects with different geometries.) An SF (where that acronym may connote both singular and plural) consists of a sequence of spatial coordinates, which for OSM data are only ever XY
coordinates represented as strings enclosed within brackets. In addition to coordinate data and associated coordinate reference systems, an SF may include any number of additional data which quantify or qualify the feature of interest. In the sf
extension to R
, for example, a single SF is represented by one row of a data.frame
, with the geometry stored in a single column, and any number of other columns containing these additional data.
Simple Feature geometries are referred to in this vignette using all capital letters (such as POLYGON
), while OSM geometries use lower case (such as polygon
). Similarly, the Simple Features standard of the OGC is referred to as SF
, while the R
package of the same name is referred to as R::sf
–upper case R
followed by lower case sf
. Much functionality of R::sf
is determined by the underlying Geospatial Data Abstraction Library (GDAL
; described below). Representations of data are often discussed here with reference to GDAL/sf
, in which case it may always be assumed that the translation and representation of data are determined by GDAL
and not directly by the creators of R::sf
.
3. How osmdata
translates OSM into Simple Features
3.1. OSM Nodes
OSM nodes translate directly into SF::POINT
objects, with all OSM key-value
pairs stored in additional data.frame
columns.
3.2. OSM Ways
OSM ways may be either polygons or (non-polygonal) lines. osmdata
translates these into SF::LINESTRING
and SF::POLYGON
objects, respectively. Although polygonal and non-polygonal ways may have systematically different key
fields, they are conflated here to the single set of key
values common to all way
objects regardless of shape. This enables direct comparison and uniform operation on both SF::LINESTRING
and SF::POLYGON
objects.
3.3 OSM Relations
OSM relations comprising members with role=outer
or role=inner
are translated into SF::MULTIPOLYGON
objects; otherwise they form SF::MULTILINESTRING
objects. As in the preceding case of OSM ways, potentially systematic differences between OSM key
fields for multipolygon
and other relation
objects are ignored in favour of returning identical key
fields in both cases, whether or not value
fields for those key
s exist.
3.3(a) Multipolygon Relations
An OSM multipolygon is translated by osmdata
into a single SF::MULTIPOLYGON
object which has an additional column specifying num_members
. The SF
geometry thus consists of a list (an R::List
object) of this number of polygons, the first of which is the outer
polygon, with all subsequent members forming closed inner rings (either individually or in combination).
Each of these inner polygons are also represented as one or more OSM objects, which will generally include detailed data on the individual components not able to be represented in the single multipolygon representation. Each inner polygon is therefore additionally stored in the sf::MULTIPOLYGON
data.frame
along with all associated data. Thus the row containing a multipolygon of num_polygon
polygons is followed by num_polygon - 1
rows containing the data for each inner
polygon.
Note that OSM relation
objects generally have fewer (or different) key-value
pairs than do OSM way
objects. In the OSM system, data describing the detailed properties of the constituent ways of a given OSM relation are stored with those ways
rather than with the relation
. osmdata
follows this general principle, and stored the geometry of all ways
of a relation
with the relation itself (that is, as part of the MULTIPOLYGON
or MULTILINESTRING
object), while those ways are also stored themselves as LINESTRING
(or potentially POLYGON
) objects, from where their additional key-value
data may be accessed.
3.3(b) Multilinestring Relations
OSM relations
that are not multipolygons
are translated into SF::MULTILINESTRING
objects. Each member of any OSM relation is attributed a role
, which may be empty. osmdata
collates all ways within a relation according to their role
attributes. Thus, unlike multipolygon relations which are always translated into a single sf::MULTIPOLYGON
object, multilinestring relations are translated by omsdata
into potentially several sf::MULTILINESTRING
objects, one for each unique role.
This is particularly useful because relations
are often used to designated extended highways
(for example, designated bicycle routes or motorways), yet these often exist in primary
and alternative
forms, with these categories specified in roles. Separating these roles enables ready access to any desired role.
These multilinestring objects also have a column specifying num_members
, as for multipolygons, with the primary member followed by num_members
rows, one for each member of the multilinestring.
4. GDAL
Translation of OSM into Simple Features
The R
package sf
provide an R
implementation of Spatial Features, and provides a wrapper around GDAL for reading geospatial data. GDAL
provides a ‘driver’ to read OSM data, and thus sf
can also be used to read OSM
data in R
, as detailed in the main osmdata
vignette. However, the GDAL
translation of OSM data differs in several important ways from the osmdata
translation.
The primary difference is that GDAL only returns unique objects of each spatial (SF) type. Thus sf::POINT
objects consist of only those points that are not otherwise members of some ‘higher’ object (line
, polygon
, or relation
objects). Although a given set of OSM data may actually contain a great many points, attempting to load these with
sf::st_read (file, layer = 'points')
will generally return surprisingly few points.
4.1. OSM Nodes
Apart from the numerical difference arising through osmdata
returning an sf::POINTS
structure containing all nodes within a given set of OSM data, while sf::st_read (file, layer='points')
returns only those points not represented in other structure, the representation of points remains otherwise broadly similar. The only other major difference is that osmdata
retains all key-value
pairs present in a given set of OSM data, whereas GDAL/sf
only retains a select few of these. Moreover, the keys
returned by GDAL/sf
are pre-defined and invariant, meaning that data returned from sf::st_read (...)
may often contain key
columns in the resultant data.frame
which contain no (non-NA
) data. This difference is illustrated in an example repeated here from the
main osmdata
vignette, with the same principles applying to all of the following classes of OSM data.
The following three lines define a query and download the resultant data to an XML
file.
q <- opq (bbox = 'Trentham, Australia')
q <- add_osm_feature (q, key = 'name') # any named objects
osmdata_xml (q, 'trentham.osm')
These data may then be converted into SF representations using either R::sf
or osmdata
, with OSM keys
being the column names of the resultant data.frame
objects.
## [1] "osm_id" "name" "barrier" "highway" "ref"
## [6] "address" "is_in" "place" "man_made" "other_tags"
## [11] "geometry"
names (osmdata_sf (q, 'trentham.osm')$osm_points)
## [1] "osm_id" "name" "X_description_" "X_waypoint_"
## [5] "addr.city" "addr.housenumber" "addr.postcode" "addr.street"
## [9] "amenity" "barrier" "denomination" "foot"
## [13] "ford" "highway" "leisure" "note_1"
## [17] "phone" "place" "railway" "railway.historic"
## [21] "ref" "religion" "shop" "source"
## [25] "tourism" "waterway" "geometry"
osmdata
returns far more key
fields than does GDAL/sf
. More importantly, however, GDAL/sf
returns pre-defined key
fields regardless of whether they contain any data:
## [1] TRUE
In contrast, osmdata
returns only those key
fields which contain data (and so excludes address
in the above example).
4.2. OSM Ways
As for points, GDAL/sf
only returns those ways that are not represented or contained in ‘higher’ objects (OSM relations interpreted as SF::MULTIPOLYGON
or SF::MULTILINESTRING
objects). osmdata
returns all ways, and thus enables, for example, examination of the full attributes of any member of a multigeometry object. This is not possible with the GDAL/sf
translation. As for points, the only additional difference between osmdata
and GDAL/sf
is that osmdata
retains all key-value
pairs, whereas GDAL
retains only a select few.
4.3 OSM Relations
Translation of OSM relations into Simple Features differs more significantly between osmdata
and GDAL/sf
.
4.3(a) Multipolygon Relations
As indicated above, multipolygon relations are translated in broadly comparable ways by both osmdata
and sf/GDAL
. Note, however, the way
members of an OSM relation may be specified in arbitrary order, and the multipolygonal way may not necessarily be traced through simply following the segments in the order returned by sf/GDAL
.
4.3(b) Multilinestring Relations
Linestring relations are simply read by GDAL directly in terms of the their constituent ways, resulting in a single SF::MULTILINESTRING
object that contains exactly the same number of lines as the ways in the OSM relation, regardless of their role
attributes. Note that roles
are frequently used to specify alternative
multi-way routes through a single OSM relation. Such distinctions between primary and alternative are erased with GDAL/sf
reading.
5 Examples
5.1 Routing
Navigable paths, routes, and ways are all tagged within OSM as highway
, readily enabling an overpass
query to return only ways
that can be used for routing purposes. Routes are nevertheless commonly assembled within OSM relations, particularly where they form major, designated transport ways such as long-distance foot or bicycle paths or major motorways.
5.1(a) Routing with sf/GDAL
A query for key=highway
translated through GDAL/sf
will return those ways not part of any ‘higher’ structure as SF::LINESTRING
objects, but components of an entire transport network might also be returned as:
-
SF::MULTIPOLYGON
objects, holding all single ways which form simple polygons (that is, in which start and end points are the same); -
SF::MULTIPOLYGON
objects holding all single (non-polygonal) ways which combine to form anOSM multipolygon
relation (that is, in which the collection of ways ultimately forms a closedrole=outer
polygon). -
SF::MULTILINESTRING
objects holding all single (non-polygonal) ways which combine to form an OSM relation that is not a multipolygon.
Translating these data into a single form usable for routing purposes is not simple. A particular problem that is extremely difficult to resolve is reconciling the SF::MULTIPOLYGON
objects with the geometry of the SF::LINESTRING
objects. Highway components contained in SF::MULTIPOLYGON
objects need to be re-connected with the network represented by the SF::LINESTRING
objects, yet the OSM identifiers of the MULTIPOLYGON
components are removed by sf/GDAL
, preventing these components from being directly re-connected. The only way to ensure connection would be to re-connect those geographic points sharing identical coordinates. This would require code too long and complicated to be worthwhile demonstrating here.
5.1(b) Routing with osmdata
osmdata
retains all of the underlying ways of ‘higher’ structures (SF::MULTIPOLYGON
or SF::MULTILINESTRING
objects) as SF::LINESTRING
or SF::POLYGON
objects. The geometries of the latter objects duplicate those of the ‘higher’ relations, yet contain additional key-value
pairs corresponding to each way. Most importantly, the OSM ID values for all members of a relation
are stored within that relation, readily enabling the individual ways (LINESTRING
or POLYGON
objects) to be identified from the relation
(MULTIPOLYGON
or MULTILINESTRING
object).
The osmdata
translation thus readily enables a singularly complete network to be reconstructed by simply combining the SF::LINESTRING
layer with the SF::POLYGON
layer. These layers will always contain entirely independent members, and so will always be able to be directly combined without duplicating any objects.