Scope

This vignette provides a guided walk-through of the “getting data out” functions of the RESTful API endpoints which list and view details.

ruODK users would mix and match parts of the demonstrated workflows to build their own data pipelines, e.g.:

  • to build a quick analysis from all data, freshly downloaded from a smaller project, or
  • to build an interactive ETL pipeline to selectively download only new submissions for further processing and upload into downstream data warehouses.

A typical and more stream-lined workflow is provided in the RMarkdown template “ODK Central via OData” which is supplied by ruODK.

Three ways to happiness

ODK Central offers no less than three different ways to access data:

  • viewing ODK Central data in MS PowerBI, MS Excel, Tableau, or ruODK through the OData service endpoints, or
  • downloading all submissions including attachments as one (possibly gigantic) zip archive either through the “Export Submissions” button in the ODK Central form submissions page or through ruODK, or
  • viewing ODK Central data through ruODK’s RESTful API functions.

While the vignette("odata", package="ruODK") (online here) illustrates the first option, this vignette demonstrates the remaining two.

Not implemented (yet) are the “managing ODK Central” functions which create, update, and delete projects, forms, users, roles, and permissions. We haven’t yet found a strong use case to automate those functions - ODK Central (driven by humans) does those jobs beautifully on an expected scale.

Setup ruODK

See vignette("Setup", package = "ruODK") for detailed options to configure ruODK.

Here, we’ll grab the OData service URL from the form used in this vignette, plus username and password of a web user of ODK Central with access to that form.

ruODK::ru_setup() will populate the default url, project ID, and form ID which are used by ruODK’s other functions (unless specified otherwise).

library(ruODK)
ruODK::ru_setup(
  svc = "https://sandbox.central.getodk.org/v1/projects/14/forms/build_Flora-Quadrat-0-4_1564384341.svc",
  un = Sys.getenv("ODKC_TEST_UN"),
  pw = Sys.getenv("ODKC_TEST_PW"),
  tz = "Australia/Perth",
  verbose = TRUE
)
#> <ruODK settings>
#>   Default ODK Central Project ID: 14 
#>   Default ODK Central Form ID: build_Flora-Quadrat-0-4_1564384341 
#>   Default ODK Central URL: https://sandbox.central.getodk.org 
#>   Default ODK Central Username: [email protected] 
#>   Default ODK Central Password: run ruODK::get_default_pw() to show 
#>   Default Time Zone: Australia/Perth 
#>   Default ODK Central Version: 0.8 
#>   Default HTTP GET retries: 3 
#>   Verbose messages: TRUE 
#>   Test ODK Central Project ID: 14 
#>   Test ODK Central Form ID: build_Flora-Quadrat-0-2_1558575936 
#>   Test ODK Central Form ID (ZIP tests): build_Spotlighting-0-6_1558333698 
#>   Test ODK Central Form ID (Attachment tests): build_Flora-Quadrat-0-1_1558330379 
#>   Test ODK Central Form ID (Parsing tests): build_Turtle-Track-or-Nest-1-0_1569907666 
#>   Test ODK Central Form ID (WKT tests): build_Locations_1589344221 
#>   Test ODK Central URL: https://sandbox.central.getodk.org 
#>   Test ODK Central Username: [email protected] 
#>   Test ODK Central Password: run ruODK::get_test_pw() to show 
#>   Test ODK Central Version: 0.8
t <- fs::dir_create("media")
#> Warning in data("fq_form_schema_raw"): data set 'fq_form_schema_raw' not found

Projects

List projects. We see the project ID, a name, the number of forms and app users, dates of last form submissions plus project management timestamps (created, updated).

The important bit here is the project ID.

fq_project_list <- ruODK::project_list()
fq_project_list %>% knitr::kable(.)
id name forms app_users created_at updated_at last_submission archived
1 DBCA 10 1 2019-06-05 17:12:44 NA 2020-08-14 16:26:53 FALSE
26 EMB Turtle Monitoring 5 1 2019-09-27 18:41:45 NA 2020-07-27 14:32:44 FALSE
27 Fire Management and Plant Health 8 1 2020-01-08 14:30:27 NA 2020-08-14 15:27:23 FALSE
3 Flora 2 1 2019-06-06 11:24:31 NA 2020-06-29 11:47:22 FALSE
28 Kingston Spotlighting 3 2 2020-03-19 16:46:10 NA 2020-04-09 11:35:35 FALSE
2 Sandbox 12 5 2019-06-06 11:24:15 NA 2019-10-25 12:02:42 FALSE
4 DBCA 0 0 2019-06-27 10:54:30 NA NA TRUE

Inspect a project using its ID. We receive a tibble with exactly one row, all the columns of ruODK::project_list plus a column verbs, which contains all available API actions for a project.

fq_project_detail <- ruODK::project_detail()
# Project details (without verbs)
fq_project_detail %>% dplyr::select(-"verbs") %>%  knitr::kable(.)
id name forms app_users last_submission created_at updated_at archived
14 ruODK package test forms 11 3 2020-08-16T07:28:39.190Z 2019-05-17T03:10:51.069Z 2020-05-13T05:11:22.932Z FALSE
# Available verbs
fq_project_detail$verbs[[1]] %>% unlist(.)
#>  [1] "backup.create"            "backup.terminate"        
#>  [3] "config.read"              "field_key.create"        
#>  [5] "field_key.delete"         "field_key.list"          
#>  [7] "form.create"              "form.delete"             
#>  [9] "form.list"                "form.read"               
#> [11] "form.update"              "project.create"          
#> [13] "project.delete"           "project.list"            
#> [15] "project.read"             "project.update"          
#> [17] "session.end"              "submission.create"       
#> [19] "submission.read"          "submission.list"         
#> [21] "user.create"              "user.list"               
#> [23] "user.password.invalidate" "user.read"               
#> [25] "user.update"              "submission.update"       
#> [27] "role.create"              "role.update"             
#> [29] "role.delete"              "assignment.list"         
#> [31] "assignment.create"        "assignment.delete"       
#> [33] "user.delete"              "audit.read"              
#> [35] "public_link.create"       "public_link.list"        
#> [37] "public_link.read"         "public_link.update"      
#> [39] "public_link.delete"

Nothing apart from the verbs is new compared to the data returned by ruODK::project_list.

To learn more about the functionality behind the verbs, refer to the interactive ODK Central API documentation.

To retrieve data from ODK Central, the functions shown below will suffice.

Forms

List forms for a project

To download form submissions, we need to know project ID and form ID.

There are several ways of retrieving the form ID:

  • Browsing forms in the ODK Central’s project overviews,
  • Stealing the form ID from the OData service endpoint URL as shown on ODK Central’s form submission page,
  • Listing form metadata for a given project ID with ruODK::form_list().
fq_form_list <- ruODK::form_list()
fq_form_list %>% knitr::kable(.)
name fid version state submissions created_at created_by_id created_by updated_at last_submission hash
Flora Quadrat 0.1 build_Flora-Quadrat-0-1_1558330379 closing 1 2019-05-20 13:33:15 57 2020-08-13 10:47:40 2019-05-20 13:44:20 4f0036619468ef05b572631b04b94f06
Flora Quadrat 0.2 build_Flora-Quadrat-0-2_1558575936 open 2 2019-05-23 09:46:08 57 2020-08-13 10:47:41 2019-05-23 11:12:16 14e269a2374132392c275117efbe67b6
Flora Quadrat 0.3 build_Flora-Quadrat-0-3_1559119570 open 1 2019-05-29 16:48:15 57 2020-08-13 10:47:41 2019-05-29 16:55:59 d5a80cefb1895eefcd0cb86a12d8acb4
Flora Quadrat 0.4 build_Flora-Quadrat-0-4_1564384341 open 2 2019-08-19 15:58:28 57 2020-08-13 10:47:38 2019-09-18 16:51:07 1bb959d541ac6990e3f74893e38c855b
Locations build_Locations_1589344221 open 2 2020-05-13 12:32:47 57 2020-08-13 10:47:37 2020-08-16 15:28:39 24c50c683a07885b8ea3655adb31650c
Locations build_Locations_no_submissions NA open 0 2020-08-16 15:00:55 57 NA NA NA
Spotlighting 0.5 build_Spotlighting-0-5_1558320001 closing 1 2019-05-20 10:44:47 57 2020-08-13 10:47:40 2019-05-20 10:58:09 3775dcdface98ba3a426739c494123f6
Spotlighting 0.6 build_Spotlighting-0-6_1558333698 open 18 2019-05-20 14:30:21 57 2020-08-13 10:47:40 2019-10-09 09:15:23 456daaa9a4f96670e6eef3cf4a7dd0db
Spotlighting Survey End 0.3 build_Spotlighting-Survey-End-0-3_1558320208 open 3 2019-05-20 10:44:38 57 2020-08-13 10:47:40 2019-10-09 10:43:40 5fdfac8e773834b1267f7ca7e1c9a428
Spotlighting Survey Start 0.3 build_Spotlighting-Survey-Start-0-3_1558320795 open 9 2019-05-20 10:53:50 57 2020-08-13 10:47:41 2019-10-09 09:15:25 f548a064cca13bca746f3c0b1a8b5a32
Turtle Track or Nest 1.0 build_Turtle-Track-or-Nest-1-0_1569907666 open 2 2019-12-03 15:16:51 57 2020-08-13 10:47:41 2019-12-03 15:26:14 0b3b5e07e3f24f3a36afc92296693ec4

Further to the metadata shown here, a column xml contains the entire XForms definition (originally XML) as nested list.

If the original XML is needed rather than the R equivalent (nested list), we can use ruODK::form_xml with parameter parse=FALSE:

fq_form_xml <- ruODK::form_xml(parse=FALSE)
fq_form_xml
#> $node
#> <pointer: (nil)>
#> 
#> $doc
#> <pointer: (nil)>
#> 
#> attr(,"class")
#> [1] "xml_document" "xml_node"

Inspect form schema

The form_schema represents all form fields of the XForms definition.

See the ODK Central API docs and the examples of ??ruODK::form_schema() for more detail.

fq_form_schema <- ruODK::form_schema()
fq_form_schema %>% knitr::kable(.)
path name type binary ruodk_name
/meta meta structure NA meta
/meta/instanceID instanceID string NA meta_instanceID
/encounter_start_datetime encounter_start_datetime dateTime NA encounter_start_datetime
/reporter reporter string NA reporter
/device_id device_id string NA device_id
/location location structure NA location
/location/area_name area_name string NA location_area_name
/location/quadrat_photo quadrat_photo binary TRUE location_quadrat_photo
/location/corner1 corner1 geopoint NA location_corner1
/habitat habitat structure NA habitat
/habitat/morphological_type morphological_type select1 NA habitat_morphological_type
/habitat/morphological_type_photo morphological_type_photo binary TRUE habitat_morphological_type_photo
/vegetation_stratum vegetation_stratum repeat NA vegetation_stratum
/vegetation_stratum/nvis_level3_broad_floristic_group nvis_level3_broad_floristic_group select1 NA vegetation_stratum_nvis_level3_broad_floristic_group
/vegetation_stratum/max_height_m max_height_m decimal NA vegetation_stratum_max_height_m
/vegetation_stratum/foliage_cover foliage_cover select1 NA vegetation_stratum_foliage_cover
/vegetation_stratum/dominant_species_1 dominant_species_1 string NA vegetation_stratum_dominant_species_1
/vegetation_stratum/dominant_species_2 dominant_species_2 string NA vegetation_stratum_dominant_species_2
/vegetation_stratum/dominant_species_3 dominant_species_3 string NA vegetation_stratum_dominant_species_3
/vegetation_stratum/dominant_species_4 dominant_species_4 string NA vegetation_stratum_dominant_species_4
/perimeter perimeter structure NA perimeter
/perimeter/corner2 corner2 geopoint NA perimeter_corner2
/perimeter/corner3 corner3 geopoint NA perimeter_corner3
/perimeter/corner4 corner4 geopoint NA perimeter_corner4
/perimeter/mudmap_photo mudmap_photo binary TRUE perimeter_mudmap_photo
/taxon_encounter taxon_encounter repeat NA taxon_encounter
/taxon_encounter/field_name field_name string NA taxon_encounter_field_name
/taxon_encounter/photo_in_situ photo_in_situ binary TRUE taxon_encounter_photo_in_situ
/taxon_encounter/taxon_encounter_location taxon_encounter_location geopoint NA taxon_encounter_taxon_encounter_location
/taxon_encounter/life_form life_form select1 NA taxon_encounter_life_form
/taxon_encounter/voucher_specimen_barcode voucher_specimen_barcode barcode NA taxon_encounter_voucher_specimen_barcode
/taxon_encounter/voucher_specimen_label voucher_specimen_label string NA taxon_encounter_voucher_specimen_label
/encounter_end_datetime encounter_end_datetime dateTime NA encounter_end_datetime

Show details of one form

The details of a form are exactly the same as the output of ruODK::form_list().

fq_form_detail <- ruODK::form_detail()
fq_form_detail %>% knitr::kable(.)
name fid version state submissions created_at created_by_id created_by updated_at last_submission hash
Flora Quadrat 0.4 build_Flora-Quadrat-0-4_1564384341 open 2 2019-08-19T07:58:28.212Z 57 2020-08-13T02:47:38.451Z 2019-09-18T08:51:07.482Z 1bb959d541ac6990e3f74893e38c855b

Submissions

We are getting closer to the actual data! This section shows two of the options for data access: dump all submissions, or extract a subset.

Get all submissions for one form

Smaller datasets lend themselves to be exported in one go. ODK Central offers one giant zip file containing all submissions, any repeating groups, and any attachments both on the form submission page, and as API endpoint which is provided as ruODK::submission_export().

The default behaviour of ruODK::submission_export() is to write the zip file to the project root (here::here()), and to overwrite existing previous downloads. See ?ruODK::submission_export() for alternative download and retention options.

In the following chuck, we illustrate common tasks:

  • Download the zip file.
  • Unpack the zip file.
  • Join repeating form group data data_taxon to main data data_quadrat to annotate data_taxon with data from data_quadrat.
  • Sanitise the column names.
  • Prepend all attachment filenames (e.g. data_quadrat$location_quadrat_photo, data_taxon$photo_in_situ) with media/.
# Predict filenames (with knowledge of form)
fid <- ruODK::get_test_fid()
fid_csv <- fs::path(t, glue::glue("{fid}.csv"))
fid_csv_veg <- fs::path(t, glue::glue("{fid}-vegetation_stratum.csv"))
fid_csv_tae <- fs::path(t, glue::glue("{fid}-taxon_encounter.csv"))

# Download the zip file
se <- ruODK::submission_export(local_dir = t, overwrite = FALSE, verbose = TRUE)

# Unpack the zip file
f <- unzip(se, exdir = t)
fs::dir_ls(t)

# Prepend attachments with media/ to turn into relative file paths
fq_zip_data <- fid_csv %>% 
  readr::read_csv(na = c("", "NA", "na")) %>% # form uses "na" for NA
  janitor::clean_names(.) %>% 
  dplyr::mutate(id = meta_instance_id) %>% 
  ruODK::handle_ru_datetimes(fq_form_schema) %>% 
  ruODK::handle_ru_geopoints(fq_form_schema) %>% 
  ruODK::attachment_link(fq_form_schema)

fq_zip_strata <- fid_csv_veg %>% 
  readr::read_csv(na = c("", "NA", "na")) %>%
  janitor::clean_names(.) %>% 
  dplyr::mutate(id = parent_key) %>% 
  # ruODK::handle_ru_datetimes(fq_form_schema) parent_key%>% # no dates
  # ruODK::handle_ru_geopoints(fq_form_schema) %>%  # no geopoints
  # ruODK::ruODK::attachment_link(fq_form_schema) %>% # no att.
  dplyr::left_join(fq_zip_data, by = c("parent_key" = "meta_instance_id"))

fq_zip_taxa <- fid_csv_tae %>%
  readr::read_csv(na = c("", "NA", "na")) %>%
  janitor::clean_names(.) %>% 
  dplyr::mutate(id = parent_key) %>% 
  # ruODK::handle_ru_datetimes(fq_form_schema) %>% 
  # ruODK::handle_ru_geopoints(fq_form_schema) %>% 
  # ruODK::ruODK::attachment_link(fq_form_schema) %>%
  dplyr::left_join(fq_zip_data, by = c("parent_key" = "meta_instance_id"))
head(fq_zip_data)
#> # A tibble: 2 x 34
#>   submission_date     meta_instance_id encounter_start_da… reporter device_id
#>   <dttm>              <chr>            <dttm>              <chr>    <chr>    
#> 1 2019-09-18 08:51:07 uuid:d5e78a78-3… 2019-09-18 16:12:21 Florian… f73d2e12…
#> 2 2019-09-18 06:20:25 uuid:529cb189-8… 2019-09-18 14:08:43 Florian… f73d2e12…
#> # … with 29 more variables: location_area_name <chr>,
#> #   location_quadrat_photo <chr>, location_corner1_latitude <dbl>,
#> #   location_corner1_longitude <dbl>, location_corner1_altitude <dbl>,
#> #   location_corner1_accuracy <dbl>, habitat_morphological_type <chr>,
#> #   habitat_morphological_type_photo <chr>, perimeter_corner2_latitude <dbl>,
#> #   perimeter_corner2_longitude <dbl>, perimeter_corner2_altitude <dbl>,
#> #   perimeter_corner2_accuracy <dbl>, perimeter_corner3_latitude <dbl>,
#> #   perimeter_corner3_longitude <dbl>, perimeter_corner3_altitude <dbl>,
#> #   perimeter_corner3_accuracy <dbl>, perimeter_corner4_latitude <dbl>,
#> #   perimeter_corner4_longitude <dbl>, perimeter_corner4_altitude <dbl>,
#> #   perimeter_corner4_accuracy <dbl>, perimeter_mudmap_photo <chr>,
#> #   encounter_end_datetime <dttm>, key <chr>, submitter_id <dbl>,
#> #   submitter_name <chr>, attachments_present <dbl>,
#> #   attachments_expected <dbl>, status <chr>, id <chr>
head(fq_zip_strata)
#> # A tibble: 6 x 43
#>   nvis_level3_bro… max_height_m foliage_cover dominant_specie… dominant_specie…
#>   <chr>                   <dbl> <chr>         <chr>            <chr>           
#> 1 w1.0_trees_with…         3    10-5          Euc big fruit    <NA>            
#> 2 w3.0_shrub               2    30-10         Blue bush        Xmas shrub      
#> 3 g4.0_sedge              NA    5-0           Tall spiky       <NA>            
#> 4 w3.0_shrub               0.5  30-10         Erem green frost Banks is candles
#> 5 m1.0_bryophyte           0.05 5-0           Moss town        <NA>            
#> 6 w3.0_shrub               3    5-0           Plant name 1     Plant name 2    
#> # … with 38 more variables: dominant_species_3 <chr>, dominant_species_4 <chr>,
#> #   parent_key <chr>, key.x <chr>, id.x <chr>, submission_date <dttm>,
#> #   encounter_start_datetime <dttm>, reporter <chr>, device_id <chr>,
#> #   location_area_name <chr>, location_quadrat_photo <chr>,
#> #   location_corner1_latitude <dbl>, location_corner1_longitude <dbl>,
#> #   location_corner1_altitude <dbl>, location_corner1_accuracy <dbl>,
#> #   habitat_morphological_type <chr>, habitat_morphological_type_photo <chr>,
#> #   perimeter_corner2_latitude <dbl>, perimeter_corner2_longitude <dbl>,
#> #   perimeter_corner2_altitude <dbl>, perimeter_corner2_accuracy <dbl>,
#> #   perimeter_corner3_latitude <dbl>, perimeter_corner3_longitude <dbl>,
#> #   perimeter_corner3_altitude <dbl>, perimeter_corner3_accuracy <dbl>,
#> #   perimeter_corner4_latitude <dbl>, perimeter_corner4_longitude <dbl>,
#> #   perimeter_corner4_altitude <dbl>, perimeter_corner4_accuracy <dbl>,
#> #   perimeter_mudmap_photo <chr>, encounter_end_datetime <dttm>, key.y <chr>,
#> #   submitter_id <dbl>, submitter_name <chr>, attachments_present <dbl>,
#> #   attachments_expected <dbl>, status <chr>, id.y <chr>
head(fq_zip_taxa)
#> # A tibble: 6 x 45
#>   field_name photo_in_situ taxon_encounter… taxon_encounter… taxon_encounter…
#>   <chr>      <chr>                    <dbl>            <dbl>            <dbl>
#> 1 Conostyli… 156879538263…            -32.0             116.          -25.6  
#> 2 Blue bush  156879557989…            -32.0             116.          -14.5  
#> 3 Pileanthu… 156878722091…            -32.0             116.          -64.9  
#> 4 Small pin… 156878733368…            -32.0             116.          -44.9  
#> 5 Small red… 156878738083…            -32.0             116.           -0.859
#> 6 Funky gra… 156878745666…            -32.0             116.           -9.49 
#> # … with 40 more variables: taxon_encounter_location_accuracy <dbl>,
#> #   life_form <chr>, voucher_specimen_barcode <chr>,
#> #   voucher_specimen_label <chr>, parent_key <chr>, key.x <chr>, id.x <chr>,
#> #   submission_date <dttm>, encounter_start_datetime <dttm>, reporter <chr>,
#> #   device_id <chr>, location_area_name <chr>, location_quadrat_photo <chr>,
#> #   location_corner1_latitude <dbl>, location_corner1_longitude <dbl>,
#> #   location_corner1_altitude <dbl>, location_corner1_accuracy <dbl>,
#> #   habitat_morphological_type <chr>, habitat_morphological_type_photo <chr>,
#> #   perimeter_corner2_latitude <dbl>, perimeter_corner2_longitude <dbl>,
#> #   perimeter_corner2_altitude <dbl>, perimeter_corner2_accuracy <dbl>,
#> #   perimeter_corner3_latitude <dbl>, perimeter_corner3_longitude <dbl>,
#> #   perimeter_corner3_altitude <dbl>, perimeter_corner3_accuracy <dbl>,
#> #   perimeter_corner4_latitude <dbl>, perimeter_corner4_longitude <dbl>,
#> #   perimeter_corner4_altitude <dbl>, perimeter_corner4_accuracy <dbl>,
#> #   perimeter_mudmap_photo <chr>, encounter_end_datetime <dttm>, key.y <chr>,
#> #   submitter_id <dbl>, submitter_name <chr>, attachments_present <dbl>,
#> #   attachments_expected <dbl>, status <chr>, id.y <chr>
# Further: create map with popups, see vignette "odata"

List submissions for one form

Not always is it appropriate to download all submissions and all attachments at once.

If forms feed into downstream data warehouses, the typical ETL workflow is to

  • List all submissions from ODK Central
  • Select the subset of new submissions to download, e.g.
    • Submissions younger than the oldest submission date in the data warehouse.
    • Submissions whose instance_id is not already present in the data warehouse.
  • Download only the selected submissions.
  • Download attachments of only the selected submissions.
fq_submission_list <- ruODK::submission_list()
fq_submission_list %>% knitr::kable(.)
instance_id submitter_id device_id created_at updated_at
uuid:d5e78a78-34db-483d-978f-d9c9a3bc7b69 241 android_id:f73d2e1221ceaa06 2019-09-18 16:51:07 NA
uuid:529cb189-8bb2-4cf1-9041-dcde716efb4f 241 android_id:f73d2e1221ceaa06 2019-09-18 14:20:25 NA

The list of submissions critically contains each submission’s unique ID in instance_id. If the submissions shall be downloaded and uploaded into another data warehouse, the instance_id can be used to determine whether a record already exists in the downstream warehouse or not. This workflow is preferable where the majority of submissions is already imported into another downstream data warehouse, and we only want to add new submissions, as in submissions which are not already imported into the data warehouse.

Furthermore, the instance_ids can now be used to retrieve the actual submissions.

Get submission data

In order to import each submission, we need to retrieve the data by instance_id.

# One submission
fq_one_submission <- ruODK::get_one_submission(fq_submission_list$instance_id[[1]])

# Multiple submissions
fq_submissions <- ruODK::submission_get(fq_submission_list$instance_id)

Parse submissions

The data in sub is one row of the bulk downloaded submissions in data_quadrat. The data in submissions represents all (or let’s pretend, the selected) submissions in data_quadrat. The field xml contains the actual submission data including repeating groups.

The structure is different to the output of ruODK::odata_submission_get, therefore ruODK::odata_submission_rectangle does not work for those, as here we might have repeating groups included in a submission.

This structure could be used for upload into data warehouses accepting nested data as e.g. JSON.

listviewer::jsonedit(fq_submissions, mode = "code")

Outlook

The approach shown here yields nested and stand-alone records, which is useful if the subsequent use requires records in nested JSON or XML format. Complex forms with repeating sub-groups will result in highly nested lists, whose structure heavily depends on the completeness of the submissions.

The other approach shown in vignette("odata-api", package="ruODK") yields rectangled data in several normalised tables, which is useful for analysis and visualisation.