Scope
This vignette provides a guided walk-through of the “getting data out” functions of the RESTful API endpoints which list and view details.
ruODK
users would mix and match parts of the
demonstrated workflows to build their own data pipelines, e.g.:
- to build a quick analysis from all data, freshly downloaded from a smaller project, or
- to build an interactive ETL pipeline to selectively download only new submissions for further processing and upload into downstream data warehouses.
A typical and more stream-lined workflow is provided in the RMarkdown
template “ODK Central via OData” which is supplied by
ruODK
.
Three ways to happiness
ODK Central offers no less than three different ways to access data:
- viewing ODK Central data in MS PowerBI, MS Excel, Tableau, or
ruODK
through the OData service endpoints, or - downloading all submissions including attachments as one (possibly
gigantic) zip archive either through the “Export Submissions” button in
the ODK Central form submissions page or through
ruODK
, or - viewing ODK Central data through
ruODK
’s RESTful API functions.
While the vignette("odata", package="ruODK")
(online here)
illustrates the first option, this vignette demonstrates the remaining
two.
Not implemented (yet) are the “managing ODK Central” functions which create, update, and delete projects, forms, users, roles, and permissions. We haven’t yet found a strong use case to automate those functions - ODK Central (driven by humans) does those jobs beautifully on an expected scale.
Setup ruODK
See vignette("Setup", package = "ruODK")
for detailed options to configure ruODK
.
Here, we’ll grab the OData service URL from the form used in this vignette, plus username and password of a web user of ODK Central with access to that form.
ruODK::ru_setup()
will populate the default url, project
ID, and form ID which are used by ruODK
’s other functions
(unless specified otherwise).
library(ruODK)
# ruODK::ru_setup(
# svc = "Form OData Service URL",
# un = Sys.getenv("ODKC_TEST_UN"),
# pw = Sys.getenv("ODKC_TEST_PW"),
# tz = "Australia/Perth",
# verbose = TRUE
# )
t <- fs::dir_create("media")
Projects
List projects. We see the project ID, a name, the number of forms and app users, dates of last form submissions plus project management timestamps (created, updated).
The important bit here is the project ID.
fq_project_list <- ruODK::project_list()
id | name | description | archived | key_id | created_at | updated_at | deleted_at | verbs | forms | app_users | datasets | last_submission | last_entity |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | ruODK package tests | Forms and submissions used for ruODK package tests. | FALSE | NA | 2023-08-04T01:47:13.622Z | 2024-03-09T05:08:58.696Z | NA | analytics.read , assignment.create , assignment.delete , assignment.list , audit.read , backup.run , config.read , config.set , dataset.create , dataset.list , dataset.read , dataset.update , entity.create , entity.delete , entity.list , entity.read , entity.update , field_key.create , field_key.delete , field_key.list , form.create , form.delete , form.list , form.read , form.restore , form.update , project.create , project.delete , project.read , project.update , public_link.create , public_link.delete , public_link.list , public_link.read , public_link.update , role.create , role.delete , role.update , session.end , submission.create , submission.delete , submission.list , submission.read , submission.restore , submission.update , user.create , user.delete , user.list , user.password.invalidate, user.read , user.update | 15 | 2 | 1 | 2024-03-09T05:11:55.753Z | 2024-10-11T06:01:11.280Z |
2 | ruODK package tests encrypted | NA | FALSE | 1 | 2023-08-04T01:47:41.258Z | 2024-03-04T07:04:18.715Z | NA | analytics.read , assignment.create , assignment.delete , assignment.list , audit.read , backup.run , config.read , config.set , dataset.create , dataset.list , dataset.read , dataset.update , entity.create , entity.delete , entity.list , entity.read , entity.update , field_key.create , field_key.delete , field_key.list , form.create , form.delete , form.list , form.read , form.restore , form.update , project.create , project.delete , project.read , project.update , public_link.create , public_link.delete , public_link.list , public_link.read , public_link.update , role.create , role.delete , role.update , session.end , submission.create , submission.delete , submission.list , submission.read , submission.restore , submission.update , user.create , user.delete , user.list , user.password.invalidate, user.read , user.update | 1 | 1 | 0 | 2024-03-04T07:06:16.788Z | NA |
3 | Sandbox | NA | FALSE | NA | 2023-08-04T01:47:50.572Z | NA | NA | analytics.read , assignment.create , assignment.delete , assignment.list , audit.read , backup.run , config.read , config.set , dataset.create , dataset.list , dataset.read , dataset.update , entity.create , entity.delete , entity.list , entity.read , entity.update , field_key.create , field_key.delete , field_key.list , form.create , form.delete , form.list , form.read , form.restore , form.update , project.create , project.delete , project.read , project.update , public_link.create , public_link.delete , public_link.list , public_link.read , public_link.update , role.create , role.delete , role.update , session.end , submission.create , submission.delete , submission.list , submission.read , submission.restore , submission.update , user.create , user.delete , user.list , user.password.invalidate, user.read , user.update | 1 | 0 | 0 | NA | NA |
Inspect a project using its ID. We receive a tibble with exactly one
row, all the columns of ruODK::project_list
plus a column
verbs
, which contains all available API actions for a
project.
fq_project_detail <- ruODK::project_detail()
id | name | forms | app_users | last_submission | created_at | updated_at | archived |
---|---|---|---|---|---|---|---|
1 | ruODK package tests | 15 | 2 | 2024-03-09T05:11:55.753Z | 2023-08-04T01:47:13.622Z | 2024-03-09T05:08:58.696Z | FALSE |
# Available verbs
fq_project_detail$verbs[[1]] %>% unlist(.)
#> [1] "config.read" "field_key.create"
#> [3] "field_key.delete" "field_key.list"
#> [5] "form.create" "form.delete"
#> [7] "form.list" "form.read"
#> [9] "form.update" "project.create"
#> [11] "project.delete" "project.read"
#> [13] "project.update" "session.end"
#> [15] "submission.create" "submission.read"
#> [17] "submission.list" "user.create"
#> [19] "user.list" "user.password.invalidate"
#> [21] "user.read" "user.update"
#> [23] "role.create" "role.update"
#> [25] "role.delete" "assignment.list"
#> [27] "assignment.create" "assignment.delete"
#> [29] "user.delete" "audit.read"
#> [31] "public_link.create" "public_link.list"
#> [33] "public_link.read" "public_link.update"
#> [35] "public_link.delete" "backup.run"
#> [37] "config.set" "analytics.read"
#> [39] "form.restore" "dataset.list"
#> [41] "entity.list" "dataset.read"
#> [43] "entity.read" "entity.create"
#> [45] "entity.update" "dataset.update"
#> [47] "entity.delete" "submission.update"
#> [49] "dataset.create" "submission.delete"
#> [51] "submission.restore"
Nothing apart from the verbs is new compared to the data returned by
ruODK::project_list
.
To learn more about the functionality behind the verbs, refer to the interactive ODK Central API documentation.
To retrieve data from ODK Central, the functions shown below will suffice.
Forms
List forms for a project
To download form submissions, we need to know project ID and form ID.
There are several ways of retrieving the form ID:
- Browsing forms in the ODK Central’s project overviews,
- Stealing the form ID from the OData service endpoint URL as shown on ODK Central’s form submission page,
- Listing form metadata for a given project ID with
ruODK::form_list()
.
fq_form_list <- ruODK::form_list()
project_id | xml_form_id | state | enketo_id | enketo_once_id | created_at | updated_at | key_id | version | hash | sha | sha256 | draft_token | published_at | name | submissions | entity_related | review_states_received | review_states_has_issues | review_states_edited | last_submission | excel_content_type | public_links | created_by_id | created_by_type | created_by_display_name | created_by_created_at | created_by_updated_at | created_by_deleted_at | fid |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | address_problem | open | q4bIaXNFWPGBfmhVVmx3rmYm9eHYpIH | 1d4c5916d82c5ee3376b7d2c0ce84ec73f967b18b2b7142e76ebe327eec23e1c | 2024-03-09 13:08:37 | 2024-03-09 13:08:58 | NA | 20240308210648 | 14b7f91529c67481d25bdde48f4cb073 | 2ec127bc81186b8693c7ef8c9ff92a3a6084195d | 1e4beccebcfa37410d6086568402a95a176afa12ccbccfadeadf3f3ef0484edc | NA | 2024-03-09 13:08:41 | Address a problem | 1 | TRUE | 1 | 0 | 0 | 2024-03-09T05:11:55.753Z | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | address_problem |
1 | burn_grading_forest | open | WKA7p8XRttHcS89tVdwuIFHNlAWJHfS | 04f0c916ad25ac1aeaa33316520f7aae2099bbdbd862bd77b1845b1f807e2a26 | 2023-08-09 11:14:17 | 2024-03-09 13:08:58 | NA | 2023.1.1 | 5be80a0e66f1238ec7ac9a21237a8867 | c063c0f32f92f4020fcebee694cdde8baf91dfc6 | 1e6ed61493b88328a7458d720d87688ccd6034e7601986fe3ccdb1c06ec72e50 | NA | 2023-08-09 11:26:31 | Burn Grading Forest | 1 | FALSE | 1 | 0 | 0 | 2023-08-09T05:09:55.750Z | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | burn_grading_forest |
1 | burn_grading_heath | open | 7WX4lTXqmeet7CHz8jjwLOc0EEmISkl | 2123c75dd4250401fcfee5abb7ddd24ba92cb5c00d84bcda1304a3454df0ba74 | 2023-08-09 11:17:02 | 2024-03-09 13:08:58 | NA | 2023.1.0 | 4f9a8b2d9b3eb1243f158d92567df956 | ee591e640b97b2114a83ae20d90c40f3982f3091 | 8939811d509f09989be67d6b9e2988cc80ffff44b31727e269cce05d71699b23 | NA | 2023-08-09 11:23:56 | Burn Grading Heath | 1 | FALSE | 1 | 0 | 0 | 2023-08-09T05:13:06.566Z | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | burn_grading_heath |
1 | burn_grading_verify | open | iQyzGfu0OspiuOMTlzM4HCbqkLWQRx8 | 20acaacf05e7275ef74ba9ef52d14743c6754c03bc090a2cbf5aace293987a55 | 2023-08-09 11:50:37 | 2024-03-09 13:08:58 | NA | 2023.1.3 | f3879cdd1087ea3d950dabb3bd91d96f | b6c0d1ba1043de79fac80d92f563c403e94adffc | 18d3e9210590979a7bfbe34f528410432683cec1b59618fcd69c8f0b50575410 | NA | 2023-08-09 15:34:50 | Burn Grading Verify | 1 | FALSE | 0 | 0 | 0 | 2023-08-09T05:14:09.891Z | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | burn_grading_verify |
1 | Flora-Quadrat-04 | open | t2fdSUXiyPboismgJValMrie7hal5YF | 2609a277d7f47e53c69424514de96ca298d25586e49c990b299cbc189478e7ec | 2023-12-03 15:02:20 | 2024-03-09 13:08:58 | NA | 434ff9e1e33fc8bb35148c0cc6979708 | 206cf55b28adcd82db91bc5c11012524654c5bd1 | bb3b86a93ab3f6f5a16a6daa693c19a0c04f3deb68e50cf1bacd7c37efd9024a | NA | 2023-12-03 15:02:27 | Flora Quadrat 0.4 | 1 | FALSE | 0 | 0 | 0 | 2023-12-04T08:30:26.154Z | NA | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | Flora-Quadrat-04 | |
1 | Flora-Quadrat-04-gap | open | 7UzBNe95jXhdfqqG2JTmEG7mhQHoKQ1 | 040a5a743b5c1f8e78d4021df766b7228a5f481fbf7a959944d52d36299fde2b | 2023-12-03 15:09:55 | 2024-03-09 13:08:58 | NA | 241c4759564ea039b4404b6892025500 | 61f5d1694e8866f6468ba6a2ff1c286226a66898 | 9504ab3daa2bf99270a7d4046c636a15c1972cdd93df388e936ad78411d619d9 | NA | 2023-12-03 15:10:05 | Flora Quadrat 0.4 (gap) | 1 | FALSE | 1 | 0 | 0 | 2024-03-04T07:08:52.813Z | NA | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | Flora-Quadrat-04-gap | |
1 | Flora-Quadrat-04-att | open | 0cuZcMAa1kemse3CFaws0ePH7hR0UPj | abf4f5edfb84342553f98177367167669c77b0155dd951ee4ddcb62e35cb917d | 2023-12-03 15:09:43 | 2024-03-09 13:08:58 | NA | 2cb6a4b3d7f05ab055f3da89d0958b14 | c130067ce8f7940acf807e74a433fa2ec3545995 | 7b251e39369d48401ff4205ea0faf6b7bd1e04e5b8acb9f7b638b51459a22b94 | NA | 2023-12-03 15:09:46 | Flora Quadrat 0.4 (one att) | 1 | FALSE | 1 | 0 | 0 | 2023-12-03T07:45:07.801Z | NA | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | Flora-Quadrat-04-att | |
1 | I8n_label_choices | open | bRT4lNCag0T0ASSJKJ8pHjNTCTFTO4s | dbc21554352efe46948676b230b7c1abf8c5b1fcb25106109ae9dd6288b82fb5 | 2023-12-03 15:03:14 | 2024-03-09 13:08:58 | NA | 14e3449067ec33efd82067a546c6554d | e190b678274da24e1ceb66f1de6d842550c29a5f | 4a7185f3ef411eb8749dbd9b540b811fffdec0572639425f0f7920ac9e9d09d9 | NA | 2023-12-03 15:03:18 | I8n label and choices | 1 | FALSE | 1 | 0 | 0 | 2023-12-04T08:30:56.757Z | NA | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | I8n_label_choices | |
1 | I8n_label_lng | open | 9sPuYNXpeZMnU7heWvoowMztz5Y93gH | 721e3c286e1b279a8885e3a443eb41335fdf310eed95b69a3424167e0c6ab8a5 | 2023-12-03 15:03:31 | 2024-03-09 13:08:58 | NA | 6932ca6e75078cae86b9eaec119abad0 | 60e1ef208de91be00138f2c911da09dabb34e366 | c57f1a1d5f5f44fc119ceaa1c4db684f8159b3d9783a540e40dce3654299da19 | NA | 2023-12-03 15:03:51 | I8n label lang | 1 | FALSE | 1 | 0 | 0 | 2023-12-04T08:31:29.169Z | NA | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | I8n_label_lng | |
1 | I8n_lang_choicefilter | open | orpw5UXdq6Jym3aQg7IPwFnw4oRWYNL | 3072a8269c47b18ab49baa79bff29976e7700c0603daf5c2d99590dda62d63af | 2023-12-03 15:06:32 | 2024-03-09 13:08:58 | NA | 02cc779cbc7ff0a136de77b7f7b8135d | fee1eb0b77aa5442885e231d4126f32aeb202040 | d842d8e9deb83bbf3475ef5e157286ca311f6cdf2cafebee04638d860489e39d | NA | 2023-12-03 15:06:36 | I8n label lang with choice filter | 1 | FALSE | 1 | 0 | 0 | 2023-12-04T08:32:30.641Z | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | I8n_lang_choicefilter | |
1 | I8n_no_lang_choicefilter | open | mByFnZbERATLDRmmksDYBqD1Rqz8Eab | 1f73b6265b220ed01499c97b0e21eed07ed30bffa7a206b54700887aeacb669f | 2023-12-03 15:09:17 | 2024-03-09 13:08:58 | NA | fc5d8a6c19662445b890a1165bde6810 | a58e58f13df3a56dfbc538b5210fa3f3cfe39bbe | 0e647b8099d3bb84aa548b24a23b2b7f0383ff6a350bdeebf7adbeb8de61bfe7 | NA | 2023-12-03 15:19:38 | I8n label no lang with choice filter | 1 | FALSE | 1 | 0 | 0 | 2023-12-04T08:35:23.748Z | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | I8n_no_lang_choicefilter | |
1 | Locations | open | NT1E6p5HlknfdSW2VttaKgjb8DctRqy | bdf9c1c0dc5f1a607ae7d10ac4a4182c996030a2ac286b45c317cad6b23d8acd | 2023-12-03 15:02:45 | 2024-03-09 13:08:58 | NA | 7365cf1fdad8d6c0ae7351bc05ecc2f2 | 784e30dc8daeb89b2b5680723cf520314843cba4 | d368f86c1e33b5682e10d7b9d7753e7edf6a35086bf6688945a281996cd30094 | NA | 2023-12-03 15:02:48 | Locations | 1 | FALSE | 0 | 0 | 0 | 2023-12-04T08:38:52.712Z | NA | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | Locations | |
1 | Locations_draft | open | itxB8ymLuT8YNq7Dx6tO5GUUpNCwq4H | NA | 2023-12-03 15:10:14 | 2024-03-09 13:08:58 | NA | d1701aad460c192c2fc97eb10241e350 | d7726b5ecf5311e65730e0aa51afa61b93284ce8 | 3c26873d0450f32f5dac86f6a52f5248a2eb97800e78d98adb4c7a2f2d8334bc | itexle5sIMlJkDlfzz3yW8pw2gJ4dwdWbT8F8utpK$g5hA7klBPyiuuoxw4zddqN | NA | Locations (draft) | 0 | FALSE | 0 | 0 | 0 | NA | NA | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | Locations_draft | |
1 | Locations_no_submissions | open | LWKvSaSk7Qa2FHFmNVf4AAuxqSH9zss | 74e225cac7e93d106d30c319d7ed72c0187651526c7756c54baf5ff7376ca116 | 2023-12-03 15:10:32 | 2024-03-09 13:08:58 | NA | eb7caf4e3994b953999165ddfdb6eb3b | ae23fca247232b723073e5d842944172c77bd110 | 21e7b36f8fca61aae069e4ce7483549a97b915453a37f1436cd32882d23eeb3b | NA | 2023-12-03 15:10:35 | Locations (no submissions) | 0 | FALSE | 0 | 0 | 0 | NA | NA | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | Locations_no_submissions | |
1 | report_problem | open | vibZw22eNVaGk7dfjqaNXCctVU1bHWi | b06f417d61c823479e16c244ea7a482df577cb385477bfa99a3ca75548619071 | 2024-03-09 13:08:06 | 2024-03-09 13:08:58 | NA | 20240308210713 | e14b4de0dd09c02093b419eeb1a79f99 | 2e8c7029416854188b24d41cd3ac0db9bb8a2a0e | 1acd951ea772ca40d3f5f4d1d45e56050955412472a935c08bd85c25ee3ae125 | NA | 2024-03-09 13:08:17 | Report a problem | 1 | TRUE | 1 | 0 | 0 | 2024-03-09T05:10:43.764Z | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | 0 | 6 | user | Florian Mayer | 2023-06-30 04:23:51 | 2023-08-10 09:41:29 | NA | report_problem |
Further to the metadata shown here, a column xml
contains the entire XForms definition (originally XML) as nested
list.
If the original XML is needed rather than the R equivalent (nested
list), we can use ruODK::form_xml
with parameter
parse=FALSE
:
fq_form_xml <- ruODK::form_xml(parse = FALSE)
if (require(listviewer)) {
listviewer::jsonedit(fq_form_xml)
} else {
ru_msg_warn("Install package listviewer to browse the form XML.")
}
#> Loading required package: listviewer
Inspect form schema
The form_schema
represents all form fields of the XForms
definition.
See the ODK
Central API docs and the examples of
??ruODK::form_schema()
for more detail.
fq_form_schema <- ruODK::form_schema()
path | name | type | binary | selectMultiple | ruodk_name |
---|---|---|---|---|---|
/meta | meta | structure | NA | NA | meta |
/meta/instanceID | instanceID | string | NA | NA | meta_instance_id |
/encounter_start_datetime | encounter_start_datetime | dateTime | NA | NA | encounter_start_datetime |
/reporter | reporter | string | NA | NA | reporter |
/device_id | device_id | string | NA | NA | device_id |
/location | location | structure | NA | NA | location |
/location/area_name | area_name | string | NA | NA | location_area_name |
/location/quadrat_photo | quadrat_photo | binary | TRUE | NA | location_quadrat_photo |
/location/corner1 | corner1 | geopoint | NA | NA | location_corner1 |
/habitat | habitat | structure | NA | NA | habitat |
/habitat/morphological_type | morphological_type | select1 | NA | NA | habitat_morphological_type |
/habitat/morphological_type_photo | morphological_type_photo | binary | TRUE | NA | habitat_morphological_type_photo |
/vegetation_stratum | vegetation_stratum | repeat | NA | NA | vegetation_stratum |
/vegetation_stratum/nvis_level3_broad_floristic_group | nvis_level3_broad_floristic_group | select1 | NA | NA | vegetation_stratum_nvis_level3_broad_floristic_group |
/vegetation_stratum/max_height_m | max_height_m | decimal | NA | NA | vegetation_stratum_max_height_m |
/vegetation_stratum/foliage_cover | foliage_cover | select1 | NA | NA | vegetation_stratum_foliage_cover |
/vegetation_stratum/dominant_species_1 | dominant_species_1 | string | NA | NA | vegetation_stratum_dominant_species_1 |
/vegetation_stratum/dominant_species_2 | dominant_species_2 | string | NA | NA | vegetation_stratum_dominant_species_2 |
/vegetation_stratum/dominant_species_3 | dominant_species_3 | string | NA | NA | vegetation_stratum_dominant_species_3 |
/vegetation_stratum/dominant_species_4 | dominant_species_4 | string | NA | NA | vegetation_stratum_dominant_species_4 |
/perimeter | perimeter | structure | NA | NA | perimeter |
/perimeter/corner2 | corner2 | geopoint | NA | NA | perimeter_corner2 |
/perimeter/corner3 | corner3 | geopoint | NA | NA | perimeter_corner3 |
/perimeter/corner4 | corner4 | geopoint | NA | NA | perimeter_corner4 |
/perimeter/mudmap_photo | mudmap_photo | binary | TRUE | NA | perimeter_mudmap_photo |
/taxon_encounter | taxon_encounter | repeat | NA | NA | taxon_encounter |
/taxon_encounter/field_name | field_name | string | NA | NA | taxon_encounter_field_name |
/taxon_encounter/photo_in_situ | photo_in_situ | binary | TRUE | NA | taxon_encounter_photo_in_situ |
/taxon_encounter/taxon_encounter_location | taxon_encounter_location | geopoint | NA | NA | taxon_encounter_taxon_encounter_location |
/taxon_encounter/life_form | life_form | select1 | NA | NA | taxon_encounter_life_form |
/taxon_encounter/voucher_specimen_barcode | voucher_specimen_barcode | barcode | NA | NA | taxon_encounter_voucher_specimen_barcode |
/taxon_encounter/voucher_specimen_label | voucher_specimen_label | string | NA | NA | taxon_encounter_voucher_specimen_label |
/encounter_end_datetime | encounter_end_datetime | dateTime | NA | NA | encounter_end_datetime |
Show details of one form
The details of a form are exactly the same as the output of
ruODK::form_list()
.
fq_form_detail <- ruODK::form_detail()
name | fid | version | state | submissions | created_at | created_by_id | created_by | updated_at | published_at | last_submission | hash |
---|---|---|---|---|---|---|---|---|---|---|---|
Flora Quadrat 0.4 | Flora-Quadrat-04 | open | 1 | 2023-12-03T07:02:20.258Z | 6 | Florian Mayer | 2024-03-09T05:08:58.714Z | 2023-12-03T07:02:27.957Z | 2023-12-04T08:30:26.154Z | 434ff9e1e33fc8bb35148c0cc6979708 |
Submissions
We are getting closer to the actual data! This section shows two of the options for data access: dump all submissions, or extract a subset.
Get all submissions for one form
Smaller datasets lend themselves to be exported in one go. ODK
Central offers one giant zip file containing all submissions, any
repeating groups, and any attachments both on the form submission page,
and as API endpoint which is provided as
ruODK::submission_export()
.
The default behaviour of ruODK::submission_export()
is
to write the zip file to the project root (here::here()
),
and to overwrite existing previous downloads. See
?ruODK::submission_export()
for alternative download and
retention options.
In the following chuck, we illustrate common tasks:
- Download the zip file.
- Unpack the zip file.
- Join repeating form group data
data_taxon
to main datadata_quadrat
to annotatedata_taxon
with data fromdata_quadrat
. - Sanitise the column names.
- Prepend all attachment filenames
(e.g.
data_quadrat$location_quadrat_photo
,data_taxon$photo_in_situ
) withmedia/
.
# Predict filenames (with knowledge of form)
fid <- ruODK::get_test_fid()
fid_csv <- fs::path(t, glue::glue("{fid}.csv"))
fid_csv_veg <- fs::path(t, glue::glue("{fid}-vegetation_stratum.csv"))
fid_csv_tae <- fs::path(t, glue::glue("{fid}-taxon_encounter.csv"))
# Download the zip file
se <- ruODK::submission_export(local_dir = t, overwrite = FALSE, verbose = TRUE)
# Unpack the zip file
f <- unzip(se, exdir = t)
fs::dir_ls(t)
# Prepend attachments with media/ to turn into relative file paths
fq_zip_data <- fid_csv %>%
readr::read_csv(na = c("", "NA", "na")) %>% # form uses "na" for NA
janitor::clean_names(.) %>%
dplyr::mutate(id = meta_instance_id) %>%
ruODK::handle_ru_datetimes(fq_form_schema) %>%
ruODK::handle_ru_geopoints(fq_form_schema) %>%
ruODK::attachment_link(fq_form_schema)
fq_zip_strata <- fid_csv_veg %>%
readr::read_csv(na = c("", "NA", "na")) %>%
janitor::clean_names(.) %>%
dplyr::mutate(id = parent_key) %>%
# ruODK::handle_ru_datetimes(fq_form_schema) parent_key%>% # no dates
# ruODK::handle_ru_geopoints(fq_form_schema) %>% # no geopoints
# ruODK::ruODK::attachment_link(fq_form_schema) %>% # no att.
dplyr::left_join(fq_zip_data, by = c("parent_key" = "meta_instance_id"))
fq_zip_taxa <- fid_csv_tae %>%
readr::read_csv(na = c("", "NA", "na")) %>%
janitor::clean_names(.) %>%
dplyr::mutate(id = parent_key) %>%
# ruODK::handle_ru_datetimes(fq_form_schema) %>%
# ruODK::handle_ru_geopoints(fq_form_schema) %>%
# ruODK::ruODK::attachment_link(fq_form_schema) %>%
dplyr::left_join(fq_zip_data, by = c("parent_key" = "meta_instance_id"))
head(fq_zip_data)
#> # A tibble: 1 × 38
#> submission_date meta_instance_id encounter_start_date…¹ reporter device_id
#> <dttm> <chr> <dttm> <lgl> <chr>
#> 1 2023-12-04 08:30:26 uuid:46d3939a-8… 2023-12-04 16:20:19 NA collect:…
#> # ℹ abbreviated name: ¹encounter_start_datetime
#> # ℹ 33 more variables: location_area_name <chr>, location_quadrat_photo <chr>,
#> # location_corner1_latitude <dbl>, location_corner1_longitude <dbl>,
#> # location_corner1_altitude <dbl>, location_corner1_accuracy <dbl>,
#> # habitat_morphological_type <chr>, habitat_morphological_type_photo <chr>,
#> # perimeter_corner2_latitude <dbl>, perimeter_corner2_longitude <dbl>,
#> # perimeter_corner2_altitude <dbl>, perimeter_corner2_accuracy <dbl>, …
head(fq_zip_strata)
#> # A tibble: 2 × 47
#> nvis_level3_broad_floristic_gr…¹ max_height_m foliage_cover dominant_species_1
#> <chr> <dbl> <chr> <chr>
#> 1 w3.0_shrub 1 70-30 Test species 1
#> 2 w1.1_trees_rainforest 5 10-5 Test species 5
#> # ℹ abbreviated name: ¹nvis_level3_broad_floristic_group
#> # ℹ 43 more variables: dominant_species_2 <chr>, dominant_species_3 <chr>,
#> # dominant_species_4 <chr>, parent_key <chr>, key.x <chr>, id.x <chr>,
#> # submission_date <dttm>, encounter_start_datetime <dttm>, reporter <lgl>,
#> # device_id <chr>, location_area_name <chr>, location_quadrat_photo <chr>,
#> # location_corner1_latitude <dbl>, location_corner1_longitude <dbl>,
#> # location_corner1_altitude <dbl>, location_corner1_accuracy <dbl>, …
head(fq_zip_taxa)
#> # A tibble: 2 × 49
#> field_name photo_in_situ taxon_encounter_loca…¹ taxon_encounter_loca…²
#> <chr> <chr> <dbl> <dbl>
#> 1 Test species 1 1701678535420.jpg -31.9 116.
#> 2 Test species 6 1701678592627.jpg -31.9 116.
#> # ℹ abbreviated names: ¹taxon_encounter_location_latitude,
#> # ²taxon_encounter_location_longitude
#> # ℹ 45 more variables: taxon_encounter_location_altitude <dbl>,
#> # taxon_encounter_location_accuracy <dbl>, life_form <chr>,
#> # voucher_specimen_barcode <lgl>, voucher_specimen_label <dbl>,
#> # parent_key <chr>, key.x <chr>, id.x <chr>, submission_date <dttm>,
#> # encounter_start_datetime <dttm>, reporter <lgl>, device_id <chr>, …
# Further: create map with popups, see vignette "odata"
List submissions for one form
Not always is it appropriate to download all submissions and all attachments at once.
If forms feed into downstream data warehouses, the typical ETL workflow is to
- List all submissions from ODK Central
- Select the subset of new submissions to download, e.g.
- Submissions younger than the oldest submission date in the data warehouse.
- Submissions whose
instance_id
is not already present in the data warehouse.
- Download only the selected submissions.
- Download attachments of only the selected submissions.
fq_submission_list <- ruODK::submission_list()
instance_id | submitter_id | device_id | created_at | updated_at | review_state | user_agent | submitter_id_2 | submitter_type | submitter_display_name | submitter_created_at | submitter_updated_at | submitter_deleted_at | current_version |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uuid:46d3939a-8bc5-4084-9154-d043bc4d3239 | 12 | collect:OHnydk6INUIfDHNl | 2023-12-04 16:30:26 | 2024-03-05 13:29:54 | approved | org.odk.collect.android/v2023.3.1 Dalvik/2.1.0 (Linux; U; Android 10; vivo 1937 Build/QP1A.190711.020) | 12 | field_key | ruodk | 2023-08-09 11:51:12 | NA | NA | 12 , 2023-12-04T08:30:26.154Z , uuid:46d3939a-8bc5-4084-9154-d043bc4d3239 , TRUE , collect:OHnydk6INUIfDHNl , org.odk.collect.android/v2023.3.1 Dalvik/2.1.0 (Linux; U; Android 10; vivo 1937 Build/QP1A.190711.020), 12 , field_key , ruodk , 2023-08-09T03:51:12.864Z |
The list of submissions critically contains each submission’s unique
ID in instance_id
. If the submissions shall be downloaded
and uploaded into another data warehouse, the instance_id
can be used to determine whether a record already exists in the
downstream warehouse or not. This workflow is preferable where the
majority of submissions is already imported into another downstream data
warehouse, and we only want to add new submissions, as in submissions
which are not already imported into the data warehouse.
Furthermore, the instance_id
s can now be used to
retrieve the actual submissions.
Get submission data
In order to import each submission, we need to retrieve the data by
instance_id
.
# One submission
fq_one_submission <- ruODK::get_one_submission(
fq_submission_list$instance_id[[1]]
)
# Multiple submissions
fq_submissions <- ruODK::submission_get(fq_submission_list$instance_id)
Parse submissions
The data in sub
is one row of the bulk downloaded
submissions in data_quadrat
. The data in
submissions
represents all (or let’s pretend, the selected)
submissions in data_quadrat
. The field xml
contains the actual submission data including repeating groups.
The structure is different to the output of
ruODK::odata_submission_get
, therefore
ruODK::odata_submission_rectangle
does not work for those,
as here we might have repeating groups included in a submission.
This structure could be used for upload into data warehouses accepting nested data as e.g. JSON.
if (requireNamespace("listviewer")) {
listviewer::jsonedit(fq_submissions, mode = "code")
} else {
ru_msg_info("Please install package listviewer!")
}
Outlook
The approach shown here yields nested and stand-alone records, which is useful if the subsequent use requires records in nested JSON or XML format. Complex forms with repeating sub-groups will result in highly nested lists, whose structure heavily depends on the completeness of the submissions.
The other approach shown in vignette("odata-api", package="ruODK")
yields rectangled data in several normalised tables, which is useful for
analysis and visualisation.