The rppo package contains just two functions. One to query terms from
the Plant Phenology Ontology (PPO) and another to query the data global
plant phenology data portal (PPO data portal). Following are three
examples which illustrate use of these functions: the first two sections
illustrate the ppo_data
and ppo_terms
functions and the third section illustrates how to use the functions
together.
ppo_terms function
It is frequently useful to look through the list of present and
absent terms contained in the PPO. The ppo_terms
function
returns present terms, absent terms, or both, with columns containing a
termID, label, definition and full URI for each term. Use the termIDs
returned from this function to query terms in the ppo_data
function. The following example returns the present terms into a
“present_terms” data frame and a sample slice from the dataframe.
present_terms <- ppo_terms(present = TRUE)
# print the first five rows, with just the termIDs and labels
print(present_terms[1:5,c("termID","label")])
#> termID label
#> 1 obo:PPO_0002359 abscised cones or seeds present
#> 2 obo:PPO_0002358 abscised fruits or seeds present
#> 3 obo:PPO_0002357 abscised leaves present
#> 4 obo:PPO_0002311 breaking leaf buds present
#> 5 obo:PPO_0002346 cones present
ppo_data function
The ppo_data
function queries the PPO Data Portal,
passing values to the database and extracting matching results. The
results of the ppo_data
function are returned as a list
with five elements: 1) a data frame containing data, 2) a readme string
containing usage information and some statistics about the query itself,
3) a citation string containing information about proper citation, 4) a
number_possible integer indicating the total number of results if a
limit has been specified, and 5) a status code returned from the
service.
The “df” variable below is populated with results from the data element in the results list, with an example slice of data showing the first record.
results <- ppo_data(genus = "Quercus", fromYear = 2013, toYear = 2013, fromDay = 100, toDay = 110, termID = 'obo:PPO_0002313', limit = 10, timeLimit = 2)
df <- results$data
print(df[1:1,])
#> dayOfYear year genus specificEpithet eventRemarks latitude longitude
#> 1 101 2013 Quercus douglasii Leaves 37.98884 -122.1296
#> termID
#> 1 obo:PPO_0002313,obo:PPO_0002312,obo:PPO_0002315,obo:PPO_0002000,obo:PPO_0002014,obo:PPO_0002017,obo:PPO_0002015
#> source eventId
#> 1 USA-NPN urn:phenologicalObservingProcess/4578221
The readme and citation files returned by the list of results can be
accessed by calling the readme and citation elements. Note that the the
file “citation_and_data_use_policies.txt” that is referred to in the
readme file can be accessed using cat(results$citation)
cat(results$readme)
#> The following contains information about your download from the Global Plant
#> Phenology Database. Please refer to the citation_and_data_use_policies.txt
#> file for important information about data usage policies, licensing, and
#> citation protocols for each dataset. This file contains summary information
#> about the query that was run.
#>
#> data file = data.csv
#> date query ran = Mon Jan 31 2022 15:08:45 GMT+0000 (Coordinated Universal Time)
#> query = +genus:Quercus AND +plantStructurePresenceTypes:"http://purl.obolibrary.org/obo/PPO_0002313" AND +year:>=2013 AND +year:<=2013 AND +dayOfYear:>=100 AND +dayOfYear:<=110 AND source:USA-NPN
#> fields returned = dayOfYear,year,genus,specificEpithet,eventRemarks,latitude,longitude,source,eventId
#> user specified limit = 10
#> total results possible = 518
#> total results returned = 10
The results lists also shows the number of possible results in the results set, which is useful if the submitted query had a limit. For example, in the query above, the limit is set to 10 but we want to know how many records were possible if the limit was not set.
cat(results$number_possible)
#> 518
working with terms and data together
Here we will generate a data frame showing the frequency of “present”
and “absent” terms for a particular query. The query is for genus =
“Quercus” and latitude > 47. For each row in the returned data frame
ppo_data
will typically return multiple terms in the termID
field, corresponding to phenological stages as defined by the PPO. For
our example, we will generate a frequency table of the number of times
“present” or “absent” term occur in the entire returned dataset. Note
that the termID field returned by ppo_data
will return
“presence” terms in addition to “present” and “absent” terms, while the
ppo_terms
function only returns “present” and “absent”
terms. Thus, our frequency distribution only counts the number of
“present” and “absent” terms [For an in-depth discussion of the
difference between “presence”, “present”, and “absent”, see https://www.frontiersin.org/articles/10.3389/fpls.2018.00517/full].
Finally, since termIDs are returned as URI identifiers and not easily
readable text, this example maps termIDs to labels. The resulting data
frame shows two columns: 1) a column of term labels, and 2) a frequency
of the number of times this label appeared in the result set.
###############################################################################
# Generate a frequency data frame showing the number of times each termID
# is populated for genus equals "Quercus" above latitude of 47
# Note that all latitude/longitude queries need to be in the format of a
# bounding box
###############################################################################
df <- ppo_data(
genus = "Quercus",
limit="10", timeLimit = 4)
#> sending request for data ...
#> https://www.plantphenology.org/api/v3/download/_search?q=%2Bgenus:Quercus+AND+source:USA-NPN&source=latitude,longitude,year,dayOfYear,termID&limit=10
# return just the termID column
t1 <- df$data[,c('termID')]
# paste each cell into one string
t2<-paste(t1, collapse = ",")
# split strings at ,
t3<-strsplit(t2, ",")
# create a frequency table as a data frame
freqFrame <- as.data.frame(table(t3))
# create a new data frame that we want to populate
resultFrame <- data.frame(
label = character(),
frequency = integer(),
stringsAsFactors = FALSE)
###############################################################################
# Replace termIDs with labels in frequency frame
###############################################################################
# fetch "present" and "absent" terms using `ppo_terms`
termList <- ppo_terms(absent = TRUE, present = TRUE, timeLimit = 2);
#> sending request for terms ...
#> No encoding supplied: defaulting to UTF-8.
# loop all "present"" and "absent" terms
if (!is.null(termList)) {
for (term in 1:nrow(termList)) {
termListTermID<-termList[term,'termID'];
termListLabel<-termList[term,'label'];
# loop all rows that have a frequency generated
for (row in 1:nrow(freqFrame)) {
freqFrameTermID = freqFrame[row,'t3']
freqFrameFrequency = freqFrame[row,'Freq']
# Populate resultFrame with matching "present" or "absent" labels.
# In this step, we will ignore "presence" terms
# found in the frequency frame since the ppo_terms only returns
# "present" and "absent" terms.
if (freqFrameTermID == termListTermID) {
resultFrame[nrow(resultFrame)+1,] <- c(termListLabel,freqFrameFrequency)
}
}
}
} else {
message("termList is null, likely due to a server response issue. Try increasing the timeLimit or try again later. If the problem persists email the authors.")
}
# print results, showing term labels and a frequency count
print(resultFrame)
#> label frequency
#> 1 breaking leaf buds absent 2
#> 2 breaking leaf buds present 8
#> 3 expanding true leaves present 8
#> 4 leaf buds present 8
#> 5 new above-ground shoot-borne shoot systems present 8
#> 6 new shoot system present 8
#> 7 non-dormant leaf buds present 8
#> 8 true leaves present 8
#> 9 unfolding true leaves present 8
#> 10 vascular leaves present 8