Popler Database
Andrew Bibian
3/16/2019
Source:vignettes/popler-database-structure.Rmd
popler-database-structure.Rmd
Popler Overview
Popler is a database of population and individual-level data gathered throughout the Long Term Ecological Research (LTER) stations funded by the USA National Science Foundation.
We define population data datasets as time series on the size or density of a population of a taxonomic unit. The size of a population can be quantified as a count, biomass, or cover class. These measures are always repeated temporally, and they are often repeated spatially as well.
We define individual-level data as information on the attributes of the individuals, or a subset thereof, that make up a population. For example, common attributes of individuals are size, age, and sex.
Temporal replication
All temporal information within a dataset is stored using sampling dates up to, when available, the daily resolution. Since not all datasets have the same temporal resolution, popler stores date information in three separate columns; day, month, and year. In any of the date time columns, NULL
or -99999
values indicate that the information was not available from the raw data.
Note that never do we perform any temporal aggregations of data prior to storing them in Popler.
Spatial replication
popler
subdivides spatial replicates based on their spatial nestedness, because most sampling designs are spatially nested. For example, a study performed at 5 sites, with 3 transects at each site, and 4 quadrats within each transect, contains 3 levels of nested spatial replication:
- Level 1: the site level,
- Level 2: the transect level
- Level 3: the quadrat level
This referencing scheme allows us to standardize and align datasets collected from a variety of different sampling designs and across different data types.
Metadata on the extent of each spatial sampling unit is recorded when available (i.e. km2, m, cm3, etc.). Note, higher levels of spatial replication indicate smaller areas of sampling extent.
Similarly to temporal replication, never do we aggregate data by levels of spatial replication prior to storing them in Popler.
Spatial Replication Level 1
The first level of spatial replication is also the highest, or coarsest, level of nested spatial replication. We call this level 1 ‘site’. LTER stations contain permanent ‘sites’ which are reused across studies. Therefore, “site” labels allow querying data collected at a particular site regardless of the original study a dataset was derived from.
Whenever available in the source metadata, these ‘sites’ are associated with latitude and longitude coordinates are also recorded if available; -99999
values exist within these fields for studies that do not record geographic location for their ‘site’ label.
The Popler Database Schema
The following image depicts the relationship structure among the 14 tables of popler. Below we will discuss this schema, table contents, and give definitions. Three tables will not be discussed, because two are related to climate and one is a table for database migration information.
1. LTER Table
Data on the LTER research station.
Column.Name | Column.Type | Definition |
---|---|---|
lterid (primary key) | varying character (10) | three letter abbreviation for lter |
lter_name | text | full name for lter |
lat_lter | numeric | latitude of lter location (source https://lternet.edu/sites/coordinates) |
lng_lter | numeric | longitude of lter location (source https://lternet.edu/sites/coordinates) |
currently_funded | varying character (50) | current funding status |
current_principle_investigator | varying character (200) | current principle investigator (if identified on homepage) |
current_contact_email | varying character (200) | current contact email (if identified on homepage) |
alt_contact_email | varying character (200) | alternate contact email (if identified on homepage) |
homepage | varying character (200) | url |
2. Study Site Table
This table contains the labels that identify “sites” (spatial replication level 1) used across research projects. Different datasets can use the same “site” label if data were collected at the same sampling location.
Column.Name | Column.Type | Definition |
---|---|---|
proj_metadata_key (primary key) | integer | integer key to identify resarch projects that collect population data |
lter_project_fkey (foreign key) | varying character (10) | three letter abbreviation for lter |
title | text | title of research project (identified from metadata link) |
samplingunits | varying character (50) | sampling unit of the observated datatype in the research project |
datatype | varying character (50) | datatype label (count, biomass, density, cover, and individual) |
structured_type_1 | varying character (50) | population structure variables gathered during the research project |
structured_type_1_units | varying character (50) | population sturcutre units |
structured_type_2 | varying character (50) | population structure variables gathered during the research project |
structured_type_2_units | varying character (50) | population sturcutre units |
structured_type_3 | varying character (50) | population structure variables gathered during the research project |
structured_type_3_units | varying character (50) | population sturcutre units |
structured_type_4 | varying character (50) | population structure variables gathered during the research project |
structured_type_4_units | varying character (50) | population sturcutre units |
studystartyr | numeric | start year of the research project (generated from the raw data) |
studyendyr | numeric | end year of the research project (generated from the raw data) |
duration_years | integer | duration of research project (different from start and end years) |
samplefreq | text | qualitative identifier for the sampling frequency used in a resarch project |
studytype | varying character (50) | indicator of observed or experimental data |
community | varying character (50) | indicator of community datasets |
spatial_replication_level_1_extent | numeric | spatial extent of a sampling area |
spatial_replication_level_1_extent_units | varying character (200) | units of the spatial extent of the sampling area |
spatial_replication_level_1_label | varying character (200) | raw data label for the corresponding level of spatial replication |
spatial_replication_level_1_number_of_unique_reps | integer | number of unique levels associated the corresponding level of spatial replecation |
spatial_replication_level_2_extent | numeric | spatial extent of a sampling area |
spatial_replication_level_2_extent_units | varying character (200) | units of the spatial extent of the sampling area |
spatial_replication_level_2_label | varying character (200) | raw data label for the corresponding level of spatial replication |
spatial_replication_level_2_number_of_unique_reps | integer | number of unique levels associated the corresponding level of spatial replecation |
spatial_replication_level_3_extent | numeric | spatial extent of a sampling area |
spatial_replication_level_3_extent_units | varying character (200) | units of the spatial extent of the sampling area |
spatial_replication_level_3_label | varying character (200) | raw data label for the corresponding level of spatial replication |
spatial_replication_level_3_number_of_unique_reps | integer | number of unique levels associated the corresponding level of spatial replecation |
spatial_replication_level_4_extent | numeric | spatial extent of a sampling area |
spatial_replication_level_4_extent_units | varying character (200) | units of the spatial extent of the sampling area |
spatial_replication_level_4_label | varying character (200) | raw data label for the corresponding level of spatial replication |
spatial_replication_level_4_number_of_unique_reps | integer | number of unique levels associated the corresponding level of spatial replecation |
treatment_type_1 | varying character (200) | general descriptor of a given treatment within the resarch project |
treatment_type_2 | varying character (200) | general descriptor of a given treatment within the resarch project |
treatment_type_3 | varying character (200) | general descriptor of a given treatment within the resarch project |
control_group | varying character (200) | data label for controls when projects perform experimental manipulations |
derived | varying character (200) | indicates whether the raw data available was derived from averages or data manipulations |
authors | text | authors associated with a resarch project (from metadata link) |
authors_contact | varying character (200) | authors contact (or a contact) associated with a research project (from metadata link) |
metalink | text | metadata link from individual lter data catalogues |
knbid | varying character (200) | KNB identifier for LTER data portal (if provided) |
3. Project Table
Metadata related to each separate dataset.
Column.Name | Column.Type | Definition |
---|---|---|
proj_metadata_key (primary key) | integer | integer key to identify resarch projects that collect population data |
lter_project_fkey (foreign key) | varying character (10) | three letter abbreviation for lter |
title | text | title of research project (identified from metadata link) |
samplingunits | varying character (50) | sampling unit of the observated datatype in the research project |
datatype | varying character (50) | datatype label (count, biomass, density, cover, and individual) |
structured_type_1 | varying character (50) | population structure variables gathered during the research project |
structured_type_1_units | varying character (50) | population sturcutre units |
structured_type_2 | varying character (50) | population structure variables gathered during the research project |
structured_type_2_units | varying character (50) | population sturcutre units |
structured_type_3 | varying character (50) | population structure variables gathered during the research project |
structured_type_3_units | varying character (50) | population sturcutre units |
structured_type_4 | varying character (50) | population structure variables gathered during the research project |
structured_type_4_units | varying character (50) | population sturcutre units |
studystartyr | numeric | start year of the research project (generated from the raw data) |
studyendyr | numeric | end year of the research project (generated from the raw data) |
duration_years | integer | duration of research project (different from start and end years) |
samplefreq | text | qualitative identifier for the sampling frequency used in a resarch project |
studytype | varying character (50) | indicator of observed or experimental data |
community | varying character (50) | indicator of community datasets |
spatial_replication_level_1_extent | numeric | spatial extent of a sampling area |
spatial_replication_level_1_extent_units | varying character (200) | units of the spatial extent of the sampling area |
spatial_replication_level_1_label | varying character (200) | raw data label for the corresponding level of spatial replication |
spatial_replication_level_1_number_of_unique_reps | integer | number of unique levels associated the corresponding level of spatial replecation |
spatial_replication_level_2_extent | numeric | spatial extent of a sampling area |
spatial_replication_level_2_extent_units | varying character (200) | units of the spatial extent of the sampling area |
spatial_replication_level_2_label | varying character (200) | raw data label for the corresponding level of spatial replication |
spatial_replication_level_2_number_of_unique_reps | integer | number of unique levels associated the corresponding level of spatial replecation |
spatial_replication_level_3_extent | numeric | spatial extent of a sampling area |
spatial_replication_level_3_extent_units | varying character (200) | units of the spatial extent of the sampling area |
spatial_replication_level_3_label | varying character (200) | raw data label for the corresponding level of spatial replication |
spatial_replication_level_3_number_of_unique_reps | integer | number of unique levels associated the corresponding level of spatial replecation |
spatial_replication_level_4_extent | numeric | spatial extent of a sampling area |
spatial_replication_level_4_extent_units | varying character (200) | units of the spatial extent of the sampling area |
spatial_replication_level_4_label | varying character (200) | raw data label for the corresponding level of spatial replication |
spatial_replication_level_4_number_of_unique_reps | integer | number of unique levels associated the corresponding level of spatial replecation |
treatment_type_1 | varying character (200) | general descriptor of a given treatment within the resarch project |
treatment_type_2 | varying character (200) | general descriptor of a given treatment within the resarch project |
treatment_type_3 | varying character (200) | general descriptor of a given treatment within the resarch project |
control_group | varying character (200) | data label for controls when projects perform experimental manipulations |
derived | varying character (200) | indicates whether the raw data available was derived from averages or data manipulations |
authors | text | authors associated with a resarch project (from metadata link) |
authors_contact | varying character (200) | authors contact (or a contact) associated with a research project (from metadata link) |
metalink | text | metadata link from individual lter data catalogues |
knbid | varying character (200) | KNB identifier for LTER data portal (if provided) |
4. Sites Within Project Table
Site level information regarding starting and ending year of sampling, number of observations, and number of taxonomic units. Note that here, uniquetaxaunits
refers to the taxonomic units observed within each site, not within each dataset.
Column.Name | Column.Type | Definition |
---|---|---|
site_in_project_key (primary key) | integer | Autoincremented integer key to identify study site labels within a project |
study_site_table_fkey (foreign key) | varying character (200) | site level label used in a resarch project |
project_table_fkey (foreign key) | integer | integer key to identify resarch projects that collect population data |
sitestartyr | numeric | start year of data generation at a site level (generated from the raw data) |
siteendyr | numeric | end year of data generation at a site level (generated from the raw data) |
totalobs | numeric | count of data records at a site level (generated from the raw data) |
uniquetaxaunits | numeric | count of unique taxonomic units at a site level (generated from raw data) |
5. Taxonomic Table
Taxonomic information recorded within a project. In this table, taxonomic information refers to each individual site (spatial replication level 1).
Column.Name | Column.Type | Definition |
---|---|---|
taxa_table_key (primary key) | integer | Autoincremented integer |
site_in_project_taxa_key (foreign key) | integer | Autoincremented integer key to identify study site labels within a project |
sppcode | varying character (100) | species code (generated from processed raw data) |
kingdom | varying character (100) | kingdom (generated from processed raw data) |
subkingdom | varying character (100) | subkingdom (generated from processed raw data) |
infrakingdom | varying character (100) | infrakingdom (generated from processed raw data) |
superdivision | varying character (100) | superdivision (generated from processed raw data) |
division | varying character (100) | division (generated from processed raw data) |
subdivision | varying character (100) | subdivision (generated from processed raw data) |
superphylum | varying character (100) | superphylum (generated from processed raw data) |
phylum | varying character (100) | phylum (generated from processed raw data) |
subphylum | varying character (100) | subphylum (generated from processed raw data) |
clss | varying character (100) | clss (generated from processed raw data) |
subclass | varying character (100) | subclass (generated from processed raw data) |
ordr | varying character (100) | ordr (generated from processed raw data) |
family | varying character (100) | family (generated from processed raw data) |
genus | varying character (100) | genus (generated from processed raw data) |
species | varying character (100) | species (generated from processed raw data) |
common_name | varying character (100) | common_name (generated from processed raw data) |
authority | varying character (100) | authority to verify taxonomic classification |
metadata_taxa_key | integer | integer key to identify the resarch projects associated with the raw data |
6. Accepted Taxonomic Table
Table containing “accepted” taxonomic information: this is an attempt to associate taxonomic units in the raw data of popler to taxonomic units accepted in the literature. This taxonomic information also refers to individual sites (spatial replication level 1).
Column.Name | Column.Type | Definition |
---|---|---|
taxa_table_key (primary key) | integer | Autoincremented integer |
site_in_project_taxa_key (foreign key) | integer | Autoincremented integer key to identify study site labels within a project |
sppcode | varying character (100) | species code (generated from processed raw data) |
kingdom | varying character (100) | kingdom (generated from processed raw data) |
subkingdom | varying character (100) | subkingdom (generated from processed raw data) |
infrakingdom | varying character (100) | infrakingdom (generated from processed raw data) |
superdivision | varying character (100) | superdivision (generated from processed raw data) |
division | varying character (100) | division (generated from processed raw data) |
subdivision | varying character (100) | subdivision (generated from processed raw data) |
superphylum | varying character (100) | superphylum (generated from processed raw data) |
phylum | varying character (100) | phylum (generated from processed raw data) |
subphylum | varying character (100) | subphylum (generated from processed raw data) |
clss | varying character (100) | clss (generated from processed raw data) |
subclass | varying character (100) | subclass (generated from processed raw data) |
ordr | varying character (100) | ordr (generated from processed raw data) |
family | varying character (100) | family (generated from processed raw data) |
genus | varying character (100) | genus (generated from processed raw data) |
species | varying character (100) | species (generated from processed raw data) |
common_name | varying character (100) | common_name (generated from processed raw data) |
authority | varying character (100) | authority to verify taxonomic classification |
metadata_taxa_key | integer | integer key to identify the resarch projects associated with the raw data |
7. Count Table
Population data where abundance is quantified as number of individuals. Null values filled in.
Column.Name | Column.Type | Definition |
---|---|---|
count_table_key (primary key) | integer | Autoincremented integer key |
taxa_count_fkey (foreign key) | integer | integer primary key from taxa_table |
site_in_project_count_fkey (foreign key) | integer | integer primary key from site_in_project_table |
year | numeric | year information from processed raw data |
month | numeric | month information from processed raw data |
day | numeric | day information from processed raw data |
spatial_replication_level_1 | varying character (50) | spatial replication level 1 from processed raw data |
spatial_replication_level_2 | varying character (50) | spatial replication level 2 from processed raw data |
spatial_replication_level_3 | varying character (50) | spatial replication level 3 from processed raw data |
spatial_replication_level_4 | varying character (50) | spatial replication level 4 from processed raw data |
spatial_replication_level_5 | varying character (50) | spatial replication level 5 from processed raw data |
treatment_type_1 | varying character (200) | treatment label from processed raw data |
treatment_type_2 | varying character (200) | treatment label from processed raw data |
treatment_type_3 | varying character (200) | treatment label from processed raw data |
structure_type_1 | varying character (200) | organismal structure from processed raw data |
structure_type_2 | varying character (200) | organismal structure from processed raw data |
structure_type_3 | varying character (200) | organismal structure from processed raw data |
structure_type_4 | varying character (50) | organismal structure from processed raw data |
count_observation | numeric | observations from processed raw data |
covariates | text | covariates from processed raw data |
metadata_count_key | integer | integer key to identify the resarch projects associated with the raw data |
8. Biomass Table
Population data where abundance is quantified in terms of biomass.
Column.Name | Column.Type | Definition |
---|---|---|
biomass_table_key (primary key) | integer | Autoincremented integer key |
taxa_biomass_fkey (foreign key) | integer | integer primary key from taxa_table |
site_in_project_biomass_fkey (foreign key) | integer | integer primary key from site_in_project_table |
year | numeric | year information from processed raw data |
month | numeric | month information from processed raw data |
day | numeric | day information from processed raw data |
spatial_replication_level_1 | varying character (50) | spatial replication level 1 from processed raw data |
spatial_replication_level_2 | varying character (50) | spatial replication level 2 from processed raw data |
spatial_replication_level_3 | varying character (50) | spatial replication level 3 from processed raw data |
spatial_replication_level_4 | varying character (50) | spatial replication level 4 from processed raw data |
spatial_replication_level_5 | varying character (50) | spatial replication level 5 from processed raw data |
treatment_type_1 | varying character (200) | treatment label from processed raw data |
treatment_type_2 | varying character (200) | treatment label from processed raw data |
treatment_type_3 | varying character (200) | treatment label from processed raw data |
structure_type_1 | varying character (200) | organismal structure from processed raw data |
structure_type_2 | varying character (200) | organismal structure from processed raw data |
structure_type_3 | varying character (200) | organismal structure from processed raw data |
structure_type_4 | varying character (50) | organismal structure from processed raw data |
biomass_observation | numeric | observations from processed raw data |
covariates | text | covariates from processed raw data |
metadata_biomass_key | integer | integer key to identify the resarch projects associated with the raw data |
9. Density Table
Population data where abundance is quantified in terms of density.
Column.Name | Column.Type | Definition |
---|---|---|
density_table_key (primary key) | integer | Autoincremented integer key |
taxa_density_fkey (foreign key) | integer | integer primary key from taxa_table |
site_in_project_density_fkey (foreign key) | integer | integer primary key from site_in_project_table |
year | numeric | year information from processed raw data |
month | numeric | month information from processed raw data |
day | numeric | day information from processed raw data |
spatial_replication_level_1 | varying character (50) | spatial replication level 1 from processed raw data |
spatial_replication_level_2 | varying character (50) | spatial replication level 2 from processed raw data |
spatial_replication_level_3 | varying character (50) | spatial replication level 3 from processed raw data |
spatial_replication_level_4 | varying character (50) | spatial replication level 4 from processed raw data |
spatial_replication_level_5 | varying character (50) | spatial replication level 5 from processed raw data |
treatment_type_1 | varying character (200) | treatment label from processed raw data |
treatment_type_2 | varying character (200) | treatment label from processed raw data |
treatment_type_3 | varying character (200) | treatment label from processed raw data |
structure_type_1 | varying character (200) | organismal structure from processed raw data |
structure_type_2 | varying character (200) | organismal structure from processed raw data |
structure_type_3 | varying character (200) | organismal structure from processed raw data |
structure_type_4 | varying character (50) | organismal structure from processed raw data |
biomass_observation | numeric | observations from processed raw data |
covariates | text | covariates from processed raw data |
metadata_density_key | integer | integer key to identify the resarch projects associated with the raw data |
10. Percent Cover Table
Population data where abundance is quantified in terms of cover.
Column.Name | Column.Type | Definition |
---|---|---|
percent_cover_table_key (primary key) | integer | Autoincremented integer key |
taxa_percent_cover_fkey (foreign key) | integer | integer primary key from taxa_table |
site_in_project_percent_cover_fkey (foreign key) | integer | integer primary key from site_in_project_table |
year | numeric | year information from processed raw data |
month | numeric | month information from processed raw data |
day | numeric | day information from processed raw data |
spatial_replication_level_1 | varying character (50) | spatial replication level 1 from processed raw data |
spatial_replication_level_2 | varying character (50) | spatial replication level 2 from processed raw data |
spatial_replication_level_3 | varying character (50) | spatial replication level 3 from processed raw data |
spatial_replication_level_4 | varying character (50) | spatial replication level 4 from processed raw data |
spatial_replication_level_5 | varying character (50) | spatial replication level 5 from processed raw data |
treatment_type_1 | varying character (200) | treatment label from processed raw data |
treatment_type_2 | varying character (200) | treatment label from processed raw data |
treatment_type_3 | varying character (200) | treatment label from processed raw data |
structure_type_1 | varying character (200) | organismal structure from processed raw data |
structure_type_2 | varying character (200) | organismal structure from processed raw data |
structure_type_3 | varying character (200) | organismal structure from processed raw data |
structure_type_4 | varying character (50) | organismal structure from processed raw data |
percent_cover_observation | numeric | observations from processed raw data |
covariates | text | covariates from processed raw data |
metadata_percent_cover_key | integer | integer key to identify the resarch projects associated with the raw data |
11. Individual Table
Individual-level data. The structure_type
columns refer to the attributes of individuals (e.g. size, age, sex, etc.)
Column.Name | Column.Type | Definition |
---|---|---|
individual_table_key (primary key) | integer | Autoincremented integer key |
taxa_individual_fkey (foreign key) | integer | integer primary key from taxa_table |
site_in_project_individual_fkey (foreign key) | integer | integer primary key from site_in_project_table |
year | numeric | year information from processed raw data |
month | numeric | month information from processed raw data |
day | numeric | day information from processed raw data |
spatial_replication_level_1 | varying character (50) | spatial replication level 1 from processed raw data |
spatial_replication_level_2 | varying character (50) | spatial replication level 2 from processed raw data |
spatial_replication_level_3 | varying character (50) | spatial replication level 3 from processed raw data |
spatial_replication_level_4 | varying character (50) | spatial replication level 4 from processed raw data |
spatial_replication_level_5 | varying character (50) | spatial replication level 5 from processed raw data |
treatment_type_1 | varying character (200) | treatment label from processed raw data |
treatment_type_2 | varying character (200) | treatment label from processed raw data |
treatment_type_3 | varying character (200) | treatment label from processed raw data |
structure_type_1 | varying character (200) | organismal structure from processed raw data |
structure_type_2 | varying character (200) | organismal structure from processed raw data |
structure_type_3 | varying character (200) | organismal structure from processed raw data |
structure_type_4 | varying character (50) | organismal structure from processed raw data |
individual_observation | numeric | observations from processed raw data |
covariates | text | covariates from processed raw data |
metadata_individual_key | integer | integer key to identify the resarch projects associated with the raw data |