Skip to contents

A dataset recording the growth of orange trees, replicated from the classic datasets::Orange dataset and implemented as a dataset_df S3 class with enhanced semantic metadata.

Usage

orange_df

Format

A data frame with 35 rows and 4 variables:

  • rowid: A unique identifier for each row (character)

  • tree: Tree identifier (ordered factor)

  • age: Age of the tree in days (numeric)

  • circumference: Trunk circumference in mm (numeric)

Details

This is a semantically enriched version of the classic Orange dataset, constructed using the dataset_df() and dublincore() constructors. Each column includes semantic metadata such as units, labels, concepts, or namespace identifiers. The dataset also embeds a machine-readable citation for reproducibility and provenance tracking.

Constructor Example

orange_bibentry <- dublincore(
  title = "Growth of Orange Trees",
  creator = c(
    person(
      given = "N.R.",
      family = "Draper",
      role = "cre",
      comment = c(VIAF = "http://viaf.org/viaf/84585260")
    ),
    person(
      given = "H",
      family = "Smith",
      role = "cre"
    )
  ),
  contributor = person(
    given = "Antal",
    family = "Daniel",
    role = "dtm"
  ),
  publisher = "Wiley",
  datasource = "https://isbnsearch.org/isbn/9780471170822",
  dataset_date = 1998,
  identifier = "https://doi.org/10.5281/zenodo.14917851",
  language = "en",
  description = "The Orange data frame has 35 rows and 3 columns of records of the growth of orange trees."
)

orange_df <- dataset_df(
  rowid = defined(paste0("orange:", row.names(Orange)),
    label = "ID in the Orange dataset",
    namespace = c("orange" = "datasets::Orange")
  ),
  tree = defined(Orange$Tree,
    label = "The number of the tree"
  ),
  age = defined(Orange$age,
    label = "The age of the tree",
    unit = "days since 1968/12/31"
  ),
  circumference = defined(Orange$circumference,
    label = "circumference at breast height",
    unit = "milimeter",
    concept = "https://www.wikidata.org/wiki/Property:P2043"
  ),
  dataset_bibentry = orange_bibentry
)

orange_df$rowid <- defined(orange_df$rowid,
  namespace = "https://doi.org/10.5281/zenodo.14917851"
)

References

  • Draper, N. R. & Smith, H. (1998). Applied Regression Analysis (3rd ed.). Wiley.

  • Pinheiro, J. C. & Bates, D. M. (2000). Mixed-effects Models in S and S-PLUS. Springer.

  • Becker, R. A., Chambers, J. M. & Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.

Examples

# Print with semantic citation and data preview
print(orange_df)
#> Draper-Smith (1998): Growth of Orange Trees [dataset], https://doi.org/10.5281/zenodo.14917851
#>    rowid     tree      age       circumference 
#>    <defined> <defined> <defined> <defined>    
#>  1 orange:1  2 [1]      118       30          
#>  2 orange:2  2 [1]      484       58          
#>  3 orange:3  2 [1]      664       87          
#>  4 orange:4  2 [1]     1004      115          
#>  5 orange:5  2 [1]     1231      120          
#>  6 orange:6  2 [1]     1372      142          
#>  7 orange:7  2 [1]     1582      145          
#>  8 orange:8  4 [2]      118       33          
#>  9 orange:9  4 [2]      484       69          
#> 10 orange:10 4 [2]      664      111          
#> # ℹ 25 more rows 

# Access semantic metadata associated with variables
print(orange_df$age)
#> orange_df$age: The age of the tree
#> Measured in days since 1968/12/31 
#>  [1]  118  484  664 1004 1231 1372 1582  118  484  664 1004 1231 1372 1582  118
#> [16]  484  664 1004 1231 1372 1582  118  484  664 1004 1231 1372 1582  118  484
#> [31]  664 1004 1231 1372 1582

# Retrieve the embedded bibliographic record
as_dublincore(orange_df)
#> Dublin Core Metadata Record
#> --------------------------
#> Title:       Growth of Orange Trees
#> Creator(s):  N.R. Draper [cre] (VIAF: http://viaf.org/viaf/84585260); H Smith [cre]
#> Contributor(s): :unas
#> Publisher:   Wiley
#> Year:        1998
#> Language:    en
#> Description: The Orange data frame has 35 rows and 3 columns of records of the growth of orange trees.