Skip to contents

Assign the primary language of a semantically rich dataset object using an ISO 639 language code or full language name. This sets the language attribute in the dataset's metadata.

Usage

language(x)

language(x, iso_639_code = "639-3") <- value

language(x, iso_639_code = "639-3") <- value

Arguments

x

A dataset object created by dataset_df() or as_dataset_df().

iso_639_code

A character string indicating the desired return format: either "639-3" (default; terminologic) or "639-1" (2-letter code).

value

A 2-letter or 3-letter language code (ISO 639-1 or ISO 639-2), or a full language name (case-insensitive).

Value

The dataset with an updated language attribute, typically an ISO 639-2/T code (Alpha_3_T) such as "fra", "eng", "spa", etc.

Details

This function supports recognition of:

  • 2-letter codes (ISO 639-1, e.g., "en", "fr")

  • 3-letter codes from both:

    • Alpha_3_B (bibliographic, e.g., "fre")

    • Alpha_3_T (terminologic, e.g., "fra")

  • Full language names (e.g., "English", "French")

For compatibility with open science repositories and modern metadata standards, this function returns the terminologic code (Alpha_3_T) when available. If Alpha_3_T is missing for a language, the legacy bibliographic code (Alpha_3_B) is used as a fallback.

Full language names (e.g., "English", "Spanish") are matched case-insensitively against the ISO 639-2 Name field. Exact matches are attempted first; if none are found, a prefix match is used. For example:

  • "English" returns "eng"

  • "English, Old" returns "ang"

This means that:

  • Both "fra" (terminologic) and "fre" (bibliographic) will be accepted as valid input for French

  • The resulting value stored and returned will be "fra"

This behaviour aligns with:

If value is NULL, the language is marked as ":unas" (unspecified).

In some cases<U+2014>especially for historical or moribund languages<U+2014>multiple similar names may exist. In such cases, it is safer to use a specific language code (e.g., "ang" instead of "English, Old" and "enm" for "English, Middle (1100-1500)"). You can also refer directly to the definitions in ISOcodes::ISO_639_2 for clarity.

See also

Other bibliographic helper functions: contributor(), creator(), dataset_format(), dataset_title(), description(), geolocation(), get_bibentry(), publication_year(), publisher(), relation(), rights(), subject()

Examples

df <- dataset_df(data.frame(x = 1:3))

language(df) <- "English" # Returns "eng"
language(df) <- "fre" # Legacy code; returns "fra"
language(df) <- "fra" # Returns "fra"
language(df, iso_639_code = "639-1") <- "fra" # Returns "fr"

language(df) <- NULL # Sets ":unas"