Gutenberg metadata about the subject of each work, particularly Library of Congress Classifications (lcc) and Library of Congress Subject Headings (lcsh).
Format
A tbl_df (see tibble or dplyr) with one row for each pairing of work and subject, with columns:
- gutenberg_id
ID describing a work that can be joined with gutenberg_metadata
- subject_type
Either "lcc" (Library of Congress Classification) or "lcsh" (Library of Congress Subject Headings)
- subject
Subject
Details
Find more information about Library of Congress Categories here: https://www.loc.gov/catdir/cpso/lcco/, and about Library of Congress Subject Headings here: https://id.loc.gov/authorities/subjects.html.
To find the date on which this metadata was last updated,
run attr(gutenberg_subjects, "date_updated")
.
Examples
library(dplyr)
library(stringr)
gutenberg_subjects %>%
filter(subject_type == "lcsh") %>%
count(subject, sort = TRUE)
#> # A tibble: 37,868 × 2
#> subject n
#> <chr> <int>
#> 1 Science fiction 2859
#> 2 Short stories 2696
#> 3 Fiction 1978
#> 4 Adventure stories 1453
#> 5 Historical fiction 932
#> 6 Conduct of life -- Juvenile fiction 874
#> 7 Love stories 849
#> 8 Detective and mystery stories 810
#> 9 Man-woman relationships -- Fiction 776
#> 10 Poetry 681
#> # … with 37,858 more rows
sherlock_holmes_subjects <- gutenberg_subjects %>%
filter(str_detect(subject, "Holmes, Sherlock"))
sherlock_holmes_subjects
#> # A tibble: 54 × 3
#> gutenberg_id subject_type subject
#> <int> <chr> <chr>
#> 1 108 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 2 221 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 3 244 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 4 834 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 5 1661 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 6 2097 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 7 2343 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 8 2344 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 9 2345 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 10 2346 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> # … with 44 more rows
sherlock_holmes_metadata <- gutenberg_works() %>%
filter(author == "Doyle, Arthur Conan") %>%
semi_join(sherlock_holmes_subjects, by = "gutenberg_id")
sherlock_holmes_metadata
#> # A tibble: 16 × 8
#> gutenberg_id title author guten…¹ langu…² guten…³ rights has_t…⁴
#> <int> <chr> <chr> <int> <chr> <chr> <chr> <lgl>
#> 1 108 "The Return of Sh… Doyle… 69 en Detect… Publi… TRUE
#> 2 244 "A Study in Scarl… Doyle… 69 en Detect… Publi… TRUE
#> 3 834 "The Memoirs of S… Doyle… 69 en Detect… Publi… TRUE
#> 4 1661 "The Adventures o… Doyle… 69 en Detect… Publi… TRUE
#> 5 2097 "The Sign of the … Doyle… 69 en Detect… Publi… TRUE
#> 6 2343 "The Adventure of… Doyle… 69 en Detect… Publi… TRUE
#> 7 2344 "The Adventure of… Doyle… 69 en Detect… Publi… TRUE
#> 8 2345 "The Adventure of… Doyle… 69 en Detect… Publi… TRUE
#> 9 2346 "The Adventure of… Doyle… 69 en Detect… Publi… TRUE
#> 10 2347 "The Adventure of… Doyle… 69 en Detect… Publi… TRUE
#> 11 2348 "The Disappearanc… Doyle… 69 en Detect… Publi… TRUE
#> 12 2349 "The Adventure of… Doyle… 69 en Detect… Publi… TRUE
#> 13 2350 "His Last Bow: An… Doyle… 69 en Detect… Publi… TRUE
#> 14 2852 "The Hound of the… Doyle… 69 en Detect… Publi… TRUE
#> 15 3289 "The Valley of Fe… Doyle… 69 en Detect… Publi… TRUE
#> 16 48320 "Adventures of Sh… Doyle… 69 en NA Publi… TRUE
#> # … with abbreviated variable names ¹gutenberg_author_id, ²language,
#> # ³gutenberg_bookshelf, ⁴has_text
if (FALSE) {
holmes_books <- gutenberg_download(sherlock_holmes_metadata$gutenberg_id)
holmes_books
}
# date last updated
attr(gutenberg_subjects, "date_updated")
#> [1] "2022-11-04"