Skip to contents

Gutenberg metadata about the subject of each work, particularly Library of Congress Classifications (lcc) and Library of Congress Subject Headings (lcsh).

Usage

gutenberg_subjects

Format

A tbl_df (see tibble or dplyr) with one row for each pairing of work and subject, with columns:

gutenberg_id

ID describing a work that can be joined with gutenberg_metadata

subject_type

Either "lcc" (Library of Congress Classification) or "lcsh" (Library of Congress Subject Headings)

subject

Subject

Details

Find more information about Library of Congress Categories here: https://www.loc.gov/catdir/cpso/lcco/, and about Library of Congress Subject Headings here: https://id.loc.gov/authorities/subjects.html.

To find the date on which this metadata was last updated, run attr(gutenberg_subjects, "date_updated").

Examples


library(dplyr)
library(stringr)

gutenberg_subjects %>%
  filter(subject_type == "lcsh") %>%
  count(subject, sort = TRUE)
#> # A tibble: 37,868 × 2
#>    subject                                 n
#>    <chr>                               <int>
#>  1 Science fiction                      2859
#>  2 Short stories                        2696
#>  3 Fiction                              1978
#>  4 Adventure stories                    1453
#>  5 Historical fiction                    932
#>  6 Conduct of life -- Juvenile fiction   874
#>  7 Love stories                          849
#>  8 Detective and mystery stories         810
#>  9 Man-woman relationships -- Fiction    776
#> 10 Poetry                                681
#> # … with 37,858 more rows

sherlock_holmes_subjects <- gutenberg_subjects %>%
  filter(str_detect(subject, "Holmes, Sherlock"))

sherlock_holmes_subjects
#> # A tibble: 54 × 3
#>    gutenberg_id subject_type subject                                           
#>           <int> <chr>        <chr>                                             
#>  1          108 lcsh         Holmes, Sherlock (Fictitious character) -- Fiction
#>  2          221 lcsh         Holmes, Sherlock (Fictitious character) -- Fiction
#>  3          244 lcsh         Holmes, Sherlock (Fictitious character) -- Fiction
#>  4          834 lcsh         Holmes, Sherlock (Fictitious character) -- Fiction
#>  5         1661 lcsh         Holmes, Sherlock (Fictitious character) -- Fiction
#>  6         2097 lcsh         Holmes, Sherlock (Fictitious character) -- Fiction
#>  7         2343 lcsh         Holmes, Sherlock (Fictitious character) -- Fiction
#>  8         2344 lcsh         Holmes, Sherlock (Fictitious character) -- Fiction
#>  9         2345 lcsh         Holmes, Sherlock (Fictitious character) -- Fiction
#> 10         2346 lcsh         Holmes, Sherlock (Fictitious character) -- Fiction
#> # … with 44 more rows

sherlock_holmes_metadata <- gutenberg_works() %>%
  filter(author == "Doyle, Arthur Conan") %>%
  semi_join(sherlock_holmes_subjects, by = "gutenberg_id")

sherlock_holmes_metadata
#> # A tibble: 16 × 8
#>    gutenberg_id title              author guten…¹ langu…² guten…³ rights has_t…⁴
#>           <int> <chr>              <chr>    <int> <chr>   <chr>   <chr>  <lgl>  
#>  1          108 "The Return of Sh… Doyle…      69 en      Detect… Publi… TRUE   
#>  2          244 "A Study in Scarl… Doyle…      69 en      Detect… Publi… TRUE   
#>  3          834 "The Memoirs of S… Doyle…      69 en      Detect… Publi… TRUE   
#>  4         1661 "The Adventures o… Doyle…      69 en      Detect… Publi… TRUE   
#>  5         2097 "The Sign of the … Doyle…      69 en      Detect… Publi… TRUE   
#>  6         2343 "The Adventure of… Doyle…      69 en      Detect… Publi… TRUE   
#>  7         2344 "The Adventure of… Doyle…      69 en      Detect… Publi… TRUE   
#>  8         2345 "The Adventure of… Doyle…      69 en      Detect… Publi… TRUE   
#>  9         2346 "The Adventure of… Doyle…      69 en      Detect… Publi… TRUE   
#> 10         2347 "The Adventure of… Doyle…      69 en      Detect… Publi… TRUE   
#> 11         2348 "The Disappearanc… Doyle…      69 en      Detect… Publi… TRUE   
#> 12         2349 "The Adventure of… Doyle…      69 en      Detect… Publi… TRUE   
#> 13         2350 "His Last Bow: An… Doyle…      69 en      Detect… Publi… TRUE   
#> 14         2852 "The Hound of the… Doyle…      69 en      Detect… Publi… TRUE   
#> 15         3289 "The Valley of Fe… Doyle…      69 en      Detect… Publi… TRUE   
#> 16        48320 "Adventures of Sh… Doyle…      69 en      NA      Publi… TRUE   
#> # … with abbreviated variable names ¹​gutenberg_author_id, ²​language,
#> #   ³​gutenberg_bookshelf, ⁴​has_text

if (FALSE) {
holmes_books <- gutenberg_download(sherlock_holmes_metadata$gutenberg_id)

holmes_books
}

# date last updated
attr(gutenberg_subjects, "date_updated")
#> [1] "2022-11-04"