Skip to contents

Get a table of Gutenberg work metadata that has been filtered by some common (settable) defaults, along with the option to add additional filters. This function is for convenience when working with common conditions when pulling a set of books to analyze. For more detailed filtering of the entire Project Gutenberg metadata, use the gutenberg_metadata and related datasets.

Usage

gutenberg_works(
  ...,
  languages = "en",
  only_text = TRUE,
  rights = c("Public domain in the USA.", "None"),
  distinct = TRUE,
  all_languages = FALSE,
  only_languages = TRUE
)

Arguments

...

Additional filters, given as expressions using the variables in the gutenberg_metadata dataset (e.g. author == "Austen, Jane")

languages

Vector of languages to include

only_text

Whether the works must have Gutenberg text attached. Works without text (e.g. audiobooks) cannot be downloaded with gutenberg_download

rights

Values to allow in the rights field. By default allows public domain in the US or "None", while excluding works under copyright. NULL allows any value of Rights

distinct

Whether to return only one distinct combination of each title and gutenberg_author_id. If multiple occur (that fulfill the other conditions), it uses the one with the lowest ID

all_languages

Whether, if multiple languages are given, all of them need to be present in a work. For example, if c("en", "fr") are given, whether only en/fr as opposed to English or French works should be returned

only_languages

Whether to exclude works that have other languages besides the ones provided. For example, whether to include en/fr when English works are requested

Value

A tbl_df (see the tibble or dplyr packages) with one row for each work, in the same format as gutenberg_metadata.

Details

By default, returns

  • English-language works

  • That are in text format in Gutenberg (as opposed to audio)

  • Whose text is not under copyright

  • At most one distinct field for each title/author pair

Examples

# \donttest{
library(dplyr)

gutenberg_works()
#> # A tibble: 53,840 × 8
#>    gutenberg_id title    author gutenberg_author_id language gutenberg_bookshelf
#>           <int> <chr>    <chr>                <int> <chr>    <chr>              
#>  1            1 "The De… Jeffe…                1638 en       Politics/American …
#>  2            2 "The Un… Unite…                   1 en       Politics/American …
#>  3            3 "John F… Kenne…                1666 en       NA                 
#>  4            4 "Lincol… Linco…                   3 en       US Civil War       
#>  5            5 "The Un… Unite…                   1 en       United States/Poli…
#>  6            6 "Give M… Henry…                   4 en       American Revolutio…
#>  7            7 "The Ma… NA                      NA en       NA                 
#>  8            8 "Abraha… Linco…                   3 en       US Civil War       
#>  9            9 "Abraha… Linco…                   3 en       US Civil War       
#> 10           10 "The Ki… NA                      NA en       Banned Books List …
#> # ℹ 53,830 more rows
#> # ℹ 2 more variables: rights <chr>, has_text <lgl>

# filter conditions
gutenberg_works(author == "Shakespeare, William")
#> # A tibble: 83 × 8
#>    gutenberg_id title    author gutenberg_author_id language gutenberg_bookshelf
#>           <int> <chr>    <chr>                <int> <chr>    <chr>              
#>  1          100 The Com… Shake…                  65 en       Plays              
#>  2         1041 Shakesp… Shake…                  65 en       NA                 
#>  3         1045 Venus a… Shake…                  65 en       NA                 
#>  4         1500 King He… Shake…                  65 en       NA                 
#>  5         1501 History… Shake…                  65 en       NA                 
#>  6         1502 The His… Shake…                  65 en       NA                 
#>  7         1503 The Tra… Shake…                  65 en       NA                 
#>  8         1504 The Com… Shake…                  65 en       NA                 
#>  9         1505 The Rap… Shake…                  65 en       NA                 
#> 10         1507 The Tra… Shake…                  65 en       NA                 
#> # ℹ 73 more rows
#> # ℹ 2 more variables: rights <chr>, has_text <lgl>

# language specifications

gutenberg_works(languages = "es") %>%
  count(language, sort = TRUE)
#> # A tibble: 1 × 2
#>   language     n
#>   <chr>    <int>
#> 1 es         763

gutenberg_works(languages = c("en", "es")) %>%
  count(language, sort = TRUE)
#> # A tibble: 3 × 2
#>   language     n
#>   <chr>    <int>
#> 1 en       53839
#> 2 es         761
#> 3 en/es       15

gutenberg_works(languages = c("en", "es"), all_languages = TRUE) %>%
  count(language, sort = TRUE)
#> # A tibble: 1 × 2
#>   language     n
#>   <chr>    <int>
#> 1 en/es       16

gutenberg_works(languages = c("en", "es"), only_languages = FALSE) %>%
  count(language, sort = TRUE)
#> # A tibble: 35 × 2
#>    language     n
#>    <chr>    <int>
#>  1 en       53839
#>  2 es         761
#>  3 en/la       27
#>  4 en/fr       21
#>  5 en/eo       19
#>  6 en/es       15
#>  7 de/en       11
#>  8 en/zh        5
#>  9 en/it        4
#> 10 ang/en       3
#> # ℹ 25 more rows
# }