Get a table of Gutenberg work metadata that has been filtered by some common (settable) defaults, along with the option to add additional filters. This function is for convenience when working with common conditions when pulling a set of books to analyze. For more detailed filtering of the entire Project Gutenberg metadata, use the gutenberg_metadata and related datasets.
Usage
gutenberg_works(
...,
languages = "en",
only_text = TRUE,
rights = c("Public domain in the USA.", "None"),
distinct = TRUE,
all_languages = FALSE,
only_languages = TRUE
)
Arguments
- ...
Additional filters, given as expressions using the variables in the gutenberg_metadata dataset (e.g.
author == "Austen, Jane"
)- languages
Vector of languages to include
- only_text
Whether the works must have Gutenberg text attached. Works without text (e.g. audiobooks) cannot be downloaded with
gutenberg_download
- rights
Values to allow in the
rights
field. By default allows public domain in the US or "None", while excluding works under copyright. NULL allows any value of Rights- distinct
Whether to return only one distinct combination of each title and gutenberg_author_id. If multiple occur (that fulfill the other conditions), it uses the one with the lowest ID
- all_languages
Whether, if multiple languages are given, all of them need to be present in a work. For example, if
c("en", "fr")
are given, whether onlyen/fr
as opposed to English or French works should be returned- only_languages
Whether to exclude works that have other languages besides the ones provided. For example, whether to include
en/fr
when English works are requested
Value
A tbl_df (see the tibble or dplyr packages) with one row for each work, in the same format as gutenberg_metadata.
Details
By default, returns
English-language works
That are in text format in Gutenberg (as opposed to audio)
Whose text is not under copyright
At most one distinct field for each title/author pair
Examples
# \donttest{
library(dplyr)
gutenberg_works()
#> # A tibble: 53,840 × 8
#> gutenberg_id title author guten…¹ langu…² guten…³ rights has_t…⁴
#> <int> <chr> <chr> <int> <chr> <chr> <chr> <lgl>
#> 1 1 "The Declaration … Jeffe… 1638 en Politi… Publi… TRUE
#> 2 2 "The United State… Unite… 1 en Politi… Publi… TRUE
#> 3 3 "John F. Kennedy'… Kenne… 1666 en NA Publi… TRUE
#> 4 4 "Lincoln's Gettys… Linco… 3 en US Civ… Publi… TRUE
#> 5 5 "The United State… Unite… 1 en United… Publi… TRUE
#> 6 6 "Give Me Liberty … Henry… 4 en Americ… Publi… TRUE
#> 7 7 "The Mayflower Co… NA NA en NA Publi… TRUE
#> 8 8 "Abraham Lincoln'… Linco… 3 en US Civ… Publi… TRUE
#> 9 9 "Abraham Lincoln'… Linco… 3 en US Civ… Publi… TRUE
#> 10 10 "The King James V… NA NA en Banned… Publi… TRUE
#> # … with 53,830 more rows, and abbreviated variable names ¹gutenberg_author_id,
#> # ²language, ³gutenberg_bookshelf, ⁴has_text
# filter conditions
gutenberg_works(author == "Shakespeare, William")
#> # A tibble: 83 × 8
#> gutenberg_id title author guten…¹ langu…² guten…³ rights has_t…⁴
#> <int> <chr> <chr> <int> <chr> <chr> <chr> <lgl>
#> 1 100 The Complete Work… Shake… 65 en Plays Publi… TRUE
#> 2 1041 Shakespeare's Son… Shake… 65 en NA Publi… TRUE
#> 3 1045 Venus and Adonis Shake… 65 en NA Publi… TRUE
#> 4 1500 King Henry VI, Fi… Shake… 65 en NA Publi… TRUE
#> 5 1501 History of King H… Shake… 65 en NA Publi… TRUE
#> 6 1502 The History of Ki… Shake… 65 en NA Publi… TRUE
#> 7 1503 The Tragedy of Ki… Shake… 65 en NA Publi… TRUE
#> 8 1504 The Comedy of Err… Shake… 65 en NA Publi… TRUE
#> 9 1505 The Rape of Lucre… Shake… 65 en NA Publi… TRUE
#> 10 1507 The Tragedy of Ti… Shake… 65 en NA Publi… TRUE
#> # … with 73 more rows, and abbreviated variable names ¹gutenberg_author_id,
#> # ²language, ³gutenberg_bookshelf, ⁴has_text
# language specifications
gutenberg_works(languages = "es") %>%
count(language, sort = TRUE)
#> # A tibble: 1 × 2
#> language n
#> <chr> <int>
#> 1 es 763
gutenberg_works(languages = c("en", "es")) %>%
count(language, sort = TRUE)
#> # A tibble: 3 × 2
#> language n
#> <chr> <int>
#> 1 en 53839
#> 2 es 761
#> 3 en/es 15
gutenberg_works(languages = c("en", "es"), all_languages = TRUE) %>%
count(language, sort = TRUE)
#> # A tibble: 1 × 2
#> language n
#> <chr> <int>
#> 1 en/es 16
gutenberg_works(languages = c("en", "es"), only_languages = FALSE) %>%
count(language, sort = TRUE)
#> # A tibble: 35 × 2
#> language n
#> <chr> <int>
#> 1 en 53839
#> 2 es 761
#> 3 en/la 27
#> 4 en/fr 21
#> 5 en/eo 19
#> 6 en/es 15
#> 7 de/en 11
#> 8 en/zh 5
#> 9 en/it 4
#> 10 ang/en 3
#> # … with 25 more rows
# }