Search, download, and process public domain texts from the Project Gutenberg collection.
Installation
Install the released version from CRAN:
install.packages("gutenbergr")Quick Start
Load the package:
We’ll get and set our Project Gutenberg mirror:
Search through the metadata to find a book:
gutenberg_works(title == "Persuasion")#> # A tibble: 1 × 8
#> gutenberg_id title author gutenberg_author_id language
#> <int> <chr> <chr> <int> <fct>
#> 1 105 Persuasion Austen, Jane 68 en
#> gutenberg_bookshelf rights has_text
#> <chr> <fct> <lgl>
#> 1 Category: Novels/Category: British Literature Public domain in the USA. TRUEPersuasion’s gutenberg_id is 105. We’ll use it to download it. We’ll set our cache option to "persistent" so that we don’t have to re-download it later.
options(gutenbergr_cache_type = "persistent")
persuasion <- gutenberg_download(105)
persuasion#> # A tibble: 8,357 × 2
#> gutenberg_id text
#> <int> <chr>
#> 1 105 "Persuasion"
#> 2 105 ""
#> 3 105 ""
#> 4 105 "by Jane Austen"
#> 5 105 ""
#> 6 105 "(1818)"
#> 7 105 ""
#> 8 105 ""
#> 9 105 ""
#> 10 105 ""
#> # ℹ 8,347 more rowsMultiple works can be downloaded at once. We’ll add title data from the metadata.
books <- gutenberg_download(c(105, 161), meta_fields = "title")
books |> count(title)Vignettes
See the following vignettes for more advanced usage of gutenbergr.
- Getting Started with gutenbergr - explore metadata and download books
- Text Mining with gutenbergr and tidytext - complete analysis workflow with tidytext
FAQ
How were the metadata files generated?
See the data-raw directory for scripts. Metadata was generated from the Project Gutenberg catalog on 11 January 2026.
Do you respect robot access rules?
Yes! The package follows Project Gutenberg’s rules:
- Retrieves books directly from mirrors using the authorized link format
- Prioritizes
.zipfiles to minimize bandwidth - Supports session and persistent caching
- This package is designed for downloading individual works or small collections, not the entire corpus. For bulk downloads, set up a mirror.
See their Terms of Use for details.
Contributing
See CONTRIBUTING.md.
Note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
