jst_get_chapters()
extracts meta-data from JSTOR-XML files for book
chapters.
Value
A tibble
containing the extracted meta-data with the following
columns:
book_id (chr): The book id of type "jstor", which is not a registered DOI.
file_name (chr): The filename of the original .xml-file. Can be used for joining with other data for the same file.
part_id (chr): The id of the part.
part_label (chr): A label for the part, if specified.
part_title (chr): The title of the part.
part_subtitle (chr): The subtitle of the part, if specified.
authors (list): A list-column with information on the authors. Can be unnested with
tidyr::unnest()
. See the examples andjst_get_authors()
.abstract (chr): The abstract to the part.
part_first_page (chr): The page where the part begins.
Details
Currently, jst_get_chapters()
is quite a lot slower than most of the other
functions. It is roughly 10 times slower than jst_get_book
, depending on
the number of chapters to extract.
Examples
# extract parts without authors
jst_get_chapters(jst_example("book.xml"))
#> # A tibble: 36 × 9
#> book_id file_name part_id part_label part_title part_subtitle authors
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 j.ctt24hdz7 book j.ctt24hdz… NA Front Mat… NA NA
#> 2 j.ctt24hdz7 book j.ctt24hdz… NA Table of … NA NA
#> 3 j.ctt24hdz7 book j.ctt24hdz… NA Acronyms … NA NA
#> 4 j.ctt24hdz7 book j.ctt24hdz… NA Authors’ … NA NA
#> 5 j.ctt24hdz7 book j.ctt24hdz… 1. The enigm… NA NA
#> 6 j.ctt24hdz7 book j.ctt24hdz… 2. ‘Anxiety,… Fiji’s road … NA
#> 7 j.ctt24hdz7 book j.ctt24hdz… 3. Fiji’s De… Who, what, w… NA
#> 8 j.ctt24hdz7 book j.ctt24hdz… 4. ‘This pro… The aftermat… NA
#> 9 j.ctt24hdz7 book j.ctt24hdz… 5. The chang… NA NA
#> 10 j.ctt24hdz7 book j.ctt24hdz… 6. The Fiji … Analyzing th… NA
#> # ℹ 26 more rows
#> # ℹ 2 more variables: abstract <chr>, part_first_page <chr>
# import authors too
parts <- jst_get_chapters(jst_example("book.xml"), authors = TRUE)
parts
#> # A tibble: 36 × 9
#> book_id file_name part_id part_label part_title part_subtitle authors
#> <chr> <chr> <chr> <chr> <chr> <chr> <list>
#> 1 j.ctt24hdz7 book j.ctt24hd… NA Front Mat… NA <tibble>
#> 2 j.ctt24hdz7 book j.ctt24hd… NA Table of … NA <tibble>
#> 3 j.ctt24hdz7 book j.ctt24hd… NA Acronyms … NA <tibble>
#> 4 j.ctt24hdz7 book j.ctt24hd… NA Authors’ … NA <tibble>
#> 5 j.ctt24hdz7 book j.ctt24hd… 1. The enigm… NA <tibble>
#> 6 j.ctt24hdz7 book j.ctt24hd… 2. ‘Anxiety,… Fiji’s road … <tibble>
#> 7 j.ctt24hdz7 book j.ctt24hd… 3. Fiji’s De… Who, what, w… <tibble>
#> 8 j.ctt24hdz7 book j.ctt24hd… 4. ‘This pro… The aftermat… <tibble>
#> 9 j.ctt24hdz7 book j.ctt24hd… 5. The chang… NA <tibble>
#> 10 j.ctt24hdz7 book j.ctt24hd… 6. The Fiji … Analyzing th… <tibble>
#> # ℹ 26 more rows
#> # ℹ 2 more variables: abstract <chr>, part_first_page <chr>
tidyr::unnest(parts)
#> Warning: `cols` is now required when using `unnest()`.
#> ℹ Please use `cols = c(authors)`.
#> # A tibble: 39 × 14
#> book_id file_name part_id part_label part_title part_subtitle prefix
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 j.ctt24hdz7 book j.ctt24hdz7… NA Front Mat… NA NA
#> 2 j.ctt24hdz7 book j.ctt24hdz7… NA Table of … NA NA
#> 3 j.ctt24hdz7 book j.ctt24hdz7… NA Acronyms … NA NA
#> 4 j.ctt24hdz7 book j.ctt24hdz7… NA Authors’ … NA NA
#> 5 j.ctt24hdz7 book j.ctt24hdz7… 1. The enigm… NA NA
#> 6 j.ctt24hdz7 book j.ctt24hdz7… 1. The enigm… NA NA
#> 7 j.ctt24hdz7 book j.ctt24hdz7… 2. ‘Anxiety,… Fiji’s road … NA
#> 8 j.ctt24hdz7 book j.ctt24hdz7… 3. Fiji’s De… Who, what, w… NA
#> 9 j.ctt24hdz7 book j.ctt24hdz7… 4. ‘This pro… The aftermat… NA
#> 10 j.ctt24hdz7 book j.ctt24hdz7… 5. The chang… NA NA
#> # ℹ 29 more rows
#> # ℹ 7 more variables: given_name <chr>, surname <chr>, string_name <chr>,
#> # suffix <chr>, author_number <dbl>, abstract <chr>, part_first_page <chr>