Skip to contents

jst_get_article() extracts meta-data from JSTOR-XML files for journal articles.





A .xml-file for a journal-article.


A tibble containing the extracted meta-data with the following columns:

  • file_name (chr): The file_name of the original .xml-file. Can be used for joining with other parts (authors, references, footnotes, full-texts).

  • journal_doi (chr): A registered identifier for the journal.

  • journal_jcode (chr): A identifier for the journal like "amerjsoci" for the "American Journal of Sociology".

  • journal_pub_id (chr): Similar to journal_jcode. Most of the time either one is present.

  • journal_title (chr): The title of the journal.

  • article_doi (chr): A registered unique identifier for the article.

  • article_jcode (chr): A unique identifier for the article (not a DOI).

  • article_pub_id (chr): Infrequent, either part of the DOI or the article_jcode.

  • article_type (chr): The type of article (research-article, book-review, etc.).

  • article_title (chr): The title of the article.

  • volume (chr): The volume the article was published in.

  • issue (chr): The issue the article was published in.

  • language (chr): The language of the article.

  • pub_day (chr): Publication day, if specified.

  • pub_month (chr): Publication month, if specified.

  • pub_year (int): Year of publication.

  • first_page (int): Page number for the first page of the article.

  • last_page (int): Page number for the last page of the article.

  • page_range (chr): The range of pages for the article.

A note about publication dates: always the first entry is being extracted, which should correspond to the oldest date, in case there is more than one date.


#> # A tibble: 1 × 19
#>   file_name   journal_doi journal_jcode journal_pub_id journal_title article_doi
#>   <chr>       <chr>       <chr>         <chr>          <chr>         <chr>      
#> 1 article_wi… NA          tranamermicr… NA             Transactions… 10.2307/32…
#> # ℹ 13 more variables: article_pub_id <chr>, article_jcode <chr>,
#> #   article_type <chr>, article_title <chr>, volume <chr>, issue <chr>,
#> #   language <chr>, pub_day <chr>, pub_month <chr>, pub_year <int>,
#> #   first_page <chr>, last_page <chr>, page_range <chr>