Author: Thomas Klebel
License: GPL v3.0
The tool Data for Research (DfR) by JSTOR is a valuable source for citation analysis and text mining. jstor provides functions and suggests workflows for importing datasets from DfR. It was developed to deal with very large datasets which require an agreement, but can be used with smaller ones as well.
The most important set of functions is a group of jst_get_* functions:
jst_get_articlejst_get_authorsjst_get_referencesjst_get_footnotesjst_get_bookjst_get_chaptersjst_get_full_textjst_get_ngramAll functions which are concerned with meta data (therefore excluding jst_get_full_text and jst_get_ngram) operate along the same lines:
xml2::read_xml().tibble.To install the package use:
install.packages("jstor")
You can install the development version from GitHub with:
# install.packages("remotes") remotes::install_github("ropensci/jstor")
In order to use jstor, you first need to load it:
The basic usage is simple: supply one of the jst_get_*-functions with a path and it will return a tibble with the extracted information.
jst_get_article(jst_example("article_with_references.xml")) %>% knitr::kable()
| file_name | journal_doi | journal_jcode | journal_pub_id | journal_title | article_doi | article_pub_id | article_jcode | article_type | article_title | volume | issue | language | pub_day | pub_month | pub_year | first_page | last_page | page_range |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| article_with_references | NA | tranamermicrsoci | NA | Transactions of the American Microscopical Society | 10.2307/3221896 | NA | NA | research-article | On the Protozoa Parasitic in Frogs | 41 | 2 | eng | 1 | 4 | 1922 | 59 | 76 | 59-76 |
jst_get_authors(jst_example("article_with_references.xml")) %>% knitr::kable()
| file_name | prefix | given_name | surname | string_name | suffix | author_number |
|---|---|---|---|---|---|---|
| article_with_references | NA | R. | Kudo | NA | NA | 1 |
Further explanations, especially on how to use jstor’s functions for importing many files, can be found in the vignettes.
In order to use jstor, you need some data from DfR. From the main page you can create a dataset by searching for terms and restricting the search regarding time, subject and content type. After you created an account, you can download your selection. Alternatively, you can download sample datasets with documents from before 1923 for the US, and before 1870 for all other countries.
In their technical specifications, DfR lists fields which should be reliably present in all articles and books.
The following table gives an overview, which elements are supported by jstor.
xml-field |
reliably present | supported in jstor
|
|---|---|---|
| journal-id (type=“jstor”) | x | x |
| journal-id (type=“publisher-id”) | x | x |
| journal-id (type=“doi”) | x | |
| issn | x | |
| journal-title | x | x |
| publisher-name | x | |
| article-id (type=“doi”) | x | x |
| article-id (type=“jstor”) | x | x |
| article-id (type=“publisher-id”) | x | |
| article-type | x | |
| volume | x | |
| issue | x | |
| article-categories | x | |
| article-title | x | x |
| contrib-group | x | x |
| pub-date | x | x |
| fpage | x | x |
| lpage | x | |
| page-range | x | |
| product | x | |
| self-uri | x | |
| kwd-group | x | |
| custom-meta-group | x | x |
| fn-group (footnotes) | x | |
| ref-list (references) | x |
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
To cite jstor, please refer to citation(package = "jstor"):
Work on jstor benefited from financial support for the project “Academic Super-Elites in Sociology and Economics” by the Austrian Science Fund (FWF), project number “P 29211 Einzelprojekte”.
Some internal functions regarding file paths and example files were adapted from the package readr.