
pangoling: Access to Large Language Model Predictions
Source:R/pangoling-package.R
pangoling-package.Rd
Provides access to word predictability estimates using large language models (LLMs) based on 'transformer' architectures via integration with the 'Hugging Face' ecosystem. The package interfaces with pre-trained neural networks and supports both causal/auto-regressive LLMs (e.g., 'GPT-2'; Radford et al., 2019) and masked/bidirectional LLMs (e.g., 'BERT'; Devlin et al., 2019, doi:10.48550/arXiv.1810.04805 ) to compute the probability of words, phrases, or tokens given their linguistic context. By enabling a straightforward estimation of word predictability, the package facilitates research in psycholinguistics, computational linguistics, and natural language processing (NLP).
Details
These options are used to control various aspects of the pangoling package.
Users can customize these options via the options()
function by specifying
pangoling.<option>
names.
pangoling.debug
: Logical; whenTRUE
, enables debugging mode. Default isFALSE
.pangoling.verbose
: Integer; controls the verbosity level (e.g., 0 = silent, 1 = minimal, 2 = detailed). Default is2
.pangoling.log.p
: Logical; ifTRUE
(default), pangoling outputs log-transformed probabilities with base e, if FALSE the output are raw probabilities. Alternativelylog.p
can be the base of other logarithmic transformations (e.g., base1/2
, to get surprisal values in bits rather than predictability).pangoling.cache
: A cache object created withcachem::cache_mem
, allowing you to specify the maximum size (in bytes) for cached objects. Default is1024 * 1024^2
bytes (1 MB).pangoling.causal.default
: Character string; specifies the default model for causal language processing. Default is"gpt2"
.pangoling.masked.default
: Character string; specifies the default model for masked language processing. Default is"bert-base-uncased"
.
Use options(pangoling.<option> = <value>)
to set these options in your
session.
Author
Maintainer: Bruno Nicenboim b.nicenboim@tilburguniversity.edu (ORCID)
Other contributors:
Chris Emmerly [contributor]
Giovanni Cassani [contributor]
Lisa Levinson [reviewer]
Utku Turk [reviewer]