pangoling 1.0.3
CRAN release: 2025-04-07
- Internal changes to comply with CRAN requirements.
- HF_HOME is used now to store the models rather than TRANSFORMERS_CACHE
pangoling 1.0.1
New Features
- Added
installed_py_pangoling()
to check if required Python dependencies (transformers
andtorch
) are installed.
pangoling 1.0.0
- changed the ownership of the repo to ropensci
- deprecated functions are now defunct and have been replaced with their respective alternative functions
pangoling 0.0.0.9011
- Added
word_n
argument incausal_words_pred()
to indicate word order of the texts. - Allows for models with larger vocabulary than tokenizer.
pangoling 0.0.0.9010
New Features:
- Added
checkpoint
parameter tocausal_preload()
andmasked_preload()
to allow loading models from checkpoints. - Introduced
causal_next_tokens_pred_tbl()
, which replacescausal_next_tokens_tbl()
and provides improved predictability calculations. - Added
causal_words_pred()
,causal_targets_pred()
, andcausal_tokens_pred_lst()
to compute predictability for words, phrases, or tokens, replacingcausal_lp()
andcausal_tokens_lp_tbl()
. - Introduced
masked_tokens_pred_tbl()
, replacingmasked_tokens_tbl()
, for retrieving possible tokens and their log probabilities. - Introduced
masked_targets_pred()
, replacingmasked_lp()
, for calculating predictability based on left and right context. - Introduced
transformer_vocab()
with an optionaldecode
parameter to return decoded tokenized words. -
New dataset
df_jaeger14
: Self-paced reading data on Chinese relative clauses. -
New dataset
df_sent
: Example dataset with two word-by-word sentences. - New vignette: Added a worked-out example of a causal model.
Enhancements:
- Added
sep
argument incausal_words_pred()
to support languages without spaces between words (e.g., Chinese). - New
log.p
argument across multiple functions to specify how predictability is calculated (e.g., log base e, log base 2 for bits, or raw probabilities). - Improved tokenization utilities:
tokenize_lst()
now supports decoded outputs via thedecode
parameter. - Updated
install_py_pangoling()
to enhance Python environment handling. - Added
perplexity_calc()
for computing perplexity from probabilities.
Deprecations:
- Deprecated
causal_next_tokens_tbl()
,causal_lp()
,causal_tokens_lp_tbl()
, andcausal_lp_mats()
. Usecausal_next_tokens_pred_tbl()
,causal_targets_pred()
,causal_words_pred()
, andcausal_pred_mats()
instead. - Deprecated
masked_tokens_tbl()
andmasked_lp()
. Usemasked_tokens_pred_tbl()
andmasked_targets_pred()
instead.
pangoling 0.0.0.9007
-
set_cache_folder()
function added. - Message when the package loads.
- New troubleshooting vignette.