Changelog

pangoling 1.0.3

CRAN release: 2025-04-07

Internal changes to comply with CRAN requirements.
HF_HOME is used now to store the models rather than TRANSFORMERS_CACHE

pangoling 1.0.2

Internal changes: OMP THREAD LIMIT was set to 1.

pangoling 1.0.1

New Features

Added installed_py_pangoling() to check if required Python dependencies (transformers and torch) are installed.

Other changes

Informative startup message if python dependencies not installed.
Documentation examples won’t run if python dependencies not installed
Articles are now pre-computed vignettes. See

pangoling 1.0.0

changed the ownership of the repo to ropensci
deprecated functions are now defunct and have been replaced with their respective alternative functions

pangoling 0.0.0.9011

Added word_n argument in causal_words_pred() to indicate word order of the texts.
Allows for models with larger vocabulary than tokenizer.

pangoling 0.0.0.9010

New Features:

Added checkpoint parameter to causal_preload() and masked_preload() to allow loading models from checkpoints.
Introduced causal_next_tokens_pred_tbl(), which replaces causal_next_tokens_tbl() and provides improved predictability calculations.
Added causal_words_pred(), causal_targets_pred(), and causal_tokens_pred_lst() to compute predictability for words, phrases, or tokens, replacing causal_lp() and causal_tokens_lp_tbl().
Introduced masked_tokens_pred_tbl(), replacing masked_tokens_tbl(), for retrieving possible tokens and their log probabilities.
Introduced masked_targets_pred(), replacing masked_lp(), for calculating predictability based on left and right context.
Introduced transformer_vocab() with an optional decode parameter to return decoded tokenized words.
New dataset df_jaeger14: Self-paced reading data on Chinese relative clauses.
New dataset df_sent: Example dataset with two word-by-word sentences.
New vignette: Added a worked-out example of a causal model.

Enhancements:

Added sep argument in causal_words_pred() to support languages without spaces between words (e.g., Chinese).
New log.p argument across multiple functions to specify how predictability is calculated (e.g., log base e, log base 2 for bits, or raw probabilities).
Improved tokenization utilities: tokenize_lst() now supports decoded outputs via the decode parameter.
Updated install_py_pangoling() to enhance Python environment handling.
Added perplexity_calc() for computing perplexity from probabilities.

Deprecations:

Deprecated causal_next_tokens_tbl(), causal_lp(), causal_tokens_lp_tbl(), and causal_lp_mats(). Use causal_next_tokens_pred_tbl(), causal_targets_pred(), causal_words_pred(), and causal_pred_mats() instead.
Deprecated masked_tokens_tbl() and masked_lp(). Use masked_tokens_pred_tbl() and masked_targets_pred() instead.

pangoling 0.0.0.9009

Deprecated .by in favor of by.

pangoling 0.0.0.9008

Fix a bug when .by is unordered

pangoling 0.0.0.9007

set_cache_folder() function added.
Message when the package loads.
New troubleshooting vignette.

pangoling 0.0.0.9006

causal_lp get a l_contexts argument.
Checkpoints work for causal models (not yet for masked models).
Ropensci badge added.

pangoling 0.0.0.9005

Strings with no tokens no longer throw errors.
Requires correct version of R.

pangoling 0.0.0.9004

Causal models accept batches.

pangoling 0.0.0.9003

bug in causal_tokens_lp_tbl fixed

pangoling 0.0.0.9002

minor function names to avoid conflict with other packages

pangoling 0.0.0.9001

Tons of stuff. Fully functional package now.

pangoling 0.0.0.9000

First release!

pangoling 1.0.3

pangoling 1.0.2

pangoling 1.0.1

New Features

Other changes

pangoling 1.0.0

pangoling 0.0.0.9011

pangoling 0.0.0.9010

New Features:

Enhancements:

Deprecations:

pangoling 0.0.0.9009

pangoling 0.0.0.9008

pangoling 0.0.0.9007

pangoling 0.0.0.9006

pangoling 0.0.0.9005

pangoling 0.0.0.9004

pangoling 0.0.0.9003

pangoling 0.0.0.9002

pangoling 0.0.0.9001

pangoling 0.0.0.9000

About

Community

Resources