pangoling 1.0.3
CRAN release: 2025-04-07
- Internal changes to comply with CRAN requirements.
- HF_HOME is used now to store the models rather than TRANSFORMERS_CACHE
pangoling 1.0.1
New Features
- Added
installed_py_pangoling()to check if required Python dependencies (transformersandtorch) are installed.
pangoling 1.0.0
- changed the ownership of the repo to ropensci
- deprecated functions are now defunct and have been replaced with their respective alternative functions
pangoling 0.0.0.9011
- Added
word_nargument incausal_words_pred()to indicate word order of the texts. - Allows for models with larger vocabulary than tokenizer.
pangoling 0.0.0.9010
New Features:
- Added
checkpointparameter tocausal_preload()andmasked_preload()to allow loading models from checkpoints. - Introduced
causal_next_tokens_pred_tbl(), which replacescausal_next_tokens_tbl()and provides improved predictability calculations. - Added
causal_words_pred(),causal_targets_pred(), andcausal_tokens_pred_lst()to compute predictability for words, phrases, or tokens, replacingcausal_lp()andcausal_tokens_lp_tbl(). - Introduced
masked_tokens_pred_tbl(), replacingmasked_tokens_tbl(), for retrieving possible tokens and their log probabilities. - Introduced
masked_targets_pred(), replacingmasked_lp(), for calculating predictability based on left and right context. - Introduced
transformer_vocab()with an optionaldecodeparameter to return decoded tokenized words. -
New dataset
df_jaeger14: Self-paced reading data on Chinese relative clauses. -
New dataset
df_sent: Example dataset with two word-by-word sentences. - New vignette: Added a worked-out example of a causal model.
Enhancements:
- Added
separgument incausal_words_pred()to support languages without spaces between words (e.g., Chinese). - New
log.pargument across multiple functions to specify how predictability is calculated (e.g., log base e, log base 2 for bits, or raw probabilities). - Improved tokenization utilities:
tokenize_lst()now supports decoded outputs via thedecodeparameter. - Updated
install_py_pangoling()to enhance Python environment handling. - Added
perplexity_calc()for computing perplexity from probabilities.
Deprecations:
- Deprecated
causal_next_tokens_tbl(),causal_lp(),causal_tokens_lp_tbl(), andcausal_lp_mats(). Usecausal_next_tokens_pred_tbl(),causal_targets_pred(),causal_words_pred(), andcausal_pred_mats()instead. - Deprecated
masked_tokens_tbl()andmasked_lp(). Usemasked_tokens_pred_tbl()andmasked_targets_pred()instead.
pangoling 0.0.0.9007
-
set_cache_folder()function added. - Message when the package loads.
- New troubleshooting vignette.
