Skip to contents

Load pre-computed data for a specified corpus. Data types are:

  • "embeddings" for language model embeddings;

  • "idfs" for Inverse Document Frequency weightings;

  • "functions" for frequency tables for text descriptions of function calls; or

  • "calls" for frequency tables for actual function calls.

This function is called within the main pkgmatch_similar_pkgs and pkgmatch_similar_fns functions to load required data there, and should not generally need to be explicitly called.

Usage

pkgmatch_load_data(
  what = "embeddings",
  corpus = "ropensci",
  fns = FALSE,
  raw = FALSE
)

Arguments

what

One of the four data types described above: "embeddings", "idfs", "functions", or "calls".

corpus

Must be specified as one of "ropensci" or "cran". If embeddings or idfs parameters are not specified, they will be automatically downloaded for the corpus specified by this parameter. The function will then return the most similar package from the specified corpus. Note that calculations will corpus = "cran" will generally take longer, because the corpus is much larger.

fns

If FALSE (default), load embeddings for all packages; otherwise load (considerably larger dataset of) embeddings for all individual functions.

raw

Only has effect of what = "calls", in which case default of FALSE loads single Inverse Document Frequency table to entire corpus; otherwise if TRUE, loads raw function call counts for each package in corpus.

Value

The loaded data.

Examples

if (FALSE) { # \dontrun{
embeddings <- pkgmatch_load_data ("embeddings")
embeddings_fns <- pkgmatch_load_data ("embeddings", fns = TRUE)
idfs <- pkgmatch_load_data ("idfs")
idfs_fns <- pkgmatch_load_data ("idfs", fns = TRUE)
} # }