
Find R packages matching an input of either text or another package
Source:R/similar-pkgs.R
pkgmatch_similar_pkgs.RdThis function accepts as input either a text description, the
name of a locally-installed package, or a path to a local directory containing an R package.
It ranks all R packages within the specified corpus in terms of how well they
match that input. The "corpus" argument can specify either rOpenSci's package suite,
CRAN, or
Bioconductor.
Ranks are obtained from scores derived from "Best Match 25" (BM25) scores based on document token frequencies.
Ranks are generally obtained by matching both for full package text from the specified corpus, including all long-form documentation, and by matching package descriptions only. The function returns a single rank derived by combining individual ranks using the Reciprocal Rank Fusion (RRF) algorithm.
Finally, all components of this function are locally cached for each call
(by the memoise package), so additional calls to this function with
the same input and corpus should be much faster than initial calls.
Arguments
- input
Either a text string, a path to local source code of an R package, or the name of any installed R package.
- corpus
Must be specified as one of "ropensci", "cran", or "bioc" (for BioConductor). If
idfsparameter is not specified, data will be automatically downloaded for the corpus specified by this parameter. The function will then return the most similar package from the specified corpus. Note that calculations willcorpus = "cran"will generally take longer, because the corpus is much larger.- idfs
Inverse Document Frequency tables for a suite of packages, generated from pkgmatch_bm25. If not provided, pre-generated IDF tables will be downloaded and stored in a local cache directory.
- n
When the result of this function is printed to screen, the top
npackages will be displayed.- browse
If
TRUE, automatically open webpages of the topnmatches in local browser.
Value
A data.frame with a "package" column naming packages, and a
column of package ranks, with 1 being most similar. For the CRAN corpus, a
column of package versions is also included.
The returned object has a default print method which prints the best 5
matches directly to the screen, yet returns information on all packages
within the specified corpus. There is also a head method to print the
first few entries of these full data (default n = 5). To see all data, use
as.data.frame().
Note
The first time this function is run without passing idfs, required
values will be automatically downloaded and stored in a locally persistent
cache directory. Especially for the "cran" corpus, this downloading may take
quite some time.
See also
Other main:
pkgmatch_similar_fns()
Examples
# The following function simulates remote data in temporary directory, to
# enable package usage without downloading. Do not run for normal usage.
generate_pkgmatch_example_data ()
#> This function resets the cache directory used by 'pkgmatch'
#> to a temporary path. To restore functionality with full data,
#> you'll either need to restart your R session, or set an
#> environment variable named 'PKGMATCH_CACHE_DIR' to the
#> desired path. Default path is /tmp/RtmpUOjjE7/pkgmatch_ex_data
input <- "curl" # Name of a single installed package
p <- pkgmatch_similar_pkgs (input, corpus = "cran")
p # Default print method, lists 5 best matching packages
#> [1] "httr" "RCurl" "pkgcache" "crul" "httr2"
head (p) # Shows first 5 rows of full `data.frame` object
#> package version rank
#> 1 httr 1.4.7 1
#> 2 RCurl 1.98 2
#> 3 pkgcache 2.2.3 3
#> 4 crul 1.5.0 4
#> 5 httr2 1.1.2 5