The “pkgmatch” package is a search and matching engine for R
packages. It finds the best-matching R packages to an input of either a
text description, or a local path to an R package. pkgmatch
was developed to enable rOpenSci to identify similar packages to each
new package submitted for our software peer-review
scheme. Matching packages can be found either in rOpenSci’s own package suite,
or all packages currently on
CRAN.
What does the package do?
What the package does is best understood by example, starting with loading the package.
Then match packages to an input string:
input <- "genomics and transcriptomics sequence data"
pkgmatch_similar_pkgs (input, corpus = "ropensci")#> [1] "onekp" "biomartr" "bold" "restez" "charlatan"
By default, the top five matching packages are printed to the screen.
The function actually returns information on all packages, along with a
head method to display the first few rows:
p <- pkgmatch_similar_pkgs (input, corpus = "ropensci")
head (p)#> package rank
#> 1 onekp 1
#> 2 biomartr 2
#> 3 bold 3
#> 4 restez 4
#> 5 charlatan 5
The head method also accepts an n parameter
to control how many rows are displayed, or as.data.frame
can be used to see the entire data.frame of results.
The following lines find equivalent matches against all packages currently on CRAN:
pkgmatch_similar_pkgs (input, corpus = "cran")#> [1] "gggenomes" "singleCellHaystack" "NewmanOmics"
#> [4] "biomartr" "bioseq"
Using an R package as input
The package also accepts as input a path to a local R package. The
following code downloads a “tarball” (.tar.gz file) from
CRAN and finds matching packages from that corpus. We of course expect
the best matches against CRAN packages to include that package
itself:
u <- "https://cran.r-project.org/src/contrib/Archive/odbc/odbc_1.5.0.tar.gz"
destfile <- file.path (tempdir (), basename (u))
download.file (u, destfile = destfile, quiet = TRUE)
pkgmatch_similar_pkgs (destfile, corpus = "cran")#> $text
#> [1] "swagger" "odbc" "MM" "RODBC" "datrProfile"
#>
#> $code
#> [1] "waterYearType" "odbc" "paperplanes" "RODBC"
#> [5] "italy"
which they indeed do. As explained in the documentation, the
pkgmatch_similar_pkgs() function ranks final results from
document
token-frequency analyses. The rankings from each of these components
can be seen as above with the head method:
p <- pkgmatch_similar_pkgs (destfile, corpus = "cran")
head (p)#> package version text_rank code_rank
#> 1 swagger 5.17.14.1 1 21085
#> 2 odbc 1.6.1 2 2
#> 3 MM 1.6-8 3 11137
#> 4 RODBC 1.3-26 4 4
#> 5 datrProfile 0.1.0 5 6
