Skip to contents

The “pkgmatch” package is a search and matching engine for R packages. It finds the best-matching R packages to an input of either a text description, or a local path to an R package. pkgmatch was developed to enable rOpenSci to identify similar packages to each new package submitted for our software peer-review scheme. Matching packages can be found either in rOpenSci’s own package suite, or all packages currently on CRAN.

What does the package do?

What the package does is best understood by example, starting with loading the package.

Then match packages to an input string:

input <- "genomics and transcriptomics sequence data"
pkgmatch_similar_pkgs (input, corpus = "ropensci")
#> [1] "onekp"     "biomartr"  "bold"      "restez"    "charlatan"

By default, the top five matching packages are printed to the screen. The function actually returns information on all packages, along with a head method to display the first few rows:

p <- pkgmatch_similar_pkgs (input, corpus = "ropensci")
head (p)
#>     package rank
#> 1     onekp    1
#> 2  biomartr    2
#> 3      bold    3
#> 4    restez    4
#> 5 charlatan    5

The head method also accepts an n parameter to control how many rows are displayed, or as.data.frame can be used to see the entire data.frame of results.

The following lines find equivalent matches against all packages currently on CRAN:

pkgmatch_similar_pkgs (input, corpus = "cran")
#> [1] "gggenomes"          "singleCellHaystack" "NewmanOmics"       
#> [4] "biomartr"           "bioseq"

Using an R package as input

The package also accepts as input a path to a local R package. The following code downloads a “tarball” (.tar.gz file) from CRAN and finds matching packages from that corpus. We of course expect the best matches against CRAN packages to include that package itself:

u <- "https://cran.r-project.org/src/contrib/Archive/odbc/odbc_1.5.0.tar.gz"
destfile <- file.path (tempdir (), basename (u))
download.file (u, destfile = destfile, quiet = TRUE)
pkgmatch_similar_pkgs (destfile, corpus = "cran")
#> $text
#> [1] "swagger"     "odbc"        "MM"          "RODBC"       "datrProfile"
#> 
#> $code
#> [1] "waterYearType" "odbc"          "paperplanes"   "RODBC"        
#> [5] "italy"

which they indeed do. As explained in the documentation, the pkgmatch_similar_pkgs() function ranks final results from document token-frequency analyses. The rankings from each of these components can be seen as above with the head method:

p <- pkgmatch_similar_pkgs (destfile, corpus = "cran")
head (p)
#>       package   version text_rank code_rank
#> 1     swagger 5.17.14.1         1     21085
#> 2        odbc     1.6.1         2         2
#> 3          MM     1.6-8         3     11137
#> 4       RODBC    1.3-26         4         4
#> 5 datrProfile     0.1.0         5         6