Skip to contents

Converts a comparison matrix generated by pairwise_compare into a data frame of candidates for matches.

Usage

pairwise_candidates(m, directional = FALSE)

Arguments

m

A matrix from pairwise_compare.

directional

Should be set to the same value as in pairwise_compare.

Value

A data frame containing all the non-NA values from m. Columns a and b are the IDs from the original corpus as passed to the comparison function. Column score is the score returned by the comparison function.

Examples

dir <- system.file("extdata/legal", package = "textreuse")
corpus <- TextReuseCorpus(dir = dir)

m1 <- pairwise_compare(corpus, ratio_of_matches, directional = TRUE)
pairwise_candidates(m1, directional = TRUE)
#> # A tibble: 6 × 3
#>   a              b                score
#> * <chr>          <chr>            <dbl>
#> 1 ca1851-match   ca1851-nomatch 0.0140 
#> 2 ca1851-match   ny1850-match   0.695  
#> 3 ca1851-nomatch ca1851-match   0.00550
#> 4 ca1851-nomatch ny1850-match   0.00508
#> 5 ny1850-match   ca1851-match   0.737  
#> 6 ny1850-match   ca1851-nomatch 0.0140 

m2 <- pairwise_compare(corpus, jaccard_similarity)
pairwise_candidates(m2)
#> # A tibble: 3 × 3
#>   a              b                score
#> * <chr>          <chr>            <dbl>
#> 1 ca1851-match   ca1851-nomatch 0.00353
#> 2 ca1851-match   ny1850-match   0.535  
#> 3 ca1851-nomatch ny1850-match   0.00331