Converts a comparison matrix generated by pairwise_compare
into a
data frame of candidates for matches.
Arguments
- m
A matrix from
pairwise_compare
.- directional
Should be set to the same value as in
pairwise_compare
.
Value
A data frame containing all the non-NA
values from m
.
Columns a
and b
are the IDs from the original corpus as
passed to the comparison function. Column score
is the score
returned by the comparison function.
Examples
dir <- system.file("extdata/legal", package = "textreuse")
corpus <- TextReuseCorpus(dir = dir)
m1 <- pairwise_compare(corpus, ratio_of_matches, directional = TRUE)
pairwise_candidates(m1, directional = TRUE)
#> # A tibble: 6 × 3
#> a b score
#> * <chr> <chr> <dbl>
#> 1 ca1851-match ca1851-nomatch 0.0140
#> 2 ca1851-match ny1850-match 0.695
#> 3 ca1851-nomatch ca1851-match 0.00550
#> 4 ca1851-nomatch ny1850-match 0.00508
#> 5 ny1850-match ca1851-match 0.737
#> 6 ny1850-match ca1851-nomatch 0.0140
m2 <- pairwise_compare(corpus, jaccard_similarity)
pairwise_candidates(m2)
#> # A tibble: 3 × 3
#> a b score
#> * <chr> <chr> <dbl>
#> 1 ca1851-match ca1851-nomatch 0.00353
#> 2 ca1851-match ny1850-match 0.535
#> 3 ca1851-nomatch ny1850-match 0.00331