Given a TextReuseCorpus containing documents of class
TextReuseTextDocument, this function applies a comparison
function to every pairing of documents, and returns a matrix with the
comparison scores.
Usage
pairwise_compare(corpus, f, ..., directional = FALSE, progress = interactive())Arguments
- corpus
- f
The function to apply to
xandy.- ...
Additional arguments passed to
f.- directional
Some comparison functions are commutative, so that
f(a, b) == f(b, a)(e.g.,jaccard_similarity). Other functions are directional, so thatf(a, b)measuresa's borrowing fromb, which may not be the same asf(b, a)(e.g.,ratio_of_matches). IfdirectionalisFALSE, then only the minimum number of comparisons will be made, i.e., the upper triangle of the matrix. IfdirectionalisTRUE, then both directional comparisons will be measured. In no case, however, will documents be compared to themselves, i.e., the diagonal of the matrix.- progress
Display a progress bar while comparing documents.
Value
A square matrix with dimensions equal to the length of the corpus,
and row and column names set by the names of the documents in the corpus. A
value of NA in the matrix indicates that a comparison was not made.
In cases of directional comparisons, then the comparison reported is
f(row, column).
See also
See these document comparison functions,
jaccard_similarity, ratio_of_matches.
Examples
dir <- system.file("extdata/legal", package = "textreuse")
corpus <- TextReuseCorpus(dir = dir)
names(corpus) <- filenames(names(corpus))
# A non-directional comparison
pairwise_compare(corpus, jaccard_similarity)
#> ca1851-match ca1851-nomatch ny1850-match
#> ca1851-match NA 0.003529412 0.534753363
#> ca1851-nomatch NA NA 0.003307607
#> ny1850-match NA NA NA
# A directional comparison
pairwise_compare(corpus, ratio_of_matches, directional = TRUE)
#> ca1851-match ca1851-nomatch ny1850-match
#> ca1851-match NA 0.01395349 0.695431472
#> ca1851-nomatch 0.005502063 NA 0.005076142
#> ny1850-match 0.737276479 0.01395349 NA
