Skip to contents

All functions

TextReuseCorpus() is.TextReuseCorpus() skipped()
TextReuseCorpus
tokens() `tokens<-`() hashes() `hashes<-`() minhashes() `minhashes<-`()
Accessors for TextReuse objects
TextReuseTextDocument() is.TextReuseTextDocument() has_content() has_tokens() has_hashes() has_minhashes()
TextReuseTextDocument
align_local()
Local alignment of natural language texts
as.matrix(<textreuse_candidates>)
Convert candidates data frames to other formats
filenames()
Filenames from paths
hash_string()
Hash a string to an integer
lsh()
Locality sensitive hashing for minhash
lsh_candidates()
Candidate pairs from LSH comparisons
lsh_compare()
Compare candidates identified by LSH
lsh_probability() lsh_threshold()
Probability that a candidate pair will be detected with LSH
lsh_query()
Query a LSH cache for matches to a single document
lsh_subset()
List of all candidates in a corpus
minhash_generator()
Generate a minhash function
pairwise_candidates()
Candidate pairs from pairwise comparisons
pairwise_compare()
Pairwise comparisons among documents in a corpus
rehash()
Recompute the hashes for a document or corpus
jaccard_similarity() jaccard_dissimilarity() jaccard_bag_similarity() ratio_of_matches()
Measure similarity/dissimilarity in documents
textreuse textreuse-package
textreuse: Detect Text Reuse and Document Similarity
tokenize()
Recompute the tokens for a document or corpus
tokenize_words() tokenize_sentences() tokenize_ngrams() tokenize_skip_ngrams()
Split texts into tokens
wordcount()
Count words