Package index • textreuse

All functions

TextReuseCorpus() is.TextReuseCorpus() skipped(): TextReuseCorpus

tokens() `tokens<-`() hashes() `hashes<-`() minhashes() `minhashes<-`(): Accessors for TextReuse objects

TextReuseTextDocument() is.TextReuseTextDocument() has_content() has_tokens() has_hashes() has_minhashes(): TextReuseTextDocument

align_local(): Local alignment of natural language texts

as.matrix(<textreuse_candidates>): Convert candidates data frames to other formats

filenames(): Filenames from paths

hash_string(): Hash a string to an integer

lsh(): Locality sensitive hashing for minhash

lsh_candidates(): Candidate pairs from LSH comparisons

lsh_compare(): Compare candidates identified by LSH

lsh_probability() lsh_threshold(): Probability that a candidate pair will be detected with LSH

lsh_query(): Query a LSH cache for matches to a single document

lsh_subset(): List of all candidates in a corpus

minhash_generator(): Generate a minhash function

pairwise_candidates(): Candidate pairs from pairwise comparisons

pairwise_compare(): Pairwise comparisons among documents in a corpus

rehash(): Recompute the hashes for a document or corpus

jaccard_similarity() jaccard_dissimilarity() jaccard_bag_similarity() ratio_of_matches(): Measure similarity/dissimilarity in documents

textreuse textreuse-package: textreuse: Detect Text Reuse and Document Similarity

tokenize(): Recompute the tokens for a document or corpus

tokenize_words() tokenize_sentences() tokenize_ngrams() tokenize_skip_ngrams(): Split texts into tokens

wordcount(): Count words