For all sequences in a cluster(s) calculate the frequency of separate words in either the sequence definitions or the reported feature name.
Usage
calc_wrdfrq(
phylota,
cid,
min_frq = 0.1,
min_nchar = 1,
type = c("dfln", "nm"),
ignr_pttrn = "[^a-z0-9]"
)
Arguments
- phylota
Phylota object
- cid
Cluster ID(s)
- min_frq
Minimum frequency
- min_nchar
Minimum number of characters for a word
- type
Definitions (dfln) or features (nm)
- ignr_pttrn
Ignore pattern, REGEX for text to ignore.
Details
By default, anything that is not alphanumeric is ignored. 'dfln' and 'nm' match the slot names in a SeqRec, see list_seqrec_slots().
See also
Other tools-public:
calc_mad()
,
drop_by_rank()
,
drop_clstrs()
,
drop_sqs()
,
get_clstr_slot()
,
get_nsqs()
,
get_ntaxa()
,
get_sq_slot()
,
get_stage_times()
,
get_tx_slot()
,
get_txids()
,
is_txid_in_clstr()
,
is_txid_in_sq()
,
list_clstrrec_slots()
,
list_ncbi_ranks()
,
list_seqrec_slots()
,
list_taxrec_slots()
,
plot_phylota_pa()
,
plot_phylota_treemap()
,
read_phylota()
,
write_sqs()
Examples
data('dragonflies')
# work out what gene region the cluster is likely representing with word freqs.
random_cids <- sample(dragonflies@cids, 10)
# most frequent words in definition line
(calc_wrdfrq(phylota = dragonflies, cid = random_cids, type = 'dfln'))
#> $`129`
#> named numeric(0)
#>
#> $`551`
#> wrds
#> rrna and
#> 0.1578947 0.1052632
#>
#> $`485`
#> named numeric(0)
#>
#> $`148`
#> sequence
#> 0.15
#>
#> $`426`
#> named numeric(0)
#>
#> $`404`
#> wrds
#> cds gene histone isolate partial petalura
#> 0.125 0.125 0.125 0.125 0.125 0.125
#>
#> $`576`
#> wrds
#> rrna and
#> 0.1666667 0.1111111
#>
#> $`615`
#> named numeric(0)
#>
#> $`689`
#> wrds
#> alpha cds elongation factor gene macromia partial
#> 0.1071429 0.1071429 0.1071429 0.1071429 0.1071429 0.1071429 0.1071429
#>
#> $`735`
#> named numeric(0)
#>
# most frequent words in feature name
(calc_wrdfrq(phylota = dragonflies, cid = random_cids, type = 'nm'))
#> $`129`
#> numeric(0)
#>
#> $`551`
#> wrds
#> internal spacer transcribed
#> 0.3333333 0.3333333 0.3333333
#>
#> $`485`
#> numeric(0)
#>
#> $`148`
#> wrds
#> internal spacer transcribed
#> 0.3333333 0.3333333 0.3333333
#>
#> $`426`
#> wrds
#> rhlwc1 rhlwd1 rhlwe1 rhlwf1 rhlwf2 rhlwf3 rhlwf4
#> 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571
#>
#> $`404`
#> numeric(0)
#>
#> $`576`
#> wrds
#> internal spacer transcribed
#> 0.3333333 0.3333333 0.3333333
#>
#> $`615`
#> numeric(0)
#>
#> $`689`
#> numeric(0)
#>
#> $`735`
#> numeric(0)
#>