This vignette covers the second goal of distionary: to
evaluate probability distributions, even when that property is not
specified in the distribution’s definition.
Distributional Representations
A distributional representation is a mathematical function that completely defines a probability distribution. Unlike a simple property (such as the mean or variance), a representation contains enough information that any other property or representation can be calculated from it.
The key innovation in distionary is that these
representations are interconnected through a network of relationships,
allowing you to specify a distribution using any available
representation and automatically derive others as needed. For example,
if you specify only a CDF, distionary can compute the
quantile function, mean, variance, and other properties.
Here is a list of representations recognised by
distionary, and the functions for accessing them.
| Representation |
distionary Functions |
|---|---|
| Cumulative Distribution Function |
eval_cdf(), enframe_cdf()
|
| Survival Function |
eval_survival(), enframe_survival()
|
| Quantile Function |
eval_quantile(), enframe_quantile()
|
| Hazard Function |
eval_hazard(), enframe_hazard()
|
| Cumulative Hazard Function |
eval_chf(), enframe_chf()
|
| Probability density Function |
eval_density(), enframe_density()
|
| Probability mass Function (PMF) |
eval_pmf(), enframe_pmf()
|
| Odds Function |
eval_odds(), enframe_odds()
|
| Return Level Function |
eval_return(), enframe_return()
|
All representations can either be accessed by the
eval_*() family of functions, providing a vector of the
evaluated representation.
d1 <- dst_geom(0.6)
eval_pmf(d1, at = 0:5)
#> [1] 0.600000 0.240000 0.096000 0.038400 0.015360 0.006144Alternatively, the enframe_*() family of functions
provides the results in a tibble or data frame paired with the inputs,
useful in a data wrangling workflow.
enframe_pmf(d1, at = 0:5)
#> # A tibble: 6 × 2
#> .arg pmf
#> <int> <dbl>
#> 1 0 0.6
#> 2 1 0.24
#> 3 2 0.096
#> 4 3 0.0384
#> 5 4 0.0154
#> 6 5 0.00614The enframe_*() functions allow for insertion of
multiple distributions, placing a column for each distribution. The
column names can be changed in three ways:
- The input column
.argcan be renamed with thearg_nameargument. - The
pmfprefix on the evaluation columns can be changed with thefn_prefixargument. - The distribution names can be changed by assigning name-value pairs for the input distributions.
Let’s practice this with the addition of a second distribution.
d2 <- dst_geom(0.4)
enframe_pmf(
model1 = d1, model2 = d2, at = 0:5,
arg_name = "num_failures", fn_prefix = "probability"
)
#> # A tibble: 6 × 3
#> num_failures probability_model1 probability_model2
#> <int> <dbl> <dbl>
#> 1 0 0.6 0.4
#> 2 1 0.24 0.24
#> 3 2 0.096 0.144
#> 4 3 0.0384 0.0864
#> 5 4 0.0154 0.0518
#> 6 5 0.00614 0.0311Drawing a random sample
To draw a random sample from a distribution, use the
realise() or realize() function:
You can read this call as “realise distribution d five
times”. By default, n is set to 1, so that realising
converts a distribution to a numeric draw:
realise(d1)
#> [1] 0While random sampling falls into the same family as the
p*/d*/q*/r* functions from the stats package
(e.g., rnorm()), this function is not a distributional
representation, hence does not have a eval_*() or
enframe_*() counterpart. This is because it’s impossible to
perfectly describe a distribution based on a sample.
Properties of Distributions
distionary distinguishes between distributional
representations (which fully define a distribution) and
distributional properties (which are characteristics that can
be computed from representations).
A distribution property is any measurable characteristic that can be calculated from a distribution’s representation. Unlike representations, properties do not contain enough information to fully reconstruct the distribution. For example, knowing the mean and variance of a distribution doesn’t tell you whether it’s a Normal, Gamma, or some other distribution family. Properties include statistical moments and other summary measures.
Below is a table of the properties incorporated in
distionary, and the corresponding functions for accessing
them.
| Property |
distionary Function |
|---|---|
| Mean | mean() |
| Median | median() |
| Variance | variance() |
| Standard Deviation | sd() |
| Skewness | skewness() |
| Excess Kurtosis | kurtosis_exc() |
| Kurtosis | kurtosis() |
| Range | range() |
Here’s the mean and variance of our original distribution.
