Skip to contents

This vignette covers the second goal of distionary: to evaluate probability distributions, even when that property is not specified in the distribution’s definition.

Distributional Representations

A distributional representation is a mathematical function that completely defines a probability distribution. Unlike a simple property (such as the mean or variance), a representation contains enough information that any other property or representation can be calculated from it.

The key innovation in distionary is that these representations are interconnected through a network of relationships, allowing you to specify a distribution using any available representation and automatically derive others as needed. For example, if you specify only a CDF, distionary can compute the quantile function, mean, variance, and other properties.

Here is a list of representations recognised by distionary, and the functions for accessing them.

Representation distionary Functions
Cumulative Distribution Function eval_cdf(), enframe_cdf()
Survival Function eval_survival(), enframe_survival()
Quantile Function eval_quantile(), enframe_quantile()
Hazard Function eval_hazard(), enframe_hazard()
Cumulative Hazard Function eval_chf(), enframe_chf()
Probability density Function eval_density(), enframe_density()
Probability mass Function (PMF) eval_pmf(), enframe_pmf()
Odds Function eval_odds(), enframe_odds()
Return Level Function eval_return(), enframe_return()

All representations can either be accessed by the eval_*() family of functions, providing a vector of the evaluated representation.

d1 <- dst_geom(0.6)
eval_pmf(d1, at = 0:5)
#> [1] 0.600000 0.240000 0.096000 0.038400 0.015360 0.006144

Alternatively, the enframe_*() family of functions provides the results in a tibble or data frame paired with the inputs, useful in a data wrangling workflow.

enframe_pmf(d1, at = 0:5)
#> # A tibble: 6 × 2
#>    .arg     pmf
#>   <int>   <dbl>
#> 1     0 0.6    
#> 2     1 0.24   
#> 3     2 0.096  
#> 4     3 0.0384 
#> 5     4 0.0154 
#> 6     5 0.00614

The enframe_*() functions allow for insertion of multiple distributions, placing a column for each distribution. The column names can be changed in three ways:

  1. The input column .arg can be renamed with the arg_name argument.
  2. The pmf prefix on the evaluation columns can be changed with the fn_prefix argument.
  3. The distribution names can be changed by assigning name-value pairs for the input distributions.

Let’s practice this with the addition of a second distribution.

d2 <- dst_geom(0.4)
enframe_pmf(
  model1 = d1, model2 = d2, at = 0:5,
  arg_name = "num_failures", fn_prefix = "probability"
)
#> # A tibble: 6 × 3
#>   num_failures probability_model1 probability_model2
#>          <int>              <dbl>              <dbl>
#> 1            0            0.6                 0.4   
#> 2            1            0.24                0.24  
#> 3            2            0.096               0.144 
#> 4            3            0.0384              0.0864
#> 5            4            0.0154              0.0518
#> 6            5            0.00614             0.0311

Drawing a random sample

To draw a random sample from a distribution, use the realise() or realize() function:

set.seed(42)
realise(d1, n = 5)
#> [1] 0 0 0 0 0

You can read this call as “realise distribution d five times”. By default, n is set to 1, so that realising converts a distribution to a numeric draw:

realise(d1)
#> [1] 0

While random sampling falls into the same family as the p*/d*/q*/r* functions from the stats package (e.g., rnorm()), this function is not a distributional representation, hence does not have a eval_*() or enframe_*() counterpart. This is because it’s impossible to perfectly describe a distribution based on a sample.

Properties of Distributions

distionary distinguishes between distributional representations (which fully define a distribution) and distributional properties (which are characteristics that can be computed from representations).

A distribution property is any measurable characteristic that can be calculated from a distribution’s representation. Unlike representations, properties do not contain enough information to fully reconstruct the distribution. For example, knowing the mean and variance of a distribution doesn’t tell you whether it’s a Normal, Gamma, or some other distribution family. Properties include statistical moments and other summary measures.

Below is a table of the properties incorporated in distionary, and the corresponding functions for accessing them.

Property distionary Function
Mean mean()
Median median()
Variance variance()
Standard Deviation sd()
Skewness skewness()
Excess Kurtosis kurtosis_exc()
Kurtosis kurtosis()
Range range()

Here’s the mean and variance of our original distribution.

mean(d1)
#> [1] 0.6666667
variance(d1)
#> [1] 1.111111