Evaluate a Distribution

library(distionary)

This vignette covers the second goal of distionary: to evaluate probability distributions, even when that property is not specified in the distribution’s definition.

Distributional Representations

A distributional representation is a mathematical function that completely defines a probability distribution. Unlike a simple property (such as the mean or variance), a representation contains enough information that any other property or representation can be calculated from it.

The key innovation in distionary is that these representations are interconnected through a network of relationships, allowing you to specify a distribution using any available representation and automatically derive others as needed. For example, if you specify only a CDF, distionary can compute the quantile function, mean, variance, and other properties.

Here is a list of representations recognised by distionary, and the functions for accessing them.

Representation	`distionary` Functions
Cumulative Distribution Function	`eval_cdf()`, `enframe_cdf()`
Survival Function	`eval_survival()`, `enframe_survival()`
Quantile Function	`eval_quantile()`, `enframe_quantile()`
Hazard Function	`eval_hazard()`, `enframe_hazard()`
Cumulative Hazard Function	`eval_chf()`, `enframe_chf()`
Probability density Function	`eval_density()`, `enframe_density()`
Probability mass Function (PMF)	`eval_pmf()`, `enframe_pmf()`
Odds Function	`eval_odds()`, `enframe_odds()`
Return Level Function	`eval_return()`, `enframe_return()`

All representations can either be accessed by the eval_*() family of functions, providing a vector of the evaluated representation.

d1 <- dst_geom(0.6)
eval_pmf(d1, at = 0:5)
#> [1] 0.600000 0.240000 0.096000 0.038400 0.015360 0.006144

Alternatively, the enframe_*() family of functions provides the results in a tibble or data frame paired with the inputs, useful in a data wrangling workflow.

enframe_pmf(d1, at = 0:5)
#> # A tibble: 6 × 2
#>    .arg     pmf
#>   <int>   <dbl>
#> 1     0 0.6    
#> 2     1 0.24   
#> 3     2 0.096  
#> 4     3 0.0384 
#> 5     4 0.0154 
#> 6     5 0.00614

The enframe_*() functions allow for insertion of multiple distributions, placing a column for each distribution. The column names can be changed in three ways:

The input column .arg can be renamed with the arg_name argument.
The pmf prefix on the evaluation columns can be changed with the fn_prefix argument.
The distribution names can be changed by assigning name-value pairs for the input distributions.

Let’s practice this with the addition of a second distribution.

d2 <- dst_geom(0.4)
enframe_pmf(
  model1 = d1, model2 = d2, at = 0:5,
  arg_name = "num_failures", fn_prefix = "probability"
)
#> # A tibble: 6 × 3
#>   num_failures probability_model1 probability_model2
#>          <int>              <dbl>              <dbl>
#> 1            0            0.6                 0.4   
#> 2            1            0.24                0.24  
#> 3            2            0.096               0.144 
#> 4            3            0.0384              0.0864
#> 5            4            0.0154              0.0518
#> 6            5            0.00614             0.0311

Drawing a random sample

To draw a random sample from a distribution, use the realise() or realize() function:

set.seed(42)
realise(d1, n = 5)
#> [1] 0 0 0 0 0

You can read this call as “realise distribution d five times”. By default, n is set to 1, so that realising converts a distribution to a numeric draw:

realise(d1)
#> [1] 0

While random sampling falls into the same family as the p*/d*/q*/r* functions from the stats package (e.g., rnorm()), this function is not a distributional representation, hence does not have a eval_*() or enframe_*() counterpart. This is because it’s impossible to perfectly describe a distribution based on a sample.

Properties of Distributions

distionary distinguishes between distributional representations (which fully define a distribution) and distributional properties (which are characteristics that can be computed from representations).

A distribution property is any measurable characteristic that can be calculated from a distribution’s representation. Unlike representations, properties do not contain enough information to fully reconstruct the distribution. For example, knowing the mean and variance of a distribution doesn’t tell you whether it’s a Normal, Gamma, or some other distribution family. Properties include statistical moments and other summary measures.

Below is a table of the properties incorporated in distionary, and the corresponding functions for accessing them.

Property	`distionary` Function
Mean	`mean()`
Median	`median()`
Variance	`variance()`
Standard Deviation	`sd()`
Skewness	`skewness()`
Excess Kurtosis	`kurtosis_exc()`
Kurtosis	`kurtosis()`
Range	`range()`

Here’s the mean and variance of our original distribution.

mean(d1)
#> [1] 0.6666667
variance(d1)
#> [1] 1.111111

Distributional Representations

Drawing a random sample

Properties of Distributions

About

Community

Resources