Skip to contents

This function will return a vector, with the same length as the number of rows of the provided data frame, corresponding to the average mahalanobis distances of each row from the whole data set.

Usage

maha_dist(data, keep.NA = TRUE, robust = FALSE, stringsAsFactors = FALSE)

Arguments

data

A data frame

keep.NA

Ensure that every row with missing data remains NA in the output? TRUE by default.

robust

Attempt to compute mahalanobis distance based on robust covariance matrix? FALSE by default

stringsAsFactors

Convert non-factor string columns into factors? FALSE by default

Value

A vector of observation-wise mahalanobis distances.

Details

This is useful for finding anomalous observations, row-wise.

It will convert any categorical variables in the data frame into numerics as long as they are factors. For example, in order for a character column to be used as a component in the distance calculations, it must either be a factor, or converted to a factor by using the stringsAsFactors parameter.

See also

Examples


maha_dist(mtcars)
#>  [1]  8.946673  8.287933  8.937150  6.096726  5.429061  8.877558  9.136276
#>  [8] 10.030345 22.593116 12.393107 11.058878  9.476126  5.594527  6.026462
#> [15] 11.201310  8.672093 12.257618  9.078630 14.954377 10.296463 13.432391
#> [22]  6.227235  5.786691 11.681526  6.718085  3.645789 18.356164 14.000669
#> [29] 21.573003 11.152850 19.192384  9.888781

maha_dist(iris, robust=TRUE)
#>   [1]  7.024666 12.986528  8.614241 13.473668  6.532370  9.070780  9.228481
#>   [8]  9.246884 15.226051 14.215466  7.972865 12.212494 13.463329  9.931996
#>  [15]  9.019549  9.318801  7.849771  7.269313 10.713173  6.811610 13.994130
#>  [22]  7.986416  4.727952 16.538568 22.151556 16.508979 11.439476  8.598318
#>  [29]  8.299954 14.252154 14.896631 12.796522 12.416119  7.686867 12.740661
#>  [36]  8.662214  9.145239  8.806551 12.143585  9.274782  6.597367 31.494573
#>  [43] 10.161955 17.383377 14.332182 13.771049  9.961501 10.456617  7.672733
#>  [50]  8.718612 12.257765  5.719748 15.216794  8.318376 11.608681 11.276622
#>  [57]  9.531537  5.212513 11.543351  7.309844  9.547606  5.066623 14.511378
#>  [64] 11.824827  2.665049  6.648135 10.993617 10.068406 22.335060  4.532418
#>  [71] 20.346004  2.857018 21.814228 18.576954  5.324136  6.751945 17.381442
#>  [78] 19.176903  7.984623  2.991555  4.592470  5.188785  1.879437 25.992011
#>  [85] 14.184508  8.855716 10.200049 16.407261  3.303454  5.370389 13.631452
#>  [92]  8.701535  3.573514  5.396080  5.488236  5.708808  4.010323  4.362553
#>  [99]  8.905585  2.978497 30.438669 31.725313 15.644610 25.087774 17.612736
#> [106] 12.953873 57.307157 24.985352 16.414720 29.096661 41.812863 24.414166
#> [113] 25.324872 34.455141 57.501571 43.750932 27.535602 25.698371  8.777775
#> [120] 35.060155 27.007985 44.451166 17.903984 38.415948 24.776229 28.781558
#> [127] 43.790379 43.856513 18.305008 39.323119 18.159890 35.340913 20.593905
#> [134] 44.216917 50.208142 20.620920 40.802058 30.582036 47.974444 31.465269
#> [141] 34.310699 58.368123 31.725313 20.054017 39.146352 48.345569 33.118020
#> [148] 33.581349 44.673366 38.976777


library(magrittr)            # for piping operator
library(dplyr)               # for "everything()" function

# using every column from mtcars, compute mahalanobis distance
# for each observation, and ensure that each distance is within 10
# median absolute deviations from the median
mtcars %>%
  insist_rows(maha_dist, within_n_mads(10), everything())
#>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
#> Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
#> Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
#> Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#> Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#> Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#> Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
#> AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
#> Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
#> Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
#> Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#> Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#> Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
#> Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
#> Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
  ## anything here will run