This function will return a vector, with the same length as the number of rows of the provided data frame, corresponding to the average mahalanobis distances of each row from the whole data set.
Arguments
- data
A data frame
- keep.NA
Ensure that every row with missing data remains NA in the output? TRUE by default.
- robust
Attempt to compute mahalanobis distance based on robust covariance matrix? FALSE by default
- stringsAsFactors
Convert non-factor string columns into factors? FALSE by default
Details
This is useful for finding anomalous observations, row-wise.
It will convert any categorical variables in the data frame into numerics
as long as they are factors. For example, in order for a character
column to be used as a component in the distance calculations, it must
either be a factor, or converted to a factor by using the
stringsAsFactors
parameter.
Examples
maha_dist(mtcars)
#> [1] 8.946673 8.287933 8.937150 6.096726 5.429061 8.877558 9.136276
#> [8] 10.030345 22.593116 12.393107 11.058878 9.476126 5.594527 6.026462
#> [15] 11.201310 8.672093 12.257618 9.078630 14.954377 10.296463 13.432391
#> [22] 6.227235 5.786691 11.681526 6.718085 3.645789 18.356164 14.000669
#> [29] 21.573003 11.152850 19.192384 9.888781
maha_dist(iris, robust=TRUE)
#> [1] 7.024666 12.986528 8.614241 13.473668 6.532370 9.070780 9.228481
#> [8] 9.246884 15.226051 14.215466 7.972865 12.212494 13.463329 9.931996
#> [15] 9.019549 9.318801 7.849771 7.269313 10.713173 6.811610 13.994130
#> [22] 7.986416 4.727952 16.538568 22.151556 16.508979 11.439476 8.598318
#> [29] 8.299954 14.252154 14.896631 12.796522 12.416119 7.686867 12.740661
#> [36] 8.662214 9.145239 8.806551 12.143585 9.274782 6.597367 31.494573
#> [43] 10.161955 17.383377 14.332182 13.771049 9.961501 10.456617 7.672733
#> [50] 8.718612 12.257765 5.719748 15.216794 8.318376 11.608681 11.276622
#> [57] 9.531537 5.212513 11.543351 7.309844 9.547606 5.066623 14.511378
#> [64] 11.824827 2.665049 6.648135 10.993617 10.068406 22.335060 4.532418
#> [71] 20.346004 2.857018 21.814228 18.576954 5.324136 6.751945 17.381442
#> [78] 19.176903 7.984623 2.991555 4.592470 5.188785 1.879437 25.992011
#> [85] 14.184508 8.855716 10.200049 16.407261 3.303454 5.370389 13.631452
#> [92] 8.701535 3.573514 5.396080 5.488236 5.708808 4.010323 4.362553
#> [99] 8.905585 2.978497 30.438669 31.725313 15.644610 25.087774 17.612736
#> [106] 12.953873 57.307157 24.985352 16.414720 29.096661 41.812863 24.414166
#> [113] 25.324872 34.455141 57.501571 43.750932 27.535602 25.698371 8.777775
#> [120] 35.060155 27.007985 44.451166 17.903984 38.415948 24.776229 28.781558
#> [127] 43.790379 43.856513 18.305008 39.323119 18.159890 35.340913 20.593905
#> [134] 44.216917 50.208142 20.620920 40.802058 30.582036 47.974444 31.465269
#> [141] 34.310699 58.368123 31.725313 20.054017 39.146352 48.345569 33.118020
#> [148] 33.581349 44.673366 38.976777
library(magrittr) # for piping operator
library(dplyr) # for "everything()" function
# using every column from mtcars, compute mahalanobis distance
# for each observation, and ensure that each distance is within 10
# median absolute deviations from the median
mtcars %>%
insist_rows(maha_dist, within_n_mads(10), everything())
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## anything here will run