This function will return a vector, with the same length as the number of rows of the provided data frame, corresponding to the average mahalanobis distances of each row from the whole data set.
Details
This is useful for finding anomalous observations, row-wise.
It will convert any categorical variables in the data frame into numerics
as long as they are factors. For example, in order for a character
column to be used as a component in the distance calculations, it must
either be a factor, or converted to a factor by using the
stringsAsFactors
parameter.
Examples
maha_dist(mtcars)
#> [1] 8.946673 8.287933 8.937150 6.096726 5.429061 8.877558 9.136276
#> [8] 10.030345 22.593116 12.393107 11.058878 9.476126 5.594527 6.026462
#> [15] 11.201310 8.672093 12.257618 9.078630 14.954377 10.296463 13.432391
#> [22] 6.227235 5.786691 11.681526 6.718085 3.645789 18.356164 14.000669
#> [29] 21.573003 11.152850 19.192384 9.888781
maha_dist(iris, robust=TRUE)
#> [1] 7.181303 14.209332 9.233412 14.029571 6.547538 9.032628 9.557745
#> [8] 9.434082 16.327924 14.609934 7.944119 12.241370 14.152604 10.634152
#> [15] 8.940285 9.311006 7.996840 7.676170 10.686693 6.772707 14.171183
#> [22] 8.199367 4.819010 17.927827 21.970559 17.509804 12.134829 8.718618
#> [29] 8.664029 14.539774 15.467206 13.918214 12.751421 7.800228 13.513486
#> [36] 9.613279 9.535921 8.718478 13.079168 9.504772 7.041884 36.721567
#> [43] 10.581647 18.598838 14.208248 15.428511 9.871731 10.898606 7.630712
#> [50] 9.183846 12.142542 5.700266 15.153194 9.430086 12.108781 11.172977
#> [57] 9.444704 5.395403 11.484362 7.646899 10.479824 5.254603 14.996254
#> [64] 11.744912 2.849079 6.677997 10.917922 9.978773 24.822657 4.694515
#> [71] 20.355909 3.146007 22.559940 18.395726 5.389387 6.864585 17.525227
#> [78] 19.539023 8.151810 3.098777 4.944060 5.362978 2.057880 26.155842
#> [85] 14.053245 8.768882 10.215374 17.593198 3.270086 5.936323 13.524049
#> [92] 8.625081 3.824912 5.800611 5.591581 5.707430 3.987159 4.389560
#> [99] 9.471902 3.068005 30.970783 32.288310 16.382188 27.303966 18.252965
#> [106] 14.211336 57.834820 27.316610 17.104499 29.976800 42.865459 24.851409
#> [113] 25.740563 34.263970 56.920919 43.839091 29.769508 30.314755 8.766718
#> [120] 35.514678 27.338244 44.696221 18.955613 38.729456 26.599257 32.185485
#> [127] 44.286331 45.304056 18.655092 42.634687 19.136274 40.640072 20.651593
#> [134] 46.870404 53.923075 20.573812 41.472972 33.431351 49.320574 31.907998
#> [141] 34.144669 57.844485 32.288310 20.731382 39.082779 47.916444 32.925457
#> [148] 34.182320 45.599451 41.065734
library(magrittr) # for piping operator
library(dplyr) # for "everything()" function
# using every column from mtcars, compute mahalanobis distance
# for each observation, and ensure that each distance is within 10
# median absolute deviations from the median
mtcars %>%
insist_rows(maha_dist, within_n_mads(10), everything())
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## anything here will run