Cleans up a dataframe
object which has date columns
entered via a free-text box (possibly by different users) and are therefore
in a non-standardized format. Supports numerous separators including /,-, or
space. Supports all-numeric, abbreviation, or long-hand month notation. Where
day of the month has not been supplied, the first day of the month is
imputed. Either DMY or YMD is assumed by default. However, the US system of
MDY is supported via the format
argument.
Arguments
- df
A
dataframe
ortibble
object with messy date column(s)- col.names
Character vector of names of columns of messy date data
- day.impute
Integer. Day of the month to be imputed if not available. defaults to 1. If
day.impute = NA
thenNA
will be imputed for the date instead and a warning will be raised. Ifday.impute = NULL
then instead of imputing the day of the month, the function will fail- month.impute
Integer. Month to be be imputed if not available. Defaults to 7 (July). If
month.impute = NA
thenNA
will be imputed for the date instead and a warning will be raised. Ifmonth.impute = NULL
then instead of imputing the month, the function will fail.- id
Name of column containing row IDs. By default, the first column is assumed.
- format
Character. The format which a date is mostly likely to be given in. Either
"dmy"
(default) or"mdy"
. If year appears to have been given first, then YMD is assumed for the subject (format argument is not used for these observations)
See also
fix_date
Similar to fix_dates()
except can only
be applied to character objects.
Examples
bad.dates <- data.frame(
id = seq(5),
some.dates = c(
"02/05/92",
"01-04-2020",
"1996/05/01",
"2020-05-01",
"02-04-96"
),
some.more.dates = c(
"2015",
"02/05/00",
"05/1990",
"2012-08",
"jan 2020"
)
)
fixed.df <- fix_dates(bad.dates, c("some.dates", "some.more.dates"))
#> Warning: `fix_dates()` was deprecated in datefixR 1.0.0.
#> ℹ Please use `fix_date_df()` instead.
# ->
fixed.df <- fix_date_df(bad.dates, c("some.dates", "some.more.dates"))