Skip to contents

The check_duplicates() function subsets rows of data, retaining rows that have the same IP address and/or same latitude and longitude. The function is written to work with data from Qualtrics surveys.

Usage

check_duplicates(
  x,
  id_col = "ResponseId",
  ip_col = "IPAddress",
  location_col = c("LocationLatitude", "LocationLongitude"),
  rename = TRUE,
  dupl_ip = TRUE,
  dupl_location = TRUE,
  include_na = FALSE,
  keep = FALSE,
  quiet = FALSE,
  print = TRUE
)

Arguments

x

Data frame (preferably imported from Qualtrics using {qualtRics}).

id_col

Column name for unique row ID (e.g., participant).

ip_col

Column name for IP addresses.

location_col

Two element vector specifying columns for latitude and longitude (in that order).

rename

Logical indicating whether to rename columns (using rename_columns())

dupl_ip

Logical indicating whether to check IP addresses.

dupl_location

Logical indicating whether to check latitude and longitude.

include_na

Logical indicating whether to include rows with NAs for IP address and location as potentially excluded rows.

keep

Logical indicating whether to keep or remove exclusion column.

quiet

Logical indicating whether to print message to console.

print

Logical indicating whether to print returned tibble to console.

Value

An object of the same type as x that includes the rows with duplicate IP addresses and/or locations. This includes a column called dupe_count that returns the number of duplicates. For a function that marks these rows, use mark_duplicates(). For a function that excludes these rows, use exclude_duplicates().

Details

To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.

Default column names are set based on output from the qualtRics::fetch_survey(). By default, IP address and location are both checked, but they can be checked separately with the dupl_ip and dupl_location arguments.

The function outputs to console separate messages about the number of rows with duplicate IP addresses and rows with duplicate locations. These counts are computed independently, so rows may be counted for both types of duplicates.

See also

Other duplicates functions: exclude_duplicates(), mark_duplicates()

Other check functions: check_duration(), check_ip(), check_location(), check_preview(), check_progress(), check_resolution()

Examples

# Check for duplicate IP addresses and locations
data(qualtrics_text)
check_duplicates(qualtrics_text)
#>  2 NAs were found in IP addresses.
#>  7 out of 7 rows had duplicate IP addresses.
#>  1 NA was found in location.
#>  10 out of 10 rows had duplicate locations.
#>              StartDate             EndDate     Status    IPAddress Progress
#> 1  2020-12-11 12:41:23 2020-12-11 12:44:37 IP Address  24.195.91.0      100
#> 2  2020-12-17 15:40:53 2020-12-17 15:43:25 IP Address   22.51.31.0       99
#> 3  2020-12-17 15:40:52 2020-12-17 15:43:39 IP Address 32.164.134.0      100
#> 4  2020-12-17 15:41:17 2020-12-17 15:45:42 IP Address  24.195.91.0      100
#> 5  2020-12-17 15:42:47 2020-12-17 15:46:26 IP Address  55.73.114.0      100
#> 6  2020-12-17 15:42:18 2020-12-17 15:48:00 IP Address  55.73.114.0      100
#> 7  2020-12-17 15:40:57 2020-12-17 15:48:56 IP Address   6.79.107.0      100
#> 8  2020-12-17 15:46:51 2020-12-17 15:51:38 IP Address   22.51.31.0      100
#> 9  2020-12-17 15:48:53 2020-12-17 15:53:48 IP Address   22.51.31.0      100
#> 10 2020-12-17 15:48:48 2020-12-17 15:54:12 IP Address 54.232.129.0      100
#>    Duration (in seconds) Finished        RecordedDate        ResponseId
#> 1                    177     TRUE 2020-12-11 12:44:37 R_LAt58JGEyKNWZlB
#> 2                    879    FALSE 2020-12-17 15:43:25 R_AkQyJypPyjgribz
#> 3                    375     TRUE 2020-12-17 15:43:39 R_H5MqcQoWznreNBt
#> 4                    521     TRUE 2020-12-17 15:45:42 R_GNVaLC9Sb2ZDzQP
#> 5                    236     TRUE 2020-12-17 15:46:26 R_7UzegytocfkyrWC
#> 6                    526     TRUE 2020-12-17 15:48:00 R_NiK6d3RgjuJh1OI
#> 7                    397     TRUE 2020-12-17 15:48:56 R_8ezIj0X0p2lJuCQ
#> 8                    872     TRUE 2020-12-17 15:51:39 R_Gbz5en48KgnCXT7
#> 9                    246     TRUE 2020-12-17 15:53:48 R_AJfrQqClQNvWIch
#> 10                   149     TRUE 2020-12-17 15:54:12 R_Kc9BGXO793zEqHM
#>    LocationLatitude LocationLongitude UserLanguage Browser       Version
#> 1          40.33554         -75.92698           EN  Chrome 86.0.4240.198
#> 2          37.28265        -120.50248           EN  Chrome  87.0.4280.88
#> 3          45.50412        -122.78665           EN  Chrome  87.0.4280.88
#> 4          40.33554         -75.92698           EN  Chrome  87.0.4280.88
#> 5          28.56411         -81.54902           EN  Chrome  87.0.4280.88
#> 6          28.56411         -81.54902           EN  Chrome  87.0.4280.88
#> 7          45.50412        -122.78665           EN  Chrome  87.0.4280.88
#> 8          37.28265        -120.50248           EN Firefox          83.0
#> 9          37.28265        -120.50248           EN    Edge   84.0.522.52
#> 10         45.50412        -122.78665           EN  Chrome  87.0.4280.88
#>    Operating System Resolution
#> 1         Macintosh   1280x800
#> 2   Windows NT 10.0   1366x768
#> 3   Windows NT 10.0  1920x1080
#> 4    Windows NT 6.1   1366x768
#> 5   Windows NT 10.0  1920x1080
#> 6   Windows NT 10.0   1536x864
#> 7   Windows NT 10.0   1536x864
#> 8   Windows NT 10.0   1440x960
#> 9   Windows NT 10.0  1920x1080
#> 10  Windows NT 10.0  1920x1080

# Check only for duplicate locations
qualtrics_text %>%
  check_duplicates(dupl_location = FALSE)
#>  2 NAs were found in IP addresses.
#>  7 out of 7 rows had duplicate IP addresses.
#>             StartDate             EndDate     Status   IPAddress Progress
#> 1 2020-12-11 12:41:23 2020-12-11 12:44:37 IP Address 24.195.91.0      100
#> 2 2020-12-17 15:40:53 2020-12-17 15:43:25 IP Address  22.51.31.0       99
#> 3 2020-12-17 15:41:17 2020-12-17 15:45:42 IP Address 24.195.91.0      100
#> 4 2020-12-17 15:42:47 2020-12-17 15:46:26 IP Address 55.73.114.0      100
#> 5 2020-12-17 15:42:18 2020-12-17 15:48:00 IP Address 55.73.114.0      100
#> 6 2020-12-17 15:46:51 2020-12-17 15:51:38 IP Address  22.51.31.0      100
#> 7 2020-12-17 15:48:53 2020-12-17 15:53:48 IP Address  22.51.31.0      100
#>   Duration (in seconds) Finished        RecordedDate        ResponseId
#> 1                   177     TRUE 2020-12-11 12:44:37 R_LAt58JGEyKNWZlB
#> 2                   879    FALSE 2020-12-17 15:43:25 R_AkQyJypPyjgribz
#> 3                   521     TRUE 2020-12-17 15:45:42 R_GNVaLC9Sb2ZDzQP
#> 4                   236     TRUE 2020-12-17 15:46:26 R_7UzegytocfkyrWC
#> 5                   526     TRUE 2020-12-17 15:48:00 R_NiK6d3RgjuJh1OI
#> 6                   872     TRUE 2020-12-17 15:51:39 R_Gbz5en48KgnCXT7
#> 7                   246     TRUE 2020-12-17 15:53:48 R_AJfrQqClQNvWIch
#>   LocationLatitude LocationLongitude UserLanguage Browser       Version
#> 1         40.33554         -75.92698           EN  Chrome 86.0.4240.198
#> 2         37.28265        -120.50248           EN  Chrome  87.0.4280.88
#> 3         40.33554         -75.92698           EN  Chrome  87.0.4280.88
#> 4         28.56411         -81.54902           EN  Chrome  87.0.4280.88
#> 5         28.56411         -81.54902           EN  Chrome  87.0.4280.88
#> 6         37.28265        -120.50248           EN Firefox          83.0
#> 7         37.28265        -120.50248           EN    Edge   84.0.522.52
#>   Operating System Resolution
#> 1        Macintosh   1280x800
#> 2  Windows NT 10.0   1366x768
#> 3   Windows NT 6.1   1366x768
#> 4  Windows NT 10.0  1920x1080
#> 5  Windows NT 10.0   1536x864
#> 6  Windows NT 10.0   1440x960
#> 7  Windows NT 10.0  1920x1080

# Do not print rows to console
qualtrics_text %>%
  check_duplicates(print = FALSE)
#>  2 NAs were found in IP addresses.
#>  7 out of 7 rows had duplicate IP addresses.
#>  1 NA was found in location.
#>  10 out of 10 rows had duplicate locations.

# Do not print message to console
qualtrics_text %>%
  check_duplicates(quiet = TRUE)
#>              StartDate             EndDate     Status    IPAddress Progress
#> 1  2020-12-11 12:41:23 2020-12-11 12:44:37 IP Address  24.195.91.0      100
#> 2  2020-12-17 15:40:53 2020-12-17 15:43:25 IP Address   22.51.31.0       99
#> 3  2020-12-17 15:40:52 2020-12-17 15:43:39 IP Address 32.164.134.0      100
#> 4  2020-12-17 15:41:17 2020-12-17 15:45:42 IP Address  24.195.91.0      100
#> 5  2020-12-17 15:42:47 2020-12-17 15:46:26 IP Address  55.73.114.0      100
#> 6  2020-12-17 15:42:18 2020-12-17 15:48:00 IP Address  55.73.114.0      100
#> 7  2020-12-17 15:40:57 2020-12-17 15:48:56 IP Address   6.79.107.0      100
#> 8  2020-12-17 15:46:51 2020-12-17 15:51:38 IP Address   22.51.31.0      100
#> 9  2020-12-17 15:48:53 2020-12-17 15:53:48 IP Address   22.51.31.0      100
#> 10 2020-12-17 15:48:48 2020-12-17 15:54:12 IP Address 54.232.129.0      100
#>    Duration (in seconds) Finished        RecordedDate        ResponseId
#> 1                    177     TRUE 2020-12-11 12:44:37 R_LAt58JGEyKNWZlB
#> 2                    879    FALSE 2020-12-17 15:43:25 R_AkQyJypPyjgribz
#> 3                    375     TRUE 2020-12-17 15:43:39 R_H5MqcQoWznreNBt
#> 4                    521     TRUE 2020-12-17 15:45:42 R_GNVaLC9Sb2ZDzQP
#> 5                    236     TRUE 2020-12-17 15:46:26 R_7UzegytocfkyrWC
#> 6                    526     TRUE 2020-12-17 15:48:00 R_NiK6d3RgjuJh1OI
#> 7                    397     TRUE 2020-12-17 15:48:56 R_8ezIj0X0p2lJuCQ
#> 8                    872     TRUE 2020-12-17 15:51:39 R_Gbz5en48KgnCXT7
#> 9                    246     TRUE 2020-12-17 15:53:48 R_AJfrQqClQNvWIch
#> 10                   149     TRUE 2020-12-17 15:54:12 R_Kc9BGXO793zEqHM
#>    LocationLatitude LocationLongitude UserLanguage Browser       Version
#> 1          40.33554         -75.92698           EN  Chrome 86.0.4240.198
#> 2          37.28265        -120.50248           EN  Chrome  87.0.4280.88
#> 3          45.50412        -122.78665           EN  Chrome  87.0.4280.88
#> 4          40.33554         -75.92698           EN  Chrome  87.0.4280.88
#> 5          28.56411         -81.54902           EN  Chrome  87.0.4280.88
#> 6          28.56411         -81.54902           EN  Chrome  87.0.4280.88
#> 7          45.50412        -122.78665           EN  Chrome  87.0.4280.88
#> 8          37.28265        -120.50248           EN Firefox          83.0
#> 9          37.28265        -120.50248           EN    Edge   84.0.522.52
#> 10         45.50412        -122.78665           EN  Chrome  87.0.4280.88
#>    Operating System Resolution
#> 1         Macintosh   1280x800
#> 2   Windows NT 10.0   1366x768
#> 3   Windows NT 10.0  1920x1080
#> 4    Windows NT 6.1   1366x768
#> 5   Windows NT 10.0  1920x1080
#> 6   Windows NT 10.0   1536x864
#> 7   Windows NT 10.0   1536x864
#> 8   Windows NT 10.0   1440x960
#> 9   Windows NT 10.0  1920x1080
#> 10  Windows NT 10.0  1920x1080