Most APIs require some sort of authentication, which requires a secret: something that you know, and the server knows, but no one else does. Because vcr saves requests and responses to disk, you need to make sure that you don’t accidentally reveal any secrets to the world.
The most common places to worry about secrets are in requests, because you need to tell the server who you are. There are three places that a secret can turn up:
In the request headers. The most common way to authenticate with an API is through the
Authorization
header.In a query parameter.
In the request body.
It’s also possible, but rarer, for you to get a secret back in the response body. This typically occurs if you’re creating some new resource.
Because request headers are the most common place for secrets to crop
up, vcr takes effort to make sure you don’t accidentally expose them.
Firstly, vcr will only record headers to disk if you’ve specifically
requested to use them for matching (with
match_requests_on = "header
“). Secondly, even if you do
match on headers, vcr will automatically redact the
Authorization
header.
If your secrets live somewhere else, read on to ensure that you don’t save them in your cassettes.
Query parameters
Some (poorly designed) APIs include secrets in the URL. (This is bad
practice because URLs are recorded in all sorts of places like logs,
browser history, and bookmarks.) You can use the
filter_query_parameters
config to strip away these secrets.
There are three way to use it:
-
Completely drop a parameter.
vcr_configure(filter_query_parameters = "user")
-
Replace parameter value with a specified value:
vcr_configure(filter_query_parameters = list(api_key = "<api-key>"))
-
Replace the parameter value when writing to disk and restore another value when reading:
vcr_configure( filter_query_parameters = list(api_key = c(Sys.getenv("MY_API_KEY"), "foo")) )
Beware of your match_requests_on
option when using this
filter. If you filter out a query parameter it’s probably a bad idea to
match on query
given that there is no way for vcr to
restore the exact http request from your cassette after one or more
query parameters is removed or changed. One way you could filter a query
parameter and still match on query or at least on the complete uri is to
use replacement behavior (a named list), but instead of
list(a="b")
use two values list(a=c("b","c"))
,
where “c” is the string to be stored in the cassette. You could of
course replace those values with values from environment variables so
that you obscure the real values if your code is public.
Request and response headers
By default, vcr doesn’t save request headers to disk, so this section
only applies if you’re using match_requests_on = "header"
or the secrets are found in the respond headers.
vcr provides two config options that lets you modified request and
response headers: filter_request_headers
and
filter_response_headers
. You can use these options to
either remove a header entirely by supplying a character vector:
vcr_configure(filter_request_headers = "password")
vcr_configure(filter_response_headers = c("secret1", "secret2"))
Or you can use a named list, where the name is the header to modify and the values is the value to replace it with:
vcr_configure(filter_request_headers = list(password = "<password>"))
vcr_configure(
filter_response_headers = list(
secret1 = "<secret1>",
secret2 = "<secret2>"
)
)
If you use httr2, note that vcr will automatically handle redacted
headers created by httr2::req_headers_redacted()
.
URL, headers, and body
If neither of the previous techniques work, there’s the nuclear
option: filter_sensitive_data
. This is applied to every
component of the request and the response that is serialized to
disk.
vcr_configure(
filter_sensitive_data = list("<api_key>" = Sys.getenv('MY_API_KEY'))
)
This is a bidirectional replacement, so that when writing your API
key is replaced with <api_key>
, and when reading
<api_key>
is replaced with your API key.