Package index
-
as.list(<robotstxt_text>)
- Convert robotstxt_text to list
-
fix_url()
- Add http protocal if missing from URL
-
get_robotstxt()
- Download a robots.txt file
-
rt_last_http
get_robotstxt_http_get()
- Storage for HTTP request response objects
-
get_robotstxts()
- Download multiple robotstxt files
-
guess_domain()
- Guess a domain from path
-
http_domain_changed()
- Check if HTTP domain changed
-
http_subdomain_changed()
- Check if HTTP subdomain changed
-
http_was_redirected()
- Check if HTTP redirect occurred
-
is_suspect_robotstxt()
- Check if file is valid / parsable robots.txt file
-
is_valid_robotstxt()
- Validate if a file is valid / parsable robots.txt file
-
list_merge()
- Merge a number of named lists in sequential order
-
null_to_default()
- Return default value if NULL
-
parse_robotstxt()
- Parse a robots.txt file
-
paths_allowed()
- Check if a bot has permissions to access page(s)
-
paths_allowed_worker_spiderbar()
- Check if a spiderbar bot has permissions to access page(s)
-
%>%
- re-export magrittr pipe operator
-
print(<robotstxt>)
- Print robotstxt
-
print(<robotstxt_text>)
- Print robotstxt's text
-
remove_domain()
- Remove domain from path
-
request_handler_handler()
- Handle robotstxt handlers
-
robotstxt()
- Generate a representation of a robots.txt file
-
rt_cache
- Get the robotstxt cache
-
rt_request_handler()
on_server_error_default
on_client_error_default
on_not_found_default
on_redirect_default
on_domain_change_default
on_sub_domain_change_default
on_file_type_mismatch_default
on_suspect_content_default
- Handle robotstxt object retrieved from HTTP request