Skip to contents

All functions

as.list(<robotstxt_text>)
Convert robotstxt_text to list
fix_url()
Add http protocal if missing from URL
get_robotstxt()
Download a robots.txt file
rt_last_http get_robotstxt_http_get()
Storage for HTTP request response objects
get_robotstxts()
Download multiple robotstxt files
guess_domain()
Guess a domain from path
http_domain_changed()
Check if HTTP domain changed
http_subdomain_changed()
Check if HTTP subdomain changed
http_was_redirected()
Check if HTTP redirect occurred
is_suspect_robotstxt()
Check if file is valid / parsable robots.txt file
is_valid_robotstxt()
Validate if a file is valid / parsable robots.txt file
list_merge()
Merge a number of named lists in sequential order
null_to_default()
Return default value if NULL
parse_robotstxt()
Parse a robots.txt file
paths_allowed()
Check if a bot has permissions to access page(s)
paths_allowed_worker_spiderbar()
Check if a spiderbar bot has permissions to access page(s)
%>%
re-export magrittr pipe operator
print(<robotstxt>)
Print robotstxt
print(<robotstxt_text>)
Print robotstxt's text
remove_domain()
Remove domain from path
request_handler_handler()
Handle robotstxt handlers
robotstxt()
Generate a representation of a robots.txt file
rt_cache
Get the robotstxt cache
rt_request_handler() on_server_error_default on_client_error_default on_not_found_default on_redirect_default on_domain_change_default on_sub_domain_change_default on_file_type_mismatch_default on_suspect_content_default
Handle robotstxt object retrieved from HTTP request