Installing and loading package
Prior to streaming, make sure to install and load rtweet. This
vignette assumes users have already setup app access tokens (see: the
“auth” vignette, vignette("auth", package = "rtweet")
).
Overview
rtweet makes it possible to capture live streams of Twitter data1.
There are two ways of having a stream:
A stream collecting data from a set of rules, which can be collected via
filtered_stream()
.A stream of a 1% of tweets published, which can be collected via
sample_stream()
.
In either case we need to choose how long should the streaming connection hold, and in which file it should be saved to.
## Stream time in seconds so for one minute set timeout = 60
## For larger chunks of time, I recommend multiplying 60 by the number
## of desired minutes. This method scales up to hours as well
## (x * 60 = x mins, x * 60 * 60 = x hours)
## Stream for 5 seconds
streamtime <- 5
## Filename to save json data (backup)
filename <- "rstats.json"
Filtered stream
The filtered stream collects tweets for all rules that are currently active, not just one rule or query.
Creating rules
Streaming rules in rtweet need a value and a tag. The value is the query to be performed, and the tag is the name to identify tweets that match a query. You can use multiple words and hashtags as value, please read the official documentation. Multiple rules can match to a single tweet.
## Stream rules used to filter tweets
new_rule <- stream_add_rule(list(value = "#rstats", tag = "rstats"))
Listing rules
To know current rules you can use stream_add_rule()
to
know if any rule is currently active:
rules <- stream_add_rule(NULL)
rules
#> result_count sent
#> 1 1 2023-03-19 22:04:29
rules(rules)
#> id value tag
#> 1 1637575790693842952 #rstats rstats
With the help of rules()
the id, value and tag of each
rule is provided.
Removing rules
To remove rules use stream_rm_rule()
# Not evaluated now
stream_rm_rule(ids(new_rule))
Note, if the rules are not used for some time, Twitter warns you that
they will be removed. But given that filtered_stream()
collects tweets for all rules, it is advisable to keep the rules list
short and clean.
filtered_stream()
Once these parameters are specified, initiate the stream. Note: Barring any disconnection or disruption of the API, streaming will occupy your current instance of R until the specified time has elapsed. It is possible to start a new instance or R —streaming itself usually isn’t very memory intensive— but operations may drag a bit during the parsing process which takes place immediately after streaming ends.
## Stream election tweets
stream_rstats <- filtered_stream(timeout = streamtime, file = filename, parse = FALSE)
#> Warning: No matching tweets with streaming rules were found in the time provided.
If no tweet matching the rules is detected a warning will be issued.
Parsing larger streams can take quite a bit of time (in addition to time spent streaming) due to a somewhat time-consuming simplifying process used to convert a json file into an R object.
Don’t forget to clean the streaming rules:
stream_rm_rule(ids(new_rule))
#> sent deleted not_deleted
#> 1 2023-03-19 22:04:51 1 0
Sample stream
The sample_stream()
function doesn’t need rules or
anything.
Saving files
Users may want to stream tweets into json files upfront and parse
those files later on. To do this, simply add parse = FALSE
and make sure you provide a path (file name) to a location you can find
later.
You can also use append = TRUE
to continue recording a
stream into an already existing file.
Currently parsing the streaming data file with
parse_stream()
is not functional. However, you can read it
back in with jsonlite::stream_in(file)
.