Introduction to the essurvey package
essurvey package is fairly easy. There are are two main families of functions:
show_*. They each complement each other and allow the user to almost never have to go to the European Social Survey (ESS) website. The only scenario where you need to enter the ESS website is to validate your email. If you haven’t registered, create an account at http://www.europeansocialsurvey.org/user/new. For those unfamiliar with the ESS, this vignette uses the term rounds, here a synonym of waves to denote the same survey in different time points.
Once you register visit your email account to validate the account and you’re ready to access the data.
Given that some
essurvey functions require your email address, this vignette will use a fake email but everything should work accordingly if you registered with the ESS.
Note: versions less than and including
essurvey 1.0.1 returned wrong countries. Please install the latest CRAN/Github version.
To install and load development version of the package use:
# install.packages("devtools") devtools::install_github("ropensci/essurvey")
to install the stable version from CRAN use:
Downloading the ESS data requires validating your email every time you download data. We can set our email as an environment variable with
Once that’s executed you can delete the previous line and any
import_* call will look for the email automatically, stored as an environment variable.
Let’s suppose you don’t know which countries or rounds are available for the ESS. Then the
show_* family of functions is your friend.
To find out which countries have participated you can use
##  "Albania" "Austria" "Belgium" ##  "Bulgaria" "Croatia" "Cyprus" ##  "Czechia" "Denmark" "Estonia" ##  "Finland" "France" "Germany" ##  "Greece" "Hungary" "Iceland" ##  "Ireland" "Israel" "Italy" ##  "Kosovo" "Latvia" "Lithuania" ##  "Luxembourg" "Montenegro" "Netherlands" ##  "Norway" "Poland" "Portugal" ##  "Romania" "Russian Federation" "Serbia" ##  "Slovakia" "Slovenia" "Spain" ##  "Sweden" "Switzerland" "Turkey" ##  "Ukraine" "United Kingdom"
This function actually looks up the countries in the ESS website. If new countries enter, this will automatically grab those countries as well. Let’s check out Turkey. How many rounds has Turkey participated in? We can use
tk_rnds <- show_country_rounds("Turkey") tk_rnds
##  2 4
Note that country names are case sensitive. Use the exact name printed out by
Using this information, we can download those specific rounds easily with
essurvey 1.0.0 all
ess_* functions have been deprecated in favor of the
turkey will now be a list of
length(rounds) containing a data frame for each round. If you only specified one round, then all
import_* functions return a data frame.
import_country is useful for when you want to download specific rounds, but not all. To download all rounds for a country automatically you can use
import_* family is concerned with downloading the data and thus always returns a list containing data frames unless only one round is specified, in which it returns a
tibble. Conversely, the
show_* family grabs information from the ESS website and always returns vectors.
Similarly, we can use other functions to download rounds. To see which rounds are currently available, use
##  1 2 3 4 5 6 7 8 9
show_rounds interactively looks up rounds in the ESS website, so any future rounds will automatically be included.
To download all available rounds, use
all_rounds <- import_all_rounds()
import_rounds for selected ones.
import_* functions have an equivalent
download_* function that allows the user to save the datasets in a specified folder in
For example, to save round two from Turkey in a folder called
./my_folder, we use:
download_country("Turkey", 2, output_dir = "./myfolder/")
By default it saves the data as
'stata' files. Alternatively you can use
download_country("Turkey", 2, output_dir = "./myfolder/", format = 'sas')
This will save the data to
./myfolder/ESS_Turkey and inside that folder there will be the
ESS2 folder that contains the data.
Whenever you download the ESS data, it comes together with a script that recodes the values 6 = ‘Not applicable’, 7 = ‘Refusal’, 8 = ‘Don’t know’, 9 = ‘No answer’ and 9 = ‘Not available’ as missings. However, that is the case for variables that have a scaling of 1-5. For variables which have a scaling from 1-10 the corresponding missings are 66, 77, and so on. At first glance new users might not know this and start calculating statistics with these variables such as…
..but that vector contains numbers such as
77, that shouldn’t be there.
recode_missings() removes the corresponding missings for numeric variables as well as for character variables. It accepts the complete
tibble and recodes all variables that should be recoded.
It also gives you the option of recoding only specific categories. For example…
other_newcoding <- recode_missings(sp, c("Don't know", "Refusal")) table(other_newcoding$tvpol) # 0 1 2 3 4 5 6 7 66 # 167 460 610 252 95 36 26 31 45
…still has missing values but recoded the ones that were specified. I strongly suggest the user not to recode these categories as missing without looking at the data as there might be substantial differences between people who didn’t and who did answer questions. If the user is decided to do so, use
recode_missings to recode everything and the corresponding
recode_*_missings functions for numeric and character recodings separately. See the documentation of
?recode_missings for more information.