install.packages(c("pak", "labelled"))
pak::pak("tidy-survey-r/srvyrexploR")
install.packages("gtsummary") # optional for step 10Final project tutorial: ANES 2024
Prerequisite: Install required packages
You only need to install these packages once. For your actual final projects, skip this step.
Step 1: Load required packages
Remember: you need to load the tidyverse package every time you start a new R project. Other applicable packages depend on the project. For your final projects, load the following libraries:
library(tidyverse)
library(srvyrexploR)
library(labelled)
library(gtsummary) # optional for step 10Step 2: Import the dataset
The dataset will appear in the Environment tab. Since we are using the srvyrexploR package, we load the data directly into R.
data(anes_2024)Step 3: Identify your variables
To find variables that you will be using in your final project, you need to know their names or labels. You have several ways to get that information.
Option 1: Search in R using the labelled package. In the following code, replace “Trust” with a keyword of your interest. Pay attention to whether the variable has any missing or abnormal values. Once you have identified all of your variables, take note of their names (e.g., TrustGovernment).
look_for(anes_2024, "Trust") pos variable label col_type missing values
11 TrustGovernment PRE: How often tru~ fct 16 1. Always
2. Most of the time
3. About half the ~
4. Some of the time
5. Never
12 TrustPeople PRE: How often can~ fct 13 1. Always
2. Most of the time
3. About half the ~
4. Some of the time
5. Never
44 V241229 PRE: How often tru~ dbl+lbl 0 [-9] -9. Refused
[-8] -8. Don't know
[-1] -1. Inapplica~
[1] 1. Always
[2] 2. Most of the~
[3] 3. About half ~
[4] 4. Some of the~
[5] 5. Never
45 V241234 PRE: How often can~ dbl+lbl 0 [-9] -9. Refused
[-1] -1. Inapplica~
[1] 1. Always
[2] 2. Most of the~
[3] 3. About half ~
[4] 4. Some of the~
[5] 5. Never
Option 2: Use the view() function to scroll through the dataset in a new tab. You can use the search bar in the top right corner of that tab to find specific keywords.
view(anes_2024)For example, I have identified the following three variables for this analysis:
TrustGovernment: How often can you trust the government in Washington to do what is right? (DV)Education: Respondent’s education level in years (IV)Sex: Respondent’s sex (CV)
Step 4: Create a smaller, clean dataset
This dataset will contain only your variables of interest from Step 3. You will work with this dataset from now on, not the main complete dataset. Give your new dataset an intuitive, simple name (mine is named test_project; yours will be different).
You will also need to rename your variables to something intuitive. In the example below, the original variable names and the new names are for demonstration only. Yours will be different.
Exception: The line of code weight = Weight must be included exactly as written. It is responsible for weighting the ANES dataset.
test_project <- anes_2024 |>
select(TrustGovernment, Education, Sex, Weight) |>
rename(
trust = TrustGovernment,
education = Education,
sex = Sex,
weight = Weight
)Step 5: Check for missing and abnormal values
Before you move on to analysis, you need to check for missing and abnormal values. If you forget to exclude missing values or “skip codes” from analysis, your charts and models will be inaccurate.
Replace trust in the code below with the name of your variable. Repeat the same step for all variables one by one. Take note of which abnormal values you see (for example, values like 95 or 99 in a years of education variable).
For categorical variables:
test_project |>
count(trust)# A tibble: 6 × 2
trust n
<fct> <int>
1 1. Always 46
2 2. Most of the time 669
3 3. About half the time 1314
4 4. Some of the time 2010
5 5. Never 709
6 <NA> 16
For continuous variables:
summary(test_project$education) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.00 10.00 12.00 12.51 13.00 95.00 12
hist(test_project$education)boxplot(test_project$education)Step 6: Recode the variables if needed
In the ANES, some variables may include numeric codes for non-substantive answers (like “Other” or “Refused”). For example, if your education histogram shows a value of 95, you must filter it out so it does not skew your results.
Note that I am “renaming” my new recoded dataset using the same name. It will quietly rewrite the old dataset with the cleaned values.
test_project <- test_project |>
filter(education <= 25)Step 7: Check that recoding was successful
Re-run your histogram or count to ensure the abnormal values are gone.
ggplot(test_project, aes(x = education)) +
geom_histogram()Step 8: Create a linear model
Replace the example variables below with the names of your variables. Note that the DV comes first. Because the dependent variable in this package is often stored as a factor, we use as.numeric() to ensure the linear model runs correctly.
Always include the weight variable as is. Give your model an intuitive name (I am using model_test_project).
model_test_project <- lm(as.numeric(trust) ~ education + sex,
data = test_project,
weights = weight)Step 9: Examine the results of the linear model
summary(model_test_project)
Call:
lm(formula = as.numeric(trust) ~ education + sex, data = test_project,
weights = weight)
Weighted Residuals:
Min 1Q Median 3Q Max
-5.9426 -0.5157 0.2411 0.4851 3.2411
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.818722 0.065008 58.742 < 2e-16 ***
education -0.020141 0.005783 -3.482 0.000501 ***
sex2. Female -0.066780 0.027294 -2.447 0.014453 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9311 on 4655 degrees of freedom
(39 observations deleted due to missingness)
Multiple R-squared: 0.004004, Adjusted R-squared: 0.003576
F-statistic: 9.356 on 2 and 4655 DF, p-value: 8.809e-05
You can stop at this point. If you want to format this regression table in a word processor or Excel, just copy and paste it into the program where you are writing your final project. If you want to create a professional table in R, continue below.
Step 10 (optional): Create a custom table
model_test_project |>
tbl_regression(
label = list(
education ~ "Years of education",
sex ~ "Sex"
)
) |>
add_significance_stars(hide_ci = FALSE, hide_p = FALSE) |>
bold_labels() |>
italicize_levels() |>
modify_caption("**Effect of Education and Sex on Trust in Government (ANES 2024)**") |>
modify_footnote(
estimate ~ "\\*p<0.05; \\**p<0.01; \\***p<0.001"
)| Characteristic | Beta1 | SE | 95% CI | p-value |
|---|---|---|---|---|
| Years of education | -0.02*** | 0.006 | -0.03, -0.01 | <0.001 |
| Sex | ||||
| 1. Male | — | — | — | |
| 2. Female | -0.07* | 0.027 | -0.12, -0.01 | 0.014 |
| 1 *p<0.05; **p<0.01; ***p<0.001 | ||||
| Abbreviations: CI = Confidence Interval, SE = Standard Error | ||||