Final project tutorial: ANES 2024

Prerequisite: Install required packages

You only need to install these packages once. For your actual final projects, skip this step.

install.packages(c("pak", "labelled"))
pak::pak("tidy-survey-r/srvyrexploR")
install.packages("gtsummary") # optional for step 10

Step 1: Load required packages

Remember: you need to load the tidyverse package every time you start a new R project. Other applicable packages depend on the project. For your final projects, load the following libraries:

library(tidyverse)
library(srvyrexploR)
library(labelled)
library(gtsummary) # optional for step 10

Step 2: Import the dataset

The dataset will appear in the Environment tab. Since we are using the srvyrexploR package, we load the data directly into R.

data(anes_2024)

Step 3: Identify your variables

To find variables that you will be using in your final project, you need to know their names or labels. You have several ways to get that information.

Option 1: Search in R using the labelled package. In the following code, replace “Trust” with a keyword of your interest. Pay attention to whether the variable has any missing or abnormal values. Once you have identified all of your variables, take note of their names (e.g., TrustGovernment).

look_for(anes_2024, "Trust")

 pos variable        label               col_type missing values             
 11  TrustGovernment PRE: How often tru~ fct      16      1. Always          
                                                          2. Most of the time
                                                          3. About half the ~
                                                          4. Some of the time
                                                          5. Never           
 12  TrustPeople     PRE: How often can~ fct      13      1. Always          
                                                          2. Most of the time
                                                          3. About half the ~
                                                          4. Some of the time
                                                          5. Never           
 44  V241229         PRE: How often tru~ dbl+lbl  0       [-9] -9. Refused   
                                                          [-8] -8. Don't know
                                                          [-1] -1. Inapplica~
                                                          [1] 1. Always      
                                                          [2] 2. Most of the~
                                                          [3] 3. About half ~
                                                          [4] 4. Some of the~
                                                          [5] 5. Never       
 45  V241234         PRE: How often can~ dbl+lbl  0       [-9] -9. Refused   
                                                          [-1] -1. Inapplica~
                                                          [1] 1. Always      
                                                          [2] 2. Most of the~
                                                          [3] 3. About half ~
                                                          [4] 4. Some of the~
                                                          [5] 5. Never

Option 2: Use the view() function to scroll through the dataset in a new tab. You can use the search bar in the top right corner of that tab to find specific keywords.

view(anes_2024)

For example, I have identified the following three variables for this analysis:

TrustGovernment: How often can you trust the government in Washington to do what is right? (DV)
Education: Respondent’s education level in years (IV)
Sex: Respondent’s sex (CV)

Step 4: Create a smaller, clean dataset

This dataset will contain only your variables of interest from Step 3. You will work with this dataset from now on, not the main complete dataset. Give your new dataset an intuitive, simple name (mine is named test_project; yours will be different).

You will also need to rename your variables to something intuitive. In the example below, the original variable names and the new names are for demonstration only. Yours will be different.

Exception: The line of code weight = Weight must be included exactly as written. It is responsible for weighting the ANES dataset.

test_project <- anes_2024 |> 
  select(TrustGovernment, Education, Sex, Weight) |> 
  rename(
    trust = TrustGovernment,
    education = Education,
    sex = Sex,
    weight = Weight
  )

Step 5: Check for missing and abnormal values

Before you move on to analysis, you need to check for missing and abnormal values. If you forget to exclude missing values or “skip codes” from analysis, your charts and models will be inaccurate.

Replace trust in the code below with the name of your variable. Repeat the same step for all variables one by one. Take note of which abnormal values you see (for example, values like 95 or 99 in a years of education variable).

For categorical variables:

test_project |> 
  count(trust)

# A tibble: 6 × 2
  trust                      n
  <fct>                  <int>
1 1. Always                 46
2 2. Most of the time      669
3 3. About half the time  1314
4 4. Some of the time     2010
5 5. Never                 709
6 <NA>                      16

For continuous variables:

summary(test_project$education)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1.00   10.00   12.00   12.51   13.00   95.00      12

hist(test_project$education)

boxplot(test_project$education)

Step 6: Recode the variables if needed

In the ANES, some variables may include numeric codes for non-substantive answers (like “Other” or “Refused”). For example, if your education histogram shows a value of 95, you must filter it out so it does not skew your results.

Note that I am “renaming” my new recoded dataset using the same name. It will quietly rewrite the old dataset with the cleaned values.

test_project <- test_project |> 
  filter(education <= 25)

Step 7: Check that recoding was successful

Re-run your histogram or count to ensure the abnormal values are gone.

ggplot(test_project, aes(x = education)) + 
  geom_histogram()

Step 8: Create a linear model

Replace the example variables below with the names of your variables. Note that the DV comes first. Because the dependent variable in this package is often stored as a factor, we use as.numeric() to ensure the linear model runs correctly.

Always include the weight variable as is. Give your model an intuitive name (I am using model_test_project).

model_test_project <- lm(as.numeric(trust) ~ education + sex, 
                         data = test_project, 
                         weights = weight)

Step 9: Examine the results of the linear model

summary(model_test_project)


Call:
lm(formula = as.numeric(trust) ~ education + sex, data = test_project, 
    weights = weight)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-5.9426 -0.5157  0.2411  0.4851  3.2411 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.818722   0.065008  58.742  < 2e-16 ***
education    -0.020141   0.005783  -3.482 0.000501 ***
sex2. Female -0.066780   0.027294  -2.447 0.014453 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9311 on 4655 degrees of freedom
  (39 observations deleted due to missingness)
Multiple R-squared:  0.004004,  Adjusted R-squared:  0.003576 
F-statistic: 9.356 on 2 and 4655 DF,  p-value: 8.809e-05

You can stop at this point. If you want to format this regression table in a word processor or Excel, just copy and paste it into the program where you are writing your final project. If you want to create a professional table in R, continue below.

Step 10 (optional): Create a custom table

model_test_project |> 
  tbl_regression(
    label = list(
      education ~ "Years of education",
      sex ~ "Sex"
    )
  ) |> 
  add_significance_stars(hide_ci = FALSE, hide_p = FALSE) |> 
  bold_labels() |> 
  italicize_levels() |> 
  modify_caption("**Effect of Education and Sex on Trust in Government (ANES 2024)**") |>
  modify_footnote(
    estimate ~ "\\*p<0.05; \\**p<0.01; \\***p<0.001"
  )

**Effect of Education and Sex on Trust in Government (ANES 2024)**
Characteristic	Beta¹	SE	95% CI	p-value
Years of education	-0.02***	0.006	-0.03, -0.01	<0.001
Sex
1. Male	—	—	—
2. Female	-0.07*	0.027	-0.12, -0.01	0.014
¹ p<0.05; p<0.01; **p<0.001
Abbreviations: CI = Confidence Interval, SE = Standard Error