“I know words. I have the best words”

Sentiment analysis of Donald Trump’s Twitter rhetoric

There is a lot of talk that Donald Trump uses inflammatory rhetoric and is generally not very nice. I am putting this to test in this research. I will compare negative and positive rhetoric in D. Trump’s tweets using regex (sentiment analysis). I will also delve into descriptive statistics of Donald Trump’s activity on Twitter.

This research uses a dataset of @realdonaldtrump tweets and pertinent information in 2009-2020 (up until the moment he was banned from Twitter). He was reinstated on Twitter 22 months later, but this is a story for another time. I will focus on tweets prior to the suspension.

trump <- read_csv("https://raw.githubusercontent.com/valeriia-popova/trump-tweets/main/realdonaldtrump.csv")
trump

# A tibble: 43,352 × 8
       id link  content date                retweets favorites mentions hashtags
    <dbl> <chr> <chr>   <dttm>                 <dbl>     <dbl> <chr>    <chr>   
 1 1.70e9 http… "Be su… 2009-05-04 13:54:25      510       917 <NA>     <NA>    
 2 1.70e9 http… "Donal… 2009-05-04 20:00:10       34       267 <NA>     <NA>    
 3 1.74e9 http… "Donal… 2009-05-08 08:38:08       13        19 <NA>     <NA>    
 4 1.74e9 http… "New B… 2009-05-08 15:40:15       11        26 <NA>     <NA>    
 5 1.77e9 http… "\"My … 2009-05-12 09:07:28     1375      1945 <NA>     <NA>    
 6 1.78e9 http… "Miss … 2009-05-12 14:21:55       29        28 <NA>     <NA>    
 7 1.79e9 http… "Liste… 2009-05-13 12:38:28       15        16 <NA>     <NA>    
 8 1.80e9 http… "\"Str… 2009-05-14 11:30:40       18        27 <NA>     <NA>    
 9 1.81e9 http… "Enter… 2009-05-15 09:13:13       15         9 <NA>     <NA>    
10 1.82e9 http… "\"Whe… 2009-05-16 17:22:45       19        47 <NA>     <NA>    
# ℹ 43,342 more rows

Negative rhetoric

I wrote a regex code with 5 random negative words of my choice to measure how negative Trump’s rhetoric is. I decided to observe this pattern over time - time always tells a story! Below are two alternative codes.

trump |> 
  filter(str_detect(content, regex("horribl.|terribl.|bad.*|craz.*|radical.*", 
                                   ignore_case = TRUE))) |> 
  count(date, sort = TRUE)

# A tibble: 1,973 × 2
   date                    n
   <dttm>              <int>
 1 2019-07-03 14:31:49     2
 2 2019-07-07 22:17:38     2
 3 2019-07-09 20:30:08     2
 4 2019-07-16 06:20:42     2
 5 2019-07-21 21:16:15     2
 6 2019-08-19 08:19:46     2
 7 2019-08-27 18:36:43     2
 8 2019-08-28 08:57:56     2
 9 2019-08-30 08:55:41     2
10 2019-09-09 08:01:41     2
# ℹ 1,963 more rows

trump_negative <- trump |> 
  mutate(year = year(date)) |> 
  group_by(year) |> 
  summarize(negative = mean(str_detect(content, regex("horribl.|terribl.|bad.*|craz.*|radical.*", 
                                                 ignore_case = TRUE)))) 

ggplot(trump_negative, aes(year, negative)) +
  geom_line()

Positive rhetoric

As a scientist (and a person!), I strive to be objective. In that spirit, I measured the opposite: how nice Trump’s rhetoric is. This time, I used regex and 5 random positive words of my choice. Below are two alternative codes.

trump |> 
  filter(str_detect(content, regex("nice.*|kind.*|generous.*|happ(y|i).*|love.*", 
                                   ignore_case = TRUE))) |> 
  count(date, sort = TRUE)

# A tibble: 2,825 × 2
   date                    n
   <dttm>              <int>
 1 2019-06-03 12:41:37     2
 2 2019-08-21 06:34:52     2
 3 2019-11-01 16:11:29     2
 4 2019-12-10 06:02:28     2
 5 2020-05-30 07:41:04     2
 6 2009-06-21 09:47:41     1
 7 2009-07-04 18:19:56     1
 8 2009-11-26 13:55:38     1
 9 2009-12-23 11:38:18     1
10 2010-03-18 13:17:47     1
# ℹ 2,815 more rows

trump_positive <- trump |> 
  mutate(year = year(date)) |> 
  group_by(year) |> 
  summarize(positive = mean(str_detect(
    content, regex("nice.*|kind.*|generous.*|happ(y|i).*|love.*", ignore_case = TRUE))))

ggplot(trump_positive, aes(year, positive)) +
  geom_line()

Sentiment analysis

I now will compare if there was more negative or positive rhetoric in D. Trump’s tweets over time.

trump_sentiment <- trump_negative |> 
  left_join(trump_positive, by = "year")

trump_sentiment

# A tibble: 12 × 3
    year negative positive
   <dbl>    <dbl>    <dbl>
 1  2009   0        0.0714
 2  2010   0        0.0559
 3  2011   0.0276   0.0219
 4  2012   0.0346   0.0439
 5  2013   0.0268   0.0717
 6  2014   0.0235   0.0805
 7  2015   0.0295   0.0789
 8  2016   0.0573   0.0464
 9  2017   0.0587   0.0471
10  2018   0.0733   0.0853
11  2019   0.0980   0.0538
12  2020   0.0841   0.0601

ggplot(trump_sentiment) +
  geom_line(aes(x = year, y = negative, color = "Negative")) +
  geom_line(aes(x = year, y = positive, color = "Positive")) +
  labs(title = "Donald Trump's Twitter rhetoric", 
       x = "Year", 
       y = "Proportion of tweets", 
       color = "Sentiment")

In the remainder of this research, I want to see if there are any patterns to Donald Trump’s activity on Twitter.

Which day is D. Trump most active on Twitter?

Apparently, Tuesday, by a tiny margin. On average, he is similarly prolific on Twitter during the workdays, with the number of tweets visibly dropping on the weekend.

trump |> 
  mutate(weekday = wday(date, label = TRUE)) |> 
  ggplot(aes(x = weekday)) +
  geom_bar()

How many time does D. Trump tweet per month?

As the chart shows, Donald Trump became a lot more active ahead and during of his first presidential campaign in 2016. The second, although more modest, spike happened around the time of his second presidential campaign. In contrast, his actual time in office did not result in as many tweets.

This is an interesting pattern that confirms very well the power of “going public.” For D. Trump, Twitter is essentially an advertising platform to gain support during his electoral campaigns rather than the medium to reach out his constituencies when he’s already in office.

trump |> 
  ggplot(aes(x = date)) +
  geom_freqpoly(binwidth = 2629746)

What time of the day is Donald Trump most active on Twitter?

Two alternative codes and chart below show that Donald Trump was most active on Twitter in the early afternoon hours - around 2-3pm. Most likely, this is the time slot after lunch that he sets aside to do his social media scouting and engagement. Aside from that, his tweeting activity coincides with the media prime time slots: 7-9am and 7-9pm. This confirms that Donald Trump is not a chaotic communicator: having spent decades in the public eye during his TV show, he certainly chooses strategic time slots for his posts to maximize public attention to his tweets.

trump |> 
  mutate(time = hms::as_hms(date)) |> 
  ggplot(aes(x = time)) +
  geom_freqpoly()

#| title: alt-active
trump |> 
  mutate(hour = hour(date)) |> 
  ggplot(aes(x = hour)) +
  geom_bar()

Did D. Trump become better at tweeting?

President Trump 1.0 loved using Twitter for his communication needs. But did he become better and more effective at it towards the end of his term in office? For this task, I focused on retweets and favorites to measure his progress (if lack of it). My assumption is if his social media efforts became more succesful at getting the public attention, we would see an increase in the number of retweets and favorites (e.g. attention currency). I limited my analysis strictly to his time in the Oval Office (starting Jan 20th 2017) to prevent contamination from his earlier tweets when he was a private person.

Below are charts looking at it from multiple angles. I juxtaposed regression models on top of the absoute counts. All of the charts point to a slight increase in D. Trump’s visibility. However, it is less impressive than I expected.

trump |> 
  filter(date > ymd(20170119)) |> 
  ggplot(aes(date, retweets)) + # or use favorites
  geom_smooth(se = FALSE)

trump |>
  filter(date > ymd("20170119")) |>
  ggplot(aes(date, retweets)) + # or use favorites
  geom_line(alpha = 0.3) +  
  geom_smooth(se = FALSE, color = "red")

As I was observing the charts, I noticed a significant spike in retweets on July 2nd, 2017. This is a curious outlier. What happened that day? Driven my curiosity, I performed future digging into the nature of D. Trump’s best performing tweet.

Turns out, the tweet in question was a doctored video in which D. Trump “punches” CNN in the face in the boxing ring. At the time, this post caused outrage among the liberal parts of the Internet - and joy among its conservative counterparts.

Ironically, this tweets has become his “best” tweet yet in terms of performance metrics. This reminds us, once again, that there is no such thing as bad publicity in “attention economy.”

outlier <- trump |> 
  arrange(-retweets) |> 
  select(content, date, retweets) |> 
  head(n = 1)
outlier

# A tibble: 1 × 3
  content                                        date                retweets
  <chr>                                          <dttm>                 <dbl>
1 # FraudNewsCNN # FNNpic.twitter.com/WYUnHjjUjg 2017-07-02 08:21:42   302269

outlier_retweet <- "# FraudNewsCNN # FNNpic.twitter.com/WYUnHjjUjg \nAI-made video where Trump box-punches CNN \nin the face"

trump |>
  filter(date > ymd("2017-01-19")) |> 
  ggplot(aes(date, retweets)) + 
  geom_point(alpha = 0.3, size = 1) + 
  geom_smooth(se = FALSE, color = "blue") +
  annotate(
    geom = "label", 
    x = ymd_hms("2017-08-05 08:21:42"), 
    y = 249000, 
    label = outlier_retweet, 
    hjust = "left", 
    color = "red"
  ) + 
  annotate(
    geom = "segment", 
    x = ymd_hms("2017-08-05 08:21:42"), 
    y = 270000, 
    xend = ymd_hms("2017-07-06 08:21:42"), 
    yend = 299000, 
    color = "red",
    arrow = arrow(type = "closed", length = unit(0.08, "inches"))
  )

trump |>
  filter(date > ymd(20170119)) |>
  mutate(month = floor_date(date, "month")) |>
  group_by(month) |>
  summarize(
    avg_retweets = mean(retweets, na.rm = TRUE),
    avg_favorites = mean(favorites, na.rm = TRUE)
  ) |>
  ggplot(aes(x = month)) + 
  geom_smooth(aes(y = avg_retweets, color = "Retweets"), se = FALSE) +
  geom_smooth(aes(y = avg_favorites, color = "Favorites"), se = FALSE) +
  labs(
    title = "Average Retweets and Favorites per Month",
    x = "Month",
    y = "Average Count",
    color = "Visibility metric"
  )