I’ve heard that sentence once too often. So I got the data from Statistik Austria to see if there’s something to it.
More specifically, I tried to answer the following questions:
I also made a little app in which you can check your own name’s popularity!
* Please note that Statistik Austria uses a binary classification for the names and the associated data, and there is criticism from many fields regarding this approach.
library(tidyverse)
library(knitr)
library(lubridate)
library(readxl)
library(ggrepel)
library(ggpubr)
library(hrbrthemes)
library(shiny)
library(bslib)
my_red = "#DC2F1E"
sonic_blue = "#C3DADC"
Statistik Austria provides the top 60 names per sex for the years 1984 through 2020. I’m born 1986, so I’ll get my birth year but not much before that. Note that the names are etymological names (as Statistik Austria calls it), meaning that the different spellings (Dominic, Dominique, Domenik, etc.) are collapsed to one form.
file = "statistik_der_60_haeufigsten_vornamen_1984-2020_in_oesterreich_-_etymologi.xlsx"
names = read_xlsx(file, skip = 3)
# it's the top 60 names from 1984 to 2020
# Important: Some years have multiple rank 60s.
names %<>%
rename(boys_rank = "Rang...1",
boys_name = "Vorname...2" ,
boys_absolute = "Absolut...3" ,
boys_percent = "in %...4",
boys_cumulative = "% kumulativ...5",
girls_rank = "Rang...6",
girls_name = "Vorname...7",
girls_absolute = "Absolut...8",
girls_percent = "in %...9",
girls_cumulative = "% kumulativ...10") %>%
mutate(boys_rank = as.numeric(boys_rank),
year = boys_rank,
year = ifelse(year < 61, NA, year)) %>%
fill(year) %>%
# at the botton we have summaries for 1984-2010 and 2010-2020
slice_head(n= 2273) %>%
filter(boys_rank < 61)
boys = names %>%
select(11, 1:5) %>%
rename_all(
funs(stringr::str_replace_all(., 'boys_', ''))) %>%
mutate(sex = "boys")
girls = names %>%
select(11, 6:10) %>%
rename_all(
funs(stringr::str_replace_all(., 'girls_', ''))) %>%
mutate(sex = "girls")
names = rbind(girls, boys) %>%
mutate(percent = round(percent, 2),
cumulative = round(cumulative, 2))
# now we have a tidy df, yay!
head(names) %>% kable()
| year | rank | name | absolute | percent | cumulative | sex |
|---|---|---|---|---|---|---|
| 2020 | 1 | Anna | 1925 | 4.73 | 4.73 | girls |
| 2020 | 2 | Marie | 1304 | 3.21 | 7.94 | girls |
| 2020 | 3 | Sophie | 1287 | 3.17 | 11.11 | girls |
| 2020 | 4 | Emilia | 1105 | 2.72 | 13.82 | girls |
| 2020 | 5 | Elena | 1013 | 2.49 | 16.31 | girls |
| 2020 | 6 | Lena | 777 | 1.91 | 18.23 | girls |
Since we get only the 60 most popular names per year adn sex, my name has to be at least the 60th most popular in any given year to be in this list (which is already quite popular, given that in 2020 babies in Austria got 1772 different first names). Let’s see.
names %>%
filter(name == "Dominik") %>%
kable()
| year | rank | name | absolute | percent | cumulative | sex |
|---|---|---|---|---|---|---|
| 2020 | 45 | Dominik | 266 | 0.62 | 54.65 | boys |
| 2019 | 48 | Dominik | 261 | 0.60 | 56.58 | boys |
| 2018 | 39 | Dominik | 307 | 0.70 | 50.53 | boys |
| 2017 | 40 | Dominik | 325 | 0.72 | 52.50 | boys |
| 2016 | 39 | Dominik | 319 | 0.71 | 51.64 | boys |
| 2015 | 35 | Dominik | 358 | 0.82 | 49.96 | boys |
| 2014 | 35 | Dominik | 326 | 0.77 | 50.37 | boys |
| 2013 | 32 | Dominik | 336 | 0.82 | 48.58 | boys |
| 2012 | 32 | Dominik | 362 | 0.89 | 50.02 | boys |
| 2011 | 30 | Dominik | 384 | 0.95 | 48.90 | boys |
| 2010 | 29 | Dominik | 371 | 0.92 | 48.11 | boys |
| 2009 | 27 | Dominik | 412 | 1.23 | 49.94 | boys |
| 2008 | 27 | Dominik | 413 | 1.19 | 50.20 | boys |
| 2007 | 27 | Dominik | 401 | 1.18 | 50.49 | boys |
| 2006 | 28 | Dominik | 417 | 1.19 | 52.01 | boys |
| 2005 | 23 | Dominik | 506 | 1.44 | 46.14 | boys |
| 2004 | 24 | Dominik | 515 | 1.44 | 48.99 | boys |
| 2003 | 19 | Dominik | 581 | 1.67 | 42.57 | boys |
| 2002 | 18 | Dominik | 657 | 1.88 | 42.49 | boys |
| 2001 | 17 | Dominik | 650 | 1.94 | 42.71 | boys |
| 2000 | 10 | Dominik | 767 | 2.22 | 29.23 | boys |
| 1999 | 10 | Dominik | 813 | 2.35 | 32.81 | boys |
| 1998 | 10 | Dominik | 945 | 2.63 | 32.27 | boys |
| 1997 | 9 | Dominik | 1105 | 2.97 | 30.90 | boys |
| 1996 | 5 | Dominik | 1295 | 3.30 | 18.98 | boys |
| 1995 | 6 | Dominik | 1381 | 3.52 | 23.83 | boys |
| 1994 | 4 | Dominik | 1561 | 3.83 | 16.37 | boys |
| 1993 | 8 | Dominik | 1518 | 3.61 | 32.06 | boys |
| 1992 | 8 | Dominik | 1459 | 3.39 | 32.38 | boys |
| 1991 | 8 | Dominik | 1415 | 3.22 | 33.55 | boys |
| 1990 | 10 | Dominik | 1277 | 2.97 | 40.97 | boys |
| 1989 | 12 | Dominik | 1179 | 2.77 | 47.09 | boys |
| 1988 | 16 | Dominik | 927 | 2.17 | 57.41 | boys |
| 1987 | 20 | Dominik | 691 | 1.65 | 65.31 | boys |
| 1986 | 20 | Dominik | 702 | 1.66 | 64.84 | boys |
| 1985 | 22 | Dominik | 598 | 1.40 | 67.00 | boys |
| 1984 | 25 | Dominik | 432 | 1.01 | 70.64 | boys |
We can see that it’s not only in the data but in every single year (which makes it already consistently popular). Let’s see how this developed over time.
Below you see the proportion of boys named Dominik from 1984-2020, with my birth year and the most and least popular years highlighted and labeled.
names %>%
filter(name == "Dominik") %>%
ggplot(aes(x = year, y = percent)) +
geom_line(size = .5, alpha = .5) +
geom_point(shape = 1, size = 4, color = my_red, alpha = 2/3) +
geom_point(data = . %>% filter(percent == max(percent)), color = my_red, size = 5) +
geom_point(data = . %>% filter(percent == min(percent)), color = my_red, size = 5) +
geom_point(data = . %>% filter(year == 1986), color = my_red, size = 5) +
geom_text(data = . %>%
filter(percent == min(percent)),
aes(label = paste0(as.character(year), ":")), hjust = 1.1, vjust = -1.6) +
geom_text(data = . %>%
filter(percent == min(percent)),
aes(label = paste0(as.character(percent), "%")), hjust = 0, vjust = -1.6) +
geom_text(data = . %>%
filter(percent == max(percent)),
aes(label = paste0(as.character(year), ":")), hjust = -0.2) +
geom_text(data = . %>%
filter(percent == max(percent)),
aes(label = paste0(as.character(percent), "%")), hjust = -1.1) +
geom_text(data = . %>%
filter(year == 1986),
aes(label = paste0(as.character(year), ":")), hjust = -0.9) +
geom_text(data = . %>%
filter(year == 1986),
aes(label = paste0(as.character(percent), "%")), hjust = -1.8) +
labs(title = "How frequent is the name Dominik?",
subtitle = "Proportion of boys in Austria named Dominik by year",
y = "Proportion in %",
x = "Year") +
theme_ft_rc()
Wait what? While I suspected that the name Dominik was more popular than I experienced it in the remote mountain village I grew up in, this pattern is quite a surprise. The steady rise and the steady fall after 1994. I tried to find external events, such as scandals involving Dominiks, that could have triggered the decline after 1994 but I couldn’t find anything involving a somewhat popular Dominik.
Now that we know the percentage of Dominiks each year, let’s see how Dominik ranked over the years.
Here you see the popularity of Dominik based on its rank. I’ll also plot the least and most popular names of my birth year 1986 - Tobias and Stefan - to see how they did over time.
names %>%
filter(year == 1986 & rank == 1 & sex == "boys" | year == 1986 & rank == 60 & sex == "boys") %>%
kable()
| year | rank | name | absolute | percent | cumulative | sex |
|---|---|---|---|---|---|---|
| 1986 | 1 | Stefan | 2297 | 5.44 | 5.44 | boys |
| 1986 | 60 | Tobias | 105 | 0.25 | 89.72 | boys |
names %>%
drop_na() %>%
ggplot(aes(x = year, y = rank)) +
geom_line(data = . %>% filter(name == "Stefan"),size = .5, color = sonic_blue) +
geom_line(data = . %>% filter(name == "Tobias"),size = .5, color = sonic_blue) +
geom_line(data = . %>% filter(name == "Dominik"),size = .5, color = my_red) +
geom_point(data = . %>% filter(name == "Dominik" & year == 1986), color = my_red, size = 5) +
geom_point(data = . %>% filter(name == "Tobias" & year == 1986), size = 5, color = sonic_blue) +
geom_point(data = . %>% filter(name == "Stefan" & year == 1986), size = 5, color = sonic_blue) +
geom_point(data = . %>% filter(year == 1994 & name == "Dominik"), color = my_red, size = 5) +
geom_point(data = . %>% filter(year == 2019 & name == "Dominik"), color = my_red, size = 5) +
geom_text(data = . %>%
filter(year == 2019 & name == "Dominik"),
aes(label = paste0(as.character(year), ": Rank ", as.character(rank))), hjust = 0.9, vjust = 2) +
geom_text(data = . %>%
filter(year == 1986 & name == "Dominik"),
aes(label = paste0(as.character(year), ": Rank ", as.character(rank))), hjust = -0.2) +
geom_text(data = . %>%
filter(year == 1994 & name == "Dominik"),
aes(label = paste0(as.character(year), ": Rank ", as.character(rank))), hjust = -0.1, vjust = -0.5) +
geom_text(data = . %>%
filter(year == 1986 & name == "Stefan"),
aes(label = name), hjust = -0, vjust = 1.8, color = sonic_blue) +
geom_text(data = . %>%
filter(year == 1986 & name == "Tobias"),
aes(label = name), hjust = -0.1, vjust = -0.5, color = sonic_blue) +
labs(title = "How does the name Dominik rank and compare\nover the years?",
subtitle = "Ranks of the names Dominik, Stefan, and Tobias by year",
y = "Rank",
x = "Year") +
scale_y_reverse()+
theme_ft_rc()
Dominik was the 4th most popular name for boys in Austria in 1994. Wow. I certainly did not expect that. With its almost 4 percent prevalence, this means that every other representative first grade in 2000 had a Dominik in their class room. The most popular name of 1986, Stefan, ran out of popularity and didn’t even make the top 60 from 2018 on. The 60th popular name of 1986, Tobias, in contrast, entered the field in 1986, had a steady increase in popularity and is to this day amongst the most popular names.
What if my parents chose my name purely based on its popularity? Maybe they thought “the baby’s name shouldn’t be too popular but also not too unpopular, something like rank 20”. Well, well, here I am, Dominik. If I were born as a girl, what would my name be?
names %>%
filter(year == 1986 & rank == 20 & sex == "girls") %>%
kable()
| year | rank | name | absolute | percent | cumulative | sex |
|---|---|---|---|---|---|---|
| 1986 | 20 | Bettina | 596 | 1.48 | 45.33 | girls |
Bettina! Let’s see how this name did over the years.
names %>%
drop_na() %>%
ggplot(aes(x = year, y = rank)) +
geom_line(data = . %>% filter(name == "Bettina"),size = .5, color = my_red) +
geom_point(data = . %>% filter(name == "Bettina" & year == 1986), color = my_red, size = 5) +
geom_point(data = . %>% filter(year == 1997 & name == "Bettina"), color = my_red, size = 5) +
geom_text(data = . %>%
filter(year == 1997 & name == "Bettina"),
aes(label = paste0(as.character(year), ": Rank ", as.character(rank))), hjust = -0.1, vjust = 0) +
geom_text(data = . %>%
filter(year == 1986 & name == "Bettina"),
aes(label = paste0(as.character(year), ": Rank ", as.character(rank))), hjust = 0, vjust = 2.2) +
labs(title = "How does the female 1986 rank equivalent to Dominik\ndevelop over the years?",
subtitle = "Rank of the name Bettina by year",
y = "Rank",
x = "Year") +
xlim(1984, 2019) +
scale_y_reverse()+
theme_ft_rc()
Oh. A short rise after 1986 and then a steep decline, leaving the top 60 names in the late 90s.
So, we’ve seen some names’ popularity over the years but an urging question is of course:
Let’s find the names that have most often been given to newborns in Austria between 1984 and 2020.
names %>% group_by(name) %>% summarize(total = sum(absolute)) %>% arrange(desc(total)) %>% top_n(10) %>% kable()
| name | total |
|---|---|
| Anna | 58120 |
| Lukas | 51567 |
| Michael | 42815 |
| Katharina | 41969 |
| Markus | 40411 |
| Julia | 36670 |
| Alexander | 35439 |
| Daniel | 34861 |
| Stefan | 34456 |
| Christoph | 32898 |
Over 58,000 Annas and over 51,000 Lukas. Impressive, but no big surprises here, there’s Annas and Lukas everywhere all the time. Let’s see how they rank over the years.
names %>%
drop_na() %>%
ggplot(aes(x = year, y = rank)) +
geom_line(data = . %>% filter(name == "Anna"),size = .5, color = sonic_blue) +
geom_line(data = . %>% filter(name == "Lukas"),size = .5, color = my_red, linetype = "dashed") +
geom_text(data = . %>%
filter(year == 1988 & name == "Anna"),
aes(label = name), hjust = 1.1, vjust = 0, color = sonic_blue) +
geom_text(data = . %>%
filter(year == 1987 & name == "Lukas"),
aes(label = name), hjust = -0.5, vjust = -0.5, color = my_red) +
labs(title = "How do the two most popular names in Austria\ndevelop over time?",
subtitle = "Ranks of the names Anna and Lukas by year",
y = "Rank",
x = "Year") +
scale_y_reverse()+
theme_ft_rc()
Anna, leading the field since 1997, Lukas, leading the field since 1996. To my surprise, both names weren’t that popular before 1990, ranking only at about 30 in the mid 1980s. But then, in the late 1990s, they both take the lead in every single year. Not sure what that says about Austrians but okay.
We’ve seen quite some different trends over time. The rise and fall of Dominik, the rise of Tobias and the fall of Stefan, the steep decline of Bettina, and the increase to the top of Anna and Lukas. But back to the initial question if Dominik indeed is as rare a name as I’ve been told throughout my childhood: Nes. Yo. While it was somewhat popular-ish when I was born (rank 20), it had a steep rise in popularity, peaking at being the 4th most popular name for boys in Austria in 1994. Now, 27 later, the name is struggling to stay in the top 60 list.
If you wanna look at your own or other names and how they developed between 1984 and 2020 in Austria, have a look at the little app I made.
If you have any questions or feedback, don’t hesitate to contact me.