You have a unique and rare name: How rare and unique is my name really?

by Dominik Freunberger

Dominik, you have a unique and rare name…

I’ve heard that sentence once too often. So I got the data from Statistik Austria to see if there’s something to it.

More specifically, I tried to answer the following questions:

How rare was my name when I was born and how did it change over the years?
How did the most and least popular names of my birth year develop over time?
What would be my female* name solely based on popularity in my birth year?
What are the most popular names over all the years?

I also made a little app in which you can check your own name’s popularity!

* Please note that Statistik Austria uses a binary classification for the names and the associated data, and there is criticism from many fields regarding this approach.

library(tidyverse)
library(knitr)
library(lubridate)
library(readxl)
library(ggrepel)
library(ggpubr)
library(hrbrthemes)
library(shiny)
library(bslib)

my_red = "#DC2F1E"
sonic_blue = "#C3DADC"

Statistik Austria provides the top 60 names per sex for the years 1984 through 2020. I’m born 1986, so I’ll get my birth year but not much before that. Note that the names are etymological names (as Statistik Austria calls it), meaning that the different spellings (Dominic, Dominique, Domenik, etc.) are collapsed to one form.

file = "statistik_der_60_haeufigsten_vornamen_1984-2020_in_oesterreich_-_etymologi.xlsx"
names = read_xlsx(file, skip = 3)

# it's the top 60 names from 1984 to 2020
# Important: Some years have multiple rank 60s.

names %<>%
  rename(boys_rank = "Rang...1",
         boys_name = "Vorname...2" ,
         boys_absolute = "Absolut...3" ,
         boys_percent = "in %...4",
         boys_cumulative = "% kumulativ...5",
         girls_rank = "Rang...6",
         girls_name = "Vorname...7",
         girls_absolute = "Absolut...8",
         girls_percent = "in %...9",
         girls_cumulative = "% kumulativ...10") %>%
  mutate(boys_rank = as.numeric(boys_rank),
         year = boys_rank,
         year = ifelse(year < 61, NA, year)) %>%
  fill(year) %>% 
  # at the botton we have summaries for 1984-2010 and 2010-2020
  slice_head(n= 2273) %>% 
  filter(boys_rank < 61)

boys = names %>% 
  select(11, 1:5) %>% 
  rename_all(
      funs(stringr::str_replace_all(., 'boys_', ''))) %>% 
  mutate(sex = "boys")

girls = names %>% 
  select(11, 6:10) %>% 
  rename_all(
      funs(stringr::str_replace_all(., 'girls_', ''))) %>% 
    mutate(sex = "girls")

names = rbind(girls, boys) %>%
  mutate(percent = round(percent, 2),
         cumulative = round(cumulative, 2))

# now we have a tidy df, yay!

head(names) %>% kable()

year	rank	name	absolute	percent	cumulative	sex
2020	1	Anna	1925	4.73	4.73	girls
2020	2	Marie	1304	3.21	7.94	girls
2020	3	Sophie	1287	3.17	11.11	girls
2020	4	Emilia	1105	2.72	13.82	girls
2020	5	Elena	1013	2.49	16.31	girls
2020	6	Lena	777	1.91	18.23	girls

Is “Dominik” even part of this list?

Since we get only the 60 most popular names per year adn sex, my name has to be at least the 60^th most popular in any given year to be in this list (which is already quite popular, given that in 2020 babies in Austria got 1772 different first names). Let’s see.

names %>% 
  filter(name == "Dominik") %>% 
  kable()

year	rank	name	absolute	percent	cumulative	sex
2020	45	Dominik	266	0.62	54.65	boys
2019	48	Dominik	261	0.60	56.58	boys
2018	39	Dominik	307	0.70	50.53	boys
2017	40	Dominik	325	0.72	52.50	boys
2016	39	Dominik	319	0.71	51.64	boys
2015	35	Dominik	358	0.82	49.96	boys
2014	35	Dominik	326	0.77	50.37	boys
2013	32	Dominik	336	0.82	48.58	boys
2012	32	Dominik	362	0.89	50.02	boys
2011	30	Dominik	384	0.95	48.90	boys
2010	29	Dominik	371	0.92	48.11	boys
2009	27	Dominik	412	1.23	49.94	boys
2008	27	Dominik	413	1.19	50.20	boys
2007	27	Dominik	401	1.18	50.49	boys
2006	28	Dominik	417	1.19	52.01	boys
2005	23	Dominik	506	1.44	46.14	boys
2004	24	Dominik	515	1.44	48.99	boys
2003	19	Dominik	581	1.67	42.57	boys
2002	18	Dominik	657	1.88	42.49	boys
2001	17	Dominik	650	1.94	42.71	boys
2000	10	Dominik	767	2.22	29.23	boys
1999	10	Dominik	813	2.35	32.81	boys
1998	10	Dominik	945	2.63	32.27	boys
1997	9	Dominik	1105	2.97	30.90	boys
1996	5	Dominik	1295	3.30	18.98	boys
1995	6	Dominik	1381	3.52	23.83	boys
1994	4	Dominik	1561	3.83	16.37	boys
1993	8	Dominik	1518	3.61	32.06	boys
1992	8	Dominik	1459	3.39	32.38	boys
1991	8	Dominik	1415	3.22	33.55	boys
1990	10	Dominik	1277	2.97	40.97	boys
1989	12	Dominik	1179	2.77	47.09	boys
1988	16	Dominik	927	2.17	57.41	boys
1987	20	Dominik	691	1.65	65.31	boys
1986	20	Dominik	702	1.66	64.84	boys
1985	22	Dominik	598	1.40	67.00	boys
1984	25	Dominik	432	1.01	70.64	boys

We can see that it’s not only in the data but in every single year (which makes it already consistently popular). Let’s see how this developed over time.

Popularity of the name Dominik from 1984 to 2020: Rise and fall

Below you see the proportion of boys named Dominik from 1984-2020, with my birth year and the most and least popular years highlighted and labeled.

names %>%
  filter(name == "Dominik") %>% 
  ggplot(aes(x = year, y = percent)) +
  geom_line(size = .5, alpha = .5) +
  geom_point(shape = 1, size = 4, color = my_red, alpha = 2/3) +
  geom_point(data = . %>% filter(percent == max(percent)), color = my_red, size = 5) +
  geom_point(data = . %>% filter(percent == min(percent)), color = my_red, size = 5) +
  geom_point(data = . %>% filter(year == 1986), color = my_red, size = 5) +
  geom_text(data = . %>% 
              filter(percent == min(percent)), 
              aes(label = paste0(as.character(year), ":")), hjust = 1.1, vjust = -1.6) +
  geom_text(data = . %>% 
              filter(percent == min(percent)), 
              aes(label = paste0(as.character(percent), "%")), hjust = 0, vjust = -1.6) +
  geom_text(data = . %>% 
              filter(percent == max(percent)), 
              aes(label = paste0(as.character(year), ":")), hjust = -0.2) +
  geom_text(data = . %>% 
              filter(percent == max(percent)), 
              aes(label = paste0(as.character(percent), "%")), hjust = -1.1) +
  geom_text(data = . %>% 
              filter(year == 1986), 
              aes(label = paste0(as.character(year), ":")), hjust = -0.9) +
  geom_text(data = . %>% 
              filter(year == 1986), 
              aes(label = paste0(as.character(percent), "%")), hjust = -1.8) +
  labs(title = "How frequent is the name Dominik?", 
       subtitle = "Proportion of boys in Austria named Dominik by year",
       y = "Proportion in %", 
       x = "Year") +
  theme_ft_rc()

Wait what? While I suspected that the name Dominik was more popular than I experienced it in the remote mountain village I grew up in, this pattern is quite a surprise. The steady rise and the steady fall after 1994. I tried to find external events, such as scandals involving Dominiks, that could have triggered the decline after 1994 but I couldn’t find anything involving a somewhat popular Dominik.

Now that we know the percentage of Dominiks each year, let’s see how Dominik ranked over the years.

Rank of the name Dominik compared with the most and least popular names of 1986

Here you see the popularity of Dominik based on its rank. I’ll also plot the least and most popular names of my birth year 1986 - Tobias and Stefan - to see how they did over time.

names %>% 
  filter(year == 1986 & rank == 1 & sex == "boys"  | year == 1986 & rank == 60 & sex == "boys") %>% 
  kable()

year	rank	name	absolute	percent	cumulative	sex
1986	1	Stefan	2297	5.44	5.44	boys
1986	60	Tobias	105	0.25	89.72	boys

names %>%
  drop_na() %>% 
  ggplot(aes(x = year, y = rank)) +
  geom_line(data = . %>% filter(name == "Stefan"),size = .5, color = sonic_blue) + 
  geom_line(data = . %>% filter(name == "Tobias"),size = .5, color = sonic_blue) + 
  geom_line(data = . %>% filter(name == "Dominik"),size = .5, color = my_red) +
  geom_point(data = . %>% filter(name == "Dominik" & year == 1986), color = my_red, size = 5) +
  geom_point(data = . %>% filter(name == "Tobias" & year == 1986), size = 5, color = sonic_blue) +
  geom_point(data = . %>% filter(name == "Stefan" & year == 1986), size = 5, color = sonic_blue) +
  geom_point(data = . %>% filter(year == 1994 & name == "Dominik"), color = my_red, size = 5) +
  geom_point(data = . %>% filter(year == 2019 & name == "Dominik"), color = my_red, size = 5) +
  geom_text(data = . %>% 
              filter(year == 2019 & name == "Dominik"), 
              aes(label = paste0(as.character(year), ": Rank ", as.character(rank))), hjust = 0.9, vjust = 2) +
  geom_text(data = . %>% 
              filter(year == 1986 & name == "Dominik"), 
              aes(label = paste0(as.character(year), ": Rank ", as.character(rank))), hjust = -0.2) +
  geom_text(data = . %>% 
              filter(year == 1994 & name == "Dominik"), 
              aes(label = paste0(as.character(year), ": Rank ", as.character(rank))), hjust = -0.1, vjust = -0.5) +
  geom_text(data = . %>% 
              filter(year == 1986 & name == "Stefan"), 
              aes(label = name), hjust = -0, vjust = 1.8, color = sonic_blue) +
  geom_text(data = . %>% 
              filter(year == 1986 & name == "Tobias"), 
              aes(label = name), hjust = -0.1, vjust = -0.5, color = sonic_blue) +
  labs(title = "How does the name Dominik rank and compare\nover the years?", 
       subtitle = "Ranks of the names Dominik, Stefan, and Tobias by year",
       y = "Rank", 
       x = "Year") +
  scale_y_reverse()+
  theme_ft_rc()

Dominik was the 4^th most popular name for boys in Austria in 1994. Wow. I certainly did not expect that. With its almost 4 percent prevalence, this means that every other representative first grade in 2000 had a Dominik in their class room. The most popular name of 1986, Stefan, ran out of popularity and didn’t even make the top 60 from 2018 on. The 60^th popular name of 1986, Tobias, in contrast, entered the field in 1986, had a steady increase in popularity and is to this day amongst the most popular names.

What would be my girl name based on popularity?

What if my parents chose my name purely based on its popularity? Maybe they thought “the baby’s name shouldn’t be too popular but also not too unpopular, something like rank 20”. Well, well, here I am, Dominik. If I were born as a girl, what would my name be?

names %>% 
  filter(year == 1986 & rank == 20 & sex == "girls") %>% 
  kable()

year	rank	name	absolute	percent	cumulative	sex
1986	20	Bettina	596	1.48	45.33	girls

Bettina! Let’s see how this name did over the years.

names %>%
  drop_na() %>% 
  ggplot(aes(x = year, y = rank)) +
  geom_line(data = . %>% filter(name == "Bettina"),size = .5, color = my_red) +
  geom_point(data = . %>% filter(name == "Bettina" & year == 1986), color = my_red, size = 5) +
  geom_point(data = . %>% filter(year == 1997 & name == "Bettina"), color = my_red, size = 5) +
  geom_text(data = . %>% 
              filter(year == 1997 & name == "Bettina"), 
              aes(label = paste0(as.character(year), ": Rank ", as.character(rank))), hjust = -0.1, vjust = 0) +
  geom_text(data = . %>% 
              filter(year == 1986 & name == "Bettina"), 
              aes(label = paste0(as.character(year), ": Rank ", as.character(rank))), hjust = 0, vjust = 2.2) +
  labs(title = "How does the female 1986 rank equivalent to Dominik\ndevelop over the years?", 
       subtitle = "Rank of the name Bettina by year",
       y = "Rank", 
       x = "Year") +
  xlim(1984, 2019) +
  scale_y_reverse()+
  theme_ft_rc()

Oh. A short rise after 1986 and then a steep decline, leaving the top 60 names in the late 90s.

So, we’ve seen some names’ popularity over the years but an urging question is of course:

What’s the most popular names from 1984-2020?

Let’s find the names that have most often been given to newborns in Austria between 1984 and 2020.

names %>% group_by(name) %>% summarize(total = sum(absolute)) %>% arrange(desc(total)) %>% top_n(10) %>%  kable()

name	total
Anna	58120
Lukas	51567
Michael	42815
Katharina	41969
Markus	40411
Julia	36670
Alexander	35439
Daniel	34861
Stefan	34456
Christoph	32898

Over 58,000 Annas and over 51,000 Lukas. Impressive, but no big surprises here, there’s Annas and Lukas everywhere all the time. Let’s see how they rank over the years.

names %>%
  drop_na() %>% 
  ggplot(aes(x = year, y = rank)) +
  geom_line(data = . %>% filter(name == "Anna"),size = .5, color = sonic_blue) + 
  geom_line(data = . %>% filter(name == "Lukas"),size = .5, color = my_red, linetype = "dashed") +
  geom_text(data = . %>% 
              filter(year == 1988 & name == "Anna"), 
              aes(label = name), hjust = 1.1, vjust = 0, color = sonic_blue) +
  geom_text(data = . %>% 
              filter(year == 1987 & name == "Lukas"), 
              aes(label = name), hjust = -0.5, vjust = -0.5, color = my_red) +
  labs(title = "How do the two most popular names in Austria\ndevelop over time?", 
       subtitle = "Ranks of the names Anna and Lukas by year",
       y = "Rank", 
       x = "Year") +
  scale_y_reverse()+
  theme_ft_rc()

Anna, leading the field since 1997, Lukas, leading the field since 1996. To my surprise, both names weren’t that popular before 1990, ranking only at about 30 in the mid 1980s. But then, in the late 1990s, they both take the lead in every single year. Not sure what that says about Austrians but okay.

Conclusion: Not so rare after all

We’ve seen quite some different trends over time. The rise and fall of Dominik, the rise of Tobias and the fall of Stefan, the steep decline of Bettina, and the increase to the top of Anna and Lukas. But back to the initial question if Dominik indeed is as rare a name as I’ve been told throughout my childhood: Nes. Yo. While it was somewhat popular-ish when I was born (rank 20), it had a steep rise in popularity, peaking at being the 4^th most popular name for boys in Austria in 1994. Now, 27 later, the name is struggling to stay in the top 60 list.

Check your own name’s popularity!

If you wanna look at your own or other names and how they developed between 1984 and 2020 in Austria, have a look at the little app I made.

If you have any questions or feedback, don’t hesitate to contact me.