R is for actuaRies

by Dr. Maria Prokofieva

Dr Maria Prokofieva is a member of the R / Business working group which is promoting the use of R in accounting, auditing, and actuarial work. She is also a professor at the Victoria University Business School in Australia and works with CPA Australia.

What is actuarial science

Actuarial data science lies at the intersection of math and business studies, combining statistical knowledge and methods from insurance and finance areas. Compared to data scientists, actuaries focus more on finance and business knowledge, while still collecting and analyzing data.

The profession is in high demand, and according to the Bureau of Labor Statistics (BLS), it is expected that actuary jobs will a enjoy 24% increase from 2020-30. This is much faster than the average for all occupations. Moreover, the median salary for an actuary is estimated to be over $100,000.

The focus of the field is on assessing the likelihood of future events, particularly in business settings (especially finance and insurance) to plan for outcomes and mitigate risks. With this in mind, probability analysis and statistics are applied to very many areas, such as predicting the number of children for a health insurance or the payout of the life insurance policy. Some common tasks for actuaries include calculating premium rates for mortality and morbidity products, assessing likelihood of financial loss or return, business risk consulting, pension and retirement planning and many more. Basically, actuaries perform any tasks that include risk modeling, be that in insurance, financial planning or energy and environment. Later in this post, we will go through some examples of those!

Reviewing particular applications across areas, we can mention:

  • Insurance in such areas as life insurance, credit and mortgage insurance, key person insurance for small business, long term care insurance, health savings accounts. The focus here is on analysis of mortality, production of life tables, calculation of compound interest for life insurance, annuities and endowment contingencies.
# Using imaginator package for individual claim simulation
# extract from the vignette 
set.seed(12345)
tbl_policy <- policies_simulate(2, 2001:2005)
tbl_claim_transaction <- claims_by_wait_time(
  tbl_policy,
  claim_frequency = 2,
  payment_frequency = 3,
  occurrence_wait = 10,
  report_wait = 5,
  pay_wait = 5,
  pay_severity = 50)
kableExtra::kbl(tbl_claim_transaction[1:8,], 
      caption = "Wait-time modelling with policies simulation") %>%
  kableExtra::kable_classic(full_width = F, html_font = "Cambria")
Table 1: Wait-time modelling with policies simulation
policy_effective_date policy_expiration_date exposure policyholder_id claim_id occurrence_date report_date number_of_payments payment_date payment_amount
2001-05-23 2002-05-22 1 1 1 2001-06-02 2001-06-07 3 2001-06-12 50
2001-05-23 2002-05-22 1 1 1 2001-06-02 2001-06-07 3 2001-06-17 50
2001-05-23 2002-05-22 1 1 1 2001-06-02 2001-06-07 3 2001-06-22 50
2001-05-23 2002-05-22 1 1 2 2001-06-02 2001-06-07 3 2001-06-12 50
2001-05-23 2002-05-22 1 1 2 2001-06-02 2001-06-07 3 2001-06-17 50
2001-05-23 2002-05-22 1 1 2 2001-06-02 2001-06-07 3 2001-06-22 50
2001-02-21 2002-02-20 1 2 3 2001-03-03 2001-03-08 3 2001-03-13 50
2001-02-21 2002-02-20 1 2 3 2001-03-03 2001-03-08 3 2001-03-18 50
# Using MortalityTables package - Mortality tables
# extract from the vignette 
mortalityTables.load("Austria_Annuities")
mortalityTables.load("Austria_Census")
plotMortalityTables(
  title="Using dimensional information for mortality tables",
  mort.AT.census[c("m", "w"), c("1951", "1991", "2001", "2011")]) + 
  aes(color = as.factor(year), linetype = sex) + labs(color = "Period", linetype = "Sex")+
  scale_fill_brewer(palette="PiYG")

  • Life insurance and social insurance. Main tasks in this area include analysis of rates of disability, mordibity, mortality, fertility, etc., analysis of factors (such as georgaphy and consumer characteristics) on usage of medical services and procedures.
# Using MortalityTables package
# extract from the vignette 
mortalityTables.load("Austria_Annuities")
# Get the cohort death probabilities for Austrian Annuitants born in 1977:
qx.coh1977 = deathProbabilities(AVOe2005R.male, YOB = 1977)
# Get the period death probabilities for Austrian Annuitants observed in the year 2020:
qx.per2020 = periodDeathProbabilities(AVOe2005R.male, Period = 2020)
# Get the cohort death probabilities for Austrian Annuitants born in 1977 as a mortalityTable.period object:
table.coh1977 = getCohortTable(AVOe2005R.male, YOB = 1977)
# Get the period death probabilities for Austrian Annuitants observed in the year 2020:
table.per2020 = getPeriodTable(AVOe2005R.male, Period = 2020)
# Compare those two in a plot:
plot(table.coh1977, table.per2020, title = "Comparison of cohort 1977 with period 2020", legend.position = c(1,0))+
  scale_fill_brewer(palette="PiYG")

  • Pension: design, funding, accounting, administration, and maintenance or redesign of pension plans with valuation and modelling for various factors that may influence the calculation and payout
# Using lifecontingencies package
# Extract from vignette
# Calculation example for the benefit reserve for a deferred annuity-due on a policyholder aged 25 when the annuity is deferred until age 65.
yearlyRate <- 12000
irate <- 0.02
APV <- yearlyRate*axn(soa08Act, x=25, i=irate,m=65-25,k=12)
levelPremium <- APV/axn(soa08Act, x=25,n=65-25,k=12)
annuityReserve<-function(t) {
  out<-NULL
  if(t<65-25) out <- yearlyRate*axn(soa08Act, x=25+t,
                                    i=irate, m=65-(25+t),k=12)-levelPremium*axn(soa08Act,
                                                                                x=25+t, n=65-(25+t),k=12) else {
                                                                                  out <- yearlyRate*axn(soa08Act, x=25+t, i=irate,k=12)
                                                                                  }
  return(out)
  }
years <- seq(from=0, to=getOmega(soa08Act)-25-1,by=1)
annuityRes <- numeric(length(years))
for(i in years) annuityRes[i+1] <- annuityReserve(i)
dataAnnuityRes <- data.frame(years=years, reserve=annuityRes)
dataAnnuityRes%>%ggplot(aes(years, reserve))+
  geom_line(color="blue")+
  labs(x = "Years", y = "Amount",
       title ="Deferred annuity benefit reserve")

and many more! Just look at those examples!

Historically, actuarial science comes from the mathematical area (no surprise here!) and stems back to 17th century when developments in mathematics in Europe coincided with the growth in demands to have more precise estimation in risks in burial, life insurance and annuities calculations. This was the start of developing techniques for life tables (hail to John Graunt, the father of demography) and discounting values for present value calculation which is one of the keystone concepts in today’s finance and accounting. The field was developing very quickly but the computational side became more and more complex: without computers, this was a tedious task. But…

Why acturial science needs R

Long gone those days when all calculations were done by hand, table and god blessing… What is in use currently in the field?

Apparently… Excel still has its dominance (if not reign?) …

In March 2022 the Casualty Actuarial Society (CAS) released results and analysis from its first survey on technology used by actuaries. With responses from over 1,200 participants, the results suggest that more than 94.3% of respondents use Excel, and at least once a day, but! it is more than just one tool! With this in mind there is a strong demand for new skills- in R (47.2%), Python (39.1%) and SQL (30.8%).

To support this, the recent results (2020) of the “wasted time” survey show a striking outcome that actuaries waste most of their time…

  • waiting for excel to process the results

  • copying data from a model

  • recreating prior model values

  • waiting for vendor support

and quite a few of other tasks that could be easily resolved with open source and replicable modeling approach..

survey<-read_csv("survey_actuaries.csv")
survey%>%pivot_longer(-Tasks, names_to="Time", values_to="Score")%>%
   group_by(Tasks) %>% 
   mutate(sum_response = sum(Score))%>%
  ungroup()%>%
  ggplot(aes(x=fct_reorder(Tasks, sum_response), Score, fill=Time))+
  geom_col(show.legend = TRUE)+
  coord_flip()+
  scale_fill_brewer(palette="PiYG")+
  labs(y = "Responses", x = "Tasks",
title ="Data and model management tasks - wasted time")

What are the barriers? TIME! (80.5% of actuaries indicated so).. but what else is missing?

Learning resources

Learning resources is always a problem to move people to use available tools. Actuarial (data) science is not the exception. Good news is that there are some to start using R for actuarial data as well as there are some materials available to build further resources.

The available learning resources start with Introduction to R. This is a general category that includes the resources to get familiar with R, RStudio and main R packages (e.g. tidyverse).

R for Data Science

RStudio.cloud Primers

A well known list built by the enthusiastic and supportive R community: RStudio Education education.rstudio.com

In the meantime, there is a growing treasure trove of R resources developed by enthusiastic R Actuaries that address specific issues.

The work of the “Data Science” group of the Swiss Association of Actuaries (SAA) is amazing - lots of resources has been prepared, including lectures and hands-on tutorials in R.

Common and specific R packages used

Apart from educational resources, specialized packages are available to the community, including:

Such packages are scattered across CRAN and github as there is not currently a task view for actuarial data science at CRAN (hint!)…

There is a great potential in the actuarial data science and R / Business would love to invite the R passionate community to start a dialogue on promoting and developing R resources in this area!

Share Comments · · · · ·

You may leave a comment below or discuss the post in the forum community.rstudio.com.