Preface

Date: 01 October 2024

About Regression Modeling

Statistical techniques can be used to address new situations. This is important in a rapidly evolving risk management world. Analysts with a strong analytical background understand that a large data set can represent a treasure trove of information to be mined and can yield a strong competitive advantage. This book and online tutorial provides budding analysts with a foundation in multiple reression. Viewers will learn about these statistical techniques using data on the demand for insurance, healthcare expenditures, and other applications. Although no specific knowledge of actuarial or risk management is presumed, the approach introduces applications in which statistical techniques can be used to analyze real data of interest.

Resources

This tutorial is based on the book Regression Modeling with Actuarial and Financial Applications.
- For resources associated with the book, please visit the Regression Modeling book web site.
For advanced regression applications in insurance, you may be interested in the series, Predictive Modeling Applications in Actuarial Science.
- Sample code and data for the series are available at series website.
An earlier version of this tutorial, a Short Course constructed for Indonesian actuaries, uses the Datacamp learning platform.

Tutorial Description

This online tutorial is designed to guide you through the foundations of regession with applications in actuarial science.
Anticipated completion time is approximately six hours.
The tutorial assumes that you are familiar with the foundations in the statistical software R, such as Datacamp’s Introduction to R.

General Layout. There are five chapters in this tutorial that summarize the foundations of multiple linear regression. Each chapter is subdivided into several sections. At the beginning of each section is a short video, typically 4-8 minutes, that summarizes the section key learning outcomes. Following the video, you can see more details about the underlying R code for the analysis presented in the video.

Role of Exercises. Following each video, there are one or two exercises that allow you to practice skills to make sure that you fully grasp the learning outcomes. The exercises are implented using an online learning platfor provided by Datacamp so that you need not install R. Feedback is programmed into the exercises so that you will learn a lot by making mistakes! You will be pacing yourself, so always feel free to reveal the answers by hitting the Solution tab. Remember, going through quickly is not equivalent to learning deeply. Use this tool to enhance your understanding of one of the foundations of data science, regression analysis.

Aquí está el Prefacio en español

Sobre el Modelado de Regresión

Las técnicas estadísticas se pueden usar para abordar nuevas situaciones. Esto es importante en un mundo de gestión de riesgos en rápida evolución. Los analistas con una fuerte formación analítica entienden que un gran conjunto de datos puede representar un tesoro de información para explorar y puede ofrecer una gran ventaja competitiva. Este libro y tutorial en línea proporcionan a los analistas principiantes una base en regresión múltiple. Los usuarios aprenderán sobre estas técnicas estadísticas usando datos sobre la demanda de seguros, gastos de salud y otras aplicaciones. Aunque no se presume un conocimiento específico de actuaría o gestión de riesgos, el enfoque introduce aplicaciones en las que las técnicas estadísticas se pueden usar para analizar datos reales de interés.

Recursos

Este tutorial se basa en el libro Modelado de Regresión con Aplicaciones Actuariales y Financieras.
- Para recursos asociados con el libro, por favor visite el sitio web del libro sobre Modelado de Regresión.
Para aplicaciones avanzadas de regresión en seguros, puede estar interesado en la serie, Aplicaciones de Modelado Predictivo en la Ciencia Actuarial.
- El código y los datos de muestra para la serie están disponibles en el sitio web de la serie.
Una versión anterior de este tutorial, un Curso Corto construido para actuarios indonesios, usa la plataforma de aprendizaje Datacamp.

Descripción del Tutorial

Este tutorial en línea está diseñado para guiarlo a través de los fundamentos de la regresión con aplicaciones en la ciencia actuarial.
Se anticipa que el tiempo de finalización es de aproximadamente seis horas.
El tutorial asume que está familiarizado con los fundamentos del software estadístico R, como el curso de Datacamp Introducción a R.

Estructura General. Hay cinco capítulos en este tutorial que resumen los fundamentos de la regresión lineal múltiple. Cada capítulo está subdividido en varias secciones. Al principio de cada sección hay un video corto, típicamente de 4-8 minutos, que resume los resultados clave de aprendizaje de la sección. Después del video, puede ver más detalles sobre el código subyacente de R para el análisis presentado en el video.

Papel de los Ejercicios. Después de cada video, hay uno o dos ejercicios que le permiten practicar habilidades para asegurarse de que comprende completamente los resultados de aprendizaje. Los ejercicios se implementan usando una plataforma de aprendizaje en línea proporcionada por Datacamp para que no necesite instalar R. La retroalimentación está programada en los ejercicios para que aprenda mucho al cometer errores. Usted mismo marcará el ritmo, así que siéntase libre de revelar las respuestas presionando la pestaña Solution. Recuerde, pasar rápidamente no equivale a aprender profundamente. Use esta herramienta para mejorar su comprensión de uno de los fundamentos de la ciencia de datos, el análisis de regresión.

Welcome to the Tutorial Video

In this video, you learn how to:

Describe regression briefly, i.e., in a nutshell
Explain Galton’s height example as a regression application

En Español

Video Overhead

Hide

A. Galton’s 1885 Regression Data

\[ \small{\begin{array}{l|ccccccccccc|c} \hline \text{Height of }& & & & & & & & & & & & \\ \text{adult child }& & & & & & & & & & & & \\ \text{in inches }& <64.0 & 64.5 & 65.5 & 66.5 & 67.5 & 68.5 & 69.5 & 70.5 & 71.5 & 72.5 & >73.0 & \text{Totals} \\ \hline >73.7 & - & - & - & - & - & - & 5 & 3 & 2 & 4 & - & 14 \\ 73.2 & - & - & - & - & - & 3 & 4 & 3 & 2 & 2 & 3 & 17 \\ 72.2 & - & - & 1 & - & 4 & 4 & 11 & 4 & 9 & 7 & 1 & 41 \\ 71.2 & - & - & 2 & - & 11 & 18 & 20 & 7 & 4 & 2 & - & 64 \\ 70.2 & - & - & 5 & 4 & 19 & 21 & 25 & 14 & 10 & 1 & - & 99 \\ 69.2 & 1 & 2 & 7 & 13 & 38 & 48 & 33 & 18 & 5 & 2 & - & 167 \\ 68.2 & 1 & - & 7 & 14 & 28 & 34 & 20 & 12 & 3 & 1 & - & 120 \\ 67.2 & 2 & 5 & 11 & 17 & 38 & 31 & 27 & 3 & 4 & - & - & 138 \\ 66.2 & 2 & 5 & 11 & 17 & 36 & 25 & 17 & 1 & 3 & - & - & 117 \\ 65.2 & 1 & 1 & 7 & 2 & 15 & 16 & 4 & 1 & 1 & - & - & 48 \\ 64.2 & 4 & 4 & 5 & 5 & 14 & 11 & 16 & - & - & - & - & 59 \\ 63.2 & 2 & 4 & 9 & 3 & 5 & 7 & 1 & 1 & - & - & - & 32 \\ 62.2 & - & 1 & - & 3 & 3 & - & - & - & - & - & - & 7 \\ <61.2 & 1 & 1 & 1 & - & - & 1 & - & 1 & - & - & - & 5 \\ \hline \text{Totals }& 14 & 23 & 66 & 78 & 211 & 219 & 183 & 68 & 43 & 19 & 4 & 928 \\ \hline \end{array}} \]

Hide

B. Supporting R Code

# Reformat Data Set
#heights <- read.csv("CSVData\\GaltonFamily.csv",header = TRUE)
heights <- read.csv("https://assets.datacamp.com/production/repositories/2610/datasets/c85ede6c205d22049e766bd08956b225c576255b/galton_height.csv", header = TRUE)
str(heights)
head(heights)
heights$child_ht <- heights$CHILDC
heights$parent_ht <- heights$PARENTC
heights2 <- heights[c("child_ht","parent_ht")]

#heights <- read.csv("CSVData\\galton_height.csv",header = TRUE)
heights <- read.csv("https://assets.datacamp.com/production/repositories/2610/datasets/c85ede6c205d22049e766bd08956b225c576255b/galton_height.csv", header = TRUE)
plot(jitter(heights$parent_ht),jitter(heights$child_ht), ylim = c(60,80), xlim = c(60,80),
     ylab = "height of child", xlab = "height of parents")
abline(lm(heights$child_ht~heights$parent_ht))
abline(0,1,col = "red", lty=2)

summary(lm(heights$child_ht~heights$parent_ht))


Call:
lm(formula = heights$child_ht ~ heights$parent_ht)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.2577 -1.4280  0.1323  1.5720  5.7918 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       25.84856    2.69009   9.609   <2e-16 ***
heights$parent_ht  0.60992    0.03882  15.710   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.26 on 926 degrees of freedom
Multiple R-squared:  0.2104,    Adjusted R-squared:  0.2096 
F-statistic: 246.8 on 1 and 926 DF,  p-value: < 2.2e-16

Online Tutorial on `Regression Modeling with Actuarial and Financial Applications`

Online Tutorial on `Regression Modeling with Actuarial and Financial Applications`

Preface

About Regression Modeling

Resources

Tutorial Description

Aquí está el Prefacio en español

Welcome to the Tutorial Video

En Español

Video Overhead

A. Galton’s 1885 Regression Data

B. Supporting R Code

Online Tutorial on Regression Modeling with Actuarial and Financial Applications

Preface

About Regression Modeling

Resources

Tutorial Description

Aquí está el Prefacio en español

Welcome to the Tutorial Video

En Español

Video Overhead

A. Galton’s 1885 Regression Data

B. Supporting R Code

Online Tutorial on `Regression Modeling with Actuarial and Financial Applications`