Chapter 1 Introduction
1.1 Import Data
First, we can import “Divorce.txt” downloaded from website https://instruction.bus.wisc.edu/jfrees/jfreesbooks/Longitudinal%20and%20Panel%20Data/Book/DataFiles.htm
These are data describing the divorce rate in each state. In addition, there is other socioeconomic information about a state that may be related to the divorce rate. In particular, data concerning the number of marriages and births, unemployment and crime rates, and AFDC (Aid to Families with Dependent Children) payments are available. In this file, data are available for the years 1965, 1975, 1985 and 1995. The information provided by this study is potentially useful for governing agencies in budgeting for social needs such as judicial and welfare services that are affected by divorce. The data for the study were collected from various U.S. Statistical Abstracts. Divorce rate is defined as the number of divorces and annulments per thousand population per state. The independent variables include the number of marriages and live births per thousand population, the total unemployment rate as percent of total work force, the average monthly AFDC payments per family, and the total number of criminal offenses known to the police (murder, rape, robbery, aggravated assault, burglary, larceny, and motor vehicle theft). Some of the data points contain missing observations due to unavailability, and Nevada is unusual due to its uniquely high and unrepresentative marriage and divorce rates. Source: U.S. Statistical Abstract, various issues.
Variable | Description |
---|---|
DIVORCE | Number of divorces and annulments per state per one thousand population. |
BIRTH | Number of live births per state per one thousand population. |
MARRIAGE | Number of marriages per state per one thousand population. |
UNEMPLOY | Total unemployment rate as a percentage of the total work force. |
CRIME | Total number of criminal offenses (murder, rape, robbery, aggravated assault, burglary, larceny and motor vehicle theft) known to police per one hundred thousand population. |
AFDC | Average monthly AFDC (Aid to Families with Dependent Children) payments per family. |
STATE | State identifier, 1-51. |
TIME | Time identifier, 1-4. |
# "\t" INDICATES SEPARATED BY TABLES ;
divorce = read.table("TXTData/Divorce.txt", sep ="\t", quote = "",header=TRUE)
# divorce = read.table(choose.files(), sep ="\t", quote = "",header=TRUE)
Let’s have a look at the dataset. The names of variables and the first 8 rows observations.
# PROVIDES THE NAMES IN THE FILE AND LISTS THE FIRST 8 OBSERVATIONS ;
names (divorce)
[1] "DIVORCE" "BIRTH" "MARRIAGE" "UNEMPLOY" "CRIME"
[6] "AFDC" "STATE" "TIME" "STATE.Name" "Region"
divorce[1:8,]
DIVORCE BIRTH MARRIAGE UNEMPLOY CRIME AFDC STATE TIME STATE.Name
1 2.6 19.9 8.8 4.9 6.799 114 1 1 Maine
2 2.3 19.5 13.4 2.8 6.106 188 2 1 New Hampshire
3 1.5 20.5 9.0 4.2 5.793 113 3 1 Vermont
4 1.5 18.8 7.1 4.9 15.072 188 4 1 Massachusetts
5 1.3 19.4 7.1 4.9 14.180 172 5 1 Rhode Island
6 1.3 19.2 7.4 3.9 11.749 197 6 1 Connecticut
7 0.5 18.6 7.4 4.6 22.509 218 7 1 New York
8 0.8 18.5 6.8 5.1 13.966 203 8 1 New Jersey
Region
1 New England
2 New England
3 New England
4 New England
5 New England
6 New England
7 Middle Atlantic
8 Middle Atlantic
We can check some summary statistics. The dimension of divorce
.
# SUMMARY STATISTICS ;
dim(divorce)
[1] 204 10
A summary of variables DIVORCE
and AFDC
.
summary(divorce[, c("DIVORCE", "AFDC")])
DIVORCE AFDC
Min. :0.500 Min. : 33.0
1st Qu.:3.300 1st Qu.:154.0
Median :4.250 Median :224.0
Mean :4.361 Mean :245.9
3rd Qu.:5.300 3rd Qu.:315.0
Max. :9.100 Max. :731.0
NA's :12 NA's :3
sd(divorce[,c("DIVORCE")], na.rm=TRUE) #The standard deviation of DIVORCE.
[1] 1.704068
sd(divorce[,c("AFDC")], na.rm=TRUE) #The standard deviation of AFDC.
[1] 122.2453
cor(divorce$DIVORCE, divorce$AFDC, use="pairwise.complete.obs")# The correlation between DIVORCE and AFDC.
[1] 0.07306962
1.2 Example 1.1: Divorce Rates (page 2)
1.2.1 Figure 1.1: Plot of 1965 divorce rates versus AFDC payments.
Figure 1.1 shows the 1965 divorce rates versus AFDC (Aid to Families with Dependent Children) payments for the fifty states.
# FIGURE 1.1. PLOT 1965 DATA ;
plot(DIVORCE ~ AFDC, subset=TIME %in% c(1),data = divorce, xaxt="n", yaxt="n",ylab="",xlab="")
axis(2, at=seq(0, 6, by=1), las=1, font=10, cex=0.005, tck=0.01)
axis(2, at=seq(0, 6, by=0.1), lab=F, tck=0.005)
axis(1, at=seq(20,220, by=20), font=10, cex=0.005, tck=0.01)
axis(1, at=seq(20,220, by=2), lab=F, tck=0.005)
mtext("DIVORCE", side=2, line=0, at=6, font=12, cex=1, las=1)
mtext("AFDC", side=1, line=3, at=120, font=12, cex=1)
We can also plot 1975 data following the same method.
# PLOT 1975 DATA ;
plot(DIVORCE ~ AFDC, subset=TIME %in% c(2),data = divorce,xaxt="n", yaxt="n",ylab="",xlab="")
axis(2, at=seq(2, 9, by=1), las=1, font=10, cex=0.005, tck=0.01)
axis(2, at=seq(2, 9, by=0.1), lab=F, tck=0.005)
axis(1, at=seq(0,400, by=100), font=10, cex=0.005, tck=0.01)
axis(1, at=seq(0,400, by=10), lab=F, tck=0.005)
mtext("DIVORCE", side=2, line=0, at=8.5, font=12, cex=1, las=1)
mtext("AFDC", side=1, line=3, at=200, font=12, cex=1)
1.2.2 Figure 1.2: Plot of divorce rate versus AFDC payments from 1965 and 1975.
Figure 1.2 shows both the 1965 and 1975 data; a line connects the two observations within each state. These lines represent a change over time (dynamic), not a cross-sectional relationship.
plot(DIVORCE ~ AFDC, data = subset(divorce, TIME %in% c(1, 2)), xaxt="n", yaxt="n",ylab="",xlab="")
for (i in divorce$STATE) {
lines(DIVORCE ~ AFDC, data = subset(divorce, TIME %in% c(1, 2) & STATE == i)) }
axis(2, at=seq(0, 10, by=1), las=1, font=10, cex=0.005, tck=0.01)
axis(2, at=seq(0, 10, by=0.1), lab=F, tck=0.005)
axis(1, at=seq(0,400, by=100), font=10, cex=0.005, tck=0.01)
axis(1, at=seq(0,400, by=10), lab=F, tck=0.005)
mtext("DIVORCE", side=2, line=0, at=8.5, font=12, cex=1, las=1)
mtext("AFDC", side=1, line=3, at=200, font=12, cex=1)
We can plot data for all years and connect the years.
# PLOT ALL DATA, CONNECTING THE YEARS ;
plot(DIVORCE ~ AFDC, data = divorce, xaxt="n", yaxt="n",ylab="",xlab="")
for (i in divorce$STATE) {
lines(DIVORCE ~ AFDC, data = subset(divorce, STATE == i)) }
axis(2, at=seq(0, 10, by=1), las=1, font=10, cex=0.005, tck=0.01)
axis(2, at=seq(0, 10, by=0.1), lab=F, tck=0.005)
axis(1, at=seq(0,800, by=100), font=10, cex=0.005, tck=0.01)
axis(1, at=seq(0,800, by=10), lab=F, tck=0.005)
mtext("DIVORCE", side=2, line=0, at=10, font=12, cex=1, las=1)
mtext("AFDC", side=1, line=3, at=400, font=12, cex=1)
We can also look at the multiple time series plot by the STATE
.
# MULTIPLE TIME SERIES PLOT ;
divorce$YEAR=divorce$TIME*10+1955
plot(DIVORCE ~ YEAR, data = divorce, xaxt="n", yaxt="n",ylab="",xlab="")
for (i in divorce$STATE) {
lines(DIVORCE ~ YEAR, data = subset(divorce, STATE == i)) }
axis(2, at=seq(0, 10, by=1), las=1, font=10, cex=0.005, tck=0.01)
axis(2, at=seq(0, 10, by=0.1), lab=F, tck=0.005)
axis(1, at=seq(1965,1995, by=10), font=10, cex=0.005, tck=0.01)
axis(1, at=seq(1964,2000, by=1), lab=F, tck=0.005)
mtext("DIVORCE", side=2, line=0, at=10, font=12, cex=1, las=1)
mtext("YEAR", side=1, line=3, at=1980, font=12, cex=1)