Chapter 3 Models with Random Effects
3.1 Import Data
# "\t" INDICATES SEPARATED BY TABLES ;
taxprep = read.table("TXTData/TaxPrep.txt", sep ="\t", quote = "",header=TRUE)
# taxprep=read.table(choose.files(), header=TRUE, sep="\t")
Data for this study are from the Statistics of Income (SOI) Panel of Individual Returns, a part of the Ernst and Young/University of Michigan Tax Research Database. The SOI Panel represents a simple random sample of unaudited individual income tax returns filed for tax years 1979-1990. The data are compiled from a stratified probability sample of unaudited individual income tax returns, Forms 1040, 1040A and 1040EZ, filed by U.S. taxpayers. The estimates that are obtained from these data are intended to represent all returns filed for the income tax years under review. All returns processed are subjected to sampling except tentative and amended returns.
Variable | Description |
---|---|
MS | is an indicator variable of the taxpayer’s marital status. It is coded one if the taxpayer is married and zero otherwise. |
HH | is an indicator variable, one if the taxpayer is a head of household and zero otherwise. |
DEPEND | is the number of dependents claimed by the taxpayer. |
AGE | is the presence of an indicator for age 65 or over. |
F1040A | is an indicator variable of the taxpayer’s filing type. It is coded one if the taxpayer uses Form 1040A and zero otherwise. |
F1040EZ | is an indicator variable of the taxpayer’s filing type. It is coded one if the taxpayer uses Form 1040EZ and zero otherwise. |
TPI | is the sum of all positive income line items on the return. is a marginal tax rate. |
TXRT | is a marginal tax rate。 It is computed on TPI less exemptions and the standard deduction. |
MR | is an exogenous marginal tax rate. It is computed on TPI less exemptions and the standard deduction. |
EMP | is an indicator variable, one if Schedule C or F is present and zero otherwise. Self-employed taxpayers have greater need for professional assistance to reduce the reporting risks of doing business. |
PREP | is a variable indicating the presence of a paid preparer. |
TAX | is the tax liability on the return. |
SUBJECT | Subject identifier, 1- 258. |
TIME | Time identifier, 1-5. |
LNTAX | is the natural logarithm of the tax liability on the return. |
LNTPI | is the natural logarithm of the sum of all positive income line items on the return. |
3.2 Example 3.2: Income Tax Payments (Page 81)
In this section, we study the effects that an individual’s economic and demographic characteristics have on the amount of income tax paid. Specifically, the response of interest is LNTAX
, defined as the natural logarithm of the liability on the tax return.
3.2.1 Table 3.2. Averages of binary variables
The binary variables in Table 3.2 indicate that over half the sample is married (MS) and approximately half the sample uses a paid preparer (PREP).
library(nlme)
gsummary(taxprep[, c("MS", "HH", "AGE", "EMP", "PREP")], groups=taxprep$TIME, FUN=mean)
## MS HH AGE EMP PREP
## 1 0.5968992 0.08139535 0.08527132 0.1395349 0.4496124
## 2 0.5968992 0.09302326 0.10465116 0.1589147 0.4418605
## 3 0.6240310 0.08527132 0.11240310 0.1550388 0.4844961
## 4 0.6472868 0.08139535 0.13178295 0.1472868 0.5077519
## 5 0.6472868 0.09302326 0.14728682 0.1472868 0.5155039
3.2.2 TABLE 3.3 - Summary statistics for continuous variables
Tables 3.2 and 3.3 describe the basic taxpayer characteristics used in our analysis. The summary statistics for the other nonbinary variables are in Table 3.3.
summary(taxprep[, c("DEPEND", "LNTPI", "MR", "LNTAX")]) #summary does not provid standard deviation
## DEPEND LNTPI MR LNTAX
## Min. :0.000 Min. :-0.1275 Min. : 0.00 Min. : 0.000
## 1st Qu.:1.000 1st Qu.: 9.4467 1st Qu.:15.00 1st Qu.: 6.645
## Median :2.000 Median :10.0506 Median :22.00 Median : 7.701
## Mean :2.419 Mean : 9.8886 Mean :23.52 Mean : 6.880
## 3rd Qu.:3.000 3rd Qu.:10.5320 3rd Qu.:33.00 3rd Qu.: 8.420
## Max. :6.000 Max. :13.2220 Max. :50.00 Max. :11.860
Standard deviation of some variables.
#Standard Deviation
var<-var(taxprep[, c("DEPEND", "LNTPI", "MR", "LNTAX")])
sqrt(diag(var))
## DEPEND LNTPI MR LNTAX
## 1.337562 1.164625 11.453800 2.694961
3.2.3 TABLE 3.4 - Averages by level of binary explanatory variable
To explore the relationship between each indicator variable and logarithmic tax, Table 3.4 presents the average logarithmic tax liability by level of indicator variable. This table shows that married filers pay greater tax, head-of-household filers pay less tax, taxpayers 65 or over pay less, taxpayers with self-employed income pay less, and taxpayers who use a professional tax preparer pay more.
library(Hmisc)
summarize(taxprep$LNTAX, taxprep$MS, mean)
## taxprep$MS taxprep$LNTAX
## 1 0 5.973412
## 2 1 7.429948
summarize(taxprep$LNTAX, taxprep$HH, mean)
## taxprep$HH taxprep$LNTAX
## 1 0 7.013197
## 2 1 5.479947
summarize(taxprep$LNTAX, taxprep$AGE, mean)
## taxprep$AGE taxprep$LNTAX
## 1 0 6.939184
## 2 1 6.430867
summarize(taxprep$LNTAX, taxprep$EMP, mean)
## taxprep$EMP taxprep$LNTAX
## 1 0 6.982682
## 2 1 6.296879
summarize(taxprep$LNTAX, taxprep$PREP, mean)
## taxprep$PREP taxprep$LNTAX
## 1 0 6.623648
## 2 1 7.158049
# TABLE counts of BINARY EXPLANATORY VARIABLE
# CREATE CATEGORICAL VARIABLE
taxprep$MSF=taxprep$MS
taxprep$HHF=taxprep$HH
taxprep$AGEF=taxprep$AGE
taxprep$EMPF=taxprep$EMP
taxprep$PREPF=taxprep$PREP
table(taxprep$MSF)
##
## 0 1
## 487 803
table(taxprep$HHF)
##
## 0 1
## 1178 112
table(taxprep$AGEF)
##
## 0 1
## 1140 150
table(taxprep$EMPF)
##
## 0 1
## 1097 193
table(taxprep$PREPF)
##
## 0 1
## 671 619
3.2.4 TABLE 3.5 - Correlation for continous variables
Table 3.5 summarizes basic relations among logarithmic tax and the other nonbinary explanatory variables. Both LNTPI
and MR
are strongly correlated with logarithmic tax whereas the relationship between DEPEND
and logarithmic tax is positive, yet weaker. Table 3.5 also shows that LNTPI
and MR
are strongly positively correlated.
cor(taxprep[,c("LNTAX", "DEPEND", "LNTPI", "MR")])
## LNTAX DEPEND LNTPI MR
## LNTAX 1.00000000 0.08519899 0.7176476 0.7466574
## DEPEND 0.08519899 1.00000000 0.2777381 0.1275044
## LNTPI 0.71764760 0.27773808 1.0000000 0.7958007
## MR 0.74665744 0.12750438 0.7958007 1.0000000
3.2.5 FIGURE 3.2: Basic added variable plot (y vs. x)
Moreover, both the mean and median marginal tax rates (MR
) are decreasing, although mean and median tax liabilities (LNTAX
) are stable (see Figure 3.2). These results are consistent with congressional efforts to reduce rates and expand the tax base through broadening the definition of income and eliminating deductions.
#CREATE CATEGORICAL VARIABLE
taxprep$SUBJECT1=factor(taxprep$SUBJECT)
lntax.lm = lm(LNTAX ~ SUBJECT1, data=taxprep)
lntpi.lm = lm(LNTPI ~ SUBJECT1, data=taxprep)
taxprep$Resid1=residuals(lntax.lm)
taxprep$Resid2=residuals(lntpi.lm)
plot(Resid1 ~ Resid2, data=taxprep, xaxt="n", yaxt="n", ylab="", xlab="")
axis(2, at=seq(-8, 7, by=2), las=1, font=10, cex=0.005, tck=0.01)
axis(2, at=seq(-8, 8, by=0.2), lab=F, tck=0.005)
axis(1, at=seq(-8,4, by=2), font=10, cex=0.005, tck=0.01)
axis(1, at=seq(-8, 4, by=0.2), lab=F, tck=0.005)
mtext("Residuals from LNTAX", side=2, line=-7, at=7.5, font=10, cex=1, las=1)
mtext("Residuals from LNTPI", side=1, line=3, at=-2, font=10, cex=1)
3.2.6 Five ways of doing the pooling model
#1
summary(lm(LNTAX~MS+HH+AGE+EMP+PREP+LNTPI+DEPEND+MR, data=taxprep))
##
## Call:
## lm(formula = LNTAX ~ MS + HH + AGE + EMP + PREP + LNTPI + DEPEND +
## MR, data = taxprep)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.6371 -0.4145 0.3875 0.9473 10.2893
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.4303158 0.5647315 -6.074 1.64e-09 ***
## MS 0.0417333 0.1546948 0.270 0.787374
## HH -0.6969164 0.1924733 -3.621 0.000305 ***
## AGE 0.0121770 0.1562682 0.078 0.937901
## EMP -0.6216132 0.1351459 -4.600 4.65e-06 ***
## PREP -0.0004085 0.0976086 -0.004 0.996661
## LNTPI 0.8332396 0.0715145 11.651 < 2e-16 ***
## DEPEND -0.1395622 0.0496814 -2.809 0.005043 **
## MR 0.1077579 0.0069511 15.502 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.676 on 1281 degrees of freedom
## Multiple R-squared: 0.6155, Adjusted R-squared: 0.6131
## F-statistic: 256.3 on 8 and 1281 DF, p-value: < 2.2e-16
#2
summary(glm(LNTAX~MS+HH+AGE+EMP+PREP+LNTPI+DEPEND+MR,data=taxprep,
family=gaussian(link=identity)) )
##
## Call:
## glm(formula = LNTAX ~ MS + HH + AGE + EMP + PREP + LNTPI + DEPEND +
## MR, family = gaussian(link = identity), data = taxprep)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -9.6371 -0.4145 0.3875 0.9473 10.2893
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.4303158 0.5647315 -6.074 1.64e-09 ***
## MS 0.0417333 0.1546948 0.270 0.787374
## HH -0.6969164 0.1924733 -3.621 0.000305 ***
## AGE 0.0121770 0.1562682 0.078 0.937901
## EMP -0.6216132 0.1351459 -4.600 4.65e-06 ***
## PREP -0.0004085 0.0976086 -0.004 0.996661
## LNTPI 0.8332396 0.0715145 11.651 < 2e-16 ***
## DEPEND -0.1395622 0.0496814 -2.809 0.005043 **
## MR 0.1077579 0.0069511 15.502 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 2.810093)
##
## Null deviance: 9361.8 on 1289 degrees of freedom
## Residual deviance: 3599.7 on 1281 degrees of freedom
## AIC: 5004.7
##
## Number of Fisher Scoring iterations: 2
#3
library(plm)
summary(plm(LNTAX~MS+HH+AGE+EMP+PREP+LNTPI+DEPEND+MR, data=taxprep,
index=c("SUBJECT","TIME"),model="pooling"))
## Pooling Model
##
## Call:
## plm(formula = LNTAX ~ MS + HH + AGE + EMP + PREP + LNTPI + DEPEND +
## MR, data = taxprep, model = "pooling", index = c("SUBJECT",
## "TIME"))
##
## Balanced Panel: n = 258, T = 5, N = 1290
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -9.63705 -0.41454 0.38751 0.94730 10.28935
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -3.43031582 0.56473154 -6.0742 1.640e-09 ***
## MS 0.04173327 0.15469480 0.2698 0.7873744
## HH -0.69691637 0.19247327 -3.6208 0.0003051 ***
## AGE 0.01217701 0.15626823 0.0779 0.9379009
## EMP -0.62161319 0.13514593 -4.5996 4.653e-06 ***
## PREP -0.00040854 0.09760862 -0.0042 0.9966612
## LNTPI 0.83323956 0.07151445 11.6513 < 2.2e-16 ***
## DEPEND -0.13956224 0.04968138 -2.8091 0.0050428 **
## MR 0.10775788 0.00695106 15.5024 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 9361.8
## Residual Sum of Squares: 3599.7
## R-Squared: 0.61549
## Adj. R-Squared: 0.61308
## F-statistic: 256.31 on 8 and 1281 DF, p-value: < 2.22e-16
#4
library(nlme)
summary(gls(LNTAX~MS+HH+AGE+EMP+PREP+LNTPI+DEPEND+MR,data=taxprep))
## Generalized least squares fit by REML
## Model: LNTAX ~ MS + HH + AGE + EMP + PREP + LNTPI + DEPEND + MR
## Data: taxprep
## AIC BIC logLik
## 5037.135 5088.688 -2508.567
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -3.430316 0.5647315 -6.074242 0.0000
## MS 0.041733 0.1546948 0.269778 0.7874
## HH -0.696916 0.1924733 -3.620848 0.0003
## AGE 0.012177 0.1562682 0.077924 0.9379
## EMP -0.621613 0.1351459 -4.599570 0.0000
## PREP -0.000409 0.0976086 -0.004185 0.9967
## LNTPI 0.833240 0.0715145 11.651345 0.0000
## DEPEND -0.139562 0.0496814 -2.809146 0.0050
## MR 0.107758 0.0069511 15.502370 0.0000
##
## Correlation:
## (Intr) MS HH AGE EMP PREP LNTPI DEPEND
## MS 0.225
## HH 0.061 0.485
## AGE -0.011 -0.218 -0.045
## EMP -0.119 -0.093 0.025 -0.038
## PREP -0.024 -0.024 0.013 -0.158 -0.141
## LNTPI -0.969 -0.208 -0.100 -0.062 0.107 -0.008
## DEPEND -0.015 -0.644 -0.316 0.292 -0.037 -0.064 -0.100
## MR 0.654 -0.002 0.066 0.162 -0.049 -0.076 -0.778 0.151
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -5.7488891 -0.2472916 0.2311642 0.5651018 6.1380086
##
## Residual standard error: 1.676333
## Degrees of freedom: 1290 total; 1281 residual
#install.packages("gee")
#5
library(gee)
summary(gee(LNTAX~MS+HH+AGE+EMP+PREP+LNTPI+DEPEND+MR, id=SUBJECT, data=taxprep,
family=gaussian(link=identity), corstr="independence") )
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept) MS HH AGE EMP
## -3.4303158248 0.0417332677 -0.6969163659 0.0121770081 -0.6216131914
## PREP LNTPI DEPEND MR
## -0.0004085366 0.8332395616 -0.1395622388 0.1077578820
##
## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
## gee S-function, version 4.13 modified 98/01/27 (1998)
##
## Model:
## Link: Identity
## Variance to Mean Relation: Gaussian
## Correlation Structure: Independent
##
## Call:
## gee(formula = LNTAX ~ MS + HH + AGE + EMP + PREP + LNTPI + DEPEND +
## MR, id = SUBJECT, data = taxprep, family = gaussian(link = identity),
## corstr = "independence")
##
## Summary of Residuals:
## Min 1Q Median 3Q Max
## -9.6370540 -0.4145432 0.3875082 0.9472989 10.2893480
##
##
## Coefficients:
## Estimate Naive S.E. Naive z Robust S.E.
## (Intercept) -3.4303158248 0.564731536 -6.074241663 1.54222163
## MS 0.0417332677 0.154694799 0.269778091 0.14854310
## HH -0.6969163659 0.192473270 -3.620847529 0.23465427
## AGE 0.0121770081 0.156268232 0.077923759 0.18216326
## EMP -0.6216131914 0.135145927 -4.599570312 0.17738436
## PREP -0.0004085366 0.097608623 -0.004185456 0.09723974
## LNTPI 0.8332395616 0.071514453 11.651345042 0.19444654
## DEPEND -0.1395622388 0.049681383 -2.809145616 0.05180461
## MR 0.1077578820 0.006951058 15.502370457 0.01417988
## Robust z
## (Intercept) -2.224269044
## MS 0.280950564
## HH -2.969970915
## AGE 0.066846673
## EMP -3.504329265
## PREP -0.004201333
## LNTPI 4.285185792
## DEPEND -2.694012121
## MR 7.599350059
##
## Estimated Scale Parameter: 2.810093
## Number of Iterations: 1
##
## Working Correlation
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 0 0 0 0
## [2,] 0 0 0 0 0
## [3,] 0 0 0 0 0
## [4,] 0 0 0 0 0
## [5,] 0 0 0 0 0
3.2.7 DISPLAY 3.1 - Error components model
The estimated model appears in Display 3.1, from a fit using the statistical package SAS. Display 3.1 shows that HH, EMP, LNTPI, and MR are statistically significant variables that affect LNTAX. Somewhat surprisingly, the PREP variable was not statistically significant.
random<-lme(LNTAX~MS+HH+AGE+EMP+PREP+LNTPI+DEPEND+MR, data=taxprep, random=~1|SUBJECT, method="ML")
## NOTE* THE DEFAULT METHOD IN lme IS "REML"
summary(random)
## Linear mixed-effects model fit by maximum likelihood
## Data: taxprep
## AIC BIC logLik
## 4813.255 4870.041 -2395.627
##
## Random effects:
## Formula: ~1 | SUBJECT
## (Intercept) Residual
## StdDev: 0.9602161 1.368896
##
## Fixed effects: LNTAX ~ MS + HH + AGE + EMP + PREP + LNTPI + DEPEND + MR
## Value Std.Error DF t-value p-value
## (Intercept) -2.9603371 0.5705536 1024 -5.188534 0.0000
## MS 0.0373000 0.1824839 1024 0.204402 0.8381
## HH -0.6889876 0.2320057 1024 -2.969702 0.0031
## AGE 0.0207431 0.2000035 1024 0.103713 0.9174
## EMP -0.5048035 0.1679848 1024 -3.005054 0.0027
## PREP -0.0217036 0.1175229 1024 -0.184675 0.8535
## LNTPI 0.7604058 0.0699692 1024 10.867728 0.0000
## DEPEND -0.1127475 0.0592818 1024 -1.901891 0.0575
## MR 0.1153752 0.0073142 1024 15.774213 0.0000
## Correlation:
## (Intr) MS HH AGE EMP PREP LNTPI DEPEND
## MS 0.176
## HH 0.030 0.419
## AGE -0.043 -0.167 -0.023
## EMP -0.116 -0.069 0.024 -0.030
## PREP -0.035 -0.045 0.004 -0.115 -0.112
## LNTPI -0.948 -0.180 -0.081 -0.043 0.099 -0.016
## DEPEND -0.074 -0.604 -0.269 0.224 -0.038 -0.039 -0.068
## MR 0.522 -0.020 0.055 0.149 -0.041 -0.051 -0.698 0.102
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -5.83483692 -0.21263981 0.09677632 0.39814646 5.79731648
##
## Number of Observations: 1290
## Number of Groups: 258
3.3 Section 3.3 - Random coefficients model
If we include all random coefficients of eight covariates (MS+HH+AGE+EMP+PREP+LNTPI+DEPEND+MR
), then the number of cofficients exceeds the number of observations and it leads to warnings. If we only include a few of covariates, then lmer()
can be used to estimate the random coefficients model. Here is an example of including (1+MS+HH|SUBJECT
) as the random part.
#randomcoeff<-lme(LNTAX~MS+HH+AGE+EMP+PREP+LNTPI+DEPEND+MR, data=taxprep, random=~1+MS+HH+AGE+EMP+PREP+LNTPI+DEPEND+MR|SUBJECT, method="ML")
# NOTE*:It takes forever to run the estimation, in the end a warning messaged was given.
# No estimation result was produced.
# The reason is due to the fact that in SAS, the method of mivque0 allows estimation for this model, in R this method is not readily available to be coded.
library(lme4)
#randomcoeff<-lmer(LNTAX ~ MS+HH+AGE+EMP+PREP+LNTPI+DEPEND+MR+(1+MS+HH+AGE+EMP+PREP+LNTPI+DEPEND+MR|SUBJECT),data=taxprep)
randomcoeff<-lmer(LNTAX ~ MS+HH+AGE+EMP+PREP+LNTPI+DEPEND+MR+(1+MS+HH|SUBJECT),data=taxprep)
## boundary (singular) fit: see ?isSingular
summary(randomcoeff)
## Linear mixed model fit by REML ['lmerMod']
## Formula: LNTAX ~ MS + HH + AGE + EMP + PREP + LNTPI + DEPEND + MR + (1 +
## MS + HH | SUBJECT)
## Data: taxprep
##
## REML criterion at convergence: 4796
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.9447 -0.2058 0.0971 0.3869 5.7321
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## SUBJECT (Intercept) 0.94837 0.9738
## MS 0.01595 0.1263 -1.00
## HH 1.89191 1.3755 0.13 -0.13
## Residual 1.84028 1.3566
## Number of obs: 1290, groups: SUBJECT, 258
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -2.751324 0.564033 -4.878
## MS 0.092114 0.178701 0.515
## HH -0.615376 0.340976 -1.805
## AGE -0.008254 0.192330 -0.043
## EMP -0.545085 0.161306 -3.379
## PREP -0.056847 0.114553 -0.496
## LNTPI 0.747633 0.069049 10.828
## DEPEND -0.122449 0.056746 -2.158
## MR 0.112551 0.007152 15.737
##
## Correlation of Fixed Effects:
## (Intr) MS HH AGE EMP PREP LNTPI DEPEND
## MS 0.169
## HH 0.023 0.278
## AGE -0.043 -0.170 -0.016
## EMP -0.119 -0.073 0.009 -0.032
## PREP -0.037 -0.047 0.002 -0.111 -0.111
## LNTPI -0.947 -0.185 -0.057 -0.044 0.101 -0.015
## DEPEND -0.072 -0.595 -0.168 0.235 -0.035 -0.036 -0.067
## MR 0.525 -0.018 0.038 0.154 -0.039 -0.051 -0.700 0.107
## convergence code: 0
## boundary (singular) fit: see ?isSingular