vignettes/tests/test-linreg-estimation-linreg.Rmd
test-linreg-estimation-linreg.Rmd
# The Linear Regression Model {#linreg-estimation-linreg-example}
In this example, we are interested in predictors of wages. The regressor variables are gender, race, union membership, education, and work experience. The regressand variable is hourly wage in US dollars.
See jeksterslabRdatarepo::wages.matrix()
for the data set used in this example.
X <- jeksterslabRdatarepo::wages.matrix[["X"]]
# age is removed
X <- X[, -ncol(X)]
y <- jeksterslabRdatarepo::wages.matrix[["y"]]
head(X)
#> constant gender race union education experience
#> [1,] 1 1 0 0 12 20
#> [2,] 1 0 0 0 9 9
#> [3,] 1 0 0 0 16 15
#> [4,] 1 0 1 1 14 38
#> [5,] 1 1 1 0 16 19
#> [6,] 1 1 0 0 12 4
head(y)
#> wages
#> [1,] 11.55
#> [2,] 5.00
#> [3,] 12.00
#> [4,] 7.00
#> [5,] 21.15
#> [6,] 6.92
jeksterslabRlinreg::linreg()
The jeksterslabRlinreg::linreg()
function fits a linear regression model using X
and y
. In this example, X
consists of a column of constants, gender
, race
, union
membership, education
, and work experience
. and y
consists of hourly wages
in US dollars.
The output includes the following:
linreg(
X = X,
y = y
)
#>
#> Model Assessment:
#> Value
#> RSS 54342.54
#> MSE 42.16
#> RMSE 6.49
#> R-squared 0.32
#> Adj. R-squared 0.32
#>
#> ANOVA Table:
#> df SS MS F p
#> Model 5 25967.28 5193.45611 122.6149 3.453144e-106
#> Error 1283 54342.54 42.35584 NA NA
#> Total 1288 80309.82 NA NA NA
#>
#> Coefficients:
#> coef se t p
#> Intercept -7.1833382 1.01578786 -7.071691 2.508276e-12
#> gender -3.0748755 0.36461621 -8.433184 8.939416e-17
#> race -1.5653133 0.50918754 -3.074139 2.155664e-03
#> union 1.0959758 0.50607809 2.165626 3.052356e-02
#> education 1.3703010 0.06590421 20.792312 5.507605e-83
#> experience 0.1666065 0.01604756 10.382050 2.659960e-24
#>
#> Standardized Coefficients:
#> Yuan and Chan 2011 standard errors are used.
#> coef se t p
#> gender -0.19477502 0.02282716 -8.532598 3.979462e-17
#> race -0.07135673 0.02317122 -3.079541 2.117236e-03
#> union 0.05077872 0.02342286 2.167913 3.034867e-02
#> education 0.48829962 0.02113537 23.103429 5.007598e-99
#> experience 0.24607631 0.02330714 10.557981 4.800438e-25
#>
#> Confidence Intervals - Regression Coefficients:
#> ci_0.05 ci_0.5 ci_2.5 ci_97.5 ci_99.5 ci_99.95
#> Intercept -10.5335348 -9.8037324 -9.1761258 -5.1905507 -4.5629441 -3.8331417
#> gender -4.2774257 -4.0154638 -3.7901849 -2.3595660 -2.1342872 -1.8723252
#> race -3.2446781 -2.8788475 -2.5642449 -0.5663817 -0.2517792 0.1140514
#> union -0.5731336 -0.2095371 0.1031443 2.0888072 2.4014886 2.7650852
#> education 1.1529406 1.2002901 1.2410091 1.4995928 1.5403119 1.5876614
#> experience 0.1136797 0.1252092 0.1351242 0.1980889 0.2080039 0.2195334
#>
#> Confidence Intervals - Standardized Slopes:
#> ci_0.05 ci_0.5 ci_2.5 ci_97.5 ci_99.5
#> gender -0.27006189 -0.253661495 -0.239557685 -0.14999235 -0.1358885
#> race -0.14777833 -0.131130752 -0.116814367 -0.02589909 -0.0115827
#> union -0.02647282 -0.009644448 0.004827412 0.09673002 0.1112019
#> education 0.41859249 0.433777402 0.446835936 0.52976331 0.5428218
#> experience 0.16920643 0.185951662 0.200352024 0.29180059 0.3062010
#> ci_99.95
#> gender -0.11948815
#> race 0.00506488
#> union 0.12803026
#> education 0.55800676
#> experience 0.32294619
#>
#> Means and Standard Deviations:
#> Mean SD
#> wages 12.3658495 7.8963503
#> gender 0.4972847 0.5001867
#> race 0.1528317 0.3599648
#> union 0.1590380 0.3658535
#> education 13.1450737 2.8138234
#> experience 18.7897595 11.6628366
linreg(
X = X,
y = y,
sehatbetahattype = "biased"
)
#>
#> Model Assessment:
#> Value
#> RSS 54342.54
#> MSE 42.16
#> RMSE 6.49
#> R-squared 0.32
#> Adj. R-squared 0.32
#>
#> ANOVA Table:
#> df SS MS F p
#> Model 5 25967.28 5193.45611 122.6149 3.453144e-106
#> Error 1283 54342.54 42.35584 NA NA
#> Total 1288 80309.82 NA NA NA
#>
#> Coefficients:
#> Biased standard errors are used.
#> coef se t p
#> Intercept -7.1833382 1.01342097 -7.088208 2.236468e-12
#> gender -3.0748755 0.36376661 -8.452880 7.620029e-17
#> race -1.5653133 0.50800108 -3.081319 2.104728e-03
#> union 1.0959758 0.50489888 2.170684 3.013798e-02
#> education 1.3703010 0.06575065 20.840873 2.582448e-83
#> experience 0.1666065 0.01601016 10.406297 2.103790e-24
#>
#> Standardized Coefficients:
#> Yuan and Chan 2011 standard errors are used.
#> coef se t p
#> gender -0.19477502 0.02282716 -8.532598 3.979462e-17
#> race -0.07135673 0.02317122 -3.079541 2.117236e-03
#> union 0.05077872 0.02342286 2.167913 3.034867e-02
#> education 0.48829962 0.02113537 23.103429 5.007598e-99
#> experience 0.24607631 0.02330714 10.557981 4.800438e-25
#>
#> Confidence Intervals - Regression Coefficients:
#> ci_0.05 ci_0.5 ci_2.5 ci_97.5 ci_99.5 ci_99.95
#> Intercept -10.5257285 -9.7976266 -9.1714824 -5.1951941 -4.5690499 -3.8409480
#> gender -4.2746237 -4.0132721 -3.7885182 -2.3612328 -2.1364788 -1.8751273
#> race -3.2407650 -2.8757868 -2.5619173 -0.5687093 -0.2548398 0.1101384
#> union -0.5692445 -0.2064951 0.1054577 2.0864938 2.3984466 2.7611960
#> education 1.1534470 1.2006862 1.2413104 1.4992916 1.5399157 1.5871549
#> experience 0.1138030 0.1253056 0.1351976 0.1980155 0.2079074 0.2194101
#>
#> Confidence Intervals - Standardized Slopes:
#> ci_0.05 ci_0.5 ci_2.5 ci_97.5 ci_99.5
#> gender -0.27006189 -0.253661495 -0.239557685 -0.14999235 -0.1358885
#> race -0.14777833 -0.131130752 -0.116814367 -0.02589909 -0.0115827
#> union -0.02647282 -0.009644448 0.004827412 0.09673002 0.1112019
#> education 0.41859249 0.433777402 0.446835936 0.52976331 0.5428218
#> experience 0.16920643 0.185951662 0.200352024 0.29180059 0.3062010
#> ci_99.95
#> gender -0.11948815
#> race 0.00506488
#> union 0.12803026
#> education 0.55800676
#> experience 0.32294619
#>
#> Means and Standard Deviations:
#> Mean SD
#> wages 12.3658495 7.8963503
#> gender 0.4972847 0.5001867
#> race 0.1528317 0.3599648
#> union 0.1590380 0.3658535
#> education 13.1450737 2.8138234
#> experience 18.7897595 11.6628366
lm()
functionThe lm()
function is the default option for fitting a linear model in R
.
lmobj <- lm(
wages ~ gender + race + union + education + experience,
data = jeksterslabRdatarepo::wages
)
summary(lmobj)
#>
#> Call:
#> lm(formula = wages ~ gender + race + union + education + experience,
#> data = jeksterslabRdatarepo::wages)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -20.781 -3.760 -1.044 2.418 50.414
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -7.18334 1.01579 -7.072 2.51e-12 ***
#> gender -3.07488 0.36462 -8.433 < 2e-16 ***
#> race -1.56531 0.50919 -3.074 0.00216 **
#> union 1.09598 0.50608 2.166 0.03052 *
#> education 1.37030 0.06590 20.792 < 2e-16 ***
#> experience 0.16661 0.01605 10.382 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 6.508 on 1283 degrees of freedom
#> Multiple R-squared: 0.3233, Adjusted R-squared: 0.3207
#> F-statistic: 122.6 on 5 and 1283 DF, p-value: < 2.2e-16
#
## `lavaan::sem()` function
#
# Linear regression in SEM
#
#
# model <- c(
# wages ~ gender + race + union + education + experience
# )
#
# Build errors with lavaan dependency
#
### Wishart Likelihood (Unbiased)
#
#
# lavobj <- lavaan::sem(
# model = model,
# data = jeksterslabRdatarepo::wages,
# likelihood = "wishart"
# )
# summary(lavobj)
#
### Normal Likelihood (Biased)
#
#
# lavobj <- lavaan::sem(
# model = model,
# data = jeksterslabRdatarepo::wages,
# likelihood = "normal"
# )
# summary(lavobj)