'data.frame':   17384 obs. of  21 variables:
 $ id           : num  7.13e+09 6.41e+09 5.63e+09 2.49e+09 1.95e+09 ...
 $ date         : Factor w/ 368 levels "20140502T000000",..: 164 219 287 219 280 11 57 250 336 302 ...
 $ price        : num  221900 538000 180000 604000 510000 ...
 $ bedrooms     : int  3 3 2 4 3 4 3 3 3 3 ...
 $ bathrooms    : num  1 2.25 1 3 2 4.5 2.25 1.5 1 2.5 ...
 $ sqft_living  : int  1180 2570 770 1960 1680 5420 1715 1060 1780 1890 ...
 $ sqft_lot     : int  5650 7242 10000 5000 8080 101930 6819 9711 7470 6560 ...
 $ floors       : num  1 2 1 1 1 1 2 1 1 2 ...
 $ waterfront   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ view         : int  0 0 0 0 0 0 0 0 0 0 ...
 $ condition    : int  3 3 3 5 3 3 3 3 3 3 ...
 $ grade        : int  7 7 6 7 8 11 7 7 7 7 ...
 $ sqft_above   : int  1180 2170 770 1050 1680 3890 1715 1060 1050 1890 ...
 $ sqft_basement: int  0 400 0 910 0 1530 0 0 730 0 ...
 $ yr_built     : int  1955 1951 1933 1965 1987 2001 1995 1963 1960 2003 ...
 $ yr_renovated : int  0 1991 0 0 0 0 0 0 0 0 ...
 $ zipcode      : Factor w/ 70 levels "98001","98002",..: 67 56 17 59 38 30 3 69 61 24 ...
 $ lat          : num  47.5 47.7 47.7 47.5 47.6 ...
 $ long         : num  -122 -122 -122 -122 -122 ...
 $ sqft_living15: int  1340 1690 2720 1360 1800 4760 2238 1650 1780 2390 ...
 $ sqft_lot15   : int  5650 7639 8062 5000 7503 101930 6819 9711 8113 7570 ...

Variable descriptions were obtained from King County, Department of Assessments. All feature engineering should be done in the first code chunks of your document.

Exploring and Wrangling

Modeling Home Prices Using Realtor Data offers a thoughtful and intelligent approach to predicting the price of homes using relevant variables regarding the home. Since we share this objective, I directly follow the steps taken by Pardoe in many places throughout this project.

Variables to explore:

  • sqft_living, sqft_above & sqft_basement
  • sqft_lot & sqft_lot15
  • yr_built

To assess the validity of any variable transformation performed, I’ll refer to the correlation coefficient r which reports the strength of the relationship between two variables. I’ll compare the r value between price and the variable before transformation to the r value between price and the variable after transformation.

sqft_living, sqft_above, sqft_basement

There seems to be a lot of redundant information stored between the sqft_living, sqft_above and sqft_basement columns. If possible, I’d like to consolidate these variables and eliminate any superfluous information.

  • Create a simple basement column that returns a 1 if sqft_basement > 0 and compare to sqft_basement
housedata$sqft_basement <- as.numeric(housedata$sqft_basement)
housedata$basement <- ifelse(housedata$sqft_basement > 0, 1, 0)

After creating the basement variable, the r values between price and sqft_basement/basement decreased from 0.3312296 to 0.1832654. Therefore, this transformation was ineffective and shall be removed.

sqft_lot & sqft_lot15

In the Modeling Homes article, Pardoe explains the common practice of realtors using “lot size ‘categories’” when pricing homes instead of the raw ft\(^2\) value. If this is truly common practice, then recoding the data from raw ft\(^2\) to lot size categories should improve the relationship between price and land size.

  • Create lotsize and lotsize15 from sqft_lot and sqft_lot15
housedata <- housedata %>% 
  mutate(lotsize = cut(sqft_lot, breaks=c(-Inf, 3000, 5000, 7000, 10000, 15000, 20000, 43560, 130680, 217800, 435600, Inf), labels=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)))
housedata$lotsize <- as.numeric(housedata$lotsize) 

# Lotsize15
housedata <- housedata %>% 
  mutate(lotsize15 = cut(sqft_lot15, breaks=c(-Inf, 3000, 5000, 7000, 10000, 15000, 20000, 43560, 130680, 217800, 435600, Inf), labels=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)))
housedata$lotsize15 <- as.numeric(housedata$lotsize15) 

Creating the lotsize and lotsize15 variables increased the \(r\) value from 0.0882381 and 0.0808064 to 0.1662823 and 0.1527959 respectively. Remove sqft_lot and sqft_lot15 to reduce redundancy.

yr_built & age

mean(housedata$yr_built)
[1] 1971.153
housedata <- housedata %>% 
  mutate(age = (1971 - as.numeric(yr_built))/10) # Decades since 1971 (mean yr_built)

Here, I created an age variable which indicates the age of the home in decades since 1971 (mean yr_built). The r value changed from 0.0525221 to -0.0525221. While the creation of the age variable didn’t increase the r value, it made the data in the yr_built column available for linear regression. Remove yr_built.

Final dataframe

Verify updated dataframe

Analysis of Variance Table

Model 1: price ~ bedrooms + bathrooms + sqft_living + sqft_lot + floors + 
    waterfront + view + condition + grade + sqft_above + sqft_basement + 
    yr_built + yr_renovated + zipcode + lat + long + sqft_living15 + 
    sqft_lot15
Model 2: price ~ bedrooms + bathrooms + sqft_living + floors + waterfront + 
    view + condition + grade + sqft_above + sqft_basement + yr_renovated + 
    zipcode + lat + long + sqft_living15 + lotsize + lotsize15 + 
    age
  Res.Df        RSS Df  Sum of Sq      F    Pr(>F)    
1  17298 4.5804e+14                                   
2  17292 4.5376e+14  6 4.2815e+12 27.193 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

After performing an anova on full models of the previous dataframe and the updated version, we find that the updated dataframe has a lower \(RSS\) and therefore a higher prediction accuracy. The transformations performed above have improved the relationship between price and every other variable. This will provide greater potential for a robust model.

Repeat all steps to form test dataframe.

Subset Selection

Here, I will use a method known as subset selection to generate a model to predict price from other variables, and then improve upon it by performing nonlinear transformations and introducing interaction terms.

Backwards Elimination

Forward Selection

The set of plots for both methods of subset selection indicates that the same model would contain the highest \(R^2\) value and the lowest \(C_p\) and BIC values.

The model selected by both forward selection and backward elimination contains 20 predictors and has an \(R^2_{adj}\) value of 0.6873832. This means that roughly 68.7% of the variability in price can be explained by the model.

Strengthening the model above

Perform nonlinear transformations

I will begin by referring to the residualPlots() output to determine which predictors would benefit from a nonlinear transformation.

The residualPlots() output indicates that the sqft_living, grade and sqft_basement all have a nonlinear relationship with price. I will accommodate for this in the model.


Call:
lm(formula = price ~ bedrooms + sqft_living + I(waterfront == 
    "Yes") + I(view == "good") + I(view == "very good") + grade + 
    I(zipcode == 98005) + I(zipcode == 98007) + I(zipcode == 
    98034) + I(zipcode == 98040) + I(zipcode == 98042) + I(zipcode == 
    98103) + I(zipcode == 98106) + I(zipcode == 98112) + I(zipcode == 
    98115) + I(zipcode == 98116) + I(zipcode == 98122) + lat + 
    long + sqft_basement + I(sqft_living^2) + I(grade^2) + I(sqft_basement^2), 
    data = housedata)

Residuals:
     Min       1Q   Median       3Q      Max 
-3790778   -89063   -10418    64356  2829843 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                -4.566e+07  1.500e+06 -30.429  < 2e-16 ***
bedrooms                   -1.442e+03  2.116e+03  -0.682  0.49546    
sqft_living                 9.209e+00  6.544e+00   1.407  0.15938    
I(waterfront == "Yes")TRUE  5.583e+05  2.073e+04  26.925  < 2e-16 ***
I(view == "good")TRUE       1.322e+05  1.001e+04  13.202  < 2e-16 ***
I(view == "very good")TRUE  2.619e+05  1.514e+04  17.297  < 2e-16 ***
grade                      -2.544e+05  1.241e+04 -20.503  < 2e-16 ***
I(zipcode == 98005)TRUE     1.011e+05  1.686e+04   5.999 2.02e-09 ***
I(zipcode == 98007)TRUE     5.661e+04  1.782e+04   3.177  0.00149 ** 
I(zipcode == 98034)TRUE    -6.294e+04  9.403e+03  -6.693 2.25e-11 ***
I(zipcode == 98040)TRUE     3.279e+05  1.359e+04  24.127  < 2e-16 ***
I(zipcode == 98042)TRUE    -2.163e+04  9.617e+03  -2.249  0.02451 *  
I(zipcode == 98103)TRUE     6.249e+04  9.242e+03   6.761 1.41e-11 ***
I(zipcode == 98106)TRUE    -7.189e+04  1.208e+04  -5.949 2.76e-09 ***
I(zipcode == 98112)TRUE     3.553e+05  1.326e+04  26.798  < 2e-16 ***
I(zipcode == 98115)TRUE     7.713e+04  9.120e+03   8.458  < 2e-16 ***
I(zipcode == 98116)TRUE     8.262e+04  1.211e+04   6.823 9.18e-12 ***
I(zipcode == 98122)TRUE     1.081e+05  1.292e+04   8.370  < 2e-16 ***
lat                         6.007e+05  1.149e+04  52.280  < 2e-16 ***
long                       -1.480e+05  1.175e+04 -12.595  < 2e-16 ***
sqft_basement               7.149e+01  8.256e+00   8.659  < 2e-16 ***
I(sqft_living^2)            2.794e-02  1.012e-03  27.607  < 2e-16 ***
I(grade^2)                  2.156e+04  7.748e+02  27.833  < 2e-16 ***
I(sqft_basement^2)         -4.746e-02  5.938e-03  -7.994 1.39e-15 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 191300 on 17360 degrees of freedom
Multiple R-squared:  0.7326,    Adjusted R-squared:  0.7323 
F-statistic:  2068 on 23 and 17360 DF,  p-value: < 2.2e-16

Introduce interaction terms

Here, I add interactions to the model to improve accuracy. One could infer that there is a relationship between bedrooms and sqft_living, and additionally between sqft_basement and sqft_living.


Call:
lm(formula = price ~ bedrooms + sqft_living + I(waterfront == 
    "Yes") + I(view == "good") + I(view == "very good") + grade + 
    I(zipcode == 98005) + I(zipcode == 98007) + I(zipcode == 
    98034) + I(zipcode == 98040) + I(zipcode == 98042) + I(zipcode == 
    98103) + I(zipcode == 98106) + I(zipcode == 98112) + I(zipcode == 
    98115) + I(zipcode == 98116) + I(zipcode == 98122) + lat + 
    long + sqft_basement + I(sqft_living^2) + I(grade^2) + I(sqft_basement^2) + 
    bedrooms:sqft_living + sqft_living:sqft_basement + sqft_basement:I(sqft_living^2) + 
    bedrooms:I(sqft_living^2) + I(sqft_living^2):I(sqft_basement^2), 
    data = housedata)

Residuals:
     Min       1Q   Median       3Q      Max 
-2364730   -88079   -10422    63804  2309151 

Coefficients:
                                      Estimate Std. Error t value Pr(>|t|)
(Intercept)                         -4.637e+07  1.490e+06 -31.122  < 2e-16
bedrooms                            -3.079e+02  7.330e+03  -0.042  0.96649
sqft_living                         -5.327e+01  1.918e+01  -2.778  0.00547
I(waterfront == "Yes")TRUE           5.430e+05  2.057e+04  26.396  < 2e-16
I(view == "good")TRUE                1.247e+05  9.932e+03  12.556  < 2e-16
I(view == "very good")TRUE           2.617e+05  1.500e+04  17.442  < 2e-16
grade                               -2.411e+05  1.326e+04 -18.180  < 2e-16
I(zipcode == 98005)TRUE              1.044e+05  1.668e+04   6.260 3.95e-10
I(zipcode == 98007)TRUE              5.441e+04  1.763e+04   3.087  0.00203
I(zipcode == 98034)TRUE             -6.555e+04  9.314e+03  -7.037 2.04e-12
I(zipcode == 98040)TRUE              3.313e+05  1.347e+04  24.596  < 2e-16
I(zipcode == 98042)TRUE             -2.453e+04  9.516e+03  -2.578  0.00995
I(zipcode == 98103)TRUE              6.231e+04  9.149e+03   6.811 1.00e-11
I(zipcode == 98106)TRUE             -7.010e+04  1.196e+04  -5.860 4.72e-09
I(zipcode == 98112)TRUE              3.521e+05  1.315e+04  26.775  < 2e-16
I(zipcode == 98115)TRUE              7.874e+04  9.024e+03   8.725  < 2e-16
I(zipcode == 98116)TRUE              8.517e+04  1.198e+04   7.108 1.22e-12
I(zipcode == 98122)TRUE              1.099e+05  1.279e+04   8.593  < 2e-16
lat                                  6.085e+05  1.140e+04  53.353  < 2e-16
long                                -1.506e+05  1.165e+04 -12.926  < 2e-16
sqft_basement                        6.056e+01  1.911e+01   3.168  0.00154
I(sqft_living^2)                     4.889e-02  3.715e-03  13.158  < 2e-16
I(grade^2)                           2.064e+04  8.271e+02  24.950  < 2e-16
I(sqft_basement^2)                  -4.868e-02  1.242e-02  -3.921 8.87e-05
bedrooms:sqft_living                 1.626e+01  5.186e+00   3.136  0.00171
sqft_living:sqft_basement           -3.934e-02  1.514e-02  -2.598  0.00938
sqft_basement:I(sqft_living^2)       1.807e-05  2.334e-06   7.744 1.01e-14
bedrooms:I(sqft_living^2)           -6.082e-03  8.789e-04  -6.920 4.68e-12
I(sqft_living^2):I(sqft_basement^2) -3.063e-09  3.177e-10  -9.641  < 2e-16
                                       
(Intercept)                         ***
bedrooms                               
sqft_living                         ** 
I(waterfront == "Yes")TRUE          ***
I(view == "good")TRUE               ***
I(view == "very good")TRUE          ***
grade                               ***
I(zipcode == 98005)TRUE             ***
I(zipcode == 98007)TRUE             ** 
I(zipcode == 98034)TRUE             ***
I(zipcode == 98040)TRUE             ***
I(zipcode == 98042)TRUE             ** 
I(zipcode == 98103)TRUE             ***
I(zipcode == 98106)TRUE             ***
I(zipcode == 98112)TRUE             ***
I(zipcode == 98115)TRUE             ***
I(zipcode == 98116)TRUE             ***
I(zipcode == 98122)TRUE             ***
lat                                 ***
long                                ***
sqft_basement                       ** 
I(sqft_living^2)                    ***
I(grade^2)                          ***
I(sqft_basement^2)                  ***
bedrooms:sqft_living                ** 
sqft_living:sqft_basement           ** 
sqft_basement:I(sqft_living^2)      ***
bedrooms:I(sqft_living^2)           ***
I(sqft_living^2):I(sqft_basement^2) ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 189200 on 17355 degrees of freedom
Multiple R-squared:  0.7384,    Adjusted R-squared:  0.738 
F-statistic:  1750 on 28 and 17355 DF,  p-value: < 2.2e-16

After performing nonlinear transformations and adding interaction terms, the \(R^2_{adj}\) for the model increases from 0.6873832 to 0.738011. This means that updated model is capable of explaining 73.8% of the variability in price. I will consider this the final model produced by the subset selection method.

Self-Selection

In the following sections, I begin with a full model that regresses price on to every available predictor. I will then proceed to strengthen the model by again performing nonlinear transformations and introducing interaction terms.

Full model


Call:
lm(formula = price ~ ., data = housedata)

Residuals:
     Min       1Q   Median       3Q      Max 
-1237318   -70270     -251    62652  4353961 

Coefficients: (1 not defined because of singularities)
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)        -3.198e+07  6.930e+06  -4.615 3.97e-06 ***
bedrooms           -3.083e+04  1.797e+03 -17.158  < 2e-16 ***
bathrooms           2.497e+04  2.967e+03   8.414  < 2e-16 ***
sqft_living         1.312e+02  4.000e+00  32.797  < 2e-16 ***
floors             -4.098e+04  3.646e+03 -11.239  < 2e-16 ***
waterfrontYes       5.836e+05  1.791e+04  32.585  < 2e-16 ***
viewfair            7.118e+04  1.037e+04   6.862 7.03e-12 ***
viewaverage         6.866e+04  6.174e+03  11.121  < 2e-16 ***
viewgood            1.413e+05  8.707e+03  16.226  < 2e-16 ***
viewvery good       3.165e+05  1.303e+04  24.292  < 2e-16 ***
conditionfair       6.766e+04  3.606e+04   1.876 0.060658 .  
conditionaverage    5.029e+04  3.339e+04   1.506 0.132012    
conditiongood       6.723e+04  3.339e+04   2.013 0.044084 *  
conditionvery good  1.110e+05  3.361e+04   3.304 0.000955 ***
grade               5.957e+04  2.060e+03  28.918  < 2e-16 ***
sqft_above          7.802e+01  4.106e+00  19.000  < 2e-16 ***
sqft_basement              NA         NA      NA       NA    
yr_renovated        1.458e+01  3.347e+00   4.357 1.33e-05 ***
zipcode98002        4.485e+04  1.638e+04   2.738 0.006189 ** 
zipcode98003       -1.868e+04  1.447e+04  -1.291 0.196572    
zipcode98004        7.300e+05  2.646e+04  27.586  < 2e-16 ***
zipcode98005        2.357e+05  2.830e+04   8.329  < 2e-16 ***
zipcode98006        2.314e+05  2.305e+04  10.039  < 2e-16 ***
zipcode98007        1.976e+05  2.900e+04   6.813 9.86e-12 ***
zipcode98008        2.170e+05  2.768e+04   7.840 4.78e-15 ***
zipcode98010        9.536e+04  2.456e+04   3.883 0.000104 ***
zipcode98011        3.974e+04  3.618e+04   1.098 0.272024    
zipcode98014        7.708e+04  3.992e+04   1.931 0.053535 .  
zipcode98019        4.913e+04  3.891e+04   1.262 0.206811    
zipcode98022        6.485e+04  2.164e+04   2.997 0.002734 ** 
zipcode98023       -5.325e+04  1.336e+04  -3.985 6.79e-05 ***
zipcode98024        1.648e+05  3.415e+04   4.827 1.40e-06 ***
zipcode98027        1.611e+05  2.375e+04   6.784 1.21e-11 ***
zipcode98028        2.312e+04  3.509e+04   0.659 0.510003    
zipcode98029        2.108e+05  2.713e+04   7.771 8.23e-15 ***
zipcode98030        6.762e+03  1.581e+04   0.428 0.668847    
zipcode98031        1.122e+04  1.658e+04   0.677 0.498691    
zipcode98032       -1.109e+04  1.932e+04  -0.574 0.565817    
zipcode98033        3.007e+05  3.005e+04  10.005  < 2e-16 ***
zipcode98034        1.267e+05  3.221e+04   3.934 8.38e-05 ***
zipcode98038        6.048e+04  1.789e+04   3.381 0.000723 ***
zipcode98039        1.161e+06  3.498e+04  33.201  < 2e-16 ***
zipcode98040        4.718e+05  2.367e+04  19.930  < 2e-16 ***
zipcode98042        2.066e+04  1.530e+04   1.350 0.176878    
zipcode98045        1.443e+05  3.303e+04   4.369 1.26e-05 ***
zipcode98052        1.729e+05  3.062e+04   5.645 1.68e-08 ***
zipcode98053        1.459e+05  3.271e+04   4.462 8.18e-06 ***
zipcode98055        2.665e+04  1.855e+04   1.437 0.150864    
zipcode98056        7.036e+04  2.016e+04   3.491 0.000483 ***
zipcode98058        1.984e+04  1.746e+04   1.136 0.255829    
zipcode98059        6.375e+04  1.980e+04   3.220 0.001283 ** 
zipcode98065        1.054e+05  3.043e+04   3.463 0.000535 ***
zipcode98070       -6.956e+04  2.346e+04  -2.965 0.003032 ** 
zipcode98072        6.751e+04  3.578e+04   1.887 0.059215 .  
zipcode98074        1.341e+05  2.893e+04   4.636 3.57e-06 ***
zipcode98075        1.402e+05  2.773e+04   5.056 4.32e-07 ***
zipcode98077        3.632e+04  3.738e+04   0.972 0.331297    
zipcode98092       -2.142e+04  1.428e+04  -1.500 0.133681    
zipcode98102        4.889e+05  3.168e+04  15.431  < 2e-16 ***
zipcode98103        2.544e+05  2.923e+04   8.705  < 2e-16 ***
zipcode98105        4.048e+05  2.996e+04  13.513  < 2e-16 ***
zipcode98106        9.428e+04  2.161e+04   4.362 1.30e-05 ***
zipcode98107        2.721e+05  2.999e+04   9.073  < 2e-16 ***
zipcode98108        8.092e+04  2.414e+04   3.352 0.000805 ***
zipcode98109        4.238e+05  3.091e+04  13.708  < 2e-16 ***
zipcode98112        5.419e+05  2.749e+04  19.711  < 2e-16 ***
zipcode98115        2.517e+05  2.959e+04   8.505  < 2e-16 ***
zipcode98116        2.206e+05  2.409e+04   9.158  < 2e-16 ***
zipcode98117        2.336e+05  3.000e+04   7.786 7.28e-15 ***
zipcode98118        1.356e+05  2.104e+04   6.445 1.19e-10 ***
zipcode98119        4.118e+05  2.917e+04  14.120  < 2e-16 ***
zipcode98122        2.774e+05  2.619e+04  10.595  < 2e-16 ***
zipcode98125        1.139e+05  3.199e+04   3.559 0.000373 ***
zipcode98126        1.460e+05  2.231e+04   6.547 6.05e-11 ***
zipcode98133        6.706e+04  3.298e+04   2.033 0.042051 *  
zipcode98136        1.882e+05  2.267e+04   8.299  < 2e-16 ***
zipcode98144        2.404e+05  2.427e+04   9.903  < 2e-16 ***
zipcode98146        4.713e+04  2.021e+04   2.332 0.019702 *  
zipcode98148        3.965e+04  2.841e+04   1.395 0.162915    
zipcode98155        4.995e+04  3.439e+04   1.452 0.146441    
zipcode98166        9.326e+03  1.863e+04   0.501 0.616676    
zipcode98168        3.613e+04  1.964e+04   1.839 0.065861 .  
zipcode98177        1.059e+05  3.438e+04   3.081 0.002066 ** 
zipcode98178        1.116e+04  2.007e+04   0.556 0.578218    
zipcode98188        2.108e+03  2.047e+04   0.103 0.917964    
zipcode98198       -2.470e+04  1.563e+04  -1.581 0.113960    
zipcode98199        2.913e+05  2.847e+04  10.232  < 2e-16 ***
lat                 2.176e+05  7.147e+04   3.044 0.002336 ** 
long               -1.731e+05  5.105e+04  -3.390 0.000701 ***
sqft_living15       6.986e+00  3.269e+00   2.137 0.032636 *  
lotsize             9.441e+03  1.656e+03   5.702 1.21e-08 ***
lotsize15          -1.917e+03  1.834e+03  -1.045 0.295975    
age                 7.282e+03  7.492e+02   9.720  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 162000 on 17292 degrees of freedom
Multiple R-squared:  0.809, Adjusted R-squared:  0.808 
F-statistic: 804.9 on 91 and 17292 DF,  p-value: < 2.2e-16

Perform nonlinear transformations

Again, I begin by referring to the residualPlots() output to determine which predictors would benefit from a nonlinear transformation.

The output of residualPlots() reveals that a nonlinear relationship exists between price and bathrooms, sqft_living, grade, sqft_above, sqft_basement, yr_renovated, and sqft_living15.


Call:
lm(formula = price ~ bedrooms + bathrooms + sqft_living + floors + 
    waterfront + view + condition + grade + sqft_above + sqft_basement + 
    yr_renovated + zipcode + lat + long + sqft_living15 + lotsize + 
    lotsize15 + age + I(bathrooms^2) + I(sqft_living^2) + I(grade^2) + 
    I(sqft_above^2) + I(sqft_basement^2) + I(yr_renovated^2), 
    data = housedata)

Residuals:
     Min       1Q   Median       3Q      Max 
-3463192   -57252     2735    54593  2405534 

Coefficients: (1 not defined because of singularities)
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)        -4.033e+07  6.299e+06  -6.402 1.57e-10 ***
bedrooms           -5.166e+03  1.694e+03  -3.049 0.002300 ** 
bathrooms           1.757e+04  7.731e+03   2.273 0.023042 *  
sqft_living        -1.745e+01  7.519e+00  -2.321 0.020297 *  
floors             -2.908e+04  3.414e+03  -8.516  < 2e-16 ***
waterfrontYes       5.908e+05  1.630e+04  36.248  < 2e-16 ***
viewfair            7.641e+04  9.434e+03   8.099 5.89e-16 ***
viewaverage         6.422e+04  5.617e+03  11.432  < 2e-16 ***
viewgood            1.267e+05  7.933e+03  15.968  < 2e-16 ***
viewvery good       2.668e+05  1.191e+04  22.394  < 2e-16 ***
conditionfair       1.328e+05  3.283e+04   4.044 5.27e-05 ***
conditionaverage    1.520e+05  3.046e+04   4.991 6.08e-07 ***
conditiongood       1.797e+05  3.047e+04   5.896 3.79e-09 ***
conditionvery good  2.320e+05  3.068e+04   7.564 4.12e-14 ***
grade              -2.433e+05  1.045e+04 -23.282  < 2e-16 ***
sqft_above          8.348e+01  8.931e+00   9.348  < 2e-16 ***
sqft_basement              NA         NA      NA       NA    
yr_renovated       -2.654e+03  3.508e+02  -7.566 4.04e-14 ***
zipcode98002        2.410e+04  1.489e+04   1.619 0.105552    
zipcode98003       -1.535e+04  1.315e+04  -1.167 0.243110    
zipcode98004        7.067e+05  2.405e+04  29.385  < 2e-16 ***
zipcode98005        2.562e+05  2.572e+04   9.962  < 2e-16 ***
zipcode98006        2.201e+05  2.095e+04  10.509  < 2e-16 ***
zipcode98007        2.158e+05  2.635e+04   8.189 2.82e-16 ***
zipcode98008        2.381e+05  2.516e+04   9.462  < 2e-16 ***
zipcode98010        1.130e+05  2.232e+04   5.063 4.16e-07 ***
zipcode98011        6.533e+04  3.288e+04   1.987 0.046909 *  
zipcode98014        1.086e+05  3.628e+04   2.993 0.002762 ** 
zipcode98019        8.809e+04  3.537e+04   2.491 0.012753 *  
zipcode98022        7.876e+04  1.967e+04   4.004 6.26e-05 ***
zipcode98023       -5.852e+04  1.215e+04  -4.818 1.47e-06 ***
zipcode98024        1.823e+05  3.103e+04   5.875 4.30e-09 ***
zipcode98027        1.708e+05  2.158e+04   7.913 2.65e-15 ***
zipcode98028        4.529e+04  3.188e+04   1.421 0.155448    
zipcode98029        2.423e+05  2.466e+04   9.824  < 2e-16 ***
zipcode98030        1.805e+04  1.437e+04   1.257 0.208900    
zipcode98031        1.915e+04  1.507e+04   1.271 0.203797    
zipcode98032       -2.018e+04  1.756e+04  -1.149 0.250504    
zipcode98033        3.021e+05  2.731e+04  11.065  < 2e-16 ***
zipcode98034        1.331e+05  2.927e+04   4.546 5.50e-06 ***
zipcode98038        9.082e+04  1.627e+04   5.583 2.40e-08 ***
zipcode98039        1.095e+06  3.181e+04  34.416  < 2e-16 ***
zipcode98040        4.508e+05  2.152e+04  20.949  < 2e-16 ***
zipcode98042        2.966e+04  1.391e+04   2.133 0.032935 *  
zipcode98045        1.826e+05  3.003e+04   6.081 1.22e-09 ***
zipcode98052        2.019e+05  2.783e+04   7.256 4.16e-13 ***
zipcode98053        1.836e+05  2.973e+04   6.173 6.84e-10 ***
zipcode98055        2.776e+04  1.686e+04   1.647 0.099587 .  
zipcode98056        6.707e+04  1.831e+04   3.662 0.000251 ***
zipcode98058        3.351e+04  1.587e+04   2.111 0.034786 *  
zipcode98059        7.674e+04  1.799e+04   4.266 2.00e-05 ***
zipcode98065        1.515e+05  2.767e+04   5.476 4.42e-08 ***
zipcode98070       -7.738e+04  2.132e+04  -3.629 0.000285 ***
zipcode98072        9.208e+04  3.252e+04   2.832 0.004632 ** 
zipcode98074        1.621e+05  2.629e+04   6.166 7.18e-10 ***
zipcode98075        1.684e+05  2.520e+04   6.681 2.44e-11 ***
zipcode98077        5.568e+04  3.397e+04   1.639 0.101172    
zipcode98092        1.860e+02  1.299e+04   0.014 0.988575    
zipcode98102        4.415e+05  2.881e+04  15.322  < 2e-16 ***
zipcode98103        2.514e+05  2.657e+04   9.461  < 2e-16 ***
zipcode98105        4.029e+05  2.723e+04  14.797  < 2e-16 ***
zipcode98106        6.038e+04  1.967e+04   3.070 0.002146 ** 
zipcode98107        2.590e+05  2.728e+04   9.496  < 2e-16 ***
zipcode98108        7.319e+04  2.195e+04   3.334 0.000858 ***
zipcode98109        4.241e+05  2.810e+04  15.092  < 2e-16 ***
zipcode98112        5.314e+05  2.500e+04  21.257  < 2e-16 ***
zipcode98115        2.607e+05  2.690e+04   9.690  < 2e-16 ***
zipcode98116        2.175e+05  2.191e+04   9.926  < 2e-16 ***
zipcode98117        2.275e+05  2.728e+04   8.341  < 2e-16 ***
zipcode98118        1.226e+05  1.913e+04   6.410 1.49e-10 ***
zipcode98119        4.128e+05  2.652e+04  15.564  < 2e-16 ***
zipcode98122        2.809e+05  2.381e+04  11.796  < 2e-16 ***
zipcode98125        1.149e+05  2.907e+04   3.953 7.75e-05 ***
zipcode98126        1.264e+05  2.029e+04   6.230 4.76e-10 ***
zipcode98133        5.941e+04  2.997e+04   1.982 0.047481 *  
zipcode98136        1.818e+05  2.063e+04   8.813  < 2e-16 ***
zipcode98144        2.309e+05  2.207e+04  10.461  < 2e-16 ***
zipcode98146        2.983e+04  1.837e+04   1.623 0.104502    
zipcode98148        3.379e+04  2.582e+04   1.309 0.190655    
zipcode98155        4.519e+04  3.125e+04   1.446 0.148195    
zipcode98166        3.065e+03  1.693e+04   0.181 0.856373    
zipcode98168        1.476e+03  1.787e+04   0.083 0.934142    
zipcode98177        1.101e+05  3.124e+04   3.525 0.000425 ***
zipcode98178        2.667e+03  1.824e+04   0.146 0.883794    
zipcode98188       -2.562e+03  1.860e+04  -0.138 0.890427    
zipcode98198       -3.098e+04  1.420e+04  -2.181 0.029168 *  
zipcode98199        2.832e+05  2.588e+04  10.943  < 2e-16 ***
lat                 2.098e+05  6.493e+04   3.231 0.001236 ** 
long               -2.537e+05  4.642e+04  -5.466 4.66e-08 ***
sqft_living15       2.223e+01  3.012e+00   7.379 1.66e-13 ***
lotsize             1.104e+04  1.505e+03   7.333 2.35e-13 ***
lotsize15          -3.926e+03  1.668e+03  -2.354 0.018569 *  
age                 2.289e+03  6.938e+02   3.300 0.000970 ***
I(bathrooms^2)      1.548e+03  1.524e+03   1.016 0.309825    
I(sqft_living^2)    3.997e-02  1.565e-03  25.541  < 2e-16 ***
I(grade^2)          1.928e+04  6.427e+02  29.998  < 2e-16 ***
I(sqft_above^2)    -2.591e-02  2.027e-03 -12.781  < 2e-16 ***
I(sqft_basement^2) -7.125e-02  5.770e-03 -12.349  < 2e-16 ***
I(yr_renovated^2)   1.343e+00  1.758e-01   7.643 2.23e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 147200 on 17286 degrees of freedom
Multiple R-squared:  0.8424,    Adjusted R-squared:  0.8415 
F-statistic: 952.5 on 97 and 17286 DF,  p-value: < 2.2e-16

Introduce interaction terms

Here, I introduce the same interaction terms as before.


Call:
lm(formula = price ~ bedrooms + bathrooms + sqft_living + floors + 
    waterfront + view + condition + grade + sqft_above + sqft_basement + 
    yr_renovated + zipcode + lat + long + sqft_living15 + lotsize + 
    lotsize15 + age + I(bathrooms^2) + I(sqft_living^2) + I(grade^2) + 
    I(sqft_above^2) + I(sqft_basement^2) + I(yr_renovated^2) + 
    bedrooms:sqft_living + sqft_basement:I(sqft_living^2) + bedrooms:I(sqft_living^2) + 
    I(sqft_living^2):I(sqft_basement^2), data = housedata)

Residuals:
     Min       1Q   Median       3Q      Max 
-2208286   -57262     1990    53552  1967109 

Coefficients: (1 not defined because of singularities)
                                      Estimate Std. Error t value Pr(>|t|)
(Intercept)                         -3.760e+07  6.212e+06  -6.054 1.44e-09
bedrooms                             1.572e+04  5.771e+03   2.725 0.006444
bathrooms                           -4.866e+03  7.810e+03  -0.623 0.533237
sqft_living                          2.435e+00  1.609e+01   0.151 0.879715
floors                              -2.795e+04  3.370e+03  -8.294  < 2e-16
waterfrontYes                        5.710e+05  1.612e+04  35.420  < 2e-16
viewfair                             7.873e+04  9.307e+03   8.459  < 2e-16
viewaverage                          6.430e+04  5.544e+03  11.598  < 2e-16
viewgood                             1.241e+05  7.837e+03  15.838  < 2e-16
viewvery good                        2.722e+05  1.176e+04  23.147  < 2e-16
conditionfair                        1.199e+05  3.239e+04   3.703 0.000214
conditionaverage                     1.373e+05  3.006e+04   4.568 4.95e-06
conditiongood                        1.650e+05  3.008e+04   5.487 4.15e-08
conditionvery good                   2.169e+05  3.028e+04   7.162 8.26e-13
grade                               -2.140e+05  1.080e+04 -19.814  < 2e-16
sqft_above                           5.937e+00  1.524e+01   0.390 0.696815
sqft_basement                               NA         NA      NA       NA
yr_renovated                        -2.609e+03  3.460e+02  -7.540 4.94e-14
zipcode98002                         2.357e+04  1.468e+04   1.605 0.108509
zipcode98003                        -1.277e+04  1.296e+04  -0.985 0.324797
zipcode98004                         7.062e+05  2.372e+04  29.770  < 2e-16
zipcode98005                         2.618e+05  2.536e+04  10.322  < 2e-16
zipcode98006                         2.195e+05  2.066e+04  10.623  < 2e-16
zipcode98007                         2.138e+05  2.599e+04   8.227  < 2e-16
zipcode98008                         2.323e+05  2.481e+04   9.360  < 2e-16
zipcode98010                         1.117e+05  2.200e+04   5.077 3.88e-07
zipcode98011                         6.932e+04  3.242e+04   2.138 0.032513
zipcode98014                         1.028e+05  3.577e+04   2.874 0.004059
zipcode98019                         8.595e+04  3.487e+04   2.464 0.013730
zipcode98022                         7.531e+04  1.939e+04   3.883 0.000104
zipcode98023                        -5.482e+04  1.198e+04  -4.576 4.76e-06
zipcode98024                         1.781e+05  3.060e+04   5.822 5.90e-09
zipcode98027                         1.709e+05  2.128e+04   8.031 1.03e-15
zipcode98028                         4.857e+04  3.144e+04   1.545 0.122421
zipcode98029                         2.376e+05  2.432e+04   9.772  < 2e-16
zipcode98030                         1.675e+04  1.416e+04   1.183 0.236889
zipcode98031                         1.795e+04  1.485e+04   1.208 0.226991
zipcode98032                        -2.174e+04  1.731e+04  -1.256 0.209251
zipcode98033                         3.028e+05  2.693e+04  11.245  < 2e-16
zipcode98034                         1.342e+05  2.886e+04   4.651 3.33e-06
zipcode98038                         8.685e+04  1.604e+04   5.414 6.23e-08
zipcode98039                         1.078e+06  3.139e+04  34.348  < 2e-16
zipcode98040                         4.617e+05  2.124e+04  21.737  < 2e-16
zipcode98042                         2.614e+04  1.371e+04   1.906 0.056681
zipcode98045                         1.723e+05  2.961e+04   5.819 6.03e-09
zipcode98052                         2.037e+05  2.744e+04   7.422 1.20e-13
zipcode98053                         1.862e+05  2.932e+04   6.349 2.22e-10
zipcode98055                         2.808e+04  1.662e+04   1.689 0.091182
zipcode98056                         6.770e+04  1.806e+04   3.749 0.000178
zipcode98058                         3.188e+04  1.565e+04   2.037 0.041647
zipcode98059                         7.649e+04  1.774e+04   4.312 1.63e-05
zipcode98065                         1.454e+05  2.728e+04   5.330 9.96e-08
zipcode98070                        -6.329e+04  2.104e+04  -3.008 0.002632
zipcode98072                         9.482e+04  3.206e+04   2.957 0.003108
zipcode98074                         1.619e+05  2.592e+04   6.245 4.34e-10
zipcode98075                         1.691e+05  2.485e+04   6.804 1.05e-11
zipcode98077                         5.323e+04  3.349e+04   1.589 0.111997
zipcode98092                        -1.935e+03  1.281e+04  -0.151 0.879911
zipcode98102                         4.509e+05  2.842e+04  15.865  < 2e-16
zipcode98103                         2.572e+05  2.622e+04   9.812  < 2e-16
zipcode98105                         4.118e+05  2.687e+04  15.325  < 2e-16
zipcode98106                         6.418e+04  1.941e+04   3.307 0.000944
zipcode98107                         2.645e+05  2.691e+04   9.828  < 2e-16
zipcode98108                         7.618e+04  2.165e+04   3.519 0.000435
zipcode98109                         4.334e+05  2.772e+04  15.636  < 2e-16
zipcode98112                         5.404e+05  2.466e+04  21.914  < 2e-16
zipcode98115                         2.663e+05  2.654e+04  10.036  < 2e-16
zipcode98116                         2.224e+05  2.161e+04  10.288  < 2e-16
zipcode98117                         2.353e+05  2.691e+04   8.742  < 2e-16
zipcode98118                         1.264e+05  1.887e+04   6.698 2.17e-11
zipcode98119                         4.233e+05  2.616e+04  16.178  < 2e-16
zipcode98122                         2.869e+05  2.350e+04  12.210  < 2e-16
zipcode98125                         1.199e+05  2.867e+04   4.181 2.92e-05
zipcode98126                         1.329e+05  2.003e+04   6.637 3.29e-11
zipcode98133                         6.449e+04  2.956e+04   2.181 0.029164
zipcode98136                         1.898e+05  2.036e+04   9.324  < 2e-16
zipcode98144                         2.343e+05  2.178e+04  10.756  < 2e-16
zipcode98146                         3.451e+04  1.812e+04   1.905 0.056861
zipcode98148                         3.628e+04  2.546e+04   1.425 0.154093
zipcode98155                         4.928e+04  3.082e+04   1.599 0.109905
zipcode98166                         8.622e+03  1.670e+04   0.516 0.605640
zipcode98168                         4.977e+03  1.762e+04   0.282 0.777631
zipcode98177                         1.130e+05  3.081e+04   3.666 0.000247
zipcode98178                         3.796e+03  1.799e+04   0.211 0.832902
zipcode98188                        -1.997e+03  1.834e+04  -0.109 0.913291
zipcode98198                        -2.679e+04  1.401e+04  -1.913 0.055762
zipcode98199                         2.930e+05  2.553e+04  11.476  < 2e-16
lat                                  2.063e+05  6.402e+04   3.222 0.001276
long                                -2.320e+05  4.578e+04  -5.068 4.07e-07
sqft_living15                        2.497e+01  2.980e+00   8.378  < 2e-16
lotsize                              1.097e+04  1.485e+03   7.391 1.53e-13
lotsize15                           -4.430e+03  1.645e+03  -2.693 0.007081
age                                  2.566e+03  6.849e+02   3.746 0.000180
I(bathrooms^2)                       7.340e+03  1.549e+03   4.740 2.15e-06
I(sqft_living^2)                     1.070e-02  5.813e-03   1.841 0.065614
I(grade^2)                           1.737e+04  6.652e+02  26.105  < 2e-16
I(sqft_above^2)                      2.955e-02  5.908e-03   5.001 5.76e-07
I(sqft_basement^2)                  -4.307e-02  6.922e-03  -6.222 5.03e-10
I(yr_renovated^2)                    1.321e+00  1.734e-01   7.617 2.73e-14
bedrooms:sqft_living                 4.798e+00  4.044e+00   1.186 0.235456
sqft_basement:I(sqft_living^2)       2.046e-05  1.809e-06  11.308  < 2e-16
bedrooms:I(sqft_living^2)           -5.184e-03  6.808e-04  -7.614 2.79e-14
I(sqft_living^2):I(sqft_basement^2) -3.336e-09  2.459e-10 -13.568  < 2e-16
                                       
(Intercept)                         ***
bedrooms                            ** 
bathrooms                              
sqft_living                            
floors                              ***
waterfrontYes                       ***
viewfair                            ***
viewaverage                         ***
viewgood                            ***
viewvery good                       ***
conditionfair                       ***
conditionaverage                    ***
conditiongood                       ***
conditionvery good                  ***
grade                               ***
sqft_above                             
sqft_basement                          
yr_renovated                        ***
zipcode98002                           
zipcode98003                           
zipcode98004                        ***
zipcode98005                        ***
zipcode98006                        ***
zipcode98007                        ***
zipcode98008                        ***
zipcode98010                        ***
zipcode98011                        *  
zipcode98014                        ** 
zipcode98019                        *  
zipcode98022                        ***
zipcode98023                        ***
zipcode98024                        ***
zipcode98027                        ***
zipcode98028                           
zipcode98029                        ***
zipcode98030                           
zipcode98031                           
zipcode98032                           
zipcode98033                        ***
zipcode98034                        ***
zipcode98038                        ***
zipcode98039                        ***
zipcode98040                        ***
zipcode98042                        .  
zipcode98045                        ***
zipcode98052                        ***
zipcode98053                        ***
zipcode98055                        .  
zipcode98056                        ***
zipcode98058                        *  
zipcode98059                        ***
zipcode98065                        ***
zipcode98070                        ** 
zipcode98072                        ** 
zipcode98074                        ***
zipcode98075                        ***
zipcode98077                           
zipcode98092                           
zipcode98102                        ***
zipcode98103                        ***
zipcode98105                        ***
zipcode98106                        ***
zipcode98107                        ***
zipcode98108                        ***
zipcode98109                        ***
zipcode98112                        ***
zipcode98115                        ***
zipcode98116                        ***
zipcode98117                        ***
zipcode98118                        ***
zipcode98119                        ***
zipcode98122                        ***
zipcode98125                        ***
zipcode98126                        ***
zipcode98133                        *  
zipcode98136                        ***
zipcode98144                        ***
zipcode98146                        .  
zipcode98148                           
zipcode98155                           
zipcode98166                           
zipcode98168                           
zipcode98177                        ***
zipcode98178                           
zipcode98188                           
zipcode98198                        .  
zipcode98199                        ***
lat                                 ** 
long                                ***
sqft_living15                       ***
lotsize                             ***
lotsize15                           ** 
age                                 ***
I(bathrooms^2)                      ***
I(sqft_living^2)                    .  
I(grade^2)                          ***
I(sqft_above^2)                     ***
I(sqft_basement^2)                  ***
I(yr_renovated^2)                   ***
bedrooms:sqft_living                   
sqft_basement:I(sqft_living^2)      ***
bedrooms:I(sqft_living^2)           ***
I(sqft_living^2):I(sqft_basement^2) ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 145100 on 17282 degrees of freedom
Multiple R-squared:  0.8468,    Adjusted R-squared:  0.8459 
F-statistic:   946 on 101 and 17282 DF,  p-value: < 2.2e-16

After performing nonlinear transformations and adding interaction terms, the \(R^2_{adj}\) for the model increases from 0.8079995 to 0.8459358. This means that updated model is capable of explaining 84.6% of the variability in price. I will consider this the final model created by the self-selection method.

Selecting a final model

Analysis of Variance Table

Model 1: price ~ bedrooms + sqft_living + I(waterfront == "Yes") + I(view == 
    "good") + I(view == "very good") + grade + I(zipcode == 98005) + 
    I(zipcode == 98007) + I(zipcode == 98034) + I(zipcode == 
    98040) + I(zipcode == 98042) + I(zipcode == 98103) + I(zipcode == 
    98106) + I(zipcode == 98112) + I(zipcode == 98115) + I(zipcode == 
    98116) + I(zipcode == 98122) + lat + long + sqft_basement + 
    I(sqft_living^2) + I(grade^2) + I(sqft_basement^2) + bedrooms:sqft_living + 
    sqft_living:sqft_basement + sqft_basement:I(sqft_living^2) + 
    bedrooms:I(sqft_living^2) + I(sqft_living^2):I(sqft_basement^2)
Model 2: price ~ bedrooms + bathrooms + sqft_living + floors + waterfront + 
    view + condition + grade + sqft_above + sqft_basement + yr_renovated + 
    zipcode + lat + long + sqft_living15 + lotsize + lotsize15 + 
    age + I(bathrooms^2) + I(sqft_living^2) + I(grade^2) + I(sqft_above^2) + 
    I(sqft_basement^2) + I(yr_renovated^2) + bedrooms:sqft_living + 
    sqft_basement:I(sqft_living^2) + bedrooms:I(sqft_living^2) + 
    I(sqft_living^2):I(sqft_basement^2)
  Res.Df        RSS Df  Sum of Sq      F    Pr(>F)    
1  17355 6.2142e+14                                   
2  17282 3.6389e+14 73 2.5753e+14 167.54 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The analysis of variance reports an \(RSS\) of 6.214208410^{14} for the final model chosen by subset selection and an \(RSS\) of 3.638931510^{14} for the self-selected model. The lower \(RSS\) for the self-selected model indicates less overall error and therefore a higher prediction accuracy. Thus, the final model I select is the model created by self-selection.

This document uses tidyverse by Wickham (2017), leaps by Lumley (2017), MASS by Ripley (2018), corrplot by Wei and Simko (2017), DT by Xie (2018b), ISLR by James et al. (2017), knitr by Xie (2018c), and bookdown by Xie (2018a).

References

James, Gareth, Daniela Witten, Trevor Hastie, and Rob Tibshirani. 2017. ISLR: Data for an Introduction to Statistical Learning with Applications in R. https://CRAN.R-project.org/package=ISLR.

Lumley, Thomas. 2017. Leaps: Regression Subset Selection. https://CRAN.R-project.org/package=leaps.

Ripley, Brian. 2018. MASS: Support Functions and Datasets for Venables and Ripley’s Mass. https://CRAN.R-project.org/package=MASS.

Wei, Taiyun, and Viliam Simko. 2017. Corrplot: Visualization of a Correlation Matrix. https://CRAN.R-project.org/package=corrplot.

Wickham, Hadley. 2017. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.

Xie, Yihui. 2018a. Bookdown: Authoring Books and Technical Documents with R Markdown. https://CRAN.R-project.org/package=bookdown.

———. 2018b. DT: A Wrapper of the Javascript Library ’Datatables’. https://CRAN.R-project.org/package=DT.

———. 2018c. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.