Predictive variable selection
One of the most essential steps in developing a robust and accurate predictive model is variable selection. It is not uncommon to start this process with a candidate list of several hundred candidate predictors, eventually whittling it down to 10-20. While some sources advocate automated variable selection using, e.g., their significance levels, others point out that “...a purely statistical solution is unrealistic. The role of scientific judgment cannot be overlooked.” ; see also . Considering that it may be difficult to implement a manual solution when working with a particularly large number of variables, an automated process, e.g., backward selection, may be used to augment but not supplant the researcher’s judgment; a standard R package,
caret, is widely accepted for this purpose . An algorithm for this process is outlined in Fig 3.7.