In most applications, regression models are merely useful approximations.
Reality is often so
complicated that you cannot know what the true model is. You may have to choose
a model more
on the basis of what variables can be measured and what kinds of models can be
estimated than
on a rigorous theory that explains how the universe really works. However, even
in cases where
theory is lacking, a regression model may be an excellent predictor of the
response if the model is
carefully formulated from a large sample. The interpretation of statistics such
as parameter
estimates may nevertheless be highly problematical.
Statisticians usually use the word "prediction" in a technical sense.
Prediction in this sense does
not refer to "predicting the future" (statisticians call that forecasting) but
rather to guessing the
response from the values of the regressors in an observation taken under the
same circumstances
as the sample from which the regression equation was estimated. If you developed
a regression
model for predicting consumer preferences in 1958, it may not give very good
predictions in 1988
no matter how well it did in 1958. If it is the future you want to predict, your
model must include
whatever relevant factors may change over time. If the process you are studying
does in fact
change over time, you must take observations at several, perhaps many, different
times. Analysis
of such data is the province of SAS/ETS procedures such as AUTOREG and
STATESPACE.
Refer to the SAS/ETS User's Guide for more information on these procedures.
The comments in the rest of this section are directed toward linear
least-squares regression.
Nonlinear regression and non-least-squares regression often introduce further
complications. For
more detailed discussions of the interpretation of regression statistics, see
Darlington (1968),
Mosteller and Tukey (1977), Weisberg (1985), and Younger (1979).
Parameter estimates are easiest to interpret in a controlled experiment in
which the regressors are
manipulated independently of each other. In a well-designed experiment, such as
a randomized
factorial design with replications in each cell, you can use lack-of-fit tests
and estimates of the
standard error of prediction to determine whether the model describes the
experimental process
with adequate precision. If so, a regression coefficient estimates the amount by
which the mean
response changes when the regressor is changed by one unit while all the other
regressors are
unchanged. However, if the model involves interactions or polynomial terms, it
may not be possible
to interpret individual regression coefficients. For example, if the equation
includes both linear and
quadratic terms for a given variable, you cannot physically change the value of
the linear term
without also changing the value of the quadratic term. Sometimes it may be
possible to recode the
regressors, for example by using orthogonal polynomials, to make the
interpretation easier.
If the nonstatistical aspects of the experiment are also treated with
sufficient care (including such
things as use of placebos and double blinds), then you can state conclusions in
causal terms; that
is, this change in a regressor causes that change in the response. Causality can
never be inferred
from statistical results alone or from an observational study.
If the model that you fit is not the true model, then the parameter estimates
may depend strongly
on the particular values of the regressors used in the experiment. For example,
if the response is
actually a quadratic function of a regressor but you fit a linear function, the
estimated slope may be
a large negative value if you use only small values of the regressor, a large
positive value if you
use only large values of the regressor, or near zero if you use both large and
small regressor
values. When you report the results of an experiment, it is important to include
the values of the
regressors. It is also important to avoid extrapolating the regression equation
outside the range of
regressors in the sample.
In an observational study, parameter estimates can be interpreted as the
expected difference in
response of two observations that differ by one unit on the regressor in
question and that have the
same values for all other regressors. You cannot make inferences about "changes"
in an
observational study since you have not actually changed anything. It may not be
possible even in
principle to change one regressor independently of all the others. Neither can
you draw
conclusions about causality without experimental manipulation.
If you conduct an observational study and if you do not know the true form of
the model,
interpretation of parameter estimates becomes even more convoluted. A
coefficient must then be
interpreted as an average over the sampled population of expected differences in
response of
observations that differ by one unit on only one regressor. The considerations
that are discussed
under controlled experiments for which the true model is not known also apply.
Two coefficients in the same model can be directly compared only if the
regressors are measured
in the same units. You can make any coefficient large or small just by changing
the units. If you
convert a regressor from feet to miles, the parameter estimate is multiplied by
5280.
Sometimes standardized regression coefficients are used to compare the
effects of regressors
measured in different units. Standardizing the variables effectively makes the
standard deviation
the unit of measurement. This makes sense only if the standard deviation is a
meaningful quantity,
which usually is the case only if the observations are sampled from a
well-defined population. In a
controlled experiment, the standard deviation of a regressor depends on the
values of the
regressor selected by the experimenter. Thus, you can make a standardized
regression coefficient
large by using a large range of values for the regressor.
In some applications you may be able to compare regression coefficients in
terms of the practical
range of variation of a regressor. Suppose that each independent variable in an
industrial process
can be set to values only within a certain range. You can rescale the variables
so that the smallest
possible value is zero and the largest possible value is one. Then the unit of
measurement for
each regressor is the maximum possible range of the regressor, and the parameter
estimates are
comparable in that sense. Another possibility is to scale the regressors in
terms of the cost of
setting a regressor to a particular value, so comparisons can be made in
monetary terms.
In an experiment, you can often select values for the regressors such that
the regressors are
orthogonal (not correlated with each other). Orthogonal designs have enormous
advantages in
interpretation. With orthogonal regressors, the parameter estimate for a given
regressor does not
depend on which other regressors are included in the model, although other
statistics such as
standard errors and p-values may change.
If the regressors are correlated, it becomes difficult to disentangle the
effects of one regressor from
another, and the parameter estimates may be highly dependent on which regressors
are used in
the model. Two correlated regressors may be nonsignificant when tested
separately but highly
significant when considered together. If two regressors have a correlation of
1.0, it is impossible to
separate their effects.
It may be possible to recode correlated regressors to make interpretation
easier. For example, if X
and Y are highly correlated, they could be replaced in a linear
regression by X+Y and X-Y without
changing the fit of the model or statistics for other regressors.
If there is error in the measurements of the regressors, the parameter
estimates must be
interpreted with respect to the measured values of the regressors, not the true
values. A regressor
may be statistically nonsignificant when measured with error even though it
would have been
highly significant if measured accurately.
Probability values (p-values) do not necessarily measure the importance of a
regressor. An
important regressor can have a large (nonsignificant) p-value if the sample is
small, if the
regressor is measured over a narrow range, if there are large measurement
errors, or if another
closely related regressor is included in the equation. An unimportant regressor
can have a very
small p-value in a large sample. Computing a confidence interval for a parameter
estimate gives
you more useful information than just looking at the p-value, but confidence
intervals do not solve
problems of measurement errors in the regressors or highly correlated regressors.
The p-values are always approximations. The assumptions required to compute
exact p-values
are never satisfied in practice.
R2 is usually defined as the proportion of variance of the response that is
predictable from (that can
be explained by) the regressor variables. It may be easier to interpret
,
which is
approximately the factor by which the standard error of prediction is reduced by
the introduction of
the regressor variables.
R2 is easiest to interpret when the observations, including the values of
both the regressors and
response, are randomly sampled from a well-defined population. Nonrandom
sampling can greatly
distort R2. For example, excessively large values of R2 can be obtained by
omitting from the
sample observations with regressor values near the mean.
In a controlled experiment, R2 depends on the values chosen for the
regressors. A wide range of
regressor values generally yields a larger R2 than a narrow range. In comparing
the results of two
experiments on the same variables but with different ranges for the regressors,
you should look at
the standard error of prediction (root mean square error) rather than R2.
Whether a given R2 value is considered to be large or small depends on the
context of the
particular study. A social scientist might consider an R2 of 0.30 to be large,
while a physicist might
consider 0.98 to be small.
You can always get an R2 arbitrarily close to 1.0 by including a large number
of completely
unrelated regressors in the equation. If the number of regressors is close to
the sample size, R2 is
very biased. In such cases, the adjusted R2 and related statistics discussed by
Darlington (1968)
are less misleading.
If you fit many different models and choose the model with the largest R2,
all the statistics are
biased and the p-values for the parameter estimates are not valid. Caution must
be taken with the
interpretation of R2 for models with no intercept term. As a general rule,
no-intercept models should
be fit only when theoretical justification exists and the data appear to fit a
no-intercept framework.
The R2 in those cases is measuring something different (refer to Kvalseth 1985).
All regression statistics can be seriously distorted by a single incorrect
data value. A decimal point
in the wrong place can completely change the parameter estimates, R2, and other
statistics. It is
important to check your data for outliers and influential observations. The
diagnostics in PROC
REG are particularly useful in this regard.