Multiple
Regression
1 Continuous Dependent Variable with
normal distribution
Multiple Continuous Independent Variables
with normal distribution
y = a * x1 + b * x2 + c * x3 +…m *
xn + b
Multiple regression is the instrument of
choice when the researcher believes several independent variables interact to
predict the value of a dependent variable. The test measures the degree to
which each of the independent variables contributes to the prediction.
Multiple regression assumes:
·
the
independent variables are not highly correlated with each other
·
the
independent variables predict the dependent variable, but the reverse is not
true; the dependent variable cannot predict the values of the independent
variables
Multiple
regression is normally implemented using one of two techniques. The first
technique, called forward stepwise regression, starts by measuring the degree
to which one independent variable (usually the one the researcher believes is
the strongest predictor) correlates to the dependent variable. One by one,
additional independent variables are added to the equation, and the degree (if
any) to which each predict the dependent variable is noted.
Backwards
stepwise regression, a related approach, begins with an examination of the
combined effect of all of the independent variables on the dependent variable.
One by one, independent variables (usually starting with the weakest predictor)
are removed, and a new analysis is performed. The results provide coefficients
for each independent variable, signifying the degree to which each one, when
combined with the others, contributes to predicting the dependent variable.
HCI example:
In
‘Empirically Validated Web Page Design Metrics’ [3],
Ivory et al identify 11 web page attributes they believe contribute to a web
page’s usability. They believe the attributes interact with each other in
contributing to effective design.
The web
pages studied were entrants in the first round of the 2000 Webby Awards, a
contest where web page design is rigorously evaluated by a panel of experts. The
research team divided the web pages into two groups, “good” web pages, which
scored in the top 33% of the Webby first round of competition, and “not good”
web pages, which scored in the bottom 67% of the Webby award first round of
competition.
The following
is a subset of the design attributes the research team believed contributed to
the effectiveness of a web page:
·
word count
·
emphasized
text percentage (e.g. headings, bold or colored text)
·
number of
links
·
page size
·
number of
graphics
The research
team noticed a web page’s word count, by itself, was not a good predictor of
the web page’s Webby award score. To illustrate, they found both “good” and
“not good” web pages with high word counts. Conversely, both “good” and “not
good” web pages were just as likely to have low word counts. The researchers
postulated other attributes, when combined with a web page’s word count,
determined its Webby award first round score.
The research
team employed backward stepwise multiple regression to discover that if a “good”
web page had low word count, then the web page also had small page size and
used graphics sparingly. They also found when web pages had high word count,
the number of headers and links on the page predicted its success in the Webby
Award competition. The researchers found several other predictive correlations
based on results of the multiple regression test.
If the team
applied multiple regression correctly, there should not have been strong
correlation between the independent variables. In multiple regression, if two
independent variables are highly correlated, stepwise regression will return a
high standard error, alerting the researcher there is strong correlation
between them. When this happens, the researcher should retain only one of the
correlated factors in the remaining analysis and the results. If two
independent variables are highly correlated, it follows only one of them should
be included in the result set. If both are left in the result set, multiple
regression results will be skewed.i The
results reported in the Ivory paper are not detailed enough to determine
whether any of the independent variables are strongly correlated with each
other.
The research
team’s work exemplifies an interesting application of multiple regression. The
team chose the right statistical test because their experiment involved a
single dependent variable, effectiveness of web page design, which can be
measured using a continuous metric. The experiment included multiple
independent variables, web page design attributes, each of which was also
measurable on a continuous scale. The experiment sample size was large; the
team evaluated 428 web pages, and thus correctly assumed variables would follow
a normal distribution.
Values to
report when using multiple regression:
·
adjusted r
square value
·
standard
error
·
F value
·
Significance
·
Beta
coefficient