Multiple Regression

 

1 Continuous Dependent Variable with normal distribution

Multiple Continuous Independent Variables with normal distribution

 

 

y = a * x1 + b * x2 + c * x3 +…m * xn + b

 

Multiple regression is the instrument of choice when the researcher believes several independent variables interact to predict the value of a dependent variable. The test measures the degree to which each of the independent variables contributes to the prediction. 

Multiple regression assumes:

·        the independent variables are not highly correlated with each other

·        the independent variables predict the dependent variable, but the reverse is not true; the dependent variable cannot predict the values of the independent variables

 

Multiple regression is normally implemented using one of two techniques. The first technique, called forward stepwise regression, starts by measuring the degree to which one independent variable (usually the one the researcher believes is the strongest predictor) correlates to the dependent variable. One by one, additional independent variables are added to the equation, and the degree (if any) to which each predict the dependent variable is noted.

 

Backwards stepwise regression, a related approach, begins with an examination of the combined effect of all of the independent variables on the dependent variable. One by one, independent variables (usually starting with the weakest predictor) are removed, and a new analysis is performed. The results provide coefficients for each independent variable, signifying the degree to which each one, when combined with the others, contributes to predicting the dependent variable.

 

HCI example:

In ‘Empirically Validated Web Page Design Metrics’ [3], Ivory et al identify 11 web page attributes they believe contribute to a web page’s usability. They believe the attributes interact with each other in contributing to effective design.

 

The web pages studied were entrants in the first round of the 2000 Webby Awards, a contest where web page design is rigorously evaluated by a panel of experts. The research team divided the web pages into two groups, “good” web pages, which scored in the top 33% of the Webby first round of competition, and “not good” web pages, which scored in the bottom 67% of the Webby award first round of competition.

 

The following is a subset of the design attributes the research team believed contributed to the effectiveness of a web page:

·        word count

·        emphasized text percentage (e.g. headings, bold or colored text)

·        number of links

·        page size

·        number of graphics

 

The research team noticed a web page’s word count, by itself, was not a good predictor of the web page’s Webby award score. To illustrate, they found both “good” and “not good” web pages with high word counts. Conversely, both “good” and “not good” web pages were just as likely to have low word counts. The researchers postulated other attributes, when combined with a web page’s word count, determined its Webby award first round score.

 

The research team employed backward stepwise multiple regression to discover that if a “good” web page had low word count, then the web page also had small page size and used graphics sparingly. They also found when web pages had high word count, the number of headers and links on the page predicted its success in the Webby Award competition. The researchers found several other predictive correlations based on results of the multiple regression test.

 

If the team applied multiple regression correctly, there should not have been strong correlation between the independent variables. In multiple regression, if two independent variables are highly correlated, stepwise regression will return a high standard error, alerting the researcher there is strong correlation between them. When this happens, the researcher should retain only one of the correlated factors in the remaining analysis and the results. If two independent variables are highly correlated, it follows only one of them should be included in the result set. If both are left in the result set, multiple regression results will be skewed.i The results reported in the Ivory paper are not detailed enough to determine whether any of the independent variables are strongly correlated with each other.

 

The research team’s work exemplifies an interesting application of multiple regression. The team chose the right statistical test because their experiment involved a single dependent variable, effectiveness of web page design, which can be measured using a continuous metric. The experiment included multiple independent variables, web page design attributes, each of which was also measurable on a continuous scale. The experiment sample size was large; the team evaluated 428 web pages, and thus correctly assumed variables would follow a normal distribution.

 

Values to report when using multiple regression:

·        adjusted r square value

·        standard error

·        F value

·        Significance

·        Beta coefficient