The correct statistical test for an experiment largely depends on the nature of the independent and dependent variables analyzed. For the purpose of choosing a statistical test, variables fall into two classes: Categorical and Continuous. Categorical variable values cannot be sequentially ordered or differentiated from each other using a mathematical method.
Examples include:
· gender
· ethnicity
· software user interfaces
Continuous variables are numeric values that can be ordered sequentially, and that do not naturally fall into discrete ranges.
Examples include:
· weight
· number of seconds it takes to perform a task
· number of words on a user interface
These concepts can be combined to make a simple model for choosing the correct statistical testi
|
|
Dependent Variable |
||
|
Categorical |
Continuous |
||
|
Independent Variable |
Categorical |
Chi Square |
t-test, ANOVA |
|
Continuous |
LDA, QDA |
Regression |
|
The model is straightforward,
illustrating how the nature of the independent and dependent variables drive
the choice of a statistical test.
With understanding of the basic
model for choosing a statistical test, we can add relevant details to the
model. First, we need to address two additional types of variables, ordinal and
interval.
First, ordinal variables are similar
to continuous variables; they can be ordered sequentially. They are also
similar to categorical variables because they (perhaps) cannot be
differentiated from each other using a mathematical method. For example,
education level is an ordinal variable. The levels of educational achievement
(high school, some college, undergraduate degree, etc.) can be sequenced in the
order in which they are achieved, and when defined as such, cannot be
differentiated from each other mathematically. So the question is, using the
simple model for choosing a statistical test, is an ordinal variable
Categorical or Continuous? The answer depends on how the researcher defines the
variable. When education levels are defined as high school, some college,
undergraduate degree, etc., the levels are categorical, and the researcher
should choose a test for categorical data. The researcher could, however,
define education level in a slightly different way. If the researcher instead
defined education level as years of full-time education, then the variable
takes on the characteristics of a Continuous variable, and the researcher
should choose a statistical test for a Continuous variable.
Interval variables also exhibit
characteristics of Categorical and Continuous variables. Interval variables
fall into equally spaced ranges. For example, an experimenter collects salary
levels using the following ranges:
·
$10,000 – 20,000
·
$20, 000 – 30,000
·
$30,0000 – 40,000, etc.
The values can be numerically sequenced,
so they are similar to Continuous variables. Because the ranges are equally
spaced, though, an unnatural restriction is placed on the values, and thus they
are similar to Categorical values. When it comes to choosing a statistical
test, there is no hard and fast rule for defining interval data as Categorical
or Continuous, and the researcher should use his/her discretion in making the
choice. Granularity of ranges is a reasonable guide for deciding how to define
the data. For example, when intervals are granular, the researcher may decide
to define the variable as Continuous, and for coarser intervals, Categorical.
Number of variables
The number of independent and
dependent variable in the experiment also affect which statistical test to
choose. For example, linear regression applies when the researcher compares 1
continuous dependent variable and 1 continuous independent variable. Multiple
regression applies when the researcher compares 2 or more continuous
independent variables against 1 continuous dependent variable.
The number of levels of a
categorical variable can also drive which statistical test to use. For example,
a researcher wants to compare whether gender affects the amount of time to
perform a task using a given user interface. Gender serves as a 2 level
categorical independent variable because it has 2 possible values: male and
female. Time to complete the task would serve as the continuous dependent
variable. In this example, a 2-sample t-test would be the correct statistical
test. If the categorical independent variable has more than 2 values, however,
one-way ANOVA should be applied. Throughout this guide, the number of
independent and dependent variables needed to run the statistical test are
included right after the section heading.
Normality
For a given set of independent and
dependent variables, often there are two statistical tests available: one
parametric and one non-parametric. Parametric tests are appropriate when
continuous variables follow a normal distribution, and non-parametric tests are
appropriate when they do not. Throughout this guide, the numeric distribution
requirements are included right after the section heading.