An Overview: Choosing the Correct Statistical Test

 

 

The correct statistical test for an experiment largely depends on the nature of the independent and dependent variables analyzed. For the purpose of choosing a statistical test, variables fall into two classes: Categorical and Continuous. Categorical variable values cannot be sequentially ordered or differentiated from each other using a mathematical method.

Examples include:

        gender

        ethnicity

        software user interfaces

 

Continuous variables are numeric values that can be ordered sequentially, and that do not naturally fall into discrete ranges.

Examples include:

        weight

        number of seconds it takes to perform a task

        number of words on a user interface

 

These concepts can be combined to make a simple model for choosing the correct statistical testi

 

 

Dependent Variable

Categorical

Continuous

Independent

Variable

Categorical

Chi Square

t-test, ANOVA

Continuous

LDA, QDA

Regression

 

The model is straightforward, illustrating how the nature of the independent and dependent variables drive the choice of a statistical test.

 

With understanding of the basic model for choosing a statistical test, we can add relevant details to the model. First, we need to address two additional types of variables, ordinal and interval.

 

First, ordinal variables are similar to continuous variables; they can be ordered sequentially. They are also similar to categorical variables because they (perhaps) cannot be differentiated from each other using a mathematical method. For example, education level is an ordinal variable. The levels of educational achievement (high school, some college, undergraduate degree, etc.) can be sequenced in the order in which they are achieved, and when defined as such, cannot be differentiated from each other mathematically. So the question is, using the simple model for choosing a statistical test, is an ordinal variable Categorical or Continuous? The answer depends on how the researcher defines the variable. When education levels are defined as high school, some college, undergraduate degree, etc., the levels are categorical, and the researcher should choose a test for categorical data. The researcher could, however, define education level in a slightly different way. If the researcher instead defined education level as years of full-time education, then the variable takes on the characteristics of a Continuous variable, and the researcher should choose a statistical test for a Continuous variable.

 

Interval variables also exhibit characteristics of Categorical and Continuous variables. Interval variables fall into equally spaced ranges. For example, an experimenter collects salary levels using the following ranges:

 

        $10,000 20,000

        $20, 000 30,000

        $30,0000 40,000, etc.

 

The values can be numerically sequenced, so they are similar to Continuous variables. Because the ranges are equally spaced, though, an unnatural restriction is placed on the values, and thus they are similar to Categorical values. When it comes to choosing a statistical test, there is no hard and fast rule for defining interval data as Categorical or Continuous, and the researcher should use his/her discretion in making the choice. Granularity of ranges is a reasonable guide for deciding how to define the data. For example, when intervals are granular, the researcher may decide to define the variable as Continuous, and for coarser intervals, Categorical.

 

Number of variables

The number of independent and dependent variable in the experiment also affect which statistical test to choose. For example, linear regression applies when the researcher compares 1 continuous dependent variable and 1 continuous independent variable. Multiple regression applies when the researcher compares 2 or more continuous independent variables against 1 continuous dependent variable.

 

The number of levels of a categorical variable can also drive which statistical test to use. For example, a researcher wants to compare whether gender affects the amount of time to perform a task using a given user interface. Gender serves as a 2 level categorical independent variable because it has 2 possible values: male and female. Time to complete the task would serve as the continuous dependent variable. In this example, a 2-sample t-test would be the correct statistical test. If the categorical independent variable has more than 2 values, however, one-way ANOVA should be applied. Throughout this guide, the number of independent and dependent variables needed to run the statistical test are included right after the section heading.

 

Normality

For a given set of independent and dependent variables, often there are two statistical tests available: one parametric and one non-parametric. Parametric tests are appropriate when continuous variables follow a normal distribution, and non-parametric tests are appropriate when they do not. Throughout this guide, the numeric distribution requirements are included right after the section heading.

 

 



 

The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.