What is the term for the degree to which a measure or scale?

Levels of Measurement

In 1946, Harvard University psychologist Stanley Smith Stevens developed the theory of the four levels of measurement when he published an article in Science entitled, "On the Theory of Scales of Measurement." In this famous article, Stevens argued that all measurement is conducted using four measurement levels. The four levels of measure, in order of complexity, are:

Nominal

Ordinal

Interval

Ratio

Here is a simple trick for remembering the four levels of measurement: Think "NOIR." Noir is the French word for black. "N" is for nominal. "O" is for Ordinal. "I" is for Interval. And, "R" is for ratio.

Categorical and Quantitative Measures:

The nominal and ordinal levels are considered categorical measures while the interval and ratio levels are viewed as quantitative measures.

Knowing the level of measurement of your data is critically important as the techniques used to display, summarize, and analyze the data depend on their level of measurement.

Let us turn to each of the four levels of measurement.

A. The Nominal Level

The nominal level of measurement is the simplest level. "Nominal" means "existing in name only." With the nominal level of measurement all we can do is to name or label things. Even when we use numbers, these numbers are only names. We cannot perform any arithmetic with nominal level data. All we can do is count the frequencies with which the things occur.

With nominal level of measurement, no meaningful order is implied. This means we can re-order our list of variables without affecting how we look at the relationship among these variables.

Here are some examples of nominal level data:

The number on an athlete's uniform
Your social security number
Your Visa card number
Your political party affiliation
The city where you were born
Your religion
Your social security number
The color of your eyes
The color of your hair
The color of the candies in a bag of M&Ms

With the nominal level of measurement, we are limited in the types of analyses we can perform. We can count the frequencies of items of interest, but we cannot sort the data in a way that changes the relationship among the variables under investigation. We can calculate the mode of the frequently occurring value or values. And, we can also perform a variety of non-parametric hypotheses tests. Non-parametric tests make no assumptions regarding the population from which the data are drawn. But, we cannot calculate common statistical measures like the mean, median, variance, or standard deviation.

B. The Ordinal Level

The ordinal level of measurement is a more sophisticated scale than the nominal level. This scale enables us to order the items of interest using ordinal numbers. Ordinal numbers denote an item's position or rank in a sequence: First, second, third, and so on. But, we lack a measurement of the distance, or intervals, between ranks. For example, let's say we observed a horse race. The order of finish is Rosebud #1, Sea Biscuit #2, and Kappa Gamma #3. We lack information about the difference in time or distance that separated the horses as they crossed the finish line.

Here are some examples of ordinal level data:

Order of finish in a race or a contest
Letter grades: A, B, C, D, or F
Ranking of chili peppers on a scale of hot, hotter, hottest
A student's year of study in high school or college: Freshman, Sophomore, Junior, and Senior
Stage of cancer: Stage I, II, III, or IV
Level of agreement: Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree

With the ordinal level of measurement, we can count the frequencies of items of interest and sort them in a meaningful rank order. And, as we said, we cannot, however, measure the distance between ranks. In terms of statistical analyses, we can count the frequency of an occurrence of an event, calculate the median, percentile, decile, and quartiles. We can also perform a variety of non-parametric hypotheses tests. But, we cannot calculate common statistical measures like the mean, median, variance, or standard deviation. And, we cannot perform parametric hypothesis tests using z values, t values, and F values.

C. The Interval Level

With the interval level of measurement we have quantitative data. Like the ordinal level, the interval level has an inherent order. But, unlike the ordinal level, we do have the distance between intervals on the scale. The interval level, however, lacks a real, non-arbitrary zero.

To repeat, here are three characteristics of the interval level:

The values have a meaningful order
The distances between the ranks are measureable
There is no "true" or natural zero

The classic example of the interval scale is temperature measured on the Fahrenheit or Celsius scales. Let's suppose today's high temperature is 60º F and thirty days ago the high temperature was only 30º F. We can say that the difference between the high temperatures on these two days is 30 degrees. But, because our measurement scale lacks a real, non-arbitrary zero, we cannot say the temperature today is twice as warm as the temperature thirty days ago.

In addition to temperature on the Fahrenheit or Celsius scales, examples of interval scale measures include:

Scores on the College Board's Scholastic Aptitude Test, which measures a student's scores on reading, writing, and math on a scale of 200 to 800
Intelligence Quotient scores
Dates on a calendar
The heights of waves in the ocean
Longitudes on a globe or map
Shoe size

With the interval level of measurement, we can perform most arithmetic operations. We can calculate common statistical measures like the mean, median, variance, or standard deviation. But, because we lack a non-arbitrary zero, we cannot calculate proportions, ratios, percentages, and fractions. We can also perform all manner of hypotheses tests as well as basic correlation and regression analyses.

D. The Ratio Level

The last and most sophisticated level of measurement is the ratio level. As with the ordinal and interval levels, the data have an inherent order. And, like the interval level, we can measure the intervals between the ranks with a measurable scale of values. But, unlike the interval level, we now have meaningful zero. The addition of a non-arbitrary zero allows use to calculate the numerical relationship between values using ratios: fractions, proportions, and percentages.

An example of the ratio level of measurement is weight. A person who weights 150 pounds, weights twice as much as a person who weighs only 75 pounds and half as much as a person who weighs 300 pounds. We can calculate ratios like these because the scale for weight in pounds starts at zero pounds.

n addition to weight, examples of ratio scale measures include:

Height
Income
Distance travelled
Time elapsed or time remaining
Money in your bank account, wallet, or pocket

With the ratio level of measurement, we can perform all arithmetic operations including proportions, ratios, percentages, and fractions. In terms of statistical analyses, we can calculate the mean, geometric mean, harmonic mean, median, mode, variance, and standard deviation. We can also perform all manner of hypotheses tests as well as correlation and regression analyses.

toc | return to top | previous page | next page

G & F Chapter 3

Two major measurement issues to consider when planning a study:

A. No one-to-one relationship b/tw variable and measurement

B. Measurement chosen can influence the measurements and the interpretation of the variables

Types of variables involved in research

–Well-defined, easily observed, and easily measured

Examples: height and weight

–Intangible, abstract attributes

Examples: motivation or self-esteem

Measurement is more complicated

II. Constructs and Operational Definitions

A. Theories and Constructs

Construct—a hypothetical attribute or mechanism that helps explain and predict behavior in a theory

External stimulus factors >>Construct>>Behavior

The construct itself cannot be directly observed or measured

Therefore, observe and measure the external factors and the behaviors that are associated theoretically with the construct.

Operational Definitions—specifies a measurement procedure for measuring external, observable behavior. The resulting measurements are used as a definition and a measurement of the construct.

When doing research, don't re-invent the wheel.

So if we are investigating the effect of watching violent television programs on children’s aggressive behavior:

We need to operationalize “violence” on television.

We need to operationalize “aggressive behavior.”

Another example:
Which of the following might be used as an operational definition of “assertiveness?”

nThe number of times a person makes requests or states his or her feelings over the course of a one-hour interaction.

An appearance of confidence and ease in social situations.

III. Validity of Measurement

How can we be sure that the measurements obtained from an operational definition actually represent the intangible construct?

Validity—Degree to which the measurement process measures the variable that it claims to measure?

Six commonly used definitions of validity

1. Face validity—Does the measurement technique look like it measures the variable that it claims to measure?

2. Concurrent Validity—Are the scores from a new measurement technique directly related to the scores obtained from another, better-established procedure for measuring the same variable?

3. Predictive Validity—Do the measurements of a construct accurately predict behavior according to the theory?

4. Construct Validity—Do measurements of a variable behave in exactly the same way as the variable itself?

5. Convergent Validity— A strong relationship between the scores obtained from two different methods of measuring the same construct.

6. Divergent Validity—Demonstrating that two two distinct constructs produce unrelated scores.

IV. Reliability of Measurement

Reliability is the stability or the consistency of measurement

Each individual measurement has an element of error. Measured Score=True Score + Error

The inconsistency in a measurement comes from error.

A. Common sources of error are:

Observer error

Environmental changes

Participant changes

B. Types and Measures of Reliability

Successive measurements--test-retest reliability

Simultaneous measurements--inter-rater reliability or inter-observer reliability

Internal consistency—split-half reliability

C. What is the Relationship between Reliability and Validity?

Partially related

Must a test be reliable in order to be valid?

Partially independent

Must a test be valid in order to be reliable?

V. Scales of Measurement

The process of measuring a variable requires a set of categories called a scale of measurement and a process that classifies each individual into one category.

Scale

Characteristics

Examples

Nominal

• Label and categorize

• No quantitative distinctions

• Gender

• Diagnosis

• Experimental or Control

Ordinal

• Categorizes observations

• Categories organized by size or magnitude

• Rank in class

• Clothing sizes (S, M, L, XL)

• Olympic medals

Interval

• Ordered categories

• Interval between categories
of equal size

• Arbitrary or absent zero point

• Temperature

• IQ

• Golf scores (above/below par)

Ratio

• Ordered categories

• Equal interval between categories

• Absolute zero point

• Number of correct answers

• Time to complete task

• Gain in height and/or weight since last year

VI. Modalities of Measurement

A. Self-Report Measures

Ask the participant a series of questions, i.e., administer a survey

Most direct way to assess a construct, but participants may distort responses

`B. Physiological Measures

Look at how the underlying construct affects physiology

Objective measure, but equipment may be expensive or setting may be unnatural

C. Behavioral Measures

Observe and measure overt behavior

Wide variety of options: e.g., "mental alertness" could be operationally defined by behaviors such as reaction time, reading comprehension, logical reasoning ability, or ability to focus attention.

Behavioral Observation--prepared set of behavioral categories is crucial

Observation without intervention

Naturalistic observation

Ethology

Observation with intervention

Why intervene?

Participant observation

Structured observation

Field experiments

VII. Other Aspects of Measurement

Whenever possible, use multiple measures of a construct

Desynchrony--lack of agreement between two measures; confuses the interpretation of the results

Sensitivity and Range Effects

The Dependent Variable

Must be sensitive enough to show variations in performance as a result of variations in the IV.

Ceiling effect: Performance high at all levels of the IV

Floor effect: Performance low at all levels of the IV

Participant Reactivity

Demand characteristics

Experimenter Bias

The tendency of an experimenter to unintentionally distort the procedures or results of an experiment based on the expected or desired outcome of the research.

Methods have been devised to help counteract these normal human tendencies that create bias:

Using blind observers who record data without knowing what the researcher is studying

Using a placebo control

Single-blind vs. Double-blind research