(from Weiss, D. J., *Analysis of Variance and Functional
Measurement: A Practical Guide, *published by Oxford University Press, 2006)

*A priori: *Latin
phrase for designated in advance. The usual statistical hypotheses are regarded
as having been specified without reference to the data. Tests are carried out
as though the hypotheses had been proposed in advance even if the researcher
had been lax about specification. The alternative to prior designation is a
post-hoc test, so named because the hypothesis to be evaluated is constructed
as a result of something interesting having emerged in the data.

*Alternative hypothesis**:*
In general, the proposition expressing the particular way the null
hypothesis is held to be false.
Sometimes referred to as the motivated hypothesis, it usually reflects a
difference the researcher hopes to demonstrate. In the ANOVA setting, the usual alternative hypothesis is that the
true means of the various groups are unequal.

*Confounded*: The situation in which the effect of a
controlled variable is inextricably mixed with that of another, uncontrolled
variable. For example, if we select children from a schoolyard and form experimental
groups according to their heights, the groups would be very likely to differ in
age as well.

*Critical value*: A
value taken from a statistical table. It serves as the criterion for
determining whether the corresponding data-based statistic is large enough to
be considered evidence against the null hypothesis.

*Degrees of freedom* (*df*):* *A structural aspect* *of the experimental design, determined
by the number of scores and not their values.
The general rule is that each observation generates one *df*.
The number of elements in each subdivision of the design determines the
allocation of *df*s to structural
units. Some statistical tests, among
them F and t, require *df*
specification to determine critical values.

*Dependent variable*: An index of the behavior of
interest. In univariate analysis (ANOVA), there is one behavior selected by the
researcher to reflect an interesting psychological property. In MANOVA (multivariate analysis of variance),
there is more than one dependent variable.

*Group*:* *A collection of participants given the
same experimental treatment. The group
is formed within the research project and need not have existed prior to the
experiment. Participants in a group may actually generate their data on an
individual basis and never come into contact with one another. It is their
scores that are grouped for analytic purposes.

*Independent variable*: A treatment under experimental
control. The researcher imposes several
(at least two) different values of the independent variable on participants,
and compares the results to determine whether the variable differentially
affects behavior. An experiment must have at least one independent variable,
and may have several.

*Interval scale*: A
measuring system having the property that intervals between the numbers
assigned accurately reflect intervals between the magnitudes of the objects
being measured. The important implication is that a given difference between
two scores is independent of location along the scale.

*Invalid*: When applied to a measurement scale, a lack
of validity means that the numbers assigned do not reflect the true magnitude
of the property being measured.
(Pronounced with the second syllable accented, to distinguish this
adjective from the homographic noun designating a person in poor health.)

*Nominal scale*: A
measuring system in which the numbers used do not reflect the magnitude of the
objects being measured. The numbers
serve only to label or classify, not to order in any way.

*Null hypothesis*: In
the ANOVA context, a statement that there is no difference among a set of true
means. In other statistical settings,
the statement similarly refers to an absence of interesting differences. Because observed means are estimates of the
true values, statistical machinery is invoked to determine the validity of the
hypothesis. The fallibility of data
implies that some incorrect decisions are inevitable.

*One-tailed test:* A significance test in which a statistic
obtained from the data is compared to the critical value in a designated tail
of the probability distribution. Used
when the alternative hypothesis specifies the sign the test
statistic must achieve. Synonymous with
directional hypothesis. Because F-ratios
must be positive, ANOVA employs only one-tailed tests.

*Ordinal scale:* A
measuring system in which the numbers used are monotonically related to the
magnitude of the objects being measured.
Monotonicity means the larger the object, the larger the number.

*Power*: The ability
of a statistical test to verify the falsity of the null hypothesis. Power is expressed as 1 - b,
where b
is the probability of a Type II Error.
In general, the specific value of ß is unknown, but it is
affected by many aspects of the experiment as well as by the statistical
procedures chosen by the researcher.
Obviously, having a powerful experiment is desirable.

*Probability* *distribution*: The possible outcomes of
an experiment along with their associated probabilities. Specific probability distributions, such as
the normal, t, and F have been derived from sets of assumptions about how
scores are generated and the way they are combined. When the assumptions are correct, the probability distribution
may be used to determine the critical value for a significance test. In practice, the distribution is presumed
applicable given the structure of the experiment and the assumptions are not
specifically checked.

*p-value*: The probability associated with an obtained
statistic, such as an F-ratio. If the *p*-value
is less than the significance level, then the result constitutes evidence
against the null hypothesis.

*Ratio scale*: A measuring system having the property
that ratios of the numbers assigned accurately reflect ratios of the magnitudes
of the objects being measured. This
means that a behavior assigned a number of, say, 10, must have twice the
magnitude of a behavior assigned a number of 5. Effectively, this requires that
there be a true zero, a score that means the behavior is absent. Ratio scales
in psychology are almost always physical scales employed to capture behavior,
such as the use of response latency to measure task difficulty. Typical
“psychological” dimensions, e.g., intelligence or attitude, do not have a
“zero” point.

*Significance level*:
A value selected by the researcher to fix the probability of a Type I
Error. Denoted as a,
this arbitrary choice is always a small probability. The most common choice in behavioral research is .05, while .01
is occasionally used. The significance
level determines the critical value
drawn from the statistical table.

* *

*Significant*: A
difference among observed values, such as group means, deemed large enough to
be considered reliable. Determination
of significance comes from a statistical test, and carries with it the
probabilistic characterization of significance level. As used in the statistics context, the word does not have its
everyday connotation of importance.

*Subscript notation*:

X_{1}, X_{2}, X_{3},..., X_{n}

The subscripts attached to each of the X's in a set offer a
handy way to refer to any of them or to all n of them at once. As an example, suppose we have measured the
weights of five objects: 12 grams, 8 grams,
5 grams, 10 grams, 8 grams. If we refer
to the weights as X's, we would say X_{1} = 12, X_{2} = 8, and so on. n would be five, since there are five
scores. What is the sum of X_{3} + X_{4}? (15 grams).

*Substantive* *variable*: An element of theoretical
interest within the experiment. The
researcher plans to determine whether different values of the substantive
variable produce different responses.

*Summation*: The
Greek capital letter sigma, S, is used
as a shorthand way of indicating the sum of a set of numbers expressed with *subscript* *notation*. It sometimes
appears with an index and with limits, e.g.,

This is read as “the sum of the i's, where i goes from 1 to 4”; thus the sum is 10. More frequently, the index appears as a subscript. Using the weights we measured for our discussion of subscript notation,

When no index or limits are furnished, the summation is considered to extend over all of the scores under discussion; so SX = 43 grams.

*Two-tailed test: *A
significance test in which two critical regions are defined, one in each tail
of the probability distribution.
Significance is achieved when the observed value of the test statistic
is more extreme than either critical value.
A two-tailed test is used when the sign of the test statistic is not
specified by the alternative hypothesis.
Thus a two-tailed test would be used if the alternative hypothesis were that
two quantities are unequal, without prior specification of which is
larger. Synonymous with nondirectional
test.

*Type I Error*:* *The researcher's data-based decision
that the null hypothesis is false when it is really true. This incorrect conclusion is not the result
of a mistake in the analysis. By
chance, a “large” dispersion among the means (which should happen with
probability a)
has actually occurred this time.

*Type II Error*: The
researcher's data-based decision that the null hypothesis is true when it is
really false. While this incorrect conclusion
could be the result of bad luck as in the case of a Type I Error, a more
worrisome possibility is that the experiment had low power.

*Variance:* Variance
is a formal measure of the dispersion, or scatter, among the numbers in a set
of data. The variance is a weighted
average of the differences between the numbers, with larger discrepancies
contributing more heavily. Other measures of dispersion are also plausible, but
the variance is the most popular because of its "mathematical
tractability" and its central role in statistical theory. It is convenient to define variance in terms
of the discrepancies from the mean (); basically, this is equivalent to an alternative definition
considering the differences between all pairs of numbers, but is much simpler
computationally. Thus the definition:

Variance =

This definition may be seen to sum the deviations from the mean; squaring each deviation accomplishes the weighting such that large deviations contribute very heavily to the summation. Dividing the sum by n - 1 accomplishes the averaging. Division by n - 1 rather than by n allows the variance in a sample to be an unbiased estimator of the variance in the population (further discussion of estimators is unnecessary for our purposes, but may be found in an intermediate-level text on mathematical statistics). Since data always should be regarded as sample values, this is the appropriate definition in an empirical field. In the world of ANOVA, the term “mean square” is used as a functional equivalent of variance.