Terms from Introductory Statistics

(from Weiss, D. J., Analysis of Variance and Functional Measurement: A Practical Guide, published by Oxford University Press, 2006)


{TC “Terms from Introductory Statistics”}

A priori: Latin phrase for designated in advance. The usual statistical hypotheses are regarded as having been specified without reference to the data. Tests are carried out as though the hypotheses had been proposed in advance even if the researcher had been lax about specification. The alternative to prior designation is a post-hoc test, so named because the hypothesis to be evaluated is constructed as a result of something interesting having emerged in the data.


Alternative hypothesis:  In general, the proposition expressing the particular way the null hypothesis is held to be false.  Sometimes referred to as the motivated hypothesis, it usually reflects a difference the researcher hopes to demonstrate.  In the ANOVA setting, the usual alternative hypothesis is that the true means of the various groups are unequal.


Confounded: The situation in which the effect of a controlled variable is inextricably mixed with that of another, uncontrolled variable. For example, if we select children from a schoolyard and form experimental groups according to their heights, the groups would be very likely to differ in age as well. 


Critical value: A value taken from a statistical table. It serves as the criterion for determining whether the corresponding data-based statistic is large enough to be considered evidence against the null hypothesis.


Degrees of freedom (df): A structural aspect of the experimental design, determined by the number of scores and not their values.  The general rule is that each observation generates one df.  The number of elements in each subdivision of the design determines the allocation of dfs to structural units.  Some statistical tests, among them F and t, require df specification to determine critical values.


Dependent variable: An index of the behavior of interest. In univariate analysis (ANOVA), there is one behavior selected by the researcher to reflect an interesting psychological property.  In MANOVA (multivariate analysis of variance), there is more than one dependent variable.


Group: A collection of participants given the same experimental treatment.  The group is formed within the research project and need not have existed prior to the experiment. Participants in a group may actually generate their data on an individual basis and never come into contact with one another. It is their scores that are grouped for analytic purposes.


Independent variable: A treatment under experimental control.  The researcher imposes several (at least two) different values of the independent variable on participants, and compares the results to determine whether the variable differentially affects behavior. An experiment must have at least one independent variable, and may have several.


Interval scale: A measuring system having the property that intervals between the numbers assigned accurately reflect intervals between the magnitudes of the objects being measured. The important implication is that a given difference between two scores is independent of location along the scale.


Invalid:  When applied to a measurement scale, a lack of validity means that the numbers assigned do not reflect the true magnitude of the property being measured.  (Pronounced with the second syllable accented, to distinguish this adjective from the homographic noun designating a person in poor health.)


Nominal scale: A measuring system in which the numbers used do not reflect the magnitude of the objects being measured.  The numbers serve only to label or classify, not to order in any way.


Null hypothesis: In the ANOVA context, a statement that there is no difference among a set of true means.  In other statistical settings, the statement similarly refers to an absence of interesting differences.  Because observed means are estimates of the true values, statistical machinery is invoked to determine the validity of the hypothesis.  The fallibility of data implies that some incorrect decisions are inevitable.


One-tailed test:  A significance test in which a statistic obtained from the data is compared to the critical value in a designated tail of the probability distribution.  Used when the alternative hypothesis specifies the sign the test statistic must achieve.  Synonymous with directional hypothesis.  Because F-ratios must be positive, ANOVA employs only one-tailed tests.


Ordinal scale: A measuring system in which the numbers used are monotonically related to the magnitude of the objects being measured.  Monotonicity means the larger the object, the larger the number.


Power: The ability of a statistical test to verify the falsity of the null hypothesis.  Power is expressed as 1 - b, where b is the probability of a Type II Error.  In general, the specific value of ß is unknown, but it is affected by many aspects of the experiment as well as by the statistical procedures chosen by the researcher.  Obviously, having a powerful experiment is desirable.


Probability distribution: The possible outcomes of an experiment along with their associated probabilities.  Specific probability distributions, such as the normal, t, and F have been derived from sets of assumptions about how scores are generated and the way they are combined.  When the assumptions are correct, the probability distribution may be used to determine the critical value for a significance test.  In practice, the distribution is presumed applicable given the structure of the experiment and the assumptions are not specifically checked.


p-value: The probability associated with an obtained statistic, such as an F-ratio.  If the p-value is less than the significance level, then the result constitutes evidence against the null hypothesis.


Ratio scale: A measuring system having the property that ratios of the numbers assigned accurately reflect ratios of the magnitudes of the objects being measured.  This means that a behavior assigned a number of, say, 10, must have twice the magnitude of a behavior assigned a number of 5. Effectively, this requires that there be a true zero, a score that means the behavior is absent. Ratio scales in psychology are almost always physical scales employed to capture behavior, such as the use of response latency to measure task difficulty. Typical “psychological” dimensions, e.g., intelligence or attitude, do not have a “zero” point.


Significance level: A value selected by the researcher to fix the probability of a Type I Error.  Denoted as a, this arbitrary choice is always a small probability.  The most common choice in behavioral research is .05, while .01 is occasionally used.  The significance level determines the critical value drawn from the statistical table.


Significant: A difference among observed values, such as group means, deemed large enough to be considered reliable.  Determination of significance comes from a statistical test, and carries with it the probabilistic characterization of significance level.  As used in the statistics context, the word does not have its everyday connotation of importance.


Subscript notation:

                X1, X2, X3,..., Xn

The subscripts attached to each of the X's in a set offer a handy way to refer to any of them or to all n of them at once.  As an example, suppose we have measured the weights of five objects:  12 grams, 8 grams, 5 grams, 10 grams, 8 grams.  If we refer to the weights as X's, we would say X1 = 12, X2 = 8, and so on.  n would be five, since there are five scores.  What is the sum of X3 + X4? (15 grams).


Substantive variable: An element of theoretical interest within the experiment.  The researcher plans to determine whether different values of the substantive variable produce different responses.


Summation: The Greek capital letter sigma, S, is used as a shorthand way of indicating the sum of a set of numbers expressed with subscript notation.  It sometimes appears with an index and with limits, e.g.,


This is read as “the sum of the i's, where i goes from 1 to 4”; thus the sum is 10.  More frequently, the index appears as a subscript.  Using the weights we measured for our discussion of subscript notation,


When no index or limits are furnished, the summation is considered to extend over all of the scores under discussion; so SX = 43 grams.


Two-tailed test: A significance test in which two critical regions are defined, one in each tail of the probability distribution.  Significance is achieved when the observed value of the test statistic is more extreme than either critical value.  A two-tailed test is used when the sign of the test statistic is not specified by the alternative hypothesis. Thus a two-tailed test would be used if the alternative hypothesis were that two quantities are unequal, without prior specification of which is larger.  Synonymous with nondirectional test.


Type I Error: The researcher's data-based decision that the null hypothesis is false when it is really true.  This incorrect conclusion is not the result of a mistake in the analysis.  By chance, a “large” dispersion among the means (which should happen with probability a) has actually occurred this time.


Type II Error: The researcher's data-based decision that the null hypothesis is true when it is really false.  While this incorrect conclusion could be the result of bad luck as in the case of a Type I Error, a more worrisome possibility is that the experiment had low power.


Variance: Variance is a formal measure of the dispersion, or scatter, among the numbers in a set of data.  The variance is a weighted average of the differences between the numbers, with larger discrepancies contributing more heavily. Other measures of dispersion are also plausible, but the variance is the most popular because of its "mathematical tractability" and its central role in statistical theory.  It is convenient to define variance in terms of the discrepancies from the mean (); basically, this is equivalent to an alternative definition considering the differences between all pairs of numbers, but is much simpler computationally.  Thus the definition:

                        Variance  = 

This definition may be seen to sum the deviations from the mean; squaring each deviation accomplishes the weighting such that large deviations contribute very heavily to the summation.  Dividing the sum by n - 1 accomplishes the averaging.  Division by n - 1 rather than by n allows the variance in a sample to be an unbiased estimator of the variance in the population (further discussion of estimators is unnecessary for our purposes, but may be found in an intermediate-level text on mathematical statistics).  Since data always should be regarded as sample values, this is the appropriate definition in an empirical field.  In the world of ANOVA, the term “mean square” is used as a functional equivalent of variance.