An Analysis Of Variance Test For Random Attrition

David J. Weiss

California State University, Los Angeles

 

 

 

 

 

 

 

Running head: Random Attrition


Abstract

The proper way to treat missing scores in a fixed effect analysis of variance remains a matter of some controversy.  There is agreement, however, on a key assumption underlying all of the proposed techniques.  Inequality in cell sizes should not be attributable to the treatments.  Missing scores should occur haphazardly.  An objective procedure for evaluating the assumption of random attrition is proposed.  Rather than attempting to determine why scores are absent, the researcher should simply try to assess whether systematic effects of the treatments are associated with the cell sizes.  This can be accomplished with ordinary analysis of variance, using the same factorial structure as for the experimental design.  The scores that are present are assigned a “1”, while those that are absent are assigned a “0”.  Any significant F-ratio mitigates against the assumption of random attrition.

 

 

Key words: Inequality of cell size; nonorthogonal analysis of variance; random attrition


An Analysis Of Variance Test For Random Attrition

The seemingly trivial disappearance of a few scores can be severely annoying for a researcher executing that most ordinary of experimental plans, a factorial design with independent groups (fixed effect model).  Unequal cell sizes present an analytic challenge.  Expert opinion on the proper procedures for handling these nonorthogonal designs has shifted over the years (Appelbaum and Cramer, 1974; Cramer & Appelbaum, 1980; Herr & Gaebelein, 1978), and there is still a lack of consensus among the analysis of variance textbooks.  For example, Winer, Brown, and Michels (1991, p. 386) recommend the method of unweighted means, while Maxwell and Delaney (1990, p. 290) suggest this method be avoided, preferring instead a more complex least squares solution.  Keppel (1991, p.291) also recommends unweighted means despite a “slight bias.”

One aspect of the discussion has remained constant.  The algorithmic advice for analysis of experimental factors presupposes that the reasons for the inequality are unrelated to the treatments.  Underlying all of the proposed techniques is the assumption that, as Maxwell and Delaney phrase the requirement, “... the treatments are not differentially responsible for subjects failing to complete the study” (Maxwell & Delaney, 1990, p. 273).  The idea is that accidental attrition produces subsamples that are effectively random samples of the original groups, and thus one can use the data from those who remain to predict the data of those who disappeared.  Classical examples of acceptable reasons for data loss include scheduling mishaps or equipment failure.  On the other hand, suppose volunteers in an obnoxious experimental condition are less likely to complete their task than those experiencing a benign treatment.  One might worry that participants who do face up to the harshness are different from those who avoid it, in ways that may be important to the experiment.  Such bias obviously threatens the validity of any conclusions based on the extant data.

From the researcher’s perspective, applying this criterion for data acceptability can be difficult.  It is not always apparent when attrition has been caused by treatment.  Since the advent of institutional review boards, blatantly aversive experimental conditions are rare.  When the nature of the treatments is less clear-cut, the researcher must make a judgment about the reasons behind the varying numbers.  The basis for that judgment may be unacceptably subjective.

Those who do know the impact of the treatment, namely the volunteers who either didn’t show up or stopped before the end, are poor informants.  Usually they vanish without a trace.  Even when interrogation is possible, demand characteristics govern the responses.  Without mind-reading skills, it is impossible to evaluate the suddenly recalled dental appointments and ailing relatives.  The genteel participants with whom I am familiar would never offend the researcher by decrying the treatment.  One might explore less direct ways of inducing no-shows and dropouts to reveal their reasons, such as anonymous mailers returned to a third party, but such attempts are problematic.  Blaming the treatment may still be seen as an insult to the researcher.

The decision that missing scores have occurred systematically is an important one that should not be made on the basis of casual inspection.  While the reasons underlying particular missing scores cannot generally be known, variation in cell sizes suggests that biased selection has occurred.  This in itself is a substantive conclusion, one that may have sufficient implication to justify its report as a component of the data analysis. 

Consider a weight-loss study in which participants are randomly assigned to combinations of exercise and diet regimens.  Program efficacy may be assessed using the difference between pre- and post-program weight (an arguable procedure (Weiss, Walker, & Hill (1988)), but bear with the example).  If there are disparities among the numbers of people finishing the program and reporting for the final weigh-in for the various treatment combinations, that information may be valuable in planning future programs. 

Similarly informative are disproportions that arise when classificatory variables such as gender influence completion of the experimental task.  If an equal number of women and men is assigned to each of several treatments, but the men are more likely to produce data, then the researcher may wish to explore possible sexism in the procedure or instruments.

Even when disparities do not have obvious meaning, they threaten the validity of any subsequent analysis.  Without a guiding procedure, it is difficult for the researcher to know when to make the painful decision to reject the data.  Therefore I propose that a statistical analysis of cell size inequality routinely precede nonorthogonal analysis of variance.

The decision about attrition may be made using formal hypothesis-testing machinery, with the emphasis shifted from assessment of reasons to assessment of disparity.  The null hypothesis is that the numbers of lost scores for all design cells are equal.  The results of the test of this hypothesis should be reported along with the primary nonorthogonal analysis.

The proposed test is simply an analysis of variance that considers the presence or absence of a score, rather than its magnitude.  Attending to the factorial structure, the analysis calls for replacing each actual score with a "1" and each planned but missing score with a "0".  This ANOVA will therefore have equal cell n, and the planned cell size will be the number of replicates (scores per cell).  The values "1" and "0" are arbitrary, as any other values would yield the same F-ratio; but one and zero constitute a natural code for "present" and "absent".

A significant F-ratio for any source, either main effect or interaction, marks concentrated inequality of attrition.  Specific comparisons may also be used when the researcher anticipates that particular treatment combinations may prove troublesome.  The present-absent coding may also be used to test for systematic trend in the pattern of missing scores. 

Researchers also carry out nonorthogonal analyses when data have not been lost in a literal sense.  In some investigations, personal characteristics such as gender, ethnicity, or health status may be among the factors of interest.  Participants are recruited to fill particular design cells.  In such cases, cell size inequalities reflect differences in ease of recruitment, an issue that may have substantive importance.  Here the null hypothesis is that the numbers of scores present for all design cells are equal.  The slight difference in wording leads to a difference in the way the test is carried out.  It seems appropriate to consider as the number of replicates the largest number of participants obtained for any cell.  The adjustment ensures that the researcher will not be “rewarded” for an overly optimistic projection of cell sizes, which might have the consequence that all cells would fall far short of the goal and the analysis would have little power to detect disparities.

Empirical Example

Researchers seldom report attrition data, since dropouts are seen merely as a minor inconvenience to be handled by the computer.  In contrast, Rudy, Estok, Kerr, and Menzel (1994) carried out an investigation in which they explicitly looked at subject retention.  Their report provided numerical values that can illustrate the value of the proposed analysis.  Participants were either runners or nonrunners recruited by mail for a longitudinal study of exercise.  The type of incentive provided, money or gifts of equal value, was the manipulated variable. 

The numbers of participants who began and completed the study are shown in Table 1.  Initial disparities presumably reflect differential rates of volunteering as well as a difference in the proportion of runners and nonrunners.  The analysis of variance on cell size inequality for the 2x2 design, using 54 (the largest cell size) as the number of replicates, yielded significant F-ratios for Running, F (1,212) = 16.08, p <.001 and for Incentive by Running, F (1,212) = 5.25, p = .02, but not for Incentive, F (1,212) = 2.95, p >.05.  I would conclude that it was easier to recruit and retain runners, for whom money appeared to be a greater lure than gifts.  Substantive conclusions regarding the exercise outcome measure would therefore have to be tempered by this disparity.  Rudy et al. (1994), who were interested purely in retention and not in recruitment effects, similarly reported an advantage for money over gifts.

------------------------

Insert Table 1 here

------------------------

Statistical Issues

The use of ones and zeros as scores raises concern that analysis of variance is inappropriate because the standard assumptions are violated.  A dichotomous variable is not normally distributed.  The variance in a given group is directly linked to the dropout rate for that group (in the range below .5); thus if the hypothesis of unequal attrition is true, heterogeneity of variance is assured.  However, Lunney (1970) empirically examined the robustness of analysis of variance using dichotomous scores with fixed effect models and equal cell sizes.  With regard to both Type I error rate and power, his simulation results supported the use of analysis of variance in situations in which (a) the response proportions are less extreme than 80%-20% if there are at least 20 df for error, and (b) the response proportions are more extreme and there are at least 40 df for error.

Since current practice does not employ any test of random attrition, it seems unlikely that researchers will be overly concerned about Type II errors when the proposed test is used.  Type I errors, which inappropriately call into question the validity of the subsequent nonorthogonal analysis, are likely to be a barrier against acceptance of the proposed procedure.  For small total numbers of planned scores, Lunney’s results show that the observed Type I error rate will be lower than expected; the F test is conservative for small n when a dichotomous variable is used.

Chi-square tests are often used with dichotomous responses.  Cochran (1950), arguing without the benefit of simulation, championed use of the F statistic as though the responses were normally distributed.  A major advantage of the proposed technique is that the analysis of cell sizes has the same factorial structure as the experimental design itself, allowing attrition effects to be localized just as response effects are.

Conclusion

When unequal cell sizes occur, the alternative to testing for equality of attrition is for the researcher to make the strong behavioral assumption that scores have been lost haphazardly.  The computerized statistics package, inexorably imposing a default option, allows the researcher to avoid thinking about the potential problem (Orme & Reis, 1991).  If an experimental variable is selectively inducing participants to withdraw, conclusions based on those who remain may be inaccurate.  One can, of course, blithely hope that those who remain in each treatment are a random subsample of those assigned, but it seems more likely that as the song says, “only the strong survive”.

For dropout rates in the moderate range likely to be of concern, the test of random attrition identifies patterns effectively.  For small dropout rates (the range below .2), or for extremely large rates (above .8, which would seem not to be a pragmatic concern), the test is relatively weak unless the number of planned scores is large.  The positive way to regard this lack of power is to note that the test will not signal alarm in a small-scale study unless systematic attrition is pronounced.


Table 1

Number of participants (data from Rudy et al., (1994))

 

 

Beginning

of study

 

 

End

of study

 

Gifts

Money

 

 

Gifts

Money

Runners

 48

  54

 

Runners

 30

  44

Nonrunners

 34

  30

 

Nonrunners

 24

  22

 

 82

  84

 

 

 54

  66

 

 


References

Appelbaum, M. I., & Cramer, E. M. (1974).  Some problems in the nonorthogonal analysis of variance.  Psychological Bulletin, 81, 335-343.

Cochran, W. F. (1950).  The comparison of percentages in matched samples.  Biometrika, 37, 256-266.

Cramer, E. M., & Appelbaum, M. I. (1980).  Nonorthogonal analysis of variance - Once again.  Psychological Bulletin, 87, 51-57.

Herr, D. G., & Gaebelein, J. (1978).  Nonorthogonal two-way analysis of variance.  Psychological Bulletin, 85, 207-216.

Keppel, G. (1991).  Design and analysis: a researcher’s handbook (3rd ed.)  Upper Saddle River, NJ: Prentice Hall.

Lunney, G. H. (1970).  Using analysis of variance with a dichotomous dependent variable: An empirical study.  Journal of Educational Measurement, 7, 263-269.

Maxwell, S. E., & Delaney, H. D. (1990).  Designing experiments and analyzing data.  Pacific Grove, CA: Brooks/Cole.

Orme, J. G., & Reis, J. (1991).  Multiple regression with missing data.  Journal of Social Science Research, 15, 61-91.

Rudy, E. B., Estok, P. J., Kerr, M. E., & Menzel, L. (1994).  Research incentives: Money versus gifts.  Nursing Research, 43, 253-255.

Weiss, D. J., Walker, D. L., & Hill, D. (1988).  The choice of a measure in a health-promotion study.  Health Education Research: Theory and Practice, 3, 381-386.

Winer, B. J., Brown, D. R., & Michels, K. M. (1991).  Statistical principles in experimental design (3rd ed.).  New York: McGraw-Hill.


Author Note

I thank Stanley P. Azen for a critical review of the manuscript.

            Requests for reprints should be sent to David J. Weiss, Department of Psychology, California State University, Los Angeles, 5151 State University Drive, Los Angeles, CA 90032.  email: dweiss@calstatela.edu.