Input-Output ] [ Statistics ] Computations ] Error trapping ] Functions ] Strings ] Filing ] Graphing ] Randomizing ] Sorting ]

 

 


Statistics Exercises

1. For fans of statistics, nothing beats a good chi-square analysis.  The beauty of chi-square from a programmer's perspective is that its definitional formula can be written with one equation:

The equation conceals the complexity rather nicely.  The raw data for a chi-square test of independence consist of a r (rows) x c (columns) matrix of observed frequencies such as this 2x3 set of political affiliations I collected from a class of 100 students: 

 

Republican

Independent

 Democratic

  Total

Females

       11

        25

       35

     71

Males

       15

         7

         7

     29

Total

       26

       32

       42

   100

 The first step in the calculations is to compute the marginal totals; I have presented them beside and beneath the raw data.  These are used to compute a matrix of expected frequencies on a cell-by-cell basis.  To find the expected frequency for a cell, multiply the two marginal totals associated with that cell and divide by the grand total (100 in this example).  Then carry out the rest of the calculations as indicated by the formula, again on a cell-by-cell basis.  Finally, sum the values to get chi-square observed, the test statistic to be compared to the critical value from a table.  The statistic has associated with it a parameter referred to as its degrees of freedom, which is given by the product (r - 1) x (c - 1).  Your program should accept as input the number of rows and columns, then the observed frequencies.  The output should give the observed chi-square statistic and its associated degrees of freedom (abbreviated df).  For the data given above, the value of chi-square observed is 14.287, with 2 df.

2.  One of my favorite mathematical symbols is the factorial, written as k!  The use of an exclamation point as a symbol brings smiles to students, and they invariably pronounce the expression with a shout, as KAY!!!!  Rather prosaically, though, the factorial refers to the product of the given integer and all of the integers below it.  So k! is k (k-1) (k-2)... 1; 3! = 6 and 6! = 720.  Computing a factorial calls for a product loop, which corresponds to a summing loop except that the numbers are multiplied rather than added.  The only tricky part is that the initialization is slightly different.  Write a program which will produce the combination for two integers; this mathematical function gives the number of ways to take n things k at a time.  The formula for combination is:

 

You need to know that 0! is defined as 1. (Hint - factorials can very quickly get to be very large numbers...)

3.  Write a program for the analysis of Bernoulli trials. This is the situation in which there are repeated independent trials, each with a constant probability of success on a single trial.  The definition gives the probability of k successes in n trials, where p is the probability of success on a single trial.  Of course, the word success is used arbitrarily to designate one of the two possible outcomes on each trial.  Bernoulli first expressed this formulation in a gambling context, so that a success might be obtaining a head in a coin toss.  Psychologists have used Bernoulli trials as a model for ignorance in a learning setting.  More directly, we might model an unfortunate student who is just guessing on a multiple-choice exam.  The user will input n, k, and p, and the program should print out the desired probability.  Include nifty subprograms as desired.  Here is the formal expression for the solution to Bernoulli's problem, often referred to as the formula for binomial probability (refer back to Exercise 2 for clarification of the ! symbol):