Thursday, June 17, 2004

Chi-Square Test

Chi-square test is a test to measure the goodness of a data fitting a given distribution. Suppose you postulated a probability model to a some random experiment and now you want to determine how good this assumption was. What should you do then? The chi-square test is widely used to determine the goodness of fit of a distribution to a set of experimental data.
There are two basic elements in the method. First, a measure is defined between the experimentally observed value and the value that would be expected if the postulated pmf(probability mass function)/pdf (probability density function) were correct. Second, this measure is compared to a threshold to determine if the difference between the observations and expected results is too large. This threshold is defined by the significance level of the test, which is selected by the investigator.
To perform the chi-square method to a set of random data that represents a random variable X, the first step is to partition the sample space Sx into a union of K disjoint intervals. Under the assumption that X has the postulated cdf(cumulative distribution function) we may compute the probability that an outcome falls in the kth interval. Then mk=n.bk is the expected number of outcomes to fall in the kth interval if n repetitions of the random experiment are performed. The chi-square statistic is defined as the weighted difference between the observed number of outcomes, Nk, that fall in the kth interval, and the expected number mk,

Chi-Square Statistic

If the fit is good, then D2 é small. Therefore the hypothesis is rejected id D2 is too large, that is, if 2 ³ tα where tα is a threshold determined by the significance level of the test.
The chi-square test is based on the fact that for large number of repetitions of the experiment, n,then the random variable D2 will have a pdf that is approximately a chi-square pdf with K-1 degrees of freedom. Thus the threshold tα can be computed by finding the point at which

P[X ³ tα] = α,

where X is a chi-square random variable with K-1 degrees of freedom.

K5%1%
13.846.63
25.999.21
37.8111.35
49.4913.28
511.0715.09
612.5916.81
714.0718.48
815.5120.09
916.9221.67
1018.3123.21
1119.6824.76
1221.0326.22

A few values for threshold for the chi-square test.

No comments: