# Testing proportions

A question often arises in the course of comparing results of medical treatments, “Are the differences in proportions of outcomes of interest observed in different populations due to chance?” A variation on the same theme is, “Is the difference between the hypothesized and observed proportions of outcomes of interest due to chance?”

In the most general case, sources (e.g., [22]) recommend using the **pooled two-proportion \(z\)-test** for ascertaining that the difference in percentages of positive outcomes between two samples is *not* due to chance. It proceeds as follows:

formulate the null hypothesis\[H_0 : p_1=p_2\], where \(p_i, i=1,2\) is the \(i\)-th proportion of outcomes of interest;

calculate proportions of outcomes of interest in each sample\[\hat{p_1}\] and \(\hat{p_2}\);

calculate the total combined (pooled) proportion of outcomes of interest\[\hat{p} = \frac{n_{p_1} + n_{p_2}}{n_1+n_2}\], where \(n_{p_i}, i=1,2\) is the number of positive outcomes in the \(i\)-th sample, \(n_i, i=1,2\) is the number of observations in the \(i\)-th sample;

calculate the standard error of the estimated difference \(\hat{p_1} - \hat{p_2}\):

\(SE = \sqrt{\hat{p}\left( 1-\hat{p} \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}\);

calculate the \(z\)-statistic\[z = \frac{\hat{p_1}-\hat{p_2}}{SE}\]

calculate the \(p-\)value for the \(z\)-statistic;

if the \(p\)-value is lower than the chosen threshold (e.g., \(5\%\)), reject \(H_0\), i.e., assume that the samples come from different distributions, otherwise accept it, i.e., assume that the samples came from the same distribution.

This test is generally applicable when the samples are no bigger than \(10\%\) of the total population and the numbers of successes and failures exceed 5.

Arguments can be made in favor of using an unpooled statistic since the variances of two samples do not have to be the same (see, e.g., [31]). In this case, the null hypothesis is \(H_{0_{unpooled}} : p_1 - p_2 = d_0\), where \(d_0\) is hypothesized difference between the two distributions, and the standard error estimate is

\(SE_{unpooled} = \sqrt{\frac{\hat{p_1}(1-\hat{p_1})}{n_1} + \frac{\hat{p_2}(1-\hat{p_2})}{n_2}} \; ,\)

and the corresponding \(z\)-statistic is \(\frac{\hat{p_1} - \hat{p_2} - d_0}{SE_{unpooled}}\). While there are finer points in arguing for the use of the unpooled test, for most practical cases the samples are assumed to come from the same distribution and the use of the pooled test is justified.

Let us consider a worked example based on the prevalence of blood transfusions in two hospitals (pavilions)^{[1]} for the two-year period between May 1, 2012 and May 1, 2014 as presented in Table A.2.

Applying our (pooled) algorithm, we get: \begin{eqnarray} n_1 = 5,000 \; , \nonumber \\ n_2 = 3,000 \; , \nonumber \\ n_{p_1} = 700 \; , \nonumber \\ n_{p_2} = 200 \; , \nonumber \\ \hat{p_1} = 0.14 \; , \nonumber \\ \hat{p_2} = 0.0667 \; , \nonumber \\ \hat{p_1} - \hat{p_2} = 0.0733 \; , \nonumber \\ \hat{p} = \frac{700+200}{5,000+3,000} = 0.1125 \; , \nonumber \\ SE = \sqrt{0.1125(1-0.1125)\left( \frac{1}{700} + \frac{1}{200} \right)} = 0.028\; , \nonumber \\ z = \frac{0.0733}{0.028} = 2.64 \; \nonumber \\ p-value = 2 \Phi(z) = 2 \times 0.0042 = 0.0084 \; , \nonumber \end{eqnarray}

where \(\Phi(z)\) is the standard normal cumulative distribution function and the coefficient \(2\) comes from the two-tailed test.

Given the data in the example and the assumptions made in Section [Methodology], the null hypothesis \(H_0\) that the difference in proportions of patients receiving blood transfusions between pavilions A and B are highly statistically significant (at higher than 99% significance level). We must therefore reject the null hypothesis \(H_0\) and adopt the alternative \(H_A\) , i.e., assume that those proportions come from different distributions.

## Notes

- ↑ We will be using the terms “pavilion” and “hospital” interchangeably throughout this document.