Related link :
Cluster Randomisation Data Analysis Program Page
Sample Size for Cluster Randomisation Data Analysis Program Page
Introduction
Sample Size
Analysis
Two Means
Two Proportions
Technical Considerations
References
Cluster randomisation experiments are used in situations where individual research subjects cannot easily be randomly allocated
to receive different treatments. In this situation, research subjects are firstly grouped into clusters, and
experimental treatments are randomly allocated to clusters so that all members of a cluster receive the same experimental treatment.
Some of the reasons for using the cluster randomisation experiments are
 When programs of treatment must be applied to a group or a community.
 In agriculture, when treatments, such as adding fertilisers or pest control, must be applied to a block
such as a paddock, a field, or a farm, and cannot be applied to individual plants
 In education, when interventions are usually applied to a class or a school, and not to individual students
 In hospitals, where improvements of care are usually applied to a ward or a hospital, and not to individual patients
 In public health initiatives when an intervention is applied to a community or a region, and not to individual
health care consumers
 When the effect of intervention may contaminate across treatment groups
 The introduction of new knowledge or techniques to one group which are likely to be copied and used by those in the other group
 Difficult administrative situations where which subjects belonging to which groups may be confused, such
as patients in a hospital ward or students in a class.
The main difference between individual randomisation and cluster randomisation is that members of a cluster
may be more similar to each other than to those from different clusters, so the effects of experimental treatment and
cluster membership are confounded. There is therefore a need to introduce a correction for this possible confounding,
the parameter Intraclass Correlation Coefficient ρ.
ρ, conceptually, is the average of correlations between all possible pairs within a cluster. If subjects are
randomly allocated to different clusters and the environments of all clusters are identical, then there should be no correlation
between cases in any clusters, and ρ=0. If all members of a cluster produce the same results, then ρ=1.
In both estimating sample size requirement and in the analysis of the data, therefore, the Intraclass Correlation Coefficient
ρ is estimated. This is used to adjust the results of standard statistical procedures that are based on individual
randomisation, so that the final results are appropriate for cluster randomisation.
Cluster randomisation is a large subject, and StatsToDo provides but the two most basic and commonly used
models, that of two group comparison for normally distributed measurements and binomially distributed proportions, as
carried out by the algorithms in the Cluster Randomisation Data Analysis Program Page
.
This section supports the programs on the Cluster Randomisation Data Analysis Program Page
.
To calculate sample size requirements for the cluster randomisation model, three steps are required.
 To obtain the sample size based on individual randomisation
 Estimate or provide the Intraclass Correlation Coefficient ρ
 Combine the two to obtain either number of clusters required, or the sample size within each cluster
Sample size based on individual randomisation (ssiz_{individual})
This can be obtained from the commonly available sample size estimation algorithms. StatsToDo provides
some of these programs and these can be found by searching via the indexes (see links at top of the page).
Sample size for 2 sets of normally distributed measurements can be obtained from tables in the
Sample Size for Unpaired Differences Tables Page
, and those for 2 sets of binomially distributed proportions in the
Sample size for Two Proportions Explanations and Tables Page
The Intraclass Correlation Coefficient (ρ)
ρ is used to adjust ssiz_{individual} to obtain the cluster number (k) and sample size
within each cluster (m).
ρ can be obtained by one of 3 methods at the planning stage
 As ρ is calculated as part of data analysis in a cluster randomisation experiment, the value
can be copied from published reports using similar population and clustering methodologies.
 ρ can be estimated in a pilot study, where a number of clusters are studied before experimentation begins.
Algorithms in the Sample Size for Cluster Randomisation Data Analysis Program Page
can be used to estimate ρ from pilot data.
 ρ can be arbitrarily nominated, based on experience and an understanding of the nature of the clusters
to be studied. For example, in agricultural experiment where clusters are merely divisions of existing facilities into lots,
and individuals are randomly assigned to the clusters, then ρ can be expected to have a very small value, and can be
assigned a zero (0) value. Where clustering is based on natural grouping such as family or schools, individuals within the
cluster are more similar, and ρ can be expected to have higher values.
Any of the 3 methods can be used, and each has some advantages and pitfalls.
 The copying of ρ from an existing study provides the most precise estimation, but this may be invalid if the
nature of the population and the manner of clustering are not replicated in the current study
 The pilot simulation, particularly if using the same environment, population and method of clustering as that intended for
the final study, provides the most appropriate coefficient. However this is time and resource intensive, and presumes an
uniform ρ across all clusters which may not be realistic
 An arbitrary nomination of ρ is the easiest, but its appropriateness depends on a knowledge and experience not only
with the research model, but also on the nature of the population to be studied, and the variability of the measurements
to be made.
Adjusted sample size
The following abbreviations are used.
 ssiz_{individual} is the sample size per group required if randomisation is based on individual subjects
 ρ is the Intraclass Correlation Coefficient
 k is the number of clusters per group
 m is the sample size within each cluster
If k is predetermined then m = ssiz _{individual} (1ρ) / (kρ ssiz _{individual})
If m is predetermined then k = ssiz _{individual} (1 + (m1) ρ) / m
The data collected from a cluster randomisation model is usually summarised. For example, when the outcome is a
normally distributed measurement, the sample size, mean, and Standard Deviation from each cluster is used for analysis.
When the outcome is a binomially distributed parameter (no/yes, true/false), the numbers of cases with positive and
negative outcomes from each cluster are usually used for analysis.
The mean value of a cluster, or the proportion of positive responses in a cluster, can be used as a measurement, and
these can be used for statistical analysis using standard statistical algorithms, using the cluster as the basic
sampling unit.
A concern of such an approach is that it assumes all clusters to have the same sample size. This however is usually the
case in cluster randomisation experiments as all clusters should have the same sample size at start, and only data loss
during collection results in minor differences in sample size from different clusters.
Donner's book suggests 3 methods of analysis at the cluster level that can be used.
 The standard two samples t test, using the cluster mean (normally distributed measurement) or proportion positive
(binomially distributed proportions) as the measurement. Although the t test assumes that the measurements are
normally distributed, the statistics is robust in that the difference between the assumptions of normal and other
distributions are trivial in most cases. The two samples t test is provided in the program pages for analysis
 The MannWhitney U Test is less powerful than the t test but makes no assumptions on the distribution pattern of the values,
requiring only that they are ordered.
 The Permutation Test has 100% power, and is particularly useful when the number of clusters involved are
very few. The only problem being that the usual desk top computers cannot cope with data containing many clusters
(StatsToDo can only handle a maximum of about 25 clusters per group for Permutation Test).
When the outcome in a cluster randomisation experiment is a normally distributed measurement, the results from the two samples
t test can be modified by the Intraclass Correlation Coefficient ρ, so that the Standard Error of the difference, the
Probability of Type I error α and 95% confidence interval of the difference are corrected for correlations. T
When the outcome in a cluster randomisation experiment is a binomially distributed proportion, three additional statistical
tests can be applied.
 The first is the corrected Perason's Chi Squres Test. The standard Chi Square test is applied, then modified by the
Intraclass Correlation Coefficient ρ
 The second is the corrected difference in proportions. The difference in proportions between the two groups and its
Standard Error are calculated. The Standard Error is then modified by the Intraclass Correlation Coefficient ρ. From these,
the 95% confidence interval of the difference in proportion between the two groups can be calculated.
 The third is the corrected Odds Ratio between the two groups. The Odds Ratio and its
Standard Error are calculated. The Standard Error is then modified by the Intraclass Correlation Coefficient ρ. From these,
the 95% confidence interval of the Odds Ratio can be calculated.
The test for differences between two groups, both for normally distributed measurements and for proportions, are provided in
the se tests are provided in the Cluster Randomisation Data Analysis Program Page
.
The conduct of a cluster randomisation exercise is best demonstrated with the following example. Please note that the data used is computer generated to demonstrate the procedures, and not real observation.
We wish to conduct a controlled trial on the effect of introducing additional fertiliser to the feeding ground of calves on
their weight gain. As we cannot randomize the calves because fertilisers can
only be applied to paddocks, we will use the cluster randomisation model, using each paddock as the unit of randomisation.
Step 1 : Find ssiz_{individual}
We use the following parameters to determine the sample size based on individual randomisation
 Type I Error α = 0.05
 Power (1  β) = 0.8
 We expect that the calves would gain some 15Kg over a 3 months period, with the Standard Deviation some 6Kg. Our
research hypothesis is that with additional fertilisers, the calves would gain an extra 3 Kg during that period.
The effect size is therefore roughly half a Standard Deviation (0.5)
We looked up the sample size requirement table in the Sample Size for Unpaired Differences Tables Page
, and found that we will
require 65 calves per group if we were to randomise on individual calves. In other words, ssiz_{individual} = 65
Step 2 : Find Intraclass Correlation Coefficient ρ
n  mean  SD 
20  15.8  4.8 
22  16.3  5.3 
25  15.5  6.2 
20  16.4  8.9 
We decided to obtain the likely Intraclass Correlation Coefficient ρ for our experiment by a pilot study,
where we placed some calves into a number of paddocks, and measure their growth (weight gain over 3 months),
and found the results as in the table to the left.
We use the first program in the Sample Size for Cluster Randomisation Data Analysis Program Page
to obtain a workable Intraclass
Correlation Coefficient, which is ρ = 0.0889. This is, as expected a very low level of correlation, as the calves are
allocated at random.
Step 3 : Sample Size Adjustment
ssiz_{ind}  ρ  k  m  ssiz_{clus} 
65  0.0881  30  3  90 
65  0.0881  28  3  84 
65  0.0881  26  3  78 
65  0.0881  24  4  96 
65  0.0881  22  4  88 
65  0.0881  20  5  100 
65  0.0881  18  5  90 
65  0.0881  16  6  96 
65  0.0881  14  8  112 
65  0.0881  12  10  120 
65  0.0881  10  14  140 
65  0.0881  9  19  171 
65  0.0881  8  27  216 
65  0.0881  7  47  329 
65  0.0881  6  217  1302 
k=number of clusters per group m=sample size per cluster ssiz_{clus}=total sample size per group 
Using the third program in the Sample Size for Cluster Randomisation Data Analysis Program Page
, we can view the sample size required in each
cluster for a range of cluster numbers, as shown in the table to the right.
It can be seen that, as the number of paddocks (clusters) to be used decreases, the number of calves per paddock
(sample size per cluster) increases exponentially. The sample size reaches infinity when the number of cluster
is below 6.
Although the sample size per cluster continues to decrease as the number of clusters increases, the changes in
the total group sample size becomes increasingly minor.
From such an analysis, and depending on the costs (financial, time, and effort) of different aspects of the experiment,
the most efficient combination with the same power can be selected. In this example, the best combinations
would seem to be from 9 paddocks with 19 calves per paddock (total 171 calves per group) to 18 paddocks with
5 claves per paddock (total 90 calves per group), depending on whether managing calves or paddocks to be more costly.
Step 4 : Analysis of Results
Grp  n  mean(Kg)  SD 
1  20  21.5  5.9 
1  20  18.8  4.7 
1  20  18.6  4.8 
1  20  19.5  5.4 
1  20  23.3  6.1 
1  20  21.0  4.2 
1  20  19.6  6.9 
1  20  22.3  6.4 
1  20  20.1  5.6 
2  20  15.3  6.1 
2  20  15.7  5.4 
2  20  18.8  6.2 
2  20  16.3  7.7 
2  20  17.1  5.3 
2  20  18.6  6.6 
2  20  16.0  5.3 
2  20  16.9  6.5 
2  20  16.8  5.9 
Following calculations in the previous section, we decided to use 9 paddocks (clusters) per group,
placing 20 calves in each paddock. Those in Grp 2
were controls, and additional fertilisers were added to the Grp 1 paddocks. The calves were weighed
at the beginning of the experiment and 3 months later. The weight gain in Kg was the outcome. The table to the left
shows means and SDs in weight gain from each paddock (cluster). The following statistical tests can be, and were
carried out.
 Test 1 : MannWhitney U Test
Using the nonparametric comparison program for data from the Unpaired Difference Programs Page
, the two groups are compared.
The data are in 2 columns, col 1 the group designation (col 1 of table), and Col 2 the mean values from each cluster (col 3 of table), and each row from each cluster. The results are U = 15.9153 p<0.05.
We can therefore conclude that there is a statistically significant difference in the cluster mean values between the two
groups.
 Test 2 : The Permutation Test
The two groups can also be compared using the Permutation Test algorithm of the Unpaired Difference Programs Page
.
The data input is the same as that for the MannWhitney U Test. The results are a Type I Error (α) of 0.0032
(1 tail) or 0.0065 (2 tail). We can therefore conclude that there is a statistically significant difference in the cluster
mean values between the two groups.
 Test 3 : The Two Samples t Test
The cluster randomisation for two means program in the Cluster Randomisation Data Analysis Program Page
is used. The data
are in 4 columns as in the table. Col 1 the group designation, Col 2 the sample size of the cluster, Col 3 cluster mean value,
Col 4 cluster Standard Deviation, each row from a cluster. The results are as follows.
Grp  Nclusters  Mean  SD 
1  9  20.5222  5.697 
2  9  16.8333  6.1229 
The mean and SD of the two groups are as in the table to the left. The difference between the two means is 3.7Kg,
the Standard Error of the difference is 0.62, and probability of Type I error (α) p<0.0001. However these values should
not be used as they have not been corrected by the Intraclass Correlation.
Intraclass Correlation Coefficient, calculated from the data, is ρ = 0.0084.
The adjusted Standard Error of the difference is now 0.67, and the 95% confidence interval of the difference
is 2.27 to 5.11 Kgs.
Please note that the correction produced only trivial differences. This is because ρ has a very low value in this example,
and the amount of correction is related to the size of ρ
With a small ρ all statistical tests resulted in similar conclusions, and in this example we can conclude that
adding fertilisers to the feeding paddocks increases the growth of calves.
Please note again that the data is computer generated to demonstrate the procedures, and does not represent any real observations.
We wish to conduct an experiment on the effect of introduce a student encouragement protocol into schools to reduce absenteeism,
defined as having missed at least 1 scheduled class in a term.
Given that such a protocol has to be introduced to a whole school, we decide to use the cluster randomisation model.
Step 1 : Find ssiz_{individual}
We use the following parameters to obtain the required sample size as if randomisation is based on individuals.
 Type I Error α = 0.05
 Power (1  β) = 0.8
 We estimate the average absenteeism rate in a school is about 30% (0.3), and expect the encouragement protocol to
reduce this to 15% (0.15)
We looked up the sample size requirement table in the Sample size for Two Proportions Explanations and Tables Page
, and found that we will require
121 students per group. In other words, ssiz_{individual} = 121
Step 2 : Find Intraclass Correlation Coefficient ρ
Pos  Neg 
30  70 
28  74 
32  70 
35  72 
We decided to carry out a pilot simulation, by examining the absenteeism in a number of schools, and
found the results as in the table to the left (Pos=number with absenteeism present, Neg = number with no absenteeism).
Please note : In a real pilot study, many more clusters would be used to obtain a stable and robust ρ.
n  Mean  SD 
100  0.3  0.4606 
102  0.2745  0.4485 
102  0.3137  0.4663 
107  0.3271  0.4714 
We used the second program of the Sample Size for Cluster Randomisation Data Analysis Program Page
to calculate
the Intraclass Correlation Coefficient ρ. The program firstly convert the number of positives and negatives into
1s and 0s, then calculate the means and SDs for each cluster, as shown in the table to the right. From this table,
we estimate the Intraclass Correlation Coefficient ρ to be 0.197
Step 3 : Sample Size Adjustment
ssiz_{ind}  ρ  k  m  ssiz_{clus} 
121  0.197  30  16  480 
121  0.197  29  19  551 
121  0.197  28  24  672 
121  0.197  27  31  837 
121  0.197  26  45  1170 
121  0.197  25  84  2100 
121  0.197  24  597  14328 
k=number of clusters per group m=sample size per cluster ssiz_{clus}=total sample size per group 
Although we can use the third or fourth programs of the Sample Size for Cluster Randomisation Data Analysis Program Page
to
calculate one at a time the required number of individual within each cluster when the number of cluster in each group
is predetermined, or the number of individuals in each cluster when the number of clusters per group is predetermined,
we can also use either program to test a range of combination.
Using the third program to determine the number of individuals in each cluster when the number of clusters are predetermined,
we produced the results as shown in the table to the right.
It can be seen that, as the number of schools (clusters) to be included decreases, the number of students per school
(sample size per cluster) increases exponentially. The sample size reaches infinity when the number of cluster
is below 24.
Although the sample size per cluster continues to decrease as the number of clusters increases, the changes in
the total group sample size becomes increasingly minor.
As the main costs of such a program are related to selecting and introducing the encouragement protocol into schools,
the decision can be based on finding the minimum number of schools (clusters) that contain sufficient number of
individuals (students). As most schools have more than 84 students, a reasonable decision can be to have
25 schools (clusters) per group.
Step 4 : Analysis of Data
Grp  N_{Pos}  N_{neg}  Proportion 
1  20  116  0.15 
1  9  132  0.06 
1  24  118  0.17 
1  25  107  0.19 
1  30  92  0.25 
1  30  73  0.29 
1  22  121  0.15 
1  28  116  0.19 
1  6  136  0.04 
1  27  120  0.18 
1  13  123  0.10 
1  30  93  0.24 
1  19  99  0.16 
1  12  113  0.10 
1  26  116  0.18 
1  10  109  0.08 
1  35  98  0.26 
1  27  117  0.19 
1  18  92  0.16 
1  8  139  0.05 
1  10  96  0.09 
1  6  139  0.04 
1  7  107  0.06 
1  10  110  0.08 
1  21  111  0.16 
2  29  115  0.20 
2  15  98  0.13 
2  35  89  0.28 
2  37  67  0.36 
2  43  94  0.31 
2  43  80  0.35 
2  33  108  0.23 
2  40  87  0.31 
2  12  103  0.10 
2  39  83  0.32 
2  21  129  0.14 
2  43  75  0.36 
2  29  85  0.25 
2  19  124  0.13 
2  39  70  0.36 
2  17  94  0.15 
2  50  78  0.39 
2  40  93  0.30 
We recruited 50 schools that had at least 84 students each that are available for the study, and randomly divided
the schools into 2 groups. Those in Grp 2 were controls and those in Grp 1 were introduced to the encouragement program.
The number of absentees in one term were collated from each school and presented in the table to the left, and the statistical
calculations can be as follows.
 Test 1 : MannWhitney U Test :
Using the nonparametric comparison program for data from the Unpaired Difference Programs Page
, the two groups are
compared. The data are in 2 columns. Col 1 the group designation (col 1 of table) and Col 2 the proportion with absenteeism in
that school (cluster) (col 4 of table). The results are U = 3.0034 p = 0.0013. We can therefore conclude that there is a
statistically significant difference in the proportion of absenteeism in the two groups.
 Test 2 : The Permutation Test
The two groups can also be compared using the Permutation Test algorithm of the Unpaired Difference Programs Page
.
The data input is the same as that for the MannWhitney U Test. In this study however, the test unfortunately failed,
because 25 clusters in each of two groups exceeded the capacity of the program in StatsToDo to permute all possible
combinations. This demonstrate the limitations of the Permutation Test, that despite its great power, it can only handle a
limited sample size.
 Test 3 : The Two Samples t Test
Grp  Npos  Nneg  Ntotal  Proportion pos 
1  473  2793  3266  0.1448 
2  716  2407  3123  0.2293 
Total  1189  5200  6389  0.1861 
The summary table is as shown to the right. Difference (Mean Prop1  Mean Prop2) = 0.086,
SE = 0.0247, df = 48, t = 3.4817, p = 0.0011.
Although this shows a significant difference between the two groups, the results should be considered only as preliminary
as no allowance is made of the Intraclass Correlation.
 Test 4 : The unadjusted and adjusted Pearson's Chi Squares Test
The Intraclass Correlation Coefficient ρ is calculated from the data. ρ = 0.0425
Unadjusted Chi Sq = 49.7836 df = 1 p <0.0001
Adjusted Chi Sq = 7.721 df = 1 p = 0.0055
The adjusted (corrected by ρ) Chi Square should be used and this indicates that the proportion of
absenteeism in the two groups are significantly different
 Test 4 : Confidence Interval of the Difference
Difference in proportions (Grp1  Grp2) = 0.0844, Adjusted SE_{diff} = 0.0246
95% CI = Diff ± 1.96SE = 0.1328 to 0.0361
This shows that the proportion of absenteeism in group 1 is 0.04 (4%) to 0.13 (13%) lower than in group 2
 Test 5 : Confidence Interval of Odds Ratio Between the Two Groups
Odds Ratio = 0.5693 Log(Odds Ratio) LOR = 0.5633
Adjusted (corrected for ρ) SE_{LOR} = 0.1665
95% CI = Exp(LORSE) to Exp(LOR + SE) = 0.4108 to 0.789
This is a different expression to Test 4, Odds Ratio instead of difference in proportions. The Odds of
absenteeism from group 1 is 0.41 to 0.79 that from group 2
The adjustment of sample size for individual randomisation ssiz _{intividual} by the Intraclass Correlation coefficient
ρ uses two formulae, one to estimate the number of clusters per group needed if the sample size per cluster is predetermined, and
the other the sample size in each cluster needed if the number of cluster is predetermined.
The formulae for these calculations are obtained from the text book by Pinol et.al. (see references), and
the results are checked against the tables provided in that text book. However the following three points should be noted.
 Minor discrepancies in results exists between the calculation from StatsToDo and the statistical tables in
the text book. These are caused by different rounding errors. As sample size are approximate estimates in any case, these
discrepancies can be accepted.
 Pinol provided no algorithm for the calculation of ρ and StatsToDo uses algorithms described in the book by
Donner and Klar (see references) for this purpose.
 In the calculation of required cluster number or sample size in clusters, results from calculations are real numbers, but
they are rounded upwards to the next integer. For example, using ssiz_{individual} of 121 and ρ of 0.197,
calculations using intracluster sample size of 45,50,55,60 will result in required cluster numbers of 25.9, 25.8, 25.6, and
25.5, all rounded up to 26 clusters per group. Users should be aware of this rounding effect and not be confused
by the discrepancies in the results.
All other calculations are based on the text book by Donner and Klar (see references) and the following points should be noted.
 The calculations for the Intraclass Correlation Coefficients ρ, both for sample size determination, and for data analysis,
are checked against the examples provided in the text book. Discrepancies exists between results produced by
the program in StatsToDo and those by the text book. However, these discrepancies disappeared if, during
calculation, all intermediary results are rounded to only 3 decimal places. In other words, the authors of the text book
seemed to have calculated by hand to 3 decimal places, while the computer program uses 32 bit number representation to more
than 12 decimal places, and this causes the discrepancies in the results. StatsToDo recommendations that results
from both sources be accepted.
 The results of data analysis for binomial data from StatsToDo were matched against the example provided
in Donner's text book, and found to be the same except for minor rounding errors as previously described.
 Donner's text book provided no data example for the analysis of data using normally distributed measurements, so there is no proof
(matched against known results) that the computer program from StatsToDo is correct. However, using computer
generated data, the results obtained by the programs are close to that expected, so the chances are that the algorithms
provided by StatsToDo are correct. However users should be aware of the absence of proof of correctness and
make their own decisions whether to rely on the results.
Machin D, Campbell M, Fayers, P, Pinol A (1997) Sample Size Tables for
Clinical Studies. Second Ed. Blackwell Science IBSN 0865428700 p. 2728
Donner A and Klar N (2000) Design and Analysis of Cluster Randomisation Trials
in Health Research. Arnold London ISBN 0 340 69153 0
 Calculation for Intraclass Correlation Coefficient ρ and sample size for normally distributed measurements. p. 5278
 Calculation for Intraclass Correlation Coefficient ρ and sample size for binomially distributed proportions. p. 6263
 Analysis of data : cluster level analysis. p. 8789
 Calculation for Intraclass Correlation Coefficient ρ and data analysis for normally distributed measurements. p. 111116
 Calculation for Intraclass Correlation Coefficient ρ and data analysis for binomially distributed proportions. p. 85  87
