This page is a simple utility to combine multiple groups of n, mean, and SD into a single group. Two algorithms are offered, both producing the same results, but using different formulae.
Decomposition of means and Standard Deviation:
- For each group :
- Σx = mean * n;
- Σx2 = SD2(n-1)+((Σx)2/n)
- The values are then added together
- tn = sum of all (n)
- tx = sum of all Σx
- txx = sum of all Σx2
- The combine calculations are
- Combined n = tn
- Combined mean = tx / tn
- Combine SD = sqrt((txx-tx2/tn) / (tn-1))
- When calculated the results from the example data should look like the following
| n | mean | SD | | Σx |
Σx2 |
Grp1 | 10 | 11.8 | 2.4 | | 118 | 1444.24 |
Grp2 | 20 | 15.3 | 3.2 | | 306 | 4876.36 |
Grp3 | 15 | 8.4 | 4.1 | | 126 | 1293.74 |
| tn | | | | tx | txx |
Sum | 45 | | | | 550 | 7614.34 |
Combined | 45 | 12.2222 | 4.5028 | | | |
Algorithm described by Cochrane
Cochrane's formula combines two groups of n, mean, and SD (n1, m1, s1 and n2, m2, s2) with the following calculations
- Combined n = n1 + n2
- Combined mean = (n1*m1 + n2*m2) / (n1 + n2)
- Combined Standard Deviation = sqrt(((n1-1)*s1*s1 + (n2-1)*s2*s2 + n1 * n2 / (n1 + n2) * (m1*m1 + m2*m2 - 2 * m1 * m2)) / (n1 + n2 -1));
- When more tha 2 groups are to be combined, the first two groups are combined first, the results are then combined with the third group, then sequentiaaly with each subsequent group.
- When calculated the results from the example data should look like the following
| Individual Groups | | Combined with previous Groups |
| n | Mean | Standard Deviation | | n | Mean | Standard Deviation |
Row 1 | 10 | 11.8 | 2.4 | | 10 | 11.8 | 2.4 |
Row 2 | 20 | 15.3 | 3.2 | | 30 | 14.1333 | 3.3634 |
Row 3 | 15 | 8.4 | 4.1 | | 45 | 12.2222 | 4.5028 |
Please Note :
Combining n, mean, SD from different groups must be used with care, as the statistical assumption is that all the groups are merely sub-samples of a single group, and combining them merely restore them back into the original single group.
In many cases this assumption is faulty, as the groups may be from different populations, and sampled under different environments. It is much safer therefore to combine groups using the meta-analysis algorithm, using the Random Effect Model, available in the Meta-analysis for Comparing Two Unpaired Groups Program Page
, using the mean and Standard Error of the mean for each group.
The Standard Error of the mean is calculated as SE = SD / sqrt(n) of each group.
After combining them using the Random Effect Model, the Standard Deviation can be recalculated as SD = SE * sqrt(tn), where tn is the sum of sample sizes from all the groups. The results should look like the following. I have made bold calculations of SE = SD / sqrt(n) before meta-analysis, and from SD = SE x sqrt(n) after meta-analysis
| n | mean | SD | SE |
Grp1 | 10 | 11.8 | 2.4 | 0.7589 |
Grp2 | 20 | 15.3 | 3.2 | 0.7155 |
Grp3 | 15 | 8.4 | 4.1 | 1.0586 |
MetaAnalysis |
Fixed Effect Model | 45 | 12.6299 | 3.1341 | 0.4672 |
Random Effect Model | 45 | 11.8975 | 12.6684 | 1.8885 |
References :
Altman DG, Machin D, Bryant TN and Gardner MJ. (2000) Statistics with Confidence Second Edition.
BMJ Books ISBN 0 7279 1375 1. p. 28-31
Higgins JPT, Li T, Deeks JJ (editors). Chapter 6: Choosing effect measures and computing estimates of effect. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.0 (updated July 2019). Cochrane, 2019. Available from
https://training.cochrane.org/handbook/current/chapter-06#section-6-5-2 (table 6.5.2a)
The two algorithms for combining means and Standard Deviations from 2 or more groups are presented as R codes, for those who are wish to check the validity of the calculations, incorporate the algorithm into their own applications, or merely interested.
The R code is designed to run from the source panel of R Studio. Those unfamiliar with R should read R Explained Page
to learn how to set up and run R Studio.
The code is presented in maroon, and the results in navy. It is divided into 3 sections. Section 1 contains data entry, section 2 the first option, using the algorithm in StatsToDo, and section 3 the second option, using Cochrane's formulation.
The data. The same data as in the Javascript program is used, and a data frame is created
myDat = ("
n mean sd
10 11.8 2.4
20 15.3 3.2
15 8.4 4.1
")
myDataFrame <- read.table(textConnection(myDat),header=TRUE) # conversion to data frame
#myDataFrame # optional check input
Algorithm 1: Decomposition of mean and SD to ex (Σx) and exx (Σx
2)
nr = nrow(myDataFrame) # number of rows
ex <- rep(0,nr) # array to contain Σx
exx <- rep(0,nr) # array to contain Σx2
tn = 0 # total n
tx = 0 # total Σx
txx = 0 # total Σx2
for(i in 1:nr)
{
ex[i] = myDataFrame$n[i] * myDataFrame$mean[i]
exx[i] = myDataFrame$sd[i]^2 * (myDataFrame$n[i]-1) + ex[i]^2 / myDataFrame$n[i]
tn = tn + myDataFrame$n[i]
tx = tx + ex[i]
txx = txx + exx[i]
}
# concatenate Σx and Σx2 to data frame
myDataFrame$ex <- ex
myDataFrame$exx <- exx
myDataFrame # show data frame
# Calculate combined values
tMean = tx / tn
tSD = sqrt((txx-tx^2/tn)/(tn-1))
print("Combined n, mean, and SD")
print(c(tn,tMean,tSD))
The results are as follows
> myDataFrame
n mean sd ex exx
1 10 11.8 2.4 118 1444.24
2 20 15.3 3.2 306 4876.36
3 15 8.4 4.1 126 1293.74
>
[1] "Combined n, mean, and SD"
[1] 45.000000 12.222222 4.502822
Algorithm 2: Cochrane's formula
Ref: https://handbook-5-1.cochrane.org/chapter_7/table_7_7_a_formulae_for_combining_groups.htm
Using the same data as in the Javascript program and algorithm 1
nr = nrow(myDataFrame) # number of rows
newN <- rep(0,nr) # array for combined n of this and previous group
newMean <- rep(0,nr) # array for combined mean of this and previous group
newSD <- rep(0,nr) # array for combined Standard Deviations of this and previous group
# Prime the first row by copying from data frame
newN[1] = myDataFrame$n[1]
newMean[1] = myDataFrame$mean[1]
newSD[1] = myDataFrame$sd[1]
# designate values of current row as "old"
oldN = newN[1]
oldMean = newMean[1]
oldSD = newSD[1]
# Combining each pair of rows from row 2 onwards
for(i in 2:nr)
{
# data from row
n = myDataFrame$n[i]
mean = myDataFrame$mean[i]
sd = myDataFrame$sd[i]
#combining with old values (Cochrane's algorithm)
newN[i] = oldN + n
newMean[i] = (oldN * oldMean + n * mean) / (oldN + n)
newSD[i] = sqrt(((oldN-1)*oldSD^2 + (n-1)*sd^2 + oldN * n / (oldN + n) * (oldMean^2 + mean^2
- 2 * oldMean * mean)) / (oldN + n -1))
# designate values of current row as "old"
oldN = newN[i]
oldMean = newMean[i]
oldSD = newSD[i]
}
# Concatenate columns of combined values to data frame
myDataFrame$newN <- newN
myDataFrame$newMean <- newMean
myDataFrame$newSD <- newSD
myDataFrame # display dataframe containing original and combined results
The results are as follows
> myDataFrame
n mean sd newN newMean newSD
1 10 11.8 2.4 10 11.80000 2.400000
2 20 15.3 3.2 30 14.13333 3.363427
3 15 8.4 4.1 45 12.22222 4.502822
The final results from both algorithms are the same.