CombineMeanSD

Content Disclaimer
Copyright @2020.
All Rights Reserved.

StatsToDo: Combining n, mean, and Standard Deviation from Multiple Groups

Links : Home Index (Subjects) Contact StatsToDo

Explanations
This page is a simple utility to combine multiple groups of n, mean, and SD into a single group. Two algorithms are offered, both producing the same results, but using different formulae.
Decomposition of means and Standard Deviation:

For each group :

Σx = mean * n;
Σx² = SD²(n-1)+((Σx)²/n)

The values are then added together

tn = sum of all (n)
tx = sum of all Σx
txx = sum of all Σx²

The combine calculations are

Combined n = tn
Combined mean = tx / tn
Combine SD = sqrt((txx-tx²/tn) / (tn-1))

When calculated the results from the example data should look like the following

n mean SD      Σx Σx²

Grp1 10 11.8 2.4 118 1444.24

Grp2 20 15.3 3.2 306 4876.36

Grp3 15 8.4 4.1 126 1293.74

tn tx txx

Sum 45 550 7614.34

Combined 45 12.2222 4.5028

Algorithm described by Cochrane
Cochrane's formula combines two groups of n, mean, and SD (n1, m1, s1 and n2, m2, s2) with the following calculations

Combined n = n1 + n2
Combined mean = (n1*m1 + n2*m2) / (n1 + n2)
Combined Standard Deviation = sqrt(((n1-1)*s1*s1 + (n2-1)*s2*s2 + n1 * n2 / (n1 + n2) * (m1*m1 + m2*m2 - 2 * m1 * m2)) / (n1 + n2 -1));
When more tha 2 groups are to be combined, the first two groups are combined first, the results are then combined with the third group, then sequentialy with each subsequent group.
When calculated the results from the example data should look like the following

Individual Groups    Combined with previous Groups

n Mean Standard Deviation n Mean Standard Deviation

Row 1 10 11.8 2.4 10 11.8 2.4

Row 2 20 15.3 3.2 30 14.1333 3.3634

Row 3 15 8.4 4.1 45 12.2222 4.5028

Please Note :
Combining n, mean, SD from different groups must be used with care, as the statistical assumption is that all the groups are merely sub-samples of a single group, and combining them merely restore them back into the original single group.
In many cases this assumption is faulty, as the groups may be from different populations, and sampled under different environments. It is much safer therefore to combine groups using the meta-analysis algorithm, using the Random Effect Model, available in the MetaAnalysis program, using the mean and Standard Error of the mean for each group.
The Standard Error of the mean is calculated as SE = SD / sqrt(n) of each group.
After combining them using the Random Effect Model, the Standard Deviation can be recalculated as SD = SE * sqrt(tn), where tn is the sum of sample sizes from all the groups. The results should look like the following. I have made bold calculations of SE = SD / sqrt(n) before meta-analysis, and from SD = SE x sqrt(n) after meta-analysis

n mean SD SE

Grp1 10 11.8 2.4 0.7589

Grp2 20 15.3 3.2 0.7155

Grp3 15 8.4 4.1 1.0586

MetaAnalysis

Fixed Effect Model 45 12.6299 3.1341 0.4672

Random Effect Model 45 11.8975 12.6684 1.8885

References :
Altman DG, Machin D, Bryant TN and Gardner MJ. (2000) Statistics with Confidence Second Edition. BMJ Books ISBN 0 7279 1375 1. p. 28-31
Higgins JPT, Li T, Deeks JJ (editors). Chapter 6: Choosing effect measures and computing estimates of effect. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.0 (updated July 2019). Cochrane, 2019. Available from https://training.cochrane.org/handbook/current/chapter-06#section-6-5-2 (table 6.5.2a)
Javascript Program

The data is a 3 column numerical data
   Column 1 is sample size n
   Column 2 is mean value
   Column 3 is Standard Deviation value

R Codes
The two algorithms for combining means and Standard Deviations from 2 or more groups are presented as R codes, for those who are wish to check the validity of the calculations, incorporate the algorithm into their own applications, or merely interested.
The R code is designed to run from the source panel of R Studio.
The code is presented in maroon, and the results in navy. It is divided into 3 sections. Section 1 contains data entry, section 2 the first option, using the algorithm in StatsToDo, and section 3 the second option, using Cochrane's formulation.
The data. The same data as in the Javascript program is used, and a data frame is created
myDat = (" n mean sd 10 11.8 2.4 20 15.3 3.2 15 8.4 4.1 ") myDataFrame <- read.table(textConnection(myDat),header=TRUE) # conversion to data frame #myDataFrame # optional check input
Algorithm 1: Decomposition of mean and SD to ex (Σx) and exx (Σx²)
nr = nrow(myDataFrame) # number of rows ex <- rep(0,nr) # array to contain Σx exx <- rep(0,nr) # array to contain Σx² tn = 0 # total n tx = 0 # total Σx txx = 0 # total Σx² for(i in 1:nr) { ex[i] = myDataFrame$n[i] * myDataFrame$mean[i] exx[i] = myDataFrame$sd[i]^2 * (myDataFrame$n[i]-1) + ex[i]^2 / myDataFrame$n[i] tn = tn + myDataFrame$n[i] tx = tx + ex[i] txx = txx + exx[i] } # concatenate Σx and Σx² to data frame myDataFrame$ex <- ex myDataFrame$exx <- exx myDataFrame # show data frame # Calculate combined values tMean = tx / tn tSD = sqrt((txx-tx^2/tn)/(tn-1)) print("Combined n, mean, and SD") print(c(tn,tMean,tSD))
The results are as follows
> myDataFrame n mean sd ex exx 1 10 11.8 2.4 118 1444.24 2 20 15.3 3.2 306 4876.36 3 15 8.4 4.1 126 1293.74 > [1] "Combined n, mean, and SD" [1] 45.000000 12.222222 4.502822
Algorithm 2: Cochrane's formula
Ref: https://handbook-5-1.cochrane.org/chapter_7/table_7_7_a_formulae_for_combining_groups.htm
Using the same data as in the Javascript program and algorithm 1
nr = nrow(myDataFrame) # number of rows newN <- rep(0,nr) # array for combined n of this and previous group newMean <- rep(0,nr) # array for combined mean of this and previous group newSD <- rep(0,nr) # array for combined Standard Deviations of this and previous group # Prime the first row by copying from data frame newN[1] = myDataFrame$n[1] newMean[1] = myDataFrame$mean[1] newSD[1] = myDataFrame$sd[1] # designate values of current row as "old" oldN = newN[1] oldMean = newMean[1] oldSD = newSD[1] # Combining each pair of rows from row 2 onwards for(i in 2:nr) { # data from row n = myDataFrame$n[i] mean = myDataFrame$mean[i] sd = myDataFrame$sd[i] #combining with old values (Cochrane's algorithm) newN[i] = oldN + n newMean[i] = (oldN * oldMean + n * mean) / (oldN + n) newSD[i] = sqrt(((oldN-1)*oldSD^2 + (n-1)*sd^2 + oldN * n / (oldN + n) * (oldMean^2 + mean^2 - 2 * oldMean * mean)) / (oldN + n -1)) # designate values of current row as "old" oldN = newN[i] oldMean = newMean[i] oldSD = newSD[i] } # Concatenate columns of combined values to data frame myDataFrame$newN <- newN myDataFrame$newMean <- newMean myDataFrame$newSD <- newSD myDataFrame # display dataframe containing original and combined results
The results are as follows
> myDataFrame n mean sd newN newMean newSD 1 10 11.8 2.4 10 11.80000 2.400000 2 20 15.3 3.2 30 14.13333 3.363427 3 15 8.4 4.1 45 12.22222 4.502822

The final results from both algorithms are the same.