Comp2SumRegs

Content Disclaimer
Copyright @2020.
All Rights Reserved.

StatsToDo: Compare Two Summary Regressions (Covariance Analysis)

Links : Home Index (Subjects) Contact StatsToDo

Explanations
The program on this page performs the Analysis of Covariance for parametric measurements between two groups, with one covariance. It is the same program as that in Compare2Regs.php, but uses summary data instead of original measurements.
The data entry for this program are, for groups 1 and 2, sample size (n), means and Standard Deviations of the independent (x) and dependent (y) variables, and the regression coefficients (b). The program calculates the other necessary parameters as follows

The constant for regression a = mean_y - b * mean_x
Sum of square for x and y Ssq = (Standard Deviation)² * (n - 1)
Sum Product Sxy = b * SsqX
Correlation Coefficient ρ = Sxy / sqrt(SsqX * SsqY)
From the entered and calculated parameter, the two groups can be compared, following ehich the two regressions combined, and the means of the 2 dependent variables adjusted and compared.

Grp 1 (Girls) Grp 2 (Boys)

n 10 12

mean x (gestation in weeks) 38 38.1

SD x 1.8 2

mean y (birth weight in grams 3268 3119

SD y 351.5 380.3

slope b (increase in grams / week) 185.3 186.9

Example: We have read a report on birth weight of boys and girls, which detected that they do not differed significantly. We wish to correct the comparison with the grstational age. The reported parameters are as the table to the right
Initial comparison shows no difference (grp 1 (girls) - grp 2 (boys)) between the two groups

In gestation <1 week, 95% CI = -1.7 to 1.5
In birth weight 149 grams, 95% CI = -160 to 458
To correct the effects of gestation, the other parameters in the two groups were calculated, and the following comparisons made
The difference between the two regression coefficients (b1 - b2) is -1.6 grams / week, p (α) = 0.94, not statistically different. This means that the growth rate of boys and girls are not significantly different
The two regression coefficients are then combined to one for all the data. This is 186.3 grams per week
The two mean birth weights are adjusted using the combined regression coefficient, and the difference between the adjusted means is -168 gram, with the 95% CI = 84 to 252 grams
The conclusions that can be drawn are that

Without tqking gestational age into consideration, girls were 149 grams heavier than boys, but this difference was not statistically significant because of the wide random variation, mostly caused by variations in gestational age
After correction by the comon regression slope, girls were found to be 168 grams lighter than boys, and this difference is statistically significant.

References
Armitage P.(1980) Statistical Methods in Medical Research. Blackwell Scientific Publications. Oxford UK. ISBN 0 632 05430 1. p.279-301
Javascript Program

Data
Data Entry: Data in 2 Columns for groups 1 and 2, and 6 rows
  -Row 1 = sample size (n)
  -Row 2 = Mean for x
  -Row 3 = Standard Deviation for x
  -Row 4 = Mean for y
  -Row 5 = Standard Deviation for y
  -Row 6 = Regression slope (b)


R Code
The following is one continuous program, but broken up to make the numerous results easier to follow
Part 1: Program parameters: consists of the values used in the two groups
# data entered as vectos of 2 elements (grp1 and 2) arN = c(10, 12) # sample size arMeanX = c(38.0, 38.1) # mean x arSdX = c(1.8, 2.0) # SD x arMeanY = c(3268, 3119) # mean Y arSdY = c(351.5, 380.3) # SD y arB = c(185.3, 186.9) # slope, regression coefficient b
Part 2: Create all other parameters from the entered data in two groups
# Calculate the other parameters arA <- c(0, 0) # constant a arSsx = c(0, 0) # sum of squares x arSsy = c(0, 0) # sum of squares y arSxy = c(0, 0) # sum products arRho = c(0, 0) # correlation coefficient rho arT = c(0, 0) # student t arP = c(0, 0) # significance p for(j in 1:2) { arA[j] = arMeanY[j] - arB[j] * arMeanX[j] arSsx[j] = arSdX[j]^2 * (arN[j] - 1) arSsy[j] = arSdY[j]^2 * (arN[j] - 1) arSxy[j] = arB[j] * arSsx[j] arRho[j] = arSxy[j] / sqrt(arSsx[j] * arSsy[j]) arT[j] = arRho[j] * sqrt((arN[j] - 2) / (1 - arRho[j]^2)) arP[j] = 1 - pt(arT[j], arN[j] - 2) # 2 tail } # Output of all parameter vectors arN # sample size arMeanX # mean x arSdX arMeanY # mean Y arSdY # SD y # slopes arB # slope, regression coefficient b arA # constant a # correlation coefficients arRho # correlation coefficient rho arT # student t arP # significance p
The result table of parameters (entered and calculated), are
> arN # sample size [1] 10 12 > arMeanX # mean x [1] 38.0 38.1 > arSdX # SD x [1] 1.8 2.0 > arMeanY # mean Y [1] 3268 3119 > arSdY # SD y [1] 351.5 380.3 > # slopes > arB # slope, regression coefficient b [1] 185.3 186.9 > arA # constant a [1] -3773.40 -4001.89 > # correlation coefficients > arRho # correlation coefficient rho [1] 0.9489047 0.9829082 > arT # student t for correlation [1] 8.505146 16.883720 > arP # significance p (2 tail) [1] 1.401494e-05 5.581409e-09
Part 3: Compare the two xs and ys
# compare x diffX = arMeanX[1] - arMeanX[2] dfX = arN[1] + arN[2] - 2 pooledX = sqrt(((arN[1]-1)*arSdX[1]^2 + (arN[2]-1)*arSdX[2]^2) / dfX) seX = pooledX * sqrt(1/arN[1] + 1/arN[2]) llX = diffX - 1.96 * seX ulX = diffX + 1.96 * seX # compare y diffY = arMeanY[1] - arMeanY[2] dfY = arN[1] + arN[2] - 2; pooledY = sqrt(((arN[1]-1)*arSdY[1]^2 + (arN[2]-1)*arSdY[2]^2) / dfY) seY = pooledY * sqrt(1/arN[1] + 1/arN[2]); llY = diffY - 1.96 * seY ulY = diffY + 1.96 * seY # output 95% CI difference in x and y c(llX, ulX) # 95% CI difference in x c(llY, ulY) # 95% CI difference in y
The results are as follows
> # output 95% CI difference in x and y > c(llX, ulX) # 95% CI difference in x [1] -1.705087 1.505087 > c(llY, ulY) # 95% CI difference in y [1] -159.5142 457.5142

Part 4: Comparing the two regressions
# comparing the two slopes diffSlope = arB[1] - arB[2] # diff slope s2 = ((arSsy[1] - arSxy[1] * arSxy[1] / arSsx[1]) + (arSsy[2] - arSxy[2] * arSxy[2] / arSsx[2])) / (arN[1] + arN[2] - 4) seSlope = sqrt(s2 * (1 / arSsx[1] + 1 / arSsx[2])) # SE of difference tSlope = diffSlope / seSlope # t test dfSlope = arN[1] + arN[2] - 4; # degrees of freedom pSlope = (1 - pt(abs(tSlope), dfSlope)) * 2 # Type I error 2 tail # output comparison 2 slopes diffSlope # difference between slopes seSlope # standard error of difference tSlope # t dfSlope # deg freedom pSlope # significance p (2 tail)
The results are as follows
> # output comparison 2 slopes > diffSlope # difference between slopes [1] -1.6 > seSlope # standard error of difference [1] 22.83804 > tSlope # t [1] -0.07005856 > dfSlope # deg freedom [1] 18 > pSlope # significance p (2 tail) [1] 0.9449195
Part 5: Combine the 2 regression lines and compared the adjusted mean y values
# combining the two slopes commonSlope = (arSxy[1] + arSxy[2]) / (arSsx[1] + arSsx[2]) # common slope grandMean = (arMeanX[1] * arN[1] + arMeanX[2] * arN[2]) / (arN[1] + arN[2]) # mean of x adjMean1 = arMeanY[1] + commonSlope * (grandMean - arMeanX[1]) # adjusted mean y grp 1 adjMean2 = arMeanY[2] + grandMean * (grandMean - arMeanX[2]) # adjusted mean y group 2 diffMean = arMeanY[1] - arMeanY[2] - commonSlope * (arMeanX[1] - arMeanX[2]) # adjusted diff s2 = (arSsy[1] + arSsy[2] - (arSxy[1] + arSxy[2]) * (arSxy[1] + arSxy[2]) / (arSsx[1] + arSsx[2])) / (arN[1] + arN[2] - 3); varMean = s2 * (1.0 / arN[1] + 1.0 / arN[2] + (arMeanX[1] - arMeanX[2]) * (arMeanX[1] - arMeanX[2]) / (arSsx[1] + arSsx[2])); # variance of difference seMean = sqrt(varMean) # Standard Error of difference tMean = diffMean / seMean # t dfMean = arN[1] + arN[2] - 3; # degrees of freedom pMean = (1 - pt(tMean, dfMean)) * 2 # p Type I Error (2 tail) # 95% CI t = abs(qt(0.025,dfMean)) # t value for p=0.05 2 tail ll = diffMean - t * seMean # lower limit 95% CI ul = diffMean + t * seMean # upper limit 95% CI # output of combined data commonSlope # common slope diffMean # difference between adjusted means seMean # Standard Error of difference tMean # t dfMean # df pMean # significance p 2 tail c(ll,ul) # 95% confidence interval of adjusted difference in y
The results are as follows
> # output of combined data > commonSlope # common slope [1] 186.2623 > diffMean # difference between adjusted means [1] 167.6262 > seMean # Standard Error of difference [1] 39.8789 > tMean # t [1] 4.203381 > dfMean # df [1] 19 > pMean # significance p 2 tail [1] 0.0004815876 > c(ll,ul) # 95% confidence interval of adjusted difference in y [1] 84.15872 251.09373