![]() | Content Disclaimer Copyright @2020. All Rights Reserved. |
Links : Home Index (Subjects) Contact StatsToDo |
Explanations
Javascript Programs
R Codes
References
E
F
Introduction
Parametric
Non-parametric
Permutation
E
F
G
This page provides a number of commonly used programs that compares measurements in 2 or more unpaired groups. The structure of this page is as follows
Explanations : with the following subpanels
Parametric Comparisons: with the following explanations
Bartlett's Test Levene Test
Two sample t test
Bonferroni's (also called Fisher's) least significant difference Scheffe's least significant difference unadjusted confidence interval for difference between means
Post-hoc comparison of mean ranks
Robust Signed Rank Test, which is aslso known as the Mann-Whitney U Test Javascript Programs: All the programs listed above are organized in a similar structure in the Javascript sub-panle. These programs are web-based, requiring no more than the insertion of data to be analysed and a click of the button R Codes: All the programs listed are also available in R codes, and similarly organized. This allows users to copy the codes, paste them in the R programming environment, and edit and used as required. References: The text book and published references from which the programs were developed.
This subpanel describes parametric statistics programs that compares data in unpaired groups, assuming that the data are continuus measurements with a normal (Gaussian) distribution.
Tests for equality of varianceIn most parametric comparisons, there is an assumption that the data in different groups are from the same population, and they differ only because of the treatment they received in each group. In other words, the assumption is that the variances of the groups are equal. It is prudent that this assumption is tested and confirmed because the results of other comparisons are interpreted, and 3 programs are provided in this page for this purpose.F Ratio of the two variances: This can be carried out prior to the two sample t test. The F ratio is the ratio of the variances in the two groups (the larger / the smaller). The probability for F is the Probability of Type I Error, and can be used to determine whether a significant difference exists. The test can be demonstrated here, using the example provided in the Javascript panel. If groups 1 and 2 in the example data is used for comparison, the Standard Deviation (and therefore the variance) of group 2 is larger than group 1. The F ratio test can be manually calculated as follows
The smaller variance is from group 1, so variance = SD2 = 1.042 = 1.08, df = n-1 = 10 -1 = 9 F = 4.34 / 1.08 = 4.19, and probability of F, with 19 and 9 degrees of freedom, p (α) = 0.02 The conclusion is that the variance from groups 1 and 3 are significantly unequal.
Comparing all 3 groups: chi sq=4.98 df=2 p=0.08. The variances are not significantly unequal Choice of program: . Please note: The following comments (in blue) are my opinions, based on following comments on the Internet, and not from any formal study, text book or peer reviewed. Care is required to accept these comments, and expert statistical advice sought if there is doubt. Both the Variance ratio and the Bartlett test are more sensitive than the Levene, and can be used to test for equality of variance before the two sample t test or one way analysis of variance. The Bartlett test is more flexible as it can cope with any number of groups, but it requires a program. The F ratio can be easily calculated by hand, but can only be used for two group comparisons. The Levene test is best used when testing the equality of variance is the primary purpose, as it is less likely to lead to a false positive conclusion (that the variances are unequal) Compare means between groups.The Oneway analysis of variance (OWAV) is used to compare means from 2 or more groups. The variations (variance, mean sum square, msq) in the data are partitioned into that attributeable to those between the groups and within the groups (residual). The ratio of these 2 variances (F) is then tested for statistical significanceThe two sample t test is a special case of OWAV, when there are only 2 groups. This test is important in that most evidence based practice is based on the meta-analysis of the difference between two means (Diff) and its Standard Error (SE). Although programs are provided to calculate Diff and SE from two groups on this page, they can just as easily be estimated from the results of analysis of variance (2 groups) as follows
From table of variance, F is obtained, and t = sqrt(F) As t = Diff / SE, SE = Diff / t Post-hoc comparisons between pairs of meansFollowing OWAV when there are more than 2 groups, the analyst may want to compare pairs of groups to obtain more details than whether significant difference exists somewhere between the groups. Post-hoc analysis allows this to be done.It is important to emphasize that post-hoc analysis should not be used as the primary method of comparing group means, as in multiple groups, there is a random chance that some groups may be significantly different. The recommendation is therefore that post-hoc analysis should only be done if a significant difference is already demonstrated by OWAV. Tukey's least significant difference. Although this is a veery commonly used test, there are variations in terminoloty and presentation that can be confusing. The following attempt to unravel this complexity. Please note however that these explanations (in blue) are based on my understandings after consulting text books, published papers, and the Internet, and not from formal instructions, so it is not authoratative. Users are urged to study or obtain advice before accepting these explanations. Tukey initially provided the mathematical model for estimating the confidence interval and statistical significance of the diference between any two group means following the analysis of variance. The model estimated the Studentised range q from the data, which allows the estimation of Standard Error of difference between two means, hence its confidence interval and statistical significance. The test was named Tukey's Studentized Range Test. The mathematics is unfortunately highly complex, requiring extensive numerical approximations. The proof was generally accepted, but the method is difficult to use in practice, especially at the time computers were not widely available.
The Tukey Ramsey least significant difference is used in the Javascript program, and the calculation for this is also available in R codes under post-hoc analysis. However, if the R package for analysis of variance is used, the default results are from the Tukey Studentised Range test. Bonferroni's least significant difference, which is also called Fisher's least significant difference reverses the mathematics of the t test, to define the least significant difference, when the residual variance is known and the significant p (α) is defined. Included in the algorithm is the Bonferroni's correction, where the p (α) value is divided by the number of comparisons being made. The number of comparison being made is automatically calculated in the Javascript analysis of variance program. When calculating least significant difference from summary results of analysis of variance, the relationship between number of groups (nGrp) and number of comparison (nComp) is 1 for 2 groups, 3 for 2 groups, 6 for 3 groups, and so on, with the generalized formula as nComp = (nGrp * (nGrp - 1) / 2) Scheffe's least significant difference is conceptually similar to Bonferroni's least significant difference, but the calculation is based on th F distribution rather than the t distribution Unadjusted confidence interval for difference between means is calculating the confidence interval between any two means in the manner of the two sample t test, assuming that all groups have the same variance. This is a simplification of Tukey's Studentized range, but without consideration of the nymber of groups or comparisons, and assuming that all groups have the same variance. In the paper by Bird, it was argued that Bonferroni correction is not necessary if the comparisons are pre-determined before analysis, and if the variances are indeed equal, then the results obtained is close to that from Tukey's, withput the necessity of complex computations The choice of post-hoc method: Please note: The following comments (in blue) are my own conclusions after reading all I can about these methods in text books, published papers and on line discussions. Users should do their own studies before accepting these comments, and seek expert advice if in doubt. The most comprehensive, robust (least likely to wrongly conclude that significant difference exists), and widely accepted post-hoc comparison is Tukey's Studentized range. However this requires the use of the original data, and the availability of the software package (e.g. in R, SAS, STATA, and so on). If Tukey's Studentized range is not available, the following can be considered.
This sub-panel describes the statistical procedures for comparing non-parametric measurements between groups. These are ordinal values, in that the values are ordered (1<2<3), but the interval between values are undefined. An obvious example is the Likert scale of 1=strongly disagree, 2=disagree, 3=neutral, 4=agree, 5=strongly agree. However, the difference between disagree and strongly disagree (2-1=1) is not the same as strongly agree and agree (5-4=1). Ordinal measurements are usually converted to ranks before they are processed.
The Median Test: This is a simple and easy test for comparing ordinal values in 2 or more groups.
If H is statistically significant, the post-hoc test of estimating the least significant difference between group mean ranks can be estimated to compare individual pairs of groups. According to Siegel (see reference), the Kruskall Wallis test has a power efficient=cy of 95.5% that of the parametric F test. For sample size estimation therefore, the Type II error (β) should be appropriately reduced. For example
The Wilcoxon Mann Whitney Test has a power efficiency of 95.5% that of the t test, and can be used to replace the t test when the distribution is uncertain. The Robust Signed Rank test, in addition will test whether the distributions in the two groups are similar, and its power efficiency approaches that of the t test.
The Permutation Test is intended to compare two groups when the sample size is very small, and the distribution uncertain.
The effect size of the input data is firstly computed. This is then compared with the effect sizes of all possible permutation of the data. The test is where the effect size of the original data sits, in term of the percentile amongst all permutations. This is then interpreted as the probability of Type I Error (α) The test has 100% power as it makes no assumption about distributions. However, the test is constrained by the total sample size of the two groups combined.
As computation requires recursive calculations, the computation time and memory use increase exponentially with sample size, and a total sample size exceeding 26 cases ((n1+n2)>26) will take a impracticably long time to compute, and the program may hang or crash if the available memory capacity is exhausted.
Contents of E:34
Contents of F:35
Contents of G:36
Parametric
Post-hoc Tests
Non-parametric
Permutation Test
E
F
This panel provides the main programs associated with comparing means of parametric measurements from different groups.
Section 1: Data input and preparation# data entry dat = (" Grp Val g1 50 g1 57 g1 70 g1 60 g1 55 g2 58 g2 65 g2 70 g2 70 g2 72 g3 70 g3 72 g3 60 g3 77 g3 75 ") dfDat <- read.table(textConnection(dat),header=TRUE) # conversion to data frame # dfDat # optional showing input data # Organize and prepare data for analysis # create a reference vector for group names ar <- sapply(dfDat[, Grp], as.character) # extract Grp as a char vector uar <- unique(ar) # remove duplications # uar optional show vector # Template for getting vectors of Val dselected by Grp #for(i in 1 : length(uar)) #{ # s <- subset(dfDat, dfDat$Grp==uar[i])$Val # print(s) #} # Separation values (Val) into its groups and calculate mean and SD of each group # vectors to store results from each group Grp <- uar # name of each group N <- vector() # sample size of each group Mean <- vector() # mean of each group SD <- vector() # SD of each group # Calculate for(i in 1 : length(uar)) { v <- subset(dfDat, dfDat$Grp==uar[i])$Val # vector of ?Val for each group N <- append(N, length(v)) Mean <- append(Mean, mean(v)) SD <- append(SD, sd(v)) } dfNMS <- data.frame(Grp, N, Mean, SD) # combine vectos rs to data feame for display and futture access dfNMS # show n, mean, SD in groupsThe results are as follows > dfNMS # show n, mean, SD in groups Grp N Mean SD 1 g1 5 58.4 7.436397 2 g2 5 67.0 5.656854 3 g3 5 70.8 6.610598 Program 2: Bartlett's Test for equality of variancethis uses the summary data in dfNMS (n, mean, SD of groups)# Bartlett's Test k = nrow(dfNMS) # number of groups N = 0 # total sample size for(i in 1 : k) N = N + dfNMS$N[i] # total sample size (all grps) Sp2 = 0 t = 0 b = 0 for(i in 1 : k) # from each group { num = dfNMS$N[i] mean = dfNMS$Mean[i] sd = dfNMS$SD[i] Sp2 = Sp2 + (num - 1) * sd^2 / (N - k) t = t + (num - 1) * log(sd^2) b = b + 1.0 / num } t = (N - k) * log(Sp2) - t b = 1.0 + (1.0 / (3.0 * (k-1))) * b - 1.0 / (N - k) chi = t / b df = k - 1 p = 1 - pchisq(chi, df=df) c(chi, df, p) # result of Barlett's testThe results is the significance of Chi Sq. p< 0.05 means there are significant differences in variance exists between groups > c(chi, df, p) # result of Barlett's test [1] 0.2907470 2.0000000 0.8646993 Program 3: Levene Test for equality of variancesthis uses the raw data in dfDat (grp and Vals)# Levene Test k = nrow(dfNMS) # number of groups X = 0 N = 0 mm <- matrix(data=0, nrow=k,ncol=3) for(i in 1 : k) { vec <- subset(dfDat, dfDat$Grp==uar[i])$Val # vector of Val for each group mean = mean(vec) for(j in 1 : length(vec)) { z = abs(vec[j] - mean) mm[i,1] = mm[i,1] + 1 mm[i,2] = mm[i,2] + z mm[i,3] = mm[i,3] + z^2 N = N + 1 X = X + z } } for(i in 1 : k) { zi = mm[i,2] / mm[i,1] - Zu Zi = Zi + mm[i,1] * zi^2 Zj = Zj + mm[i,3] - mm[i,2]^2 / mm[i,1] } F = ((N - k) * Zi) / ((k-1) * Zj) df1 = k-1 df2 = N - k p = 1 - pf(F, df1, df2) c(F, df1, df2, p) # results of Levenne TestThe results are as follows. These are F, its two degrees of freedom, and p (α). p<0.05 indicates significant differences in variances between groups > c(F, df1, df2, p) # results of Levenne Test [1] 0.06891652 2.00000000 12.00000000 0.93377130 Program 4: One Way Analysis of Varianceres <- aov(dfDat$Val ~ dfDat$Grp) summary(res) # result of analysis of variance TukeyHSD(res) # Tukey's Least Significant Difference with adjusted pThe results of analysis of variance are as follows > summary(res) Df Sum Sq Mean Sq F value Pr(>F) dfDat$Grp 2 403.6 201.80 4.621 0.0325 * Residuals 12 524.0 43.67 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1The results of Tukey's HSD by R are as follows > TukeyHSD(res) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = dfDat$Val ~ dfDat$Grp) $`dfDat$Grp` diff lwr upr p adj g2-g1 8.6 -2.54984 19.74984 0.1408789 g3-g1 12.4 1.25016 23.54984 0.0294074 g3-g2 3.8 -7.34984 14.94984 0.6449915 Program 5: Two samples t testg1 <- c(50,57,70,60,55) g2 <- c(70,72,60,77,75) t.test(g1,g2, var.equal=TRUE)The results are as follows. Two Sample t-test data: g1 and g2 t = -2.7867, df = 8, p-value = 0.02368 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -22.661071 -2.138929 sample estimates: mean of x mean of y 58.4 70.8Please note:
This subpanel presents R codes for the 4 post-hoc tests of significance following analysis of variance using parametric data. These are
Subroutine functionsTukey-Kramer Least Significant Difference# subroutine for Tukey-Kramer Least Significant Difference # rms = residual mean square, rdf=residual degrees of freedom # nGrps = number of groups in aov, # n1, n2 = sample sizes of the 2 groups being compared # x05 and x01 are the tables (for p=0.05 and p=0.01)of studentized range statistics q. These are rearranged into q05 and q01 to suit the matric calls of R. The tables are copied from the text book by Steelr et.al, which quote the original tables by Pearson and Hartley (see references) TukeyLSD <- function(rms, rdf, nGrps, n1, n2) { x05 = c( c(3.635,4.602,5.218,5.673,6.033,6.330,6.582,6.801,6.995,7.167,7.323,7.466,7.596,7.716,7.828,7.932,8.030,8.122,8.208), #rdf=5 c(3.460,4.339,4.896,5.305,5.628,5.895,6.122,6.319,6.493,6.649,6.789,6.917,7.034,7.143,7.244,7.338,7.426,7.508,7.586), c(3.344,4.165,4.681,5.060,5.359,5.606,5.815,5.997,6.158,6.302,6.431,6.550,6.658,6.759,6.852,6.939,7.020,7.097,7.169), c(3.261,4.041,4.529,4.886,5.167,5.399,5.596,5.767,5.918,6.053,6.175,6.287,6.389,6.483,6.571,6.653,6.729,6.801,6.869), c(3.199,3.948,4.415,4.755,5.024,5.244,5.432,5.595,5.738,5.867,5.983,6.089,6.186,6.276,6.359,6.437,6.510,6.579,6.643), c(3.151,3.877,4.327,4.654,4.912,5.124,5.304,5.460,5.598,5.722,5.833,5.935,6.028,6.114,6.194,6.269,6.339,6.405,6.467), #rdf=q0 c(3.113,3.820,4.256,4.574,4.823,5.028,5.202,5.353,5.486,5.605,5.713,5.811,5.901,5.984,6.062,6.134,6.202,6.265,6.325), c(3.081,3.773,4.199,4.508,4.750,4.950,5.119,5.265,5.395,5.510,5.615,5.710,5.797,5.878,5.953,6.023,6.089,6.151,6.209), c(3.055,3.734,4.151,4.453,4.690,4.884,5.049,5.192,5.318,5.431,5.533,5.625,5.711,5.789,5.862,5.931,5.995,6.055,6.112), c(3.033,3.701,4.111,4.407,4.639,4.829,4.990,5.130,5.253,5.364,5.463,5.554,5.637,5.714,5.785,5.852,5.915,5.973,6.029), c(3.014,3.673,4.076,4.367,4.595,4.782,4.940,5.077,5.198,5.306,5.403,5.492,5.574,5.649,5.719,5.785,5.846,5.904,5.958), c(2.998,3.649,4.046,4.333,4.557,4.741,4.896,5.031,5.150,5.256,5.352,5.439,5.519,5.593,5.662,5.726,5.786,5.843,5.896), c(2.984,3.628,4.020,4.303,4.524,4.705,4.858,4.991,5.108,5.212,5.306,5.392,5.471,5.544,5.612,5.675,5.734,5.790,5.842), c(2.971,3.609,3.997,4.276,4.494,4.673,4.824,4.955,5.071,5.173,5.266,5.351,5.429,5.501,5.567,5.629,5.688,5.743,5.794), c(2.960,3.593,3.977,4.253,4.468,4.645,4.794,4.924,5.037,5.139,5.231,5.314,5.391,5.462,5.528,5.589,5.647,5.701,5.752), c(2.950,3.578,3.958,4.232,4.445,4.620,4.768,4.895,5.008,5.108,5.199,5.282,5.357,5.427,5.492,5.553,5.610,5.663,5.714), #rdf=20 c(2.941,3.565,3.942,4.213,4.424,4.597,4.743,4.870,4.981,5.081,5.170,5.252,5.327,5.396,5.460,5.520,5.576,5.629,5.679), c(2.933,3.553,3.927,4.196,4.405,4.577,4.722,4.847,4.957,5.056,5.144,5.225,5.299,5.368,5.431,5.491,5.546,5.599,5.648), c(2.926,3.542,3.914,4.180,4.388,4.558,4.702,4.826,4.935,5.033,5.121,5.201,5.274,5.342,5.405,5.464,5.519,5.571,5.620), c(2.919,3.532,3.901,4.166,4.373,4.541,4.684,4.807,4.915,5.012,5.099,5.179,5.251,5.319,5.381,5.439,5.494,5.545,5.594), c(2.913,3.523,3.890,4.153,4.358,4.526,4.667,4.789,4.897,4.993,5.079,5.158,5.230,5.297,5.359,5.417,5.471,5.522,5.570), c(2.907,3.514,3.880,4.141,4.345,4.511,4.652,4.773,4.880,4.975,5.061,5.139,5.211,5.277,5.339,5.396,5.450,5.500,5.548), c(2.902,3.506,3.870,4.130,4.333,4.498,4.638,4.758,4.864,4.959,5.044,5.122,5.193,5.259,5.320,5.377,5.430,5.480,5.528), c(2.897,3.499,3.861,4.120,4.322,4.486,4.625,4.745,4.850,4.944,5.029,5.106,5.177,5.242,5.302,5.359,5.412,5.462,5.509), c(2.892,3.493,3.853,4.111,4.311,4.475,4.613,4.732,4.837,4.930,5.014,5.091,5.161,5.226,5.286,5.342,5.395,5.445,5.491), c(2.888,3.486,3.845,4.102,4.301,4.464,4.601,4.720,4.824,4.917,5.001,5.077,5.147,5.211,5.271,5.327,5.379,5.429,5.475),#rdf+30 c(2.884,3.481,3.838,4.094,4.292,4.454,4.591,4.709,4.812,4.905,4.988,5.064,5.134,5.198,5.257,5.313,5.365,5.414,5.460), c(2.881,3.475,3.832,4.086,4.284,4.445,4.581,4.698,4.802,4.894,4.976,5.052,5.121,5.185,5.244,5.299,5.351,5.400,5.445), c(2.877,3.470,3.825,4.079,4.276,4.436,4.572,4.689,4.791,4.883,4.965,5.040,5.109,5.173,5.232,5.287,5.338,5.386,5.432), c(2.874,3.465,3.820,4.072,4.268,4.428,4.563,4.680,4.782,4.873,4.955,5.030,5.098,5.161,5.220,5.275,5.326,5.374,5.420), c(2.871,3.461,3.814,4.066,4.261,4.421,4.555,4.671,4.773,4.863,4.945,5.020,5.088,5.151,5.209,5.264,5.315,5.362,5.408), c(2.868,3.457,3.809,4.060,4.255,4.414,4.547,4.663,4.764,4.855,4.936,5.010,5.078,5.141,5.199,5.253,5.304,5.352,5.397), c(2.865,3.453,3.804,4.054,4.249,4.407,4.540,4.655,4.756,4.846,4.927,5.001,5.069,5.131,5.189,5.243,5.294,5.341,5.386), c(2.863,3.449,3.799,4.049,4.243,4.400,4.533,4.648,4.749,4.838,4.919,4.993,5.060,5.122,5.180,5.234,5.284,5.331,5.376), c(2.861,3.445,3.795,4.044,4.237,4.394,4.527,4.641,4.741,4.831,4.911,4.985,5.052,5.114,5.171,5.225,5.275,5.322,5.367), c(2.858,3.442,3.791,4.039,4.232,4.388,4.521,4.634,4.735,4.824,4.904,4.977,5.044,5.106,5.163,5.216,5.266,5.313,5.358),#rdf = 40 c(2.843,3.420,3.764,4.008,4.197,4.351,4.481,4.592,4.690,4.777,4.856,4.927,4.993,5.053,5.109,5.161,5.210,5.256,5.299),#rdf = 50 c(2.829,3.399,3.737,3.977,4.163,4.314,4.441,4.550,4.646,4.732,4.808,4.878,4.942,5.001,5.056,5.107,5.154,5.199,5.241),#rdf = 60 c(2.814,3.377,3.711,3.947,4.129,4.277,4.402,4.509,4.603,4.686,4.761,4.829,4.892,4.949,5.003,5.052,5.099,5.142,5.183),#ref = 90 c(2.800,3.356,3.685,3.917,4.096,4.241,4.363,4.468,4.560,4.641,4.714,4.781,4.842,4.898,4.950,4.998,5.043,5.086,5.126),#rdf =120 c(2.786,3.335,3.659,3.887,4.063,4.205,4.324,4.427,4.517,4.596,4.668,4.733,4.792,4.847,4.897,4.944,4.988,5.030,5.069),#rdf = 240 c(2.772,3.314,3.633,3.858,4.030,4.170,4.286,4.387,4.474,4.552,4.622,4.685,4.743,4.796,4.845,4.891,4.934,4.974,5.012))# inf q05 <- matrix(data=x05, nrow=19, ncol=42) x01=c( c(5.702,6.976,7.804,8.421,8.913,9.321,9.669,9.971,10.239,10.479,10.696,10.894,11.076,11.244,11.400,11.545,11.682,11.811,11.932), c(5.243,6.331,7.033,7.556,7.972,8.318,8.612,8.869,9.097,9.300,9.485,9.653,9.808,9.951,10.084,10.208,10.325,10.434,10.538), c(4.949,5.919,6.542,7.005,7.373,7.678,7.939,8.166,8.367,8.548,8.711,8.860,8.997,9.124,9.242,9.353,9.456,9.553,9.645), c(4.745,5.635,6.204,6.625,6.959,7.237,7.474,7.680,7.863,8.027,8.176,8.311,8.436,8.552,8.659,8.760,8.854,8.943,9.027), c(4.596,5.428,5.957,6.347,6.657,6.915,7.134,7.325,7.494,7.646,7.784,7.910,8.025,8.132,8.232,8.325,8.412,8.495,8.573), c(4.482,5.270,5.769,6.136,6.428,6.669,6.875,7.054,7.213,7.356,7.485,7.603,7.712,7.812,7.906,7.993,8.075,8.153,8.226), c(4.392,5.146,5.621,5.970,6.247,6.476,6.671,6.841,6.992,7.127,7.250,7.362,7.464,7.560,7.648,7.731,7.809,7.883,7.952), c(4.320,5.046,5.502,5.836,6.101,6.320,6.507,6.670,6.814,6.943,7.060,7.166,7.265,7.356,7.441,7.520,7.594,7.664,7.730), c(4.260,4.964,5.404,5.726,5.981,6.192,6.372,6.528,6.666,6.791,6.903,7.006,7.100,7.188,7.269,7.345,7.417,7.484,7.548), c(4.210,4.895,5.322,5.634,5.881,6.085,6.258,6.409,6.543,6.663,6.772,6.871,6.962,7.047,7.125,7.199,7.268,7.333,7.394), c(4.167,4.836,5.252,5.556,5.796,5.994,6.162,6.309,6.438,6.555,6.660,6.756,6.845,6.927,7.003,7.074,7.141,7.204,7.264), c(4.131,4.786,5.192,5.489,5.722,5.915,6.079,6.222,6.348,6.461,6.564,6.658,6.744,6.823,6.897,6.967,7.032,7.093,7.151), c(4.099,4.742,5.140,5.430,5.659,5.847,6.007,6.147,6.270,6.380,6.480,6.572,6.656,6.733,6.806,6.873,6.937,6.997,7.053), c(4.071,4.703,5.094,5.379,5.603,5.787,5.944,6.081,6.201,6.309,6.407,6.496,6.579,6.655,6.725,6.791,6.854,6.912,6.967), c(4.046,4.669,5.054,5.334,5.553,5.735,5.889,6.022,6.141,6.246,6.342,6.430,6.510,6.585,6.654,6.719,6.780,6.837,6.891), c(4.024,4.639,5.018,5.293,5.510,5.688,5.839,5.970,6.086,6.190,6.285,6.370,6.449,6.523,6.591,6.654,6.714,6.770,6.823), c(4.004,4.612,4.986,5.257,5.470,5.646,5.794,5.924,6.038,6.140,6.233,6.317,6.395,6.467,6.534,6.596,6.655,6.710,6.762), c(3.986,4.588,4.957,5.225,5.435,5.608,5.754,5.882,5.994,6.095,6.186,6.269,6.346,6.417,6.482,6.544,6.602,6.656,6.707), c(3.970,4.566,4.931,5.195,5.403,5.573,5.718,5.844,5.955,6.054,6.144,6.226,6.301,6.371,6.436,6.497,6.553,6.607,6.658), c(3.955,4.546,4.907,5.168,5.373,5.542,5.685,5.809,5.919,6.017,6.105,6.186,6.261,6.330,6.394,6.453,6.510,6.562,6.612), c(3.942,4.527,4.885,5.144,5.347,5.513,5.655,5.778,5.886,5.983,6.070,6.150,6.224,6.292,6.355,6.414,6.469,6.522,6.571), c(3.930,4.510,4.865,5.121,5.322,5.487,5.627,5.749,5.856,5.951,6.038,6.117,6.190,6.257,6.319,6.378,6.432,6.484,6.533), c(3.918,4.495,4.847,5.101,5.300,5.463,5.602,5.722,5.828,5.923,6.008,6.087,6.158,6.225,6.287,6.344,6.399,6.450,6.498), c(3.908,4.481,4.830,5.082,5.279,5.441,5.578,5.697,5.802,5.896,5.981,6.058,6.129,6.195,6.256,6.314,6.367,6.418,6.465), c(3.898,4.467,4.814,5.064,5.260,5.420,5.556,5.674,5.778,5.871,5.955,6.032,6.103,6.168,6.228,6.285,6.338,6.388,6.435), c(3.889,4.455,4.799,5.048,5.242,5.401,5.536,5.653,5.756,5.848,5.932,6.008,6.078,6.142,6.202,6.258,6.311,6.361,6.407), c(3.881,4.443,4.786,5.032,5.225,5.383,5.517,5.633,5.736,5.827,5.910,5.985,6.055,6.119,6.178,6.234,6.286,6.335,6.381), c(3.873,4.433,4.773,5.018,5.210,5.367,5.500,5.615,5.716,5.807,5.889,5.964,6.033,6.096,6.155,6.211,6.262,6.311,6.357), c(3.865,4.423,4.761,5.005,5.195,5.351,5.483,5.598,5.698,5.789,5.870,5.944,6.013,6.076,6.134,6.189,6.240,6.289,6.334), c(3.859,4.413,4.750,4.992,5.181,5.336,5.468,5.581,5.682,5.771,5.852,5.926,5.994,6.056,6.114,6.169,6.220,6.268,6.313), c(3.852,4.404,4.739,4.980,5.169,5.323,5.453,5.566,5.666,5.755,5.835,5.908,5.976,6.038,6.096,6.150,6.200,6.248,6.293), c(3.846,4.396,4.729,4.969,5.156,5.310,5.439,5.552,5.651,5.739,5.819,5.892,5.959,6.021,6.078,6.132,6.182,6.229,6.274), c(3.840,4.388,4.720,4.959,5.145,5.298,5.427,5.538,5.637,5.725,5.804,5.876,5.943,6.004,6.061,6.115,6.165,6.212,6.256), c(3.835,4.381,4.711,4.949,5.134,5.286,5.414,5.526,5.623,5.711,5.790,5.862,5.928,5.989,6.046,6.099,6.148,6.195,6.239), c(3.830,4.374,4.703,4.940,5.124,5.275,5.403,5.513,5.611,5.698,5.776,5.848,5.914,5.974,6.031,6.084,6.133,6.179,6.223), c(3.825,4.367,4.695,4.931,5.114,5.265,5.392,5.502,5.599,5.685,5.764,5.835,5.900,5.961,6.017,6.069,6.118,6.165,6.208), c(3.793,4.324,4.644,4.874,5.052,5.198,5.322,5.428,5.522,5.606,5.681,5.750,5.814,5.872,5.926,5.977,6.024,6.069,6.111), c(3.762,4.282,4.594,4.818,4.991,5.133,5.253,5.356,5.447,5.528,5.601,5.667,5.728,5.784,5.837,5.886,5.931,5.974,6.015), c(3.732,4.241,4.545,4.763,4.931,5.069,5.185,5.284,5.372,5.451,5.521,5.585,5.644,5.698,5.749,5.796,5.840,5.881,5.920), c(3.702,4.200,4.497,4.709,4.872,5.005,5.118,5.214,5.299,5.375,5.443,5.505,5.561,5.614,5.662,5.708,5.750,5.790,5.827), c(3.672,4.160,4.450,4.655,4.814,4.943,5.052,5.145,5.227,5.300,5.366,5.426,5.480,5.530,5.577,5.621,5.661,5.699,5.735), c(3.643,4.120,4.403,4.603,4.757,4.882,4.987,5.078,5.157,5.227,5.290,5.348,5.400,5.448,5.493,5.535,5.574,5.611,5.645)) q01 <- matrix(data=x01, nrow=19, ncol=42) col = nGrps - 1 if(col<1) col = 1 if(col>19) col = 19 row = rdf - 4 if(rdf>40 & rdf<=50) row = 36 + 1 if(rdf>50 & rdf<=60) row = 36 + 2 if(rdf>60 & rdf<=90) row = 36 + 3 if(rdf>90 & rdf<=120) row = 36 + 4 if(rdf>120 & rdf<=240) row = 36 + 5 if(rdf>120) row = 36 + 6 # rows and columns of the tables lsd05 = q05[col,row] * sqrt(rms / 2 * (1 / n1 + 1 / n2)) lsd01 = q01[col,row] * sqrt(rms / 2 * (1 / n1 + 1 / n2)) return (c(lsd05, lsd01)) # LSD for p=0.05 and p=0.01 }Sibroutine function for Bonferroni's (also called Fisher's) least significant difference # Subroutine for Bonferroni (Fisher) Least Significant Difference # alpha = probability of Type I error (p, α) # rms = residual mean square, rdf=residual degrees of freedom # nComp = number of comparisons being made from the same aov # n1, n2 = sample sizes of the 2 groups being compared BonferroniLSD <- function(alpha, rms, rdf, nComp, n1,n2) { return (qt(1 - alpha / nComp, rdf) * sqrt(rms * (1 / n1 + 1 / n2))) }Subroutine function for Scneffe's least significant difference # Subroutine for Scheffe Least Significant Difference # alpha = probability of Type I error (p, α) # bdf = between group degrees of freedom = number of groups in analysis of variance - 1 # rms = residual mean square, rdf=residual degrees of freedom # nComp = number of comparisons being made from the same aov # n1, n2 = sample sizes of the 2 groups being compared ScheffeLSD <- function(alpha, bdf, rms, rdf, nComp, n1, n2) { f = qf(1 - alpha, df1=bdf, df2=rdf) # find F value for the significance return (sqrt(nComp * f) * sqrt(rms * (1.0 / n1 + 1.0 / n2))) }Subroutine for unadjusted confidence interval of differences between means following analysis of variance Please note: the parameter for level of confidence is usually expressed as a percentage. e.g. 95% CI. For this subroutine, probability of Type I error (alpha) is used. The conversions is
e.g. for 95% confidence interval alpha = 1 - 95 / 100 = 1 - 0.95 = 0.05 # Subroutine for confidence interval # alpha 0.05 = confidence of 95%, 0.01 as 99% confidence # alpha / 2 for 2 tail mpdel ConfidenceInterval <- function(alpha, rms, rdf, n1, n2) { se = sqrt(rms * (1.0 / n1 + 1.0 / n2)) # Standard Error ci1 = se * qt(1 - alpha, df=rdf) # CI 1 tail ci2 = se * qt(1 - alpha / 2, df=rdf) # CI 2 tail return (c(ci1,ci2)) } Main ProgramAll the subroutines are run using a single main program. All the data are not used in every subroutines# data entry # From analysis of variance # BDF = between group degrees of freedom (number of groups -1) # RDF = residual degrees of freedom (total sample size - number of groups) # RMS = residual mean square (residual variance) # For each test # ALPHA = probability of Type I Error (p, α) # NCOMP = number of comparisons being made from the analysis of variance # N1, N2 the sample size of the two groups for calculating difference dat = (" BDF RDF RMS ALPHA NCOMP N1 N2 2 57 6.35 0.05 3 10 20 2 57 6.35 0.01 3 10 20 2 57 6.35 0.05 2 10 20 2 57 6.35 0.01 1 10 20 2 57 6.35 0.05 3 20 30 2 57 6.35 0.01 3 30 30 2 57 6.35 0.05 2 10 30 2 57 6.35 0.01 1 10 30 ") dfPostHoc <- read.table(textConnection(dat),header=TRUE) # conversion to data frame dfPostHoc # optional showing input data # vectors to record results T05 <- vector() # Tukey Kramer LSD for p=0.05 T01 <- vector() # Tukey Kramer LSD for p=0.01 BON <- vector() # Bonerroni LSD SCH <- vector() # Scheffe LSD CI1 <- vector() # confidence interval 1 tail CI2 <- vector() # confidence interval 2 tail # Calculations for(i in 1 : nrow(dfPostHoc)) { bdf = dfPostHoc$BDF[i] # between group degrees of freedom nGrps = bdf + 1 # number of groups in aov rdf = dfPostHoc$RDF[i] # residual degrees of freedom rms = dfPostHoc$RMS[i] # residual nean square alpha = dfPostHoc$ALPHA[i] # Probability of Tyupe I Error nComp = dfPostHoc$NCOMP[i] # number of comparisons between groups n1 = dfPostHoc$N1[i] n2 = dfPostHoc$N2[i] # Tukey-Kramer LSD ar <- TukeyLSD(rms, rdf, nGrps, n1, n2) T05 <- append(T05, sprintf(ar[1], fmt="%#.4f")) # Tukey Kramer LSD p=0.05 T01 <- append(T01, sprintf(ar[2], fmt="%#.4f")) # Tukey Kramer LSD p=0.01 # Bonferroni (Fisher) LSD bonferroniLSD = BonferroniLSD(alpha, rms, rdf, nComp, n1, n2) BON <- append(BON, sprintf(bonferroniLSD, fmt="%#.4f") ) # Bonferroni LSD # Scheffe LSD scheffeLSD = ScheffeLSD(alpha, bdf, rms, rdf, nComp, n1, n2) SCH <- append(SCH, sprintf(scheffeLSD, fmt="%#.4f") ) # Scheffe LSD # Un adjusted confidence interval ar <- ConfidenceInterval(alpha, rms, rdf, n1, n2) CI1 <- append(CI1, sprintf(ar[1], fmt="%#.4f") ) # confidence interval 1 tail CI2 <- append(CI2, sprintf(ar[2], fmt="%#.4f") ) # confidence interval 2 tail } # Add result vectors to data frame for display dfPostHoc$T05 <- T05 dfPostHoc$T01 <- T01 dfPostHoc$BON <- BON dfPostHoc$SCH <- SCH dfPostHoc$CI1 <- CI1 dfPostHoc$CI2 <- CI2 dfPostHoc # display input data and resultsThe results are as follows From input data
> dfPostHoc # display input data and results BDF RDF RMS ALPHA NCOMP N1 N2 T05 T01 BON SCH CI1 CI2 1 2 57 6.35 0.05 3 10 20 2.3457 2.9550 2.1285 3.0044 1.6318 1.9543 2 2 57 6.35 0.01 3 10 20 2.3457 2.9550 2.7483 3.7792 2.3360 2.6008 3 2 57 6.35 0.05 2 10 20 2.3457 2.9550 1.9543 2.4531 1.6318 1.9543 4 2 57 6.35 0.01 1 10 20 2.3457 2.9550 2.3360 2.1819 2.3360 2.6008 5 2 57 6.35 0.05 3 20 30 1.7484 2.2026 1.5865 2.2393 1.2163 1.4567 6 2 57 6.35 0.01 3 30 30 1.5638 1.9700 1.8322 2.5194 1.5574 1.7339 7 2 57 6.35 0.05 2 10 30 2.2115 2.7860 1.8426 2.3128 1.5385 1.8426 8 2 57 6.35 0.01 1 10 30 2.2115 2.7860 2.2024 2.0571 2.2024 2.4521
This sub-panel presents R codes for comparing non-parametric measurements from different groups, and consisting of
Data entry for 3 groups# data entry dat = (" Grp Val g1 50 g1 57 g1 70 g1 60 g1 55 g2 58 g2 65 g2 70 g2 72 g2 70 g3 70 g3 72 g3 60 g3 77 g3 75 ") df <- read.table(textConnection(dat),header=TRUE) # conversion to data frame df$Rank <- rank(df$Val) # values as ranks df # optional display of data mxRanks <- xtabs(~ df$Grp + df$Rank) # matrix of counts by rank mxRanks # optional display of counts of ranksThe matrix of ranks for the 3 groups are displayed as follows > mxRanks # optional display of counts of ranks df$Rank df$Grp 1 2 3 4 5.5 7 9.5 12.5 14 15 g1 1 1 1 0 1 0 1 0 0 0 g2 0 0 0 1 0 1 2 1 0 0 g3 0 0 0 0 1 0 1 1 1 1 Median Testmedian = nrow(df) / 2 # median rank for all data MED <- vector(); # intermediate vector of 1(<) and 2(>=) median # calculate counts of < or >= medians in groups for(i in 1:nrow(df)) { if(df$Rank[i]<median) { MED <- append(MED,"<Median") } else { MED <- append(MED,">=median") } } # convert median values into a matrix of counts mxMed <- xtabs(~ df$Grp + MED) # matrix of counts for < medians and >= medians, display with results # create matrix to calculate chi square mxCount <- matrix(data=0, nrow=nrow(mxMed)+1,ncol=3) # for calculating chi sq # add row and col total for(i in 1:nrow(mxMed)) { for(j in 1 : ncol(mxMed)) { x = mxMed[i,j] mxCount[i,j] = x mxCount[i,3] = mxCount[i,3] + x mxCount[nrow(mxRanks)+1,j] = mxCount[nrow(mxRanks)+1,j] + x mxCount[nrow(mxRanks)+1,3] = mxCount[nrow(mxRanks)+1,3] + x } } #mxCount # optional display, for debug only # calculate chi sq ng = nrow(mxMed) # number of groups chi = 0 for(i in 1:ng) { for(j in 1:2) { o = mxCount[i,j] # observed e = mxCount[i,3] * mxCount[r+1,j] / mxCount[r+1,3] # expected chi = chi + (e-o)^2 / e # chi sq } } degF = r-1 # df p = 1 - pchisq(chi, df=degF) # p alpha # Show results mxMed # table of median counts c(chi, degF, p) # output results of Median Test: chi sq, df, and pThe results are as follows > mxMed # table of median counts MED df$Grp <Median >=median g1 4 1 g2 2 3 g3 1 4 > c(chi, degF, p) # output results of Median Test: chi sq, df, and p [1] 3.750000 2.000000 0.153355 Kruskall Wallis Analysis of Variance by RankFirstly, the subroutine to calculate post hoc comparison of mean ranks between groups, which will be called from the main program# subroutine for Least Significant differnce in mean ranks LSDRank <- function(TotN,NGrps,n1,n2,alpha) # total ssiz, num of grps 2 tail { tt = sqrt((TotN * (TotN + 1) / 12.0) * (1.0 / n1 + 1.0 / n2)) gg = alpha / (NGrps * (NGrps - 1)) # gg is Bonferroni corrected prob 1 tail return(-qnorm(gg) * tt) }The main program. This uses the same example data as in the median test, and results of the following preliminary calculations
# Main Program nn = nrow(df) # total sample size arGrpNames <- rownames(mxRanks) # name of groups ng = length(arGrpNames) # numver of groups # Step 1: calculate mean ranks arMeanRanks <- vector() for(k in 1 : ng) { s <- subset(df$Rank, df$Grp==arGrpNames[k]) arMeanRanks <- append(arMeanRanks, mean(s)) } #arMeanRanks # optional display the mean ranks (during debugging) # Cakcukate Kruskall Wallis nr = ncol(mxRanks) # number of columns (number of rank intervals # Calculations begin ET = 0 # for calculating ties arN <- array(0,ng) # group sample size arI <- array(0,nr) # rank sample size for(i in 1 : nr) { for(j in 1 : ng) { x = mx[j,i] arI[i] = arI[i] + x arN[j] = arN[j] + x } ET = ET + 1.0 * arI[i] * arI[i] * arI[i] - arI[i]; # ties } SumSq = 0 #Sum(Nj*Rj2) for(j in 1 : ng) { SumSq = SumSq + arN[j] * arMeanRanks[j]^2 } H = (12.0 / (1.0 * nn * (nn + 1)) * SumSq - 3.0 * (nn+1)) / (1 - ET / (1.0 * nn * nn * nn - nn)) degF = ng - 1 p = 1 - pchisq(H, df=degF) # Results arGrpNames # name of groups arN # sample size of groups arMeanRanks # mean ranks of groups c(H, degF, p) # Results Kruskall Wallis H, degrees of freedom, p (α)The initial results are displayed as follows > # Results > arGrpNames # name of groups [1] "g1" "g2" "g3" > arN # sample size of groups [1] 5 5 5 > arMeanRanks # mean ranks of groups [1] 4.2 8.5 11.3 > c(H, degF, p) # Results Kruskall Wallis H, degrees of freedom, p (α) [1] 6.53503650 2.00000000 0.03810087 >Post-hoc analysis # vectors to store variables and results # post hoc analysis Grp1 <- vector() Grp2 <- vector() Diff <- vector() LSD_05 <- vector() LSD_01 <- vector() LSD_005 <- vector() for(i in 1 : (ng - 1)) { for(j in (i+1) : ng) { Grp1 <- append(Grp1, arGrpNames[i]) Grp2 <- append(Grp2, arGrpNames[j]) Diff <- append(Diff, arMeanRanks[i] - (arMeanRanks[j])) n1 = arN[i] n2 = arN[j] LSD_05 <- append(LSD_05, LSDRank(nn,ng,n1,n2,0.05)) LSD_01 <- append(LSD_01, LSDRank(nn,ng,n1,n2,0.01)) LSD_005 <- append(LSD_005, LSDRank(nn,ng,n1,n2,0.005)) } } dfRes <- data.frame(Grp1, Grp2, Diff, LSD_05, LSD_01, LSD_005) dfRes # results of post-hoc comparison of mean ranksThe results are as follows > dfRes <- data.frame(Grp1, Grp2, Diff, LSD_05, LSD_01, LSD_005) > dfRes # results of post-hoc comparison of mean ranks Grp1 Grp2 Diff LSD_05 LSD_01 LSD_005 1 g1 g2 -4.3 6.771197 8.301998 8.892519 2 g1 g3 -7.1 6.771197 8.301998 8.892519 3 g2 g3 -2.8 6.771197 8.301998 8.892519 Non-parametric tests for 2 groupsIn addition to the Mrdian test (already listed above), 2 programs for comparing non-parametric measurements in 2 groups are presented here. These are
First, the data to be used. This is the same as that for the Kruskall Wallis test, except that the group g2 is removed, so that the 2 groups g1 and g3 are being compared. # data entry dat = (" Grp Val g1 50 g1 57 g1 70 g1 60 g1 55 g3 70 g3 72 g3 60 g3 77 g3 75 ") df <- read.table(textConnection(dat),header=TRUE) # conversion to data frame df$Rank <- rank(df$Val) # values to rank df # optional display of data mxRanks <- xtabs(~ df$Grp + df$Rank) # matrix of counts by rank mxRanks # optional display of counts of ranks arGrpNames <- rownames(mxRanks) # name of groups Wilcoxon Mann Whitney Testng = nrow(mxRanks) # number of groups nr = ncol(mxRanks) # number of rank divisions c(ng,nr) m = sum(mxRanks[1,]) # sample size grroup 1 n = sum(mxRanks[2,]) # sample size grroup 2 nn = m + n # total sample size c(m,n,nn) # calculate ties t = 0 for(i in 1 : nr) { z = mxRanks[1,i] + mxRanks[1,i] if(z>1) { t = t + (z * z * z - z) / 12 # calculate ties } } # Calculate w = sum of ranks w = c(0,0) # w for the 2 groups for(i in 1 : 2) { s <- subset(df$Rank, df$Grp==arGrpNames[i]) # ranks of each geoup for(j in 1:length(s)) { w[i] = w[i] + s[j] # w for each group } } # Wilcoxon Mann Whitney Test z z = (w[1] + 0.5 - m * (nn + 1) / 2) / sqrt((m * n / (nn * (nn - 1))) * ((nn * nn * nn - nn) / 12 - t)) p = 1 - pnorm(abs(z)) # Results output #n, w, mean rank of each group c(m, w[1], w[1] / m) c(n, w[2], w[2] / n) # Wilcoxon Mann Whitney U z and p c(z, p)The results are as follows > #n, w, mean rank of each group > c(m, w[1], w[1] / m) [1] 5.0 17.0 3.4 > c(n, w[2], w[2] / n) [1] 5.0 38.0 7.6 > # Wilcoxon Mann Whitney z and p > c(z, p) [1] -2.12132034 0.01694743 Robust Signed Rank Test (Mann-Whitney UFirstly the subroutine function to calculate statistical significance (p, α) from U, which will be called from the main program# subroutine significance of U SigOfU <- function(U, m, n) { U = abs(U) if(m>12 | n>12) return (sprintf("p = %.4g",(1 - pnorm(U)))) if(m<3 | n<3) return ("p = n.s.") i = m j = n if(n<m) { i = n j = m } if(i==3) { mx <- c(c(2.347,1.732,1.632,1.897,1.644,1.500,1.575,1.611,1.638,1.616), c(1e10, 2.273,2.324,2.912,2.605,2.777,2.353,2.553,2.369,2.449), c(1e10, 1e10,4.195,5.116,6.037,4.082,3.566,3.651,3.503,3.406), c(1e10, 1e10, 1e10, 1e10, 1e10,6.957,7.876,8.795,5.831,5.000)) mxC <- matrix(data = mx, nrow=10,ncol=4) } else if(i==4) { mx <- c(c(1.586,1.500,1.434,1.428,1.371,1.434,1.466,1.448,1.455), c(2.502,2.160,2.247,2.104,2.162,2.057,2.000,2.067,2.096), c(4.483,3.265,3.021,3.295,2.868,2.683,2.951,2.776,2.847), c(1e10, 1e10,6.899,4.786,4.252,4.423,4.276,4.017,3.904)) mxC <- matrix(data = mx, nrow=9,ncol=4) } else if(i==5) { mx <- c(c(1.447,1.362,1.308,1.378,1.361,1.361,1.340,1.369), c(2.063,1.936,1.954,1.919,1.893,1.900,1.890,1.923), c(2.859,2.622,2.465,2.556,2.536,2.496,2.497,2.497), c(7.187,3.913,4.246,3.730,3.388,3.443,3.435,3.444)) mxC <- matrix(data = mx, nrow=8,ncol=4) } else if(i==6) { mx <- c(c(1.335,1.326,1.327,1.338,1.339,1.320,1.330), c(1.860,1.816,1.796,1.845,1.829,1.833,1.825), c(2.502,2.500,2.443,2.349,2.339,2.337,2.349), c(3.712,3.519,3.230,3.224,3.164,3.161,3.151)) mxC <- matrix(data = mx, nrow=7,ncol=4) } else if(i==7) { mx <- c(c(1.333,1.310,1.320,1.313,1.302,1.318), c(1.804,1.807,1.790,1.776,1.769,1.787), c(2.331,2.263,2.287,2.248,2.240,2.239), c(3.195,3.088,2.967,3.002,2.979,2.929)) mxC <- matrix(data = mx, nrow=6,ncol=4) } else if(i==8) { mx <- c(c(1.295,1.283,1.284,1.290,1.293), c(1.766,1.765,1.756,1.746,1.759), c(2.251,2.236,2.209,2.205,2.198), c(2.954,2.925,2.880,2.856,2.845)) mxC <- matrix(data = mx, nrow=5,ncol=4) } else if(i==9) { mx <- c(c(1.294,1.304,1.288,1.299), c(1.744,1.742,1.744,1.737), c(2.206,2.181,2.172,2.172), c(2.857,2.802,2.798,2.770)) mxC <- matrix(data = mx, nrow=4,ncol=4) } else if(i==10) { mx <- c(c(1.295,1.284,1.284), c(1.723,1.726,1.720), c(2.161,2.152,2.144), c(2.770,2.733,2,718)) mxC <- matrix(data = mx, nrow=3,ncol=4) } else if(i==11) { mx <- c(c(1.289,1.290), c(1.716,1.708), c(2.138,2.127), c(2.705,2.683)) mxC <- matrix(data = mx, nrow=2,ncol=4) } else if(i==12) { mx <- matrix(c(1.283), c(1.708), c(2.117), c(2.661)) mxC <- matrix(data = mx, nrow=1,ncol=4) } if(U>=mxC[j-i+1,4]) return("p<0.01") if(U>=mxC[j-i+1,3]) return("p<0.025") if(U>=mxC[j-i+1,2]) return("p<0.05") if(U>=mxC[j-i+1,1]) return("p<0.1") return("p=n.s.") }The main program" This uses the same data entry as the Wilcoxon Mann Whitney Test, listed above. This will not be repeated here. The program starts from a copy ot the procedures to convert measurements to rans # Data entry as in Wilcoson test # Create ranks and tuen ranks into a matrix, copied b\from Wilcoson Mann Whitney Test ng = nrow(mxRanks) # number of groups nr = ncol(mxRanks) # number of rank divisions c(ng,nr) m = sum(mxRanks[1,]) # sample size grroup 1 n = sum(mxRanks[2,]) # sample size grroup 2 nn = m + n # total sample size c(m,n,nn) x1 = mxRanks[2,1] x2 = mxRanks[1,1] uyx = 0 uxy = 0 for(i in 2 : nr) { uyx = uyx + mxRanks[1,i] * x1 uxy = uxy + mxRanks[2,i] * x2 x1 = x1 + mxRanks[2,i] x2 = x2 + mxRanks[1,i] } uyx = uyx / m uxy = uxy / n x1 = 0 x2 = 0 vx = 0 vy = 0 for(i in 1 : nr) { vx = vx + mxRanks[1,i] * (uyx - x1)^2 vy = vy + mxRanks[2,i] * (uxy - x2)^2 x1 = x1 + mxRanks[2,i] x2 = x2 + mxRanks[1,i] } U = (m * uyx - n * uxy) / (2 * sqrt(vx + vy + uyx * uxy)) p = SigOfU(U, m, n) # significance (p, alpha) of U # results output c(U, m, n) # Mann Whitney's U and sample size of the 2 groups p # significance (p, α) of UThe results are as follows > # results output > c(U, m, n) # Mann Whitney's U and sample size of the 2 groups [1] -4.753127 5.000000 5.000000 > p # significance (p, α) of U [1] "p<0.025"
This sub-panel provides R codes for the Permutation Test
Section 1: the global variables which will be set up in the main program bu=ut used in the recursive calculations nSame <- 0 nLess <- 0 nMore <- 0 nv <- 0 sumTot <- 0 effectSize <- 0 refAr <- vector()Section 2: Subroutine function Permutate a recursive function calculating the permutation results Permutate <- function(m, indexAr) { if(indexAr[m]>nv) # last element of index exceeds data size { return(c(nLess, nSame, nMore)) # all permutation done } # Calculate sum = 0; for(i in 1 : m){ sum = sum + refAr[indexAr[i]] } # sum of permutated values diff = sum * 2.0 - sumTot # difference to reference if(abs(diff-effectSize)<1e-10) # same (trivial difference by binary math notation) { assign("nSame", nSame+1, envir = .GlobalEnv) } else { if(diff<effectSize) # less than input effect size { assign("nLess", nLess+1, envir = .GlobalEnv) } else # more than input effect size { assign("nMore", nMore+1, envir = .GlobalEnv) } } # recurse indexAr[m] = indexAr[m] + 1 # increment last cell of index for(i in m:2) # backwards to second cell { if(indexAr[i]>(nv-m+i)) # cell exceeds max for that position { indexAr[i-1] = indexAr[i-1] + 1 # advance prvious cell for(j in i:m) indexAr[j] = indexAr[j-1] + 1 # rest cell } } Permutate(m, indexAr) # recurse }Section 3: the main program # main program # data entry dat = (" Grp Val g1 57 g1 70 g1 60 g1 55 g2 58 g2 65 g2 70 g2 70 g2 72 g2 70 g2 72 g2 60 g2 77 g2 75 ") df <- read.table(textConnection(dat),header=TRUE) # conversion to data frame # set up and initialize globasl variables nv <- nrow(df) # sample size (global) # cross tab mxTmp <- xtabs(~ df$Grp + df$Val) # tmp matrix of counts arName <- rownames(mxTmp) # name of groups refAr <- df$Val # all the values (global) sumTot <- sum(df$Val) # sum(y) for all cases (global) # Effect size of data as entered dfGrp1 <- subset(df, df$Grp==arName[1]) # group 1 data m = nrow(dfGrp1) # sample size grp 1 # effect size of the 2 grps entered, to be used for comparisons recursively sumM = sum(dfGrp1$Val) # sum (values in grp 1) effectSize <- sumM * 2 - sumTot # Effect size of data as entered (global) # results nLess <- 0 # number of cases with smaller effect size (global) nSame <- 0 # number of cases with the same effect size (global) nMore <- 0 # number of cases with larger effect size (global) # set up initial index vector to control recursion indexAr <- vector() # index vector for(col in 1:m) indexAr <- append(indexAr, col) # fill to sample size of grp 1 #Call subroutine to calculate resAr <- Permutate(m, indexAr) # assemble results nTot = nLess + nSame + nMore # Total number of possible permutations pLess = nLess / nTot # proportion with smaller effect size pSame = nSame / nTot # proportion with the same effect size pMore = nMore / nTot # proportion with larger effect size # probability of obtaining reference effect size or more extreme p = min(pLess,pMore) + pSame # significance (probability, α) # result output effectSize # effect size of the 2 groups in input data c(nTot, nLess, nSame, nMore) # result numbers c(pLess, pSame, pMore) # result proportions p # probability of sum(x) or more extreme (p, α)The resuts are as follows > effectSize # effect size of the 2 groups in input data [1] -447 > c(nTot, nLess, nSame, nMore) # result numbers [1] 1001 17 11 973 > c(pLess, pSame, pMore) # result proportions [1] 0.01698302 0.01098901 0.97202797 > p # probability of sum(x) or more extreme (p, α) [1] 0.02797203
Contents of E:24
Contents of F:25
Parametric TestsConfidence Intervals :
Comparison of VariancesBartlett's Test I have not read this reference, but if is quoted in the nist web site from which I obtained the algorithm.
Formulae : I obtained these from the National Institute of Science and Technology resource website. The urls are One Way Analysis of VarianceAlgorithm: Armitage P. Statistical Methods in Medical Research (1971). Blackwell Scientific Publications. Oxford. P.189-207.Confidence Intervals : Altman DG, Machin D, Bryant TN and Gardner MJ. (2000) Statistics with Confidence Second Edition. BMJ Books ISBN 0 7279 1375 1. p. 28-31 Post hoc AnalysisLeast significant difference Tukey:
Steel R.G.D., Torrie J.H., Dickey D.A. Principles and Procedures of Statistics. A Biomedical Approach. 3rd. Ed. (1997)ISBN 0-07-061028-2 p. 191-192 Studentised range tables : Pearson ES, Hartley HO (1966) Biometrika table for statisticians Ed. 3 Table 29. https://en.wikipedia.org/wiki/Tukey's_range_test from Wikipedia
Scheffe
Pedhazur E.J. Multiple regression in behavioral research explanation and prediction (3rd Ed) 1993. Harcourt Brace College Publishers, Orlando Florida. ISBN 0-03-072831-2 p. 369-371 Portney LG, Watkins MP (2000) Foundations of Clinical Research. Applications to Practice (Second Edition) Prentice Hall Health, New Jersey. ISBN 0-8385-2695-0. p. 460-461 Steel R.G.D., Torrie J.H., Dickey D.A. Principles and Procedures of Statistics. A Biomedical Approach. 3rd. Ed. (1997)ISBN 0-07-061028-2 p. 189-190
Nonparametric TestsSiegel S and Castellan Jr. N J (1988) Nonparametric Statistics for the Behavioral Sciences 2nd. Ed. McGraw Hill, Inc. Boston Massachusetts. ISBN 0-07-057357-3Least significant difference between mean ranks Siegel S and Castellan jr. N,J (1988) Nonparametric statistics for the behavioural sciences. McGraw-Hill International Editions Statistics series. ISBN 0-07-057357-3. P213-214. Dunn's Test Median Test Permutation Test
Contents of E:4
Contents of F:5
|