Content Disclaimer Copyright @2020. All Rights Reserved. |

**Links : **Home
Index (Subjects)
Contact StatsToDo

Explanations and References
This page describes the relationship between sample size and error in estimating the
Sample Size Tables
mean of a normally distributed measurement from a sample
Establishing population means is a frequent research activity, particularly in the educational, social, and biomedical fields. Educational departments may wish to know the mathematical abilities of a cohort of school children, Obstetricians need to know the normal birth weight, and so on.
- Level of confidence in the results. This is expressed as a percentage, the most commonly used one being 95% confidence interval
- Error, either defined by the analyst as tolerable during planning, or estimated from the results at the end of data collection, is the distance between the mean and the end of the confidence interval, so that the actual confidence interval is CI = mean ± error. Error is calculated in Standard Deviation units (z), and the error in term of the measurement ielf is z x SD
- Sample size is the number of observations in the data
This panel provides three calculations - The first is to
**estimate the sample size requirement**, using confidence level, tolerable error, and assumed Standard Deviation. The input data and results are in actual units of measurements. In the sample size table however, the error/SD ratio us used, so the results are based on SD=1.An example: We wish to establish the mean IQ of first year university students. We expect the Standard Deviations of IQ in the cohort to be 10, and we want a 95% confidence interval of the results to be ±2 IQ points. This is an error / SD ratio of (z = 2 / 10 = 0.2). The sample size required is 99 subjects. - The second is to
**estimate error from the data already collectd**. This is based on the confidence level desired, and the sample size and Standard Deviation found in the data. The result is the error in actual measurements, so that the confidence interval is mean ±errorAn example: We proceeded to measure 97 university student's IQ, and found the mean and Standard Deviation of IQ in the group measured to be 110 and 12 accordingly. From this we can established that the 95% confidence interval to be ±2.4. The 95% CI is therefore 110±2.4, 107.6 to 112.4 - The third is an exploration of relationship between sample size and error, and is used in
**pilot studies**when the mean and Standard Deviations are not known. The program estimates the error with increasing sample size, using the value of 1 for Standard Deviation, so the results are in Standard Deviation units (z). The results are tabulated, and allows the researcher to determine the optimum sample size for a pilot studyAn Example: We would like to know the optimum sample size to be used in a pilot study with a 95% confidence. Examining the results using the Javascript program, we can conclude that a sample size of between 15 and 20 would allow us to obtain a 95% confidence interval of mean ± 0.5SDs, or a sample size of approximately 65 to obtain a 95% confidence interval of mean ± 0.25SDs. We can also conclude that, after the first 20 cases, each increase of 5 cases reduces error by less that 0.1SDs. If the cost of data collection is great, we may decide to use 20 cases in the pilot study. However, if greater precision is a priority, we may use greater numbers. **Please note:**Pilot studies obtains preliminary data that are useful during planning. The sample size is therefore an approximation, determined by a balance of need for precision and the cost of data collection. The results of pilot studies therefore cannot be used for hypothesis testing or defining a population parameter. To obtain robust results, the sample size calculation should be used, and the results tested using the error calculations
## Reference-
Machin D, Campbell M, Fayers, P, Pinol A (1997) Sample Size Tables for Clinical
Studies. Second Ed. Blackwell Science IBSN 0-86542-870-0 p. 131-135
This sub-panel provides 2 tables for samples size and mean values, the first for sample size, the second for pilot studies
Javascript Programs
## Table 1 . Samples Size for Estimating MeansThe table consists of 5 columns.- The first is the tolerable error in Standard Devision units (z=ER/SD).
- The second to the fifth column are the sample size required for estimating population means and errors to 80%, 90%, 95%, and 99% level of Confidence
- The cells contain the sample size required
- For example, to have 95% confidence interval that the mean will fall within the range of ± a fifth of the expected Standard Deviation (z=ER/SD=0.20), the sample size required is 99
This sub-panel shows R codes for programs related to means
PtoZ <- function(p) # probability to z { return (-qnorm(p)) } ZtoP <- function(z) # z to probability { return (pnorm(-z)) } PtoT <- function(p, degFd, tail) # probability to t { p = p / tail return (-qt(p, df=degFd)) } TtoP <- function(t, degFd, tail) # t to probability { p = pt(-t, df=degFd) return (p * tail) } Section 2: Sample Size for Mean
ssmean <- function(c,er,sd) # c=% confidence, er=error, sd = SD { c = (1.0 - c / 100.0) # 2 tails for t ssL = 1 ssR = 1e10 ss = 5000 se = sd / sqrt(ss); t = PtoT(c,ss-1,2) e = t * se; while(abs(e-er)>0.0001 && abs(ssL-ssR)>1) { if(e>er){ssL = ss } else {ssR=ss } ss = round((ssL+ssR)/2) se = sd/sqrt(ss) if(ss>5000){t = PtoZ(c/2);} else {t = PtoT(c,ss-1,2);} e = t * se; } return (ss) } # Testing txt = (" Cf SD Err 90 0.4 0.1 95 1.0 0.5 99 2.2 1.3 ") df <- read.table(textConnection(txt),header=TRUE) df # optional display of data frame # extract columns as vectoprs arCf <- df$Cf arSD <- df$SD arErr <- df$Err # Create result vector arSSiz <- vector() for(i in 1:nrow(df)) { cf = arCf[i] sd = arSD[i] er = arErr[i] arSSiz <- append(arSSiz,ssmean(cf, sd, er)) # append sample size to result array } # Incorporatw results to original data frame df$SSiz <- arSSiz # show data frame with results dfResult: SSiz for Cf% confidence interval of mean±Err > df Cf SD Err SSiz 1 90 0.4 0.1 46 2 95 1.0 0.5 18 3 99 2.2 1.3 22 Section 3: Program for Error of Means
errmean <- function(c,n,sd) # c=confidence in %, n=ssiz, sd=SD { c = (1.0 - c / 100.0) # 2 tails for t se = sd / sqrt(n) t = PtoT(c,n-1,2) return (t * se) } # Testing txt = (" Cf SSiz SD 90 46 0.4 95 18 1.0 99 22 2.2 ") df <- read.table(textConnection(txt),header=TRUE) df # optional display of data frame # extract columns as vectoprs arCf <- df$Cf arSSiz <- df$SSiz arSD <- df$SD # Create result vector arEr <- vector() for(i in 1:nrow(df)) { cf = arCf[i] ssiz = arSSiz[i] sd = arSD[i] arEr <- append(arEr,errmean(cf,ssiz,sd)) # append sample size to result array } # Incorporatw results to original data frame df$Err <- arErr # show data frame with results dfResult: Cf% confidence interval = mean±Err Cf SSiz SD Err 1 90 46 0.4 0.1 2 95 18 1.0 0.5 3 99 22 2.2 1.3 > Section 4: Pilot for Means
MeanPilot <- function(conf, intv, maxN) { ssiz <- vector() error <- vector() Dec <- vector() PcDec <- vector() n = intv i = 1 ssiz <- append(ssiz,n) error <- append(error, errmean(conf, n, 1)) Dec <- append(Dec,0) PcDec <- append(PcDec,0) while(n <= maxN) { i = i + 1 n = n + intv ssiz <- append(ssiz,n) error <- append(error, errmean(conf, n, 1)) Dec <- append(Dec,error[i-1] - error[i]) PcDec <- append(PcDec,Dec[i] / error[i-1] * 100) } mx <- cbind(ssiz, error, Dec, PcDec) df = as.data.frame(mx) return(df) } # Testing confidence = 95 interval = 5 maxN = 20 MeanPilot(confidence, interval, maxN)Results > MeanPilot(confidence, interval, maxN) ssiz error Dec PcDec 1 5 1.2416640 0.00000000 0.00000 2 10 0.7153569 0.52630709 42.38724 3 15 0.5537815 0.16157536 22.58668 4 20 0.4680144 0.08576714 15.48754 5 25 0.4127797 0.05523469 11.80192 |