Content Disclaimer
Copyright @2014.
All Rights Reserved.
StatsToDo : Sample Size for Pearson's Correlation Coefficient Explained and Table

Links : Home Index (Subjects) Contact StatsToDo

Related link :
Sample Size Introduction and Explanation Page
Pilot Studies Explained Page
Sample Size for Pearson's Correlation Coefficient Program Page
Correlation and Regression Program Page
Correlation and Regression Explained Page

Introduction Sample Size Table References
This page provides a sample size table for Pearson's Correlation Coefficient rho (ρ), as calculated using the Sample Size for Pearson's Correlation Coefficient Program Page . The table provides sample size for :
  • Power (1-β) of 0.8, 0.9, and 0.95
  • Probability of Type I Error (α) of 0.1, 0.05, 0.01, and 0.001
  • One or two tail models
  • Correlation Coefficient (ρ) at 0.02 intervals
Sample size calculation is based on the algorithm as described by Machin et.al., where the correlation coefficient ρ is firstly transformed to Fisher's Z, and the estimation is based on Z assuming Z is normally distributed.
    Fisher's Z Z = log((1+ρ) / (1-ρ)) / 2
    Sample Size n = ((zα+zβ) / Z)2 + 3 for one tail model
    For 2 tail model the α value is haved. e.g. for p=0.05, zα for 1 tail=1.64, and for 2 tail=1.96
Power is calculated by reversing the same formula, so that zβ = Z(sqrt(n-3))-zα, then converting zβ to probability β. Again value of zα depends on α and whether 1 or 2 tail model

Confidence interval is also calculated by reversing the formula, where

    Z = (zα+zβ) / sqrt(n-3)
    SE = sqrt(n-3)
    Confidence interval = Z ±zα(SE)
    Z and the confidence intervals are then converted to correlation coefficient where ρ=exp(2Z-1) / exp(2Z+1)
    The intervals are not symmetrical, being wider towards 0 and narrower towards 1 and -1

On whether to use a one or two tail model

  • If the user intends to establish 95% confidence interval for comparison with other correlation coefficients, or to use the results in a future meta-analysis, then a two tail model is appropriate.
  • More commonly, however, the interest is whether a significant correlation exists in the data observed, whether one of the tails overlaps the null value of zero (0). In this case, the one tail model suffices, and requires a smaller sample size.
  • In most publications, therefore, when the one or two tail model is not mentioned, the default is usually the one tail model.
Example
    To establish an expected correlation coefficient (ρ) of 0.6, significant at the α<0.05 and a power (1-β) of 0.8, requires 16 pairs of x/y values, taking the default position of a one tail model.