Cohen Kappa

Content Disclaimer
Copyright @2020.
All Rights Reserved.

StatsToDo: Cohen's Kappa

Links : Home Index (Subjects) Contact StatsToDo

Explanations and References
Kappa was first described by Fleiss in 1969. It is a measurement of concordance or agreement between two or more judges, in the way they classify or categorise subjects into different groups or categories. It became a popular method of measuring concordance for nominal data
Cohen modified Fleiss's algorithm for use when there are only two raters or measurements, by inserting a weighting to the differences between the pair of measurements. This increases the influences with the width of difference, making the algorithm suitable for ordinal scales
Cohen's Kappa is therefore a measurement of concordance when the data is ordinal
Nomenclature

Ordinal data These are data sets where the numbers are in order, but the distances between numbers are unstated. In other words 3 is bigger than 2 and 2 is bigger than 1, but 3-2 is not necessarily the same as 2-1.
A common ordinal data is the Likert scale, where 1=strongly disagree, 2=disagree, 3=neutral, 4=agree, and 5=strongly agree. Although these numbers are in order, the difference between strongly agree and agree (5-4) is not necessarily the same as between disagree and strongly disagree (2-1).
In the example on this page, babies are classified as small (1), as expected (2), and large(3). Large (3) is bigger than expected (2), and expected (2) is bigger than small (1). However, the difference between large and expected is not the same as between expected and small
Instrument is any method of measurement. For example, a ruler, a Likert Scale (5 point scale from strongly disagree to strongly agree), or a machine (e.g. ultrasound measurement of bone length). In the example of this page, the instrument is the judgement of the two doctors concerned
Subjects are the subjects of the measurements. The babies in this example

Example
The example on this page are artificially created to demonstrate the procedure, and they do not reflect any real clinical situation. The data purports to be from two doctors evaluating the size of 30 babies in their mother's abdomen, and classified them as smaller than expected (1), size as expected (2) and larger than expected (3). Cohen's Kappa then evaluates how much the two doctors agreed with each other (in concordance)
The data can be entered in two manners

As a table of 30 rows (cases) and two columns (doctors), each cell containing the evaluation (1, 2, or 3)
As a table of counts, with rows representing doctor 1's evaluation (1, 2, or 3) and column as doctor 2's evaluation (1, 2, or 3). Each cell contains the number of cases so evaluated.

The result consists firstly the display of the count matrix, then the Kappa, its Standard Error, and its 95% confidence interval. Two common methods of interpretation can be used

If the 95% confidence interval does not traverse the null value (0), a conclusion that concordance significantly stronger than random chance has been reached. In this example significant concordance cannot be concluded
A rule of thumb, where a Kappa of <0.2 is considered poor agreement, 0.21-0.4 fair, 0.41-0.6 moderate, 0.61-0.8 strong, and more than 0.8 near complete agreement. From our example, the conclusion that poor to fair concordance can be made. This is not surprising as the sample size is clearly too small.

References
Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 20:37-46, 1960.
Cohen J. Weighted kappa: nominal scale agreement with provision for scale and disagreement or partial credit. Psychol. Bull. 70:213-20. 1968.
Fleiss, Joseph L.; Cohen, Jacob; Everitt, B. S. (1969) Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, Vol 72(5): p 323-327
Fleiss JL Statistical methods for rates and proportions second edition. Wiley Series in probability and mathematical statistics. Chapter 13 p. 212-236
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33: 159-74.
wikipedia.org/wiki/Cohen's_kappa
University of York Department of Health Sciences Measurement in Health and Disease Cohen's Kappa a teaching paper with easy to understand and full formulation for Cohen's Kappa, weighted and unweighted, and Standard Error calculations.
Javascript Program

Data Entry using Table of Raw Scores
The data is a matrix of numbers with 2 columns
  - Each row a subject
  - The columns are from 2 ordinal scales
  - Each cell contains the scores (ordinal value)
  - Data is converted to ranks
    then a table of counts by ranks for analysis.
Data Entry using Table of Counts by Ranks
The data is a square matrix of counts
  - The number of rows and cols are the ranks of the two ordinal scales
  - The lowest scale value is ranked to 1
  - Each cell contains the count of the two rankss

R Codes
This panel presents the algorithms for Cohen's Kappa for ordinal data
Firstly, the subroutine function that calculates Kappa from a matrix of counts by ranks
# Cohen Kappa for ordinal data # function for Kappa Algorithm using matrix of counts by ranks CalCohenKappa <- function(mx) { print("Matrix of count by ranks") print(mx) g = nrow(mx) # converts values into ranks # ranking by range of values and not by number of cases n = 0 # n = total number of paired values mxSq <- matrix(data=0, nrow=g+1,ncol=g+1, byrow=TRUE) # data matrix with row and col totals added for(i in 1:g) for(j in 1:g) { v = mx[i,j] n = n + v mxSq[i,j] = v mxSq[i,g+1] = mxSq[i,g+1] + v # col total mxSq[g+1,j] = mxSq[g+1,j] + v # row total mxSq[g+1,g+1] = mxSq[g+1,g+1] + v } # print(mxSq) # optional print out # Calculate Cohen's (weighted) Kappa mxp <- matrix(data=0, nrow=g,ncol=g, byrow=TRUE) mxpe <- matrix(data=0, nrow=g,ncol=g, byrow=TRUE) mxw <- matrix(data=0, nrow=g,ncol=g, byrow=TRUE) for(i in 1:g)for(j in 1:g) { mxp[i,j] = mxSq[i,j] / mxSq[g+1,g+1] mxpe[i,j] = mxSq[i,g+1] * mxSq[g+1,j] / mxSq[g+1,g+1] / mxSq[g+1,g+1] if(i==j) { mxw[i,j] = 0; } else { mxw[i,j] = abs(i-j) } } sumWP = 0 sumWPe = 0 sumW2P = 0 for(i in 1:g) for(j in 1:g) { sumWP = sumWP + mxw[i,j] * mxp[i,j] sumWPe = sumWPe + mxw[i,j] * mxpe[i,j] sumW2P = sumW2P + mxw[i,j] * mxw[i,j] * mxp[i,j] } kappa = 1.0 - sumWP / sumWPe #Cohen Kappa se = sqrt((sumW2P - sumWP * sumWP) / (n * sumWPe * sumWPe)) # SE print(paste("Cohen's Kappa=", kappa," SE=", se )) print(paste0("95% CI = ", (kappa - 1.96 * se), " to ", (kappa + 1.96 * se))) }
Program 1: data entry is by pairs of values
#Program 1: data entry by 2 coulumns of paired values datValues = (" 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 1 2 1 3 1 3 1 2 1 2 2 1 2 3 2 3 2 1 2 1 3 1 3 2 3 1 3 2 3 2 ") datMx <- read.table(textConnection(datValues),header=FALSE) # matrix of count in ranks # datMx # optional printout for original data n = nrow(datMx) # ranking tmpMx <- datMx # temporary scratch matrix rankMx <- matrix(data=0, nrow=n,ncol=2, byrow=TRUE) # data ranked to range of values minv = 0 rank = 0 cycle = 0 while(minv<1e10 & cycle<2*n) { minv = 1e10 rank = rank + 1 cycle = cycle + 1 minv = min(tmpMx) if(minv<1e10) { for(i in 1:n)for(j in 1:2)if(tmpMx[i,j]==minv) { rankMx[i,j] = rank tmpMx[i,j] = 1e10 } } } g = rank - 1 # number of ranks # rankMx # optional printout of ranks # Create count matrix countMx <- matrix(data=0, nrow=g,ncol=g, byrow=TRUE) for(i in 1:n)countMx[rankMx[i,1],rankMx[i,2]] = countMx[rankMx[i,1],rankMx[i,2]] + 1 # countMx # optional printout of count matrix CalCohenKappa(countMx) # call function to calculate and present results
The results are
[1] "Matrix of count by ranks" [,1] [,2] [,3] [1,] 5 3 2 [2,] 3 5 2 [3,] 2 3 5 [1] "Cohen's Kappa= 0.278481012658228 SE= 0.14691180903751" [1] "95% CI = -0.00946613305529259 to 0.566428158371748"
Program 2 allows data entry using the count matrix by ranks (if this has already been calculated)
# Program 2: data entry using matrix of counts by ranks datCount = (" 5 3 2 3 5 2 2 3 5 ") mx <- read.table(textConnection(datCount),header=FALSE) # matrix of count in ranks CalCohenKappa(mx)
The results are
[1] "Matrix of count by ranks" [,1] [,2] [,3] [1,] 5 3 2 [2,] 3 5 2 [3,] 2 3 5 [1] "Cohen's Kappa= 0.278481012658228 SE= 0.14691180903751" [1] "95% CI = -0.00946613305529259 to 0.566428158371748"