Content Disclaimer
Copyright @2020.
All Rights Reserved.
StatsToDo : Concordance (Agreements) Explained

Links : Home Index (Subjects) Contact StatsToDo

Disclaimer

StatsToDo is neither an authoritative reference site nor a teaching site, and the information provided is intended only to help the user negotiate the resources provided. Inexperienced users should seek guidance from professional statisticians.

Introduction

In this discussion, the term subjects refers to individuals or groups of individuals in a research project. Measurements refers to numerical or textual evaluation of events or results.

While many statistical procedures use measurements to evaluate similarities or differences between subjects, the statistics of concordance use subjects to evaluate similarities and differences between different measurements. Concordance procedures are therefore used to evaluate variations in measurements, the reliabilities of one or more measurements, and to compare different measurements of the same things.

StatsToDo provides 7 concordnce programs, divided into 3 groups used commonly under different circumstances

The first group is used to evaluate how much different measurements agree with each other, and includes the following

  • Kappa: the original Kappa developed by Fleiss in 1969. It evaluates agreements between 2 or more nominal measurements (where the measurements are names and labels such as 1=depression, 2=schizophrenia, 3=normal). Kappa is often used to test how different clinicians or psychosocial questionaires agree in their diagnoses or predictions. Further information and calculations for Kappa are available in Kappa.php
  • Cohen's Kappa: Cohen modified Fleiss's Kappa for use when there are only two (2) measurements, by inserting a weighting to the differences between the pair of measurements. This increases the influences with the width of difference, making the algorithm suitable for ordinal scales (where the measurements are ordered, but the difference between values are unstated. e.g. in Likert Scale where 1=strongly disagree, 2=disagree, 3=neutral, 4=agree, and 5= strongly agree. The difference betweern 1 and 2, strongly disagree and disagree, is not the same as 2 and 3 (disagree and neutral). Further information and calculations for Cohen's Kappa are available in CohenKappa.php
  • Kendall's coefficient of concordance for ranks (W) calculates agreements between 3 or more rankers according to the ranking order each placed on the individuals being ranked. The idea is that n subjects are ranked (0 to n-1) by each of the rankers, and the statistics evaluates how much the rankers agree with each other. The program from StatsToDo is modified so that it can accept any scale, and the input data are ranked before calculation. This allows the program to be used for a wide range of evaluations and measurements, providing they are at least ordered. Further information and calculations for Cohen's Kappa are available in KendallW.php
  • Intraclass Correlation (ICC): is an algorithm to measure agreement (consensus, concordance) between 2 or more measurements that are normally distributed. It has advantages over correlation coefficient, in that it is adjusted for the effects of the scale of measurements, and that it will represent agreements from more than two raters or measuring methods. Further information and calculations for ICC are available in IntraclassCorrelation.php
The second group is used to evaluate how much two or more measurements, together, measures the same thing, an evaluation of the reliability of multivariate measurements
  • Kuder Richardson Coefficient of reliability (K-R 20) was first described in 1937, and used to test the reliability of a multivariate measurement which are composed of two or more binary measurements (no/yes, true/false, right/wrong). It is commonly used in the evaluation of a batch of multiple choice questions, to see if the correct/error forms a general pattern that measures the achievements of the students. Further information and calculations for Kuder-Richardson Coefficient are available in KuderRichardson.php
  • Chronbach's Alpha: In 1941 Hoyt modified the Kuder Richarson Coefficient, adjusting it for continuity, and name this the Kuder Richardson Hoyt Coefficient. Cronbach in 1951 showed that this coefficient can be used generally in all scaled measurements. As he intended this be a starting point to develop even better indices, he named it Coefficient Alpha. This is now known as Cronbach's Alpha, a widely accepted measurement of internal consistency (reliability) of a multivariate measurement composing of two or more correlated items. Further information and calculations for Cronbach's Alpha are available in CronbachAlpha.php
The third group is used to evaluate accuracy and precision between paired measurements that are normally distributed. The pair consists of a reference (gold standard) and a new measurement which may be cheaper, less intrusive, or safer. The evaluation is whether the new measurement is good enough to use as the surrogate of the gold standard. An example can be to evaluate the accuracy and precision of blood pressure measurements uing an external sphygmomanometers, compared with the gold standard of an intra-arterial catheter.
  • Lin's Concordance Correlation Coefficient (CCC, ρc): evaluates the degree to which pairs of observations fall on the 45 degree line (the line of no difference) through the origin of a 2 dimensional plot of the two measurements. Lin (1989) describes standards used to determine how conclusions are to be drawn, and the coefficient used as the refernce to the precision of the new measurement. Further information and calculations for CCC are available in LinCCC.php
  • Bland and Altman (1986) discussed in details how agreements between two methods of measuring the same thing can be evaluated in nuanced details. The plot is now commonly used and considered by many as the standard method of evaluating a new measurement against a gold standard. The particular advantages offered are the detailed evaluations of many aspects of the relationship, the error, precision, how they change over the range of measurements, and the confidnce intervals that can be used to make decisions. Further information and calculations for Bland and Altman Plot are available in BlandAltmanPlot.php

References

The references for each aklgorithms are in its page, and will not be presented here. However, a nice discussion on the general approach can be found in https://www.sciencedirect.com/science/article/pii/S0093691X10000233