Content Disclaimer
Copyright @2020.
All Rights Reserved.

StatsToDo: CUSUM for Proportions with Bernoulli Distribution

Links : Home Index (Subjects) Contact StatsToDo


Explanations R Code
The example is a made up one to demonstrate the numerical process, and the data is generated by the computer. It purports to be from a quality control exercise in an obstetric unit, using Caesarean Section Rate as the quality indicator.
  • From records in the past, we established the benchmark Caesarean Section Rate to be 20% (0.2), and this can be capped if the junior staff and midwives are well trained and closely supervised.
  • With time however, experienced staff leave and replaced by the less experienced and trained. The standard of supervision would gradually deteriorate, resulting in an increase in the Caesarean Section rate.
  • We would like to trigger an alarm and reorganize the working and supervision framework when the Caesarean Section Rate increases to 25% (0.25) or more.
  • As re-organizing working framework is time consuming and disruptive, we would like any false alarm to be no more frequent than once per week, approximately 100 births. The Average Number of Observations (anos) = 100

Planning Parameters

# Section 1: parameters and data
inControlProp = 0.2
outOfControlProp = 0.25
anos = 100 

Section 2: Calculate the Reference Value (gb, γB) abd Decision Interval (h)

Section 2 consists of 2 parts. Section 2a. contains the algorithm described by Reynolds, translated to R, and Section 2b the control statements to produce

Section 2a. Reynold's algorithm in R The algorithms finds γB and h from the input parameters. I have merely translated the formulae from the paper, and am unable to explain why it is done the way it is. Those interested are referred to the original paper which explains in greatr details both the theory and the algorithm.

# Section 2a: Algorithm Translated from Paper by 
# Reynolds Jr. MR and Stoumbos ZG (1999) A CUSUM Chart for Monitoring a Proportion when 
#     Inspecting Continuously.  Journal of Quality Technology vol 31: No. 1. p.87 - 108

# EpsilonZ3 is an intermediary variable necessary for calculating  Epsilon(p).
#            the calculation is the first part of equation C1.
EpsilonZ3 <- function(p)
{
  x = (1 - p) / p
  return (x^0.5 - x^-0.5)
}

#  Epsilon(p) is equation 6, an intermediary coefficient in calculating ANOS
EpsilonP <- function(p)
{
  if(p<0.01)
  {
    return (sqrt((1 - p) / p) - sqrt(p / (1 - p))) / 3
  }
  if(p<=0.5)
  {
    return (0.41 - 0.0842 * log(p) - 0.0391 * log(p)^3
            -0.00376 * log(p)^4 - 0.000008 * log(p)^7)
  }
  return (EpsilonZ3(p)/3 + EpsilonP(1 - p))    # equation C2
}

#  KSi function is the function (C5) for a particular KSi value.
#  This is repeatedly called by the following function as iterations until
#  a KSi that will return zero

KSiFunction <- function(p0, p1, p, KSi)  # equation C5
{  
  t = 1 - ((1 - p1) / (1 - p0))^KSi
  b1 = (p1 / p0)^KSi
  b2 = ((1 - p1) / (1 - p0))^KSi
  return (p - t / (b1 - b2))
}

# Find KSi is an iterative algorithm (equation C5) to find the correct KSi value
# which will return zero (0) for a particular combination of po,p1 and p.
# KSi is necessary to calculate ANOS when p is not p0,p1, or r1/r2

FindKSi<- function (p0, p1, p)
{
  el = -1
  er = 1
  em = 0.5
  ep = KSiFunction(p0, p1, p, em)
  while(abs(ep)>0.000001)
  {
    if(ep>0)
    {
      er = em
    }
    else  
    { 
      el = em
    }
    em = (el + er) / 2
    ep = KSiFunction(p0, p1, p, em)
  }
  em
}


# FindNOS is the key function, finding the number of samples required to signal (equation C3)
# p0 and p1 are the proportions in and out of control.  p=proportion of interest
# r1 and r2 were already calculated from p0 and p1
# h is the decision value

FindNOS <- function(p0, p1, r1, r2, h, p)   # equation C3
{
  eP0 = EpsilonP(p0)
  if(p1>p0)
  {
    hsb = h + eP0 * sqrt(p0 * (1 - p0))   # equaltion 5 for detecting increase
  } 
  else 
  {
    hsb = h - eP0 * sqrt(p0 * (1 - p0))  # equaltion 10 for detecting decrease
  }
  if(p==p0)
  {
    return ((exp(hsb * r2) - hsb * r2 -1) / abs(r2 * p0 - r1))
  }
  if(p==p1)
  {
    return ((exp(hsb * r2) + hsb * r2 - 1) / abs(r2 * p1 - r1))
  }
  if(p==(r1 / r2))
  {
    return (hsb * (hsb + abs(-r1 / r2)) * r2^2 / (r1 * (r2 - r1)))
  }
  KSiP = FindKSi(p0, p1, p)
  return ((exp(KSiP * hsb * r2) -  KSiP * hsb * r2 -1) / abs(KSiP * (r2 * p - r1)))
}

#
# Translating algorithm from paper to usable parameters
#
# function FindH iterates h to FindNOS until the correct h is found for a particular NOS

FindH <- function(p0, p1, r1, r2, p, nos)
{
  hl = 0
  if(p1>p0) 
  {
    hh = 50 
    hm = 25 
  } 
  else  
  {
    hh = 50 
    hm = -25 
  }
  a = FindNOS(p0, p1, r1, r2, hm, p)
  while(abs(a-nos)>0.1)
  {
    if (a>nos) 
    { 
      hh = hm
    }   
    else 
    { 
        hl = hm; 
    }
    hm = (hl + hh) / 2
    a = FindNOS(p0, p1, r1, r2, hm, p)
  }
  hm
}

FindH calculates the decision interval h (hm inside the function) from the in and out of control proportions (p0, p1), the proportion of interest (p), which in planning is the same as p0, the anos, and r1 and r2 calculated in the main programs. It returns the value of the decision interval h

Section 2b. The main controlling commands, interfacing input parameters and Reynold's algorithm

# Calculate gb and h
p0 = inControlProp
p1 = outOfControlProp
r1 = -log((1 - p1) / (1 - p0))
r2 = log((p1 * (1- p0)) / (p0 * (1 - p1)))  # equation 3
gb = r1 / r2                                # equation 4
h = FindH(p0, p1, r1, r2, p0, anos)
cat("Reference Value gammaB gb =",gb,"\tDecision Interval h=", h, "\n")
Step 2a and 2b together calculte the parameters required by the function getH, the reference value γB (gb), calls the GetH function to obtain the decision interval h, and displays the results
Reference Value gammaB gb = 0.2243397 	Decision Interval h= 3.164673 

Step 3: CUSUM Plot

Step 3 is divided into 2 parts. Step 3a calculates the cusum vector, and 3b plots the vector and h in a graph.

# Step 3a: Calculate CUSUM
dat=c(
  0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
  0,0,0,0,1,0,1,1,1,0,0,0,0,0,0,1,0,0,0,0,1,1,1,1,0,0,0,0)
cusum <- vector()
cusumValue = 0
for(i in 1 : length(dat))
{
  if(dat[i]!=0) 
  { 
    cusumValue = cusumValue + 1 - gb 
  }  
  else 
  { 
    cusumValue = cusumValue - gb
  }
  if(p1>p0) # Up
  {
    if(cusumValue<0)
    {
      cusumValue = 0
    }
  }
  else                               # down
  {
    if(cusumValue>0)
    {
      cusumValue = 0
    }
  }
  cusum[i] = cusumValue
}
cusum
The vector dat contains the values 0= negative (normal delivery), 1=positive (Caesarean Section) The first 2 lines of code in step 2a creates the empty cusum vector. The initial CUSUM value is set to 0. The remaining codes calculates the cusum value for each case, and places it in the cusum vector. The results are as follows
> cusum
 [1] 0.0000000 0.0000000 0.7756603 0.5513205 0.3269808 0.1026410 0.0000000 0.7756603
 [9] 0.5513205 0.3269808 0.1026410 0.0000000 0.0000000 0.7756603 0.5513205 0.3269808
[17] 1.1026410 0.8783013 0.6539616 0.4296218 0.2052821 0.0000000 0.0000000 0.0000000
[25] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.7756603
[33] 0.5513205 0.3269808 0.1026410 0.0000000 0.7756603 0.5513205 1.3269808 2.1026410
[41] 2.8783013 2.6539616 2.4296218 2.2052821 1.9809423 1.7566026 1.5322629 2.3079231
[49] 2.0835834 1.8592436 1.6349039 1.4105642 2.1862244 2.9618847 3.7375450 4.5132052
[57] 4.2888655 4.0645257 3.8401860 3.6158463

Step 3b: Plot CUSUM from Vector

# Step 3b: Plot the cusum vector and h
plot(cusum,type="l")
abline(h=h)
In step 3b, the first line plots the cusum vector, and the second line the decision interval h. The result plot is shown to the right.

Step 4: Optional Export Results

# Step 4: Optional export of results
#myDataFrame <- data.frame(dat,cusum)       #combine dat and cusum to dataframe
#myDataFrame                                #display dataframe
#write.csv(myDataFrame, "CusumBernoulli.csv")   # write dataframe to .csv file
Step 4 is optional, and in fact commented out, and included as a template only. Each line can be activated by removing the #

The first line places the two vectors, dat and cusum together into a dataframe

The second line displays the data, along with row numbers, in the console, which can then be copied and pasted into other applications for further processing

The third line saves the dataframe as a comma delimited .csv file. This is needed if the data is too large to handle by copy and paste from the console.

Javascript Plot