Content Disclaimer
Copyright @2020.
All Rights Reserved.

StatsToDo: CUSUM for Proportions with Negative Binomial Distribution

Links : Home Index (Subjects) Contact StatsToDo

Explanations R Code
The example is a made up one to demonstrate the numerical process, and the data is generated by the computer. It purports to be from a quality control exercise in an obstetric unit, using Caesarean Section Rate as the quality indicator.
  • From records in the past, we established the benchmark Caesarean Section Rate to be 20% (0.2), and this can be capped if the junior staff and midwives are well trained and closely supervised.
  • With time however, experienced staff leave and replaced by the less experienced and trained. The standard of supervision would gradually deteriorate, resulting in an increase in the Caesarean Section rate.
  • We would like to trigger an alarm and reorganize the working and supervision framework when the Caesarean Section Rate increases to 30% (0.3) or more.
  • In Negative Binomial terms this means 1 caesaran Section (CS) to 4 normal delivery (ND), 2 to 8, 3 to 12, and so on. The numbes to use depends on a balance between the spped of data acquisition and stability of the data. In this exercise, we have chosen 3 CS matching 12 ND
  • As re-organizing working framework is time consuming and disruptive, we would like any false alarm to be no more frequent than once every 100 sets of samples, so the average run length ARL = 100

Step 1: Define Parameters

The data is entered in step 1. This is the only part of the program that needs any editing
# Step 1: parameters and data
nPos = 3            # number of positives (CS) as decider (r)
icNeg = 12          # in control number of negatives (ND) between NPos (c0)
oocNeg =7           # out of control number of negatives(ND) between NPos (c1)
arl = 100 
theModel = "F"   #F for FIR,   Z for zero, S for steady state 
Step 1 contains the parameters and the data. This is the part where the user can edit, and change the values to that required in his/her own analysis

The first 4 lines sets the parameters required for the analysis. The logic is as follows

  • The number of positive cases (Caesarean Section) remains the same for in and out of control.
  • When in control, the CS rate is 20%, the ND rate is 80%, so there are 12 negatives to the 3 positives
  • When out of control the CS rate increases to 30%, the ND rate decreases to 70% so the number of ND is 7 to every 3 CSs (7 negatives to 3 positives)

The 4th line, the model has 3 options, which sets the first value of the CUSUM

  • F means Fast Initial Response, where the initial CUSUM value is set at half of the Decision Interval h. The rationale is that, if the situation is in control then CUSUM will gradually drift towards zero, but if the situation is already out of control, an alarm would be triggered early. The down side is that a false alarm is slightly more likely early on in the monitoring. As FIR is recommended by Hawkins, it is set as the default option
  • Z is for zero, and CUSUM starts at the baseline value of 0. This will lower the risk of false alarm in the early stages of monitoring, but will detect the out of control situation slower if it already exists at the begining.
  • S is for steady state, intended for when monitoring is already ongoing, and a new plot is being constructed. The CUSUM starts at the value when the previous chart ends.
  • Each model will make minor changes to the value of the decision interval h. The setting of the initial values is mostly intended to determine how quickly an alarm can be triggered if the out of control situation exists from the beginning.

Step 2: Calculate Reference Value and Decision Interval

# Step 2a: convert counts to odds and variance ref: Hawkins, p.147
icMu = nPos / icNeg                # in control mean = r/c0
icVar = icMu * (1 + 1 / icNeg)     # in control variance = icMu / (1+1/c0))
oocMu = nPos / oocNeg              # out of control mean = r/c1

# Step 2b: Calculate k and h
#install.packages("CUSUMdesign")   # if not already installed
result <- getH(distr=5, ICmean=icMu, ICvar=icVar, OOCmean=oocMu, ARL=arl, type=theModel)
k <- result$ref
h <- result$DI
  h = -h
cat("Reference Value k=",k,"\tDecision Interval h=", h, "\n")
Step 2a converts the counts into means and variance in term of odds

Step 2b performs that statistical calculations using the odds parameters. The package CUSUMdesign needs to be alrady installed, and the library activated each time the program is used.

result is the object that contains the results of the analysis. The result required for this program are the reference value (k) and decision interval h. Please note that h is calculated as a positive value. If the CUSUM is designed to detect a decrease from in control value, then h needs to be changed to a negative value.

The last line displays the results we need

Reference Value k= 0.3305118 	Decision Interval h= 3.666667 
Please note that, although the parameters are entered as counts, k and h are in terms of odds of outcome positive (Caesarean Section). This means that, as the count of negatives increases, odd decreases, and as count of negatives decreases, odd increases. In other words, the CUSUM chart will be the inverse of the negative counts, and reflect the actual change in the odd of outcome positives.

Section 3: CUSUM Plot

Step 3 is divided into 2 parts. Step 3a calculates the cusum vector, and 3b plots the vector and h in a graph.

# Step 3a: Create vector of cusum value
      10,9,6,10,6,9,9,9,9,6,7,6,7,8,7,8,7,9,8,7,5,8,7,10,10)  # number of negative cases between each set of 3 positive cases
cusum <- vector()
cusumValue = 0
  cusumValue = h / 2

for(i in 1 : length(dat))
  mu = nPos / dat[i]
  cusumValue = cusumValue + mu - k
  if(oocMu>icMu) # mu Up count down
      cusumValue = 0
  else          # mu down count up
      cusumValue = 0
  cusum[i] = cusumValue
dat is a vector of negative counts, each count matches the defined number of positives. In this example, it is the number of normal deliveries to each 3 Caesarean Sections.

The next 6 lines of code in step 3a creates the empty cusum vector and sets the initial cusum value. The 10th line converts the negative count into odd of positive/negative, before it is used to calculate CUSUM. The remaining codes calculates the cusum value for each measurement, and places it in the cusum vector.

The resulting CUSUM vector is as follows

> cusum
 [1] 1.752822 1.636595 1.536853 1.420627 1.362842 1.246616 1.166104 1.168926 1.088414
[10] 1.091236 1.094057 1.063546 1.161605 1.131093 1.400582 1.498641 1.501463 1.470951
[19] 1.515439 1.559927 1.657987 1.927475 2.096964 2.195023 2.364511 2.334000 2.336821
[28] 2.506309 2.475798 2.645286 2.648107 2.650929 2.653750 2.656572 2.826060 2.924120
[37] 3.093608 3.191668 3.236156 3.334216 3.378704 3.476763 3.479585 3.524073 3.622133
[46] 3.891621 3.936109 4.034169 4.003657 3.973145

Step 3b: Plotting CUSUM vector

# Step 3b: Plot the cusum vector and h
In step 3b, the first line plots the cusum vector, and the second line the decision interval h. The result plot is shown to the right

Step 4: Optional Export of Results

# Step 4: Optional export of results
#myDataFrame <- data.frame(dat,cusum)       #combine dat and cusum to dataframe
#myDataFrame                                #display dataframe
#write.csv(myDataFrame, "CusumNegBin.csv")   # write dataframe to .csv file
Step 4 is optional, and in fact commented out, and included as a template only. Each line can be activated by removing the #

The first line places the two vectors, dat and cusum together into a dataframe

The second line displays the data, along with row numbers, in the console, which can then be copied and pasted into other applications for further processing

The third line saves the dataframe as a comma delimited .csv file. This is needed if the data is too large to handle by copy and paste from the console.

Parameters are as calculated in R
Starting CUSUM value=0 for model Z
Starting CUSUM value=h/2 for model F

Number of Positives
Reference Value (k)
Decision Interval (h)
Starting CUSUM Value
MacroPlot Code