CUSUM Generally
CUSUM is a set of statistical procedures used in quality control. CUSUM stands for Cumulative Sum of Deviations.
In any ongoing process, be it manufacture or delivery of services and products, once the process is established and running, the outcome should be stable and within defined limits near a benchmark. The situation is said to be In Control
When things go wrong, the outcomes depart from the defined benchmark. The situation is then said to be Out of Control
In some cases, things go catastrophically wrong, and the outcomes departure from the benchmark in a dramatic and obvious manner, so that investigation and remedy follows. For example, the gear in an engine may fracture, causing the machine to seize. An example in health care is the employment of an unqualified fraud as a surgeon, followed by sudden and massive increase in mortality and morbidity.
The detection of catastrophic departure from the benchmark is usually by the Shewhart Chart, not covered on this site. Usually, some statistically improbable outcome, such as two consecutive measurements outside 3 Standard Deviations, or 3 consecutive measurements outside 2 Standard Deviations, is used to trigger an alarm that all is not well.
In many instances however, the departures from outcome benchmark are gradual and small in scale, and these are difficult to detect. Examples of this are changes in size and shape of products caused by progressive wearing out of machinery parts, reduced success rates over time when experienced staff are gradually replaced by novices in a work team, increases in client complaints to a service department following a loss of adequate supervision.
CUSUM is a statistical process of sampling outcome, and summing departures from benchmarks. When the situation is in control, the departures caused by random variations cancel each other numerically. In the out of control situation, departures from benchmark tend to be unidirectional, so that the sum of departures accumulates until it becomes statistically identifiable.
Terminology
In control describe the situation when everything is going according to plan, and the measurements being monitored are within the benchmark.
Out of control is the situation CUSUM is designed to detect, when the measurements drift outside of the benchmark
Average run length (ARL) is the estimated number of continuous observations before a false alarm is triggered. It is equivalent to the false positive rate or the Type I Error. A false positive rate of 1% (p=0.01) is the same as ARL=100.
The ARL is usually set in a balance between the need for investigation and intervention when things go wrong and the inconvenience and cost of a false alarm. For example, if the sampling rate is 5 a day, and the requirement is that a false alarm does not occur more frequently than every 20 days, then the ARL = 5x20 = 100
The CUSUM is designed to be a one tail algorithm, to test for departure from the benchmark upwards or downwards, but not both. If the user wishes to have a two tail test for both at the same time, then he/she needs to use two CUSUMs, one for each tail, but the ARL for each should be half of that required for the one tail situation.
Data is a vector (array) of values obtained during monitoring that are used to calculate the CUSUM
Terms the user can control, but is usually set in default
Model sets the initial value of CUSUM in a run, which determines how rapidly the out of control situation can be detected if it exists already. The more rapid the response will of course lead to a greater risk of a false alarm. The 3 options are
- F for Fast Initial Resposnse (FIR), where the initial CUSUM value is set at half of the Decision Interval (h). This is the default option, as recommended by Hawkin's textbook
- Z for zero (0), where the initial CUSUM value is set to 0. This can be used if the user is certain that the situation is in control initially, and wish to avoid an early false alarm
- S is for steady state, used when the CUSUM value is supposed to be from the end of a previous CUSUM which has just ended, and the value can be set by the user. S is usually not offered in StatsToDo as this requires the user to alter the algorithm to set an initial value
Winsorization is a statistical process whereby unexpected outliers with extreme values are modified before they are used for calculating CUSUM. Winsorization is not provided by StatsToDo, and users will need to manually modify extreme outlier values before analysis if they should choose to do so.
Terms for results produced by the algorithm
Reference Value (k) is used to adjust the value of the CUSUM and control the proliferation of its variance. It is used in all subsequent calculations, but need not be attended to by the user
Decision Interval (h) the the value of the CUSUM which should trigger an alarm that the out of control situation has been detected.
CUSUM values is calculated from the data using the reference value (k). It is usually stored in a vector, and used for plotting.
CUSUM programs available on StatsToDo
The following programs are available as individual pages on this site
Measurements
- CUSUM for means with Normal distribution
- CUSUM for variances with Normal distribution
- CUSUM for measurements with Inverse Gaussian distribution
Proportions
- CUSUM for proportions with Binomial distribution
- CUSUM for proportions with Negative Binomial distribution
- CUSUM for proportions with Bernoulli distribution
Counts
- CUSUM for counts with Poisson distribution
- CUSUM for counts with Negative Binomial distribution
This page is for CUSUM for proportions or counts with Negative Binomial distribution
CUSUM for Binomial Distribution
Proportions
Proportions can be handled under 3 common types of distribution
- The Binomial Distribution where the measurement is the number of the positive cases in a group of set sample size. The advantage of such an appropach is that the results tend to be stable, as short term variations are evened out with many cases. The disadvantage is that evaluation can only take place when the planned sample size per group has been reached, so conclusions tend to take a long time.
- The Negative Binomial Distribution Where the measurement is the number of negative cases between a set number of positive cases. Evaluation can take place after each time the set number of positive case is reached, so conclusions can be reached sooner. However the results tend to be more variable as it is influenced by short term variations.
- The Bernoulli Distribution where the measurement is either positive or negative for each case. Evaluation therefore takes place after each observation, so conclusions can be reached very quickly, but the results tend to be more chaotic as it varies with each observation.
- This page describes the Negative Binomial Distribution.
CUSUM for Proportions based on the Negative Binomial Distribution
The Negative Binomial Distribution is based on the number of outcome negative cases (nNeg) between a pre-determined number of positive cases (nPos), and each sampling is examined when the defined number of positive cases have been reached. An example is the Caesarean Section rate in many obstetric units, say 20%, which is 4 normal deliveeries to each Caesarean Section, 8 to 2 Caesarean Section, 12 between 3 Caesarean Sections. The number of positive cases (e.g. Caesarean Section) is nominated and constant as a parameter, and the number of negative cases (e.g. normal delivery) is the measurement.
Negative Binomial Distribution is an alternative to the Binomial distribution for CUSUM of proportions. It is sometimes preferred because each sample is quicker, and the data can be obtained when the defined number of positive cases is reached, rather than waiting for results from all the cases in a defined sample size to be completed.
Negative Binomial Distribution can also be an alternative to Poisson distribution for CUSUM on counts, particularly if the assumptions of Poisson (variance=mean) cannot be met.
The parameters required are
- The number of positive cases in each sample. This remains constant throughout a CUSUM project
- The expected number of negative cases in each sample to match the defined number of positive cases.
- The Average Run Length (ARL). This depends on a balance between the importance of detecting deviation against the cost of disruption in case of a false positive. The ARL in Binomial Distribution is based on the number of groups and not on number of cases. Please note: that the algorithm on this page is intended for a one tail monitoring, either an increase or a decrease in the value. If the user intends a two tail monitoring, to detect either increase of decrease, the two CUSUM charts should be created, each with half the ARL that of a one tail CUSUM.
Details of how the analysis is done and the results are describer in the R code panel. Conceptually, th algorithm is as follows
- The statistics is based on the odds ratio. If r=number of positive cases, and c=number of negative cases:
- mean (mu, μ) = r / c
- variance (v) = μ(1+1/c)
- μin control, vin control, μout of control and ARL are used to obtain the reference value (k) and decision interval (h), both expressed as odds
- The negative outcome count (n) obtained during monitoring is converted into odds, odd = r / n, which is then used to calculate CUSUM
- The CUSUM chart is therefore one of cumulative changes in odds of negative outcome. If the negative counts increases, the odd decreases. If the counts decreases, the odd increases.
Plotting CUSUM
Each CUSUM value (CUSUM
n) is the previous CUSUM value (CUSUN
n-1, plus the odd calculated from the current number of negatives in the set, corrected by the Reference value (k)
CUSUMn = CUSUMn-1 + odd - k, where odd = nNeg / nPos
If CUSUM crosses the zero value (0) it is truncated to 0
The CUSUM values are plotted sequentially. An alarm is triggered when CUSUM crosses the Decision Interval (h)
Please note: Plotting for CUSUM on this page is provided both using R codes, and Javascript plotting
References
CUSUM : Hawkins DM, Olwell DH (1997) Cumulative sum charts and charting for
quality improvement. Springer-Verlag New York. ISBN 0-387-98365-1 p 47-74, 147-148
Hawkins DM (1992) Evaluation of average run lengths of cumulative sum charts for an arbitrary data distribution. Journal Communications in Statistics - Simulation and Computation Volume 21, - Issue 4 Pages 1001-1020
https://cran.r-project.org/web/packages/CUSUMdesign/index.html
https://cran.r-project.org/web/packages/CUSUMdesign/CUSUMdesign.pdf
The example is a made up one to demonstrate the numerical process, and the data is generated by the computer. It purports to be from a quality control exercise in an obstetric unit, using Caesarean Section Rate as the quality indicator.
- From records in the past, we established the benchmark Caesarean Section Rate to be 20% (0.2), and this can be capped if the junior staff and midwives are well trained and closely supervised.
- With time however, experienced staff leave and replaced by the less experienced and trained. The standard of supervision would gradually deteriorate, resulting in an increase in the Caesarean Section rate.
- We would like to trigger an alarm and reorganize the working and supervision framework when the Caesarean Section Rate increases to 30% (0.3) or more.
- In Negative Binomial terms this means 1 caesaran Section (CS) to 4 normal delivery (ND), 2 to 8, 3 to 12, and so on. The numbes to use depends on a balance between the spped of data acquisition and stability of the data. In this exercise, we have chosen 3 CS matching 12 ND
- As re-organizing working framework is time consuming and disruptive, we would like any false alarm to be no more frequent than once every 100 sets of samples, so the average run length ARL = 100
Step 1: Define Parameters
The data is entered in step 1. This is the only part of the program that needs any editing
# Step 1: parameters and data
nPos = 3 # number of positives (CS) as decider (r)
icNeg = 12 # in control number of negatives (ND) between NPos (c0)
oocNeg =7 # out of control number of negatives(ND) between NPos (c1)
arl = 100
theModel = "F" #F for FIR, Z for zero, S for steady state
Step 1 contains the parameters and the data. This is the part where the user can edit, and change the values to that required in his/her own analysis
The first 4 lines sets the parameters required for the analysis. The logic is as follows
- The number of positive cases (Caesarean Section) remains the same for in and out of control.
- When in control, the CS rate is 20%, the ND rate is 80%, so there are 12 negatives to the 3 positives
- When out of control the CS rate increases to 30%, the ND rate decreases to 70% so the number of ND is 7 to every 3 CSs (7 negatives to 3 positives)
The 4th line, the model has 3 options, which sets the first value of the CUSUM
- F means Fast Initial Response, where the initial CUSUM value is set at half of the Decision Interval h. The rationale is that, if the situation is in control then CUSUM will gradually drift towards zero, but if the situation is already out of control, an alarm would be triggered early. The down side is that a false alarm is slightly more likely early on in the monitoring. As FIR is recommended by Hawkins, it is set as the default option
- Z is for zero, and CUSUM starts at the baseline value of 0. This will lower the risk of false alarm in the early stages of monitoring, but will detect the out of control situation slower if it already exists at the begining.
- S is for steady state, intended for when monitoring is already ongoing, and a new plot is being constructed. The CUSUM starts at the value when the previous chart ends.
- Each model will make minor changes to the value of the decision interval h. The setting of the initial values is mostly intended to determine how quickly an alarm can be triggered if the out of control situation exists from the beginning.
Step 2: Calculate Reference Value and Decision Interval
# Step 2a: convert counts to odds and variance ref: Hawkins, p.147
icMu = nPos / icNeg # in control mean = r/c0
icVar = icMu * (1 + 1 / icNeg) # in control variance = icMu / (1+1/c0))
oocMu = nPos / oocNeg # out of control mean = r/c1
# Step 2b: Calculate k and h
#install.packages("CUSUMdesign") # if not already installed
library(CUSUMdesign)
result <- getH(distr=5, ICmean=icMu, ICvar=icVar, OOCmean=oocMu, ARL=arl, type=theModel)
k <- result$ref
h <- result$DI
if(oocMu<icMu)
{
h = -h
}
cat("Reference Value k=",k,"\tDecision Interval h=", h, "\n")
Step 2a converts the counts into means and variance in term of odds
Step 2b performs that statistical calculations using the odds parameters. The package CUSUMdesign needs to be alrady installed, and the library activated each time the program is used.
result is the object that contains the results of the analysis. The result required for this program are the reference value (k) and decision interval h. Please note that h is calculated as a positive value. If the CUSUM is designed to detect a decrease from in control value, then h needs to be changed to a negative value.
The last line displays the results we need
Reference Value k= 0.3305118 Decision Interval h= 3.666667
Please note that, although the parameters are entered as counts, k and h are in terms of odds of outcome positive (Caesarean Section). This means that, as the count of negatives increases, odd decreases, and as count of negatives decreases, odd increases. In other words, the CUSUM chart will be the inverse of the negative counts, and reflect the actual change in the odd of outcome positives.
Section 3: CUSUM Plot
Step 3 is divided into 2 parts. Step 3a calculates the cusum vector, and 3b plots the vector and h in a graph.
# Step 3a: Create vector of cusum value
dat=c(12,14,13,14,11,14,12,9,12,9,9,10,7,10,5,7,9,10,8,8,7,5,6,7,6,
10,9,6,10,6,9,9,9,9,6,7,6,7,8,7,8,7,9,8,7,5,8,7,10,10) # number of negative cases between each set of 3 positive cases
cusum <- vector()
cusumValue = 0
if(theModel=="F")
{
cusumValue = h / 2
}
for(i in 1 : length(dat))
{
mu = nPos / dat[i]
cusumValue = cusumValue + mu - k
if(oocMu>icMu) # mu Up count down
{
if(cusumValue<0)
{
cusumValue = 0
}
}
else # mu down count up
{
if(cusumValue>0)
{
cusumValue = 0
}
}
cusum[i] = cusumValue
}
cusum
dat is a vector of negative counts, each count matches the defined number of positives. In this example, it is the number of normal deliveries to each 3 Caesarean Sections.
The next 6 lines of code in step 3a creates the empty cusum vector and sets the initial cusum value. The 10th line converts the negative count into odd of positive/negative, before it is used to calculate CUSUM. The remaining codes calculates the cusum value for each measurement, and places it in the cusum vector.
The resulting CUSUM vector is as follows
> cusum
[1] 1.752822 1.636595 1.536853 1.420627 1.362842 1.246616 1.166104 1.168926 1.088414
[10] 1.091236 1.094057 1.063546 1.161605 1.131093 1.400582 1.498641 1.501463 1.470951
[19] 1.515439 1.559927 1.657987 1.927475 2.096964 2.195023 2.364511 2.334000 2.336821
[28] 2.506309 2.475798 2.645286 2.648107 2.650929 2.653750 2.656572 2.826060 2.924120
[37] 3.093608 3.191668 3.236156 3.334216 3.378704 3.476763 3.479585 3.524073 3.622133
[46] 3.891621 3.936109 4.034169 4.003657 3.973145
Step 3b: Plotting CUSUM vector
# Step 3b: Plot the cusum vector and h
plot(cusum,type="l")
abline(h=h)
In step 3b, the first line plots the cusum vector, and the second line the decision interval h. The result plot is shown to the right
Step 4: Optional Export of Results
# Step 4: Optional export of results
#myDataFrame <- data.frame(dat,cusum) #combine dat and cusum to dataframe
#myDataFrame #display dataframe
#write.csv(myDataFrame, "CusumNegBin.csv") # write dataframe to .csv file
Step 4 is optional, and in fact commented out, and included as a template only. Each line can be activated by removing the
#
The first line places the two vectors, dat and cusum together into a dataframe
The second line displays the data, along with row numbers, in the console, which can then be copied and pasted into other applications for further processing
The third line saves the dataframe as a comma delimited .csv file. This is needed if the data is too large to handle by copy and paste from the console.