Related Links:
R Explained Page
CUSUM Charts Explained Page
Introduction
Example R Code
Example Explained
X
Y
This page provides explanations and example R codes for CUSUM quality control charts, for detecting changes in measurements that conforms to the Inverse Gaussian Distribution.
CUSUM Generally
CUSUM is a set of statistical procedures used in quality control. CUSUM stands for Cumulative Sum of Deviations.
In any ongoing process, be it manufacture or delivery of services and products, once the process is established and running, the outcome should be stable and within defined limits near a benchmark. The situation is said to be In Control
When things go wrong, the outcomes depart from the defined benchmark. The situation is then said to be Out of Control
In some cases, things go catastrophically wrong, and the outcomes departure from the benchmark in a dramatic and obvious manner, so that investigation and remedy follows. For example, the gear in an engine may fracture, causing the machine to seize. An example in health care is the employment of an unqualified fraud as a surgeon, followed by sudden and massive increase in mortality and morbidity.
The detection of catastrophic departure from the benchmark is usually by the Shewhart Chart, not covered on this site. Usually, some statistically improbable outcome, such as two consecutive measurements outside 3 Standard Deviations, or 3 consecutive measurements outside 2 Standard Deviations, is used to trigger an alarm that all is not well.
In many instances however, the departures from outcome benchmark are gradual and small in scale, and these are difficult to detect. Examples of this are changes in size and shape of products caused by progressive wearing out of machinery parts, reduced success rates over time when experienced staff are gradually replaced by novices in a work team, increases in client complaints to a service department following a loss of adequate supervision.
CUSUM is a statistical process of sampling outcome, and summing departures from benchmarks. When the situation is in control, the departures caused by random variations cancel each other numerically. In the out of control situation, departures from benchmark tend to be unidirectional, so that the sum of departures accumulates until it becomes statistically identifiable.
The mathematical process for CUSUM is in 2 parts. The common part is the summation of depertures from the bench mark (CUSUM), and graphically demonstrating it. The unique part is the calculation of the decision interval abbreviated as DI or h, and the reference value, abbreviated as k, which continuously adjustes the CUSUM and its variance. The two values of h and k depend on the following parameters
 The in control values
 The out of control values
 The Type I Error or false positive rate, expressed as the Average Run Length, abbreviated as ARL, the number of samples expected for a false positve decision when the situation is in control. ARL is the inverse of false positive rate. A false positive rate of 1% would have ARL=100
Data with skewed distribution
All unidirectional measurements are ratios against a known standard, and have positve values. In most cases, the range of measurement is sufficiently far from 0, and the variability sufficiently narrow that the errors incurred by assuming the normal distribution is trivial, and statistical analysis can be carried out with that assumption. When the values of measurements are close to zero (0), and the variations wide, the assumtion of normal distribution becomes invalid, as the probability curve becomes skewed with the mode near the lower values and a long tail for the higher values.
An example of this is in time measurements. When we consider the age of pregnant women, the range is between 2040 years, with the mode near the mean. The skew is so minor that in most cases age of the mother can be treated as normally distributed values.
However, if we consider the duration of labour, the range is from within a few minutes to over 24 hours, with most women delivered between 48 hours. The mode is therefore near the lower values, and there is a long tail of higher values. Statistical analysis based on the assumtion of normality will likely lead to misinterpretations.
Dealing with skewed data
A set of skewed data can be transformed, so that its distribution become closer to normal distribution. The common transformation is log and square root transformation, as these stretches the intervals amongst the lower values and shrink the intervals of the higher values. For more complex skews, the Box Cox transformation can be used. The data are transformed, the analysis assumes normal distribution, and the results reverse transformed to the original units. Data transformation are discussed in the Numerical Transformation Explained Page
and programs for them in Numerical Transformation Program Page
CUSUM for the Inverse Gaussian Distribution
When the data is a continuous measurement of all positive values, and has a skew, the Inverse Gaussian distribution is the most appropriate model to use for CUSUM, as the model allows a precise prediction of what the data should be.
Details of how the analysis is done and the results are describer in the panel R Code Explained. Conceptually, the algorithm is as follows
 Before monitoring begins, a set of reference data must be obtained so that the central location (mu μ) and the skew (lambda λ) can be estimated. These become the in control mu and lambda.
 The out of control location (mu) is then nominated.
 With the in and out of control parameters, and the average run length, the reference value (k) and decision interval h can be computed.
 CUSUM can then be calculated and plotted using monitoring data.
References
CUSUM : Hawkins DM, Olwell DH (1997) Cumulative sum charts and charting for
quality improvement. SpringerVerlag New York. ISBN 0387983651 p 4774, 147148
Hawkins DM (1992) Evaluation of average run lengths of cumulative sum charts for an arbitrary data distribution. Journal Communications in Statistics  Simulation and Computation Volume 21,  Issue 4 Pages 10011020
https://cran.rproject.org/web/packages/CUSUMdesign/index.html
https://cran.rproject.org/web/packages/CUSUMdesign/CUSUMdesign.pdf
This section presents the R code in total so it can be copied and pasted into RStudio as a template. Detailed explanations for each step are in the next panel
For those unfamiliar with R, basic information on setting up R and common R procedures can be found in the R Explained Page
# Step 0: before monitoring, establish incontrol mu and lambda
refData < c(7.5, 4, 5.5, 9, 7, 16, 3.5, 17.5, 11.5, 14.5, 11, 7.5, 12.5, 11, 6.5, 8, 9, 6.5,
20, 12, 13, 10, 11.5, 13, 7.5, 11, 2, 14.5, 17, 10, 20.5, 9, 2.5, 7, 18.5, 6, 3.5,
3, 13, 17, 7, 10.5, 10.5, 6.5, 24.5, 6, 10.5, 5.5, 8.5, 17)
#install.packages("goft") # if not already installed
library(goft)
ig_fit(refData) # fitting an inverse Gaussian distribution to refData
# Step 1 Input parameters and data
inControlMu = 10
inControlLambda = 29
outOfControlMu = 20
arl = 100
theModel = "F" #F for FIR, Z for zero, S for steady state
dat=c(15, 17.5, 8, 16, 7.5, 13, 14.5, 11, 27, 9, 12, 12, 12, 14, 10.5, 16, 17, 18.5, 14, 15.5)
# Step 2: Calculate k and h
#install.packages("CUSUMdesign") # if not already installed
library(CUSUMdesign)
result < getH(distr=6, ICmean=inControlMu, IClambda=inControlLambda, OOCmean=outOfControlMu,
ARL=arl, type=theModel)
k = result$ref
h = result$DI
if(outOfControlMu<inControlMu)
{
h = h
}
cat("Reference Value k=",k,"\tDecision Interval h=", h, "\n")
# Step 3: Calculate and plot CUSUM
#Step 3a: Calculate CUSUM
cusum < vector()
cusumValue = 0
if(theModel=="F")
{
cusumValue = h / 2
}
for(i in 1 : length(dat))
{
cusumValue = cusumValue + dat[i]  k
#print(cusumValue)
if(outOfControlMu>inControlMu) # Up
{
if(cusumValue<0)
{
cusumValue = 0
}
}
else # down
{
if(cusumValue>0)
{
cusumValue = 0
}
}
cusum[i] = cusumValue
}
# Step 3b: Plot CUSUM
plot(cusum,type="l")
abline(h=h)
# Step 4: Optional export of results
#myDataFrame < data.frame(dat,cusum) #combine dat and cusum to dataframe
#myDataFrame #display dataframe
#write.csv(myDataFrame, "CusumInvGauss.csv") # write dataframe to .csv file
The example is a made up one to demonstrate the numerical process, and the data is generated by the computer. It purports to be from a quality control exercise in the outpatient clinic, using the patient waiting time between arrival and being attended to as the quality indicator.
 After much organization and training, the initial benchmark has been achieved, that patients do not have to wait for more than 20 minutes between arrival and being attended to. CUSUM is then established to monitor that this standard is maintained.
 With time, experienced and trained staff leave and replaced by the less experienced and trained. The standard may then deteriorate, resulting in progressively longer waiting time for patients.
 We would like to trigger an alarm to carry out retraining and reorganization when the waiting time doubled to that of the bench mark.
 As training and reorganization is disruptive to services, we set the average run length (ARL) to 100. If we sample one waiting time per clinic session, the false alarm rate should be no more frequent than 100 clinic sessions.
# Step 0: before monitoring, establish incontrol mu and lambda
refData < c(7.5, 4, 5.5, 9, 7, 16, 3.5, 17.5, 11.5, 14.5, 11, 7.5, 12.5, 11, 6.5, 8, 9, 6.5,
20, 12, 13, 10, 11.5, 13, 7.5, 11, 2, 14.5, 17, 10, 20.5, 9, 2.5, 7, 18.5, 6, 3.5,
3, 13, 17, 7, 10.5, 10.5, 6.5, 24.5, 6, 10.5, 5.5, 8.5, 17)
#install.packages("goft") # if not already installed
library(goft)
ig_fit(refData) # fitting an inverse Gaussian distribution to refData

Step 0 is undertaken during the planning stage. We collect waiting time (in minutes closest 0.5 minutes) from 50 patients. It can be seen from the plot to the left that, when compared with the theoretical normal distribution curve, the data is skewed to the left with a long right tail, and the 95% confidence interval on the left would include the impossible negative values. This is typical of the Inverse Gaussian Distribution.
We calculate the central location (mu) and lambda, which describe the in control charactersitcs based on the Inverse Gaussian Distribution. The results are:
Inverse Gaussian MLE
mu 10.32000
lambda 28.95172

# Step 1 Input parameters and data
inControlMu = 10
inControlLambda = 29
outOfControlMu = 20
arl = 100
theModel = "F" #F for FIR, Z for zero, S for steady state
dat=c(15, 17.5, 8, 16, 7.5, 13, 14.5, 11, 27, 9, 12, 12, 12, 14, 10.5, 16, 17, 18.5, 14, 15.5)
Step 1 contains the parameters and the data. This is the part where the user can edit, and change the values to that required in his/her own analysis
The first 4 lines sets the parameters required for the analysis. The logic is as follows
 The in control location (inControlMu) and lambda (inControlLambda) are the rounded values from step 0. The out of control location (outOfControlMu) is set to be twice that of in control.
The 5th line, the model has 3 options, which sets the first value of the CUSUM
 F means Fast Initial Response, where the initial CUSUM value is set at half of the Decision Interval h. The rationale is that, if the situation is in control then CUSUM will gradually drift towards zero, but if the situation is already out of control, an alarm would be triggered early. The down side is that a false alarm is slightly more likely early on in the monitoring. As FIR is recommended by Hawkins, it is set as the default option
 Z is for zero, and CUSUM starts at the baseline value of 0. This will lower the risk of false alarm in the early stages of monitoring, but will detect the out of control situation slower if it already exists at the begining.
 S is for steady state, intended for when monitoring is already ongoing, and a new plot is being constructed. The CUSUM starts at the value when the previous chart ends.
 Each model will make minor changes to the value of the decision interval h. The setting of the initial values is mostly intended to determine how quickly an alarm can be triggered if the out of control situation exists from the beginning.
The last part, c() is a function that creates a vector (array) containing the comma separated values within the bracket. In this case, dat is the name of the vector which contains the waiting time (in minutes to the nearest 0.5) of 20 patients.
The remainder of the program does not require any editing or change by the user, unless he/she wishes to alter the program for specific purposes.
Step 2 calculates k and h from the input parameters, and this is divided into 2 parts.
# Step 2: Calculate k and h
#install.packages("CUSUMdesign") # if not already installed
library(CUSUMdesign)
result < getH(distr=6, ICmean=inControlMu, IClambda=inControlLambda, OOCmean=outOfControlMu,
ARL=arl, type=theModel)
k = result$ref
h = result$DI
if(outOfControlMu<inControlMu)
{
h = h
}
cat("Reference Value k=",k,"\tDecision Interval h=", h, "\n")
result is the object that contains the results of the analysis. The result required for this program are the reference value k and decision interval h. Please note that h is calculated as a positive value. If the CUSUM is designed to detect a decrease from in control value, then h needs to be changed to a negative value.
The last line displays the results we need
Reference Value k= 13.33333 Decision Interval h= 22.07419
Step 3 is divided into 2 parts. Step 3a calculates the cusum vector, and 3b plots the vector and h in a graph.
#Step 3a: Calculate CUSUM
cusum < vector()
cusumValue = 0
if(theModel=="F")
{
cusumValue = h / 2
}
for(i in 1 : length(dat))
{
cusumValue = cusumValue + dat[i]  k
#print(cusumValue)
if(outOfControlMu>inControlMu) # Up
{
if(cusumValue<0)
{
cusumValue = 0
}
}
else # down
{
if(cusumValue>0)
{
cusumValue = 0
}
}
cusum[i] = cusumValue
}
The first 6 lines of code in step 3a creates the empty cusum vector and sets the initial cusum value. The 10th line converts the negative count into odd of positive/negative, before it is used to calculate CUSUM. The remaining codes calculates the cusum value for each measurement, and places it in the cusum vector
# Step 3b: Plot the cusum vector and h
plot(cusum,type="l")
abline(h=h)

In step 3b, the first line plots the cusum vector, and the second line the decision interval h. The result plot is shown to the left.

# Step 4: Optional export of results
#myDataFrame < data.frame(dat,cusum) #combine dat and cusum to dataframe
#myDataFrame #display dataframe
#write.csv(myDataFrame, "CusumInvGauss.csv") # write dataframe to .csv file
Step 4 is optional, and in fact commented out, and included as a template only. Each line can be activated by removing the #
The first line places the two vectors, dat and cusum together into a dataframe
The second line displays the data, along with row numbers, in the console, which can then be copied and pasted into other applications for further processing
The third line saves the dataframe as a comma delimited .csv file. This is needed if the data is too large to handle by copy and paste from the console.
Please note: R Studio write files to User/Document/ folder by default. The path needs to be reset if the user wishes to save files using a specific folder. Discussion in the file I/O panel of the R Explained Page
<
