Content Disclaimer
Copyright @2020.
All Rights Reserved.

StatsToDo: CUSUM for Normally Distributed Mean

Links : Home Index (Subjects) Contact StatsToDo


Explanations R Code
We will use an example to demonstrate how to use R codes for CUSUM

The example is a made up one to demonstrate the numerical process, and the data is generated by the computer. It purports to be from a quality control exercise in the manufacturing of ball bearings.

  • When everything is working normally, we expect our ball bearing to weigh 100g, with a Standard Deviation of 10g.
  • The machine making the ball bearing however wears out, and when this happens, the ball bearings progressively become bigger and hearvier. We would like to trigger an alarm and service the machinery when the ball bearing become 102g or more, an increase of 1/5 of Standard Deviation=10.
  • We sample a ball bearing 5 times a day, and we do not want a false alarm more frequent than every 20 days, so our average run length ARL = 5x20 = 100

Step 1: Defining CUSUM Parameters

The data is entered in step 1.
# Step 1: Define parameters and data
inControlMean = 100
inControlSD = 10
outOfControlMean = 102
arl = 100 
theModel = "F"   #F for FIR,   Z for zero, S for steady state 
The first 4 lines sets the parameters required for the analysis.

The 5th line, the model has 3 options, which sets the first value of the CUSUM

  • F means Fast Initial Response, where the initial CUSUM value is set at half of the Decision Interval h. The rationale is that, if the situation is in control then CUSUM will gradually drift towards zero, but if the situation is already out of control, an alarm would be triggered early. The down side is that a false alarm is slightly more likely early on in the monitoring. As FIR is recommended by Hawkins, it is set as the default option
  • Z is for zero, and CUSUM starts at the baseline value of 0. This will lower the risk of false alarm in the early stages of monitoring, but will detect the out of control situation slower if it already exists at the begining.
  • S is for steady state, intended for when monitoring is already ongoing, and a new plot is being constructed. The CUSUM starts at the value when the previous chart ends.
  • Each model will make minor changes to how quickly a departure can be detected

Step 2: Calculate Reference Value (k) and Decision Interval (h)

# Step 2: Estimating k and h
#install.packages("CUSUMdesign")   # if not already installed
library(CUSUMdesign)
result <- getH(distr=1, ICmean=inControlMean, ICsd=inControlSD, OOCmean=outOfControlMean, ARL=arl, type=theModel)
k = result$ref
h = result$DI
if(outOfControlMean<inControlMean)
{
  h = -h
}
cat("Reference Value k=",k,"\tDecision Interval h=", h, "\n")
Step 2 performs that statistical calculations using the parameters entered. The package CUSUMdesign needs to be alrady installed, and the library activated each time the program is used.

result is the object that contains the results of the analysis. The result required for this program are the Reference Value (k) and Decision Interval (h). Please note that h is calculated as a positive value. If the CUSUM is designed to detect a decrease from in control value, then h needs to be changed to a negative value.

The last line displays the results we need

Reference Value k= 101 	Decision Interval h= 69.34988 

Step 3: CUSUM

Step 3 is deivided into 2 parts. Step 3a calculates the cusum vector, and 3b plots the vector and h in a graph.

Step 3a. CUSUM Vector

# Step 3a: Create the cusum vector
# dat is the vector of measurements
dat=c(
  96.1,99.3,98.8,107.9,102.3,108.1,85.2,100.3,97.2,107.7,107.3,109.5,105.3,110.9,105.9,99.8,
  99.5,91.4,105.4,97.6,94.9,111.2,98.5,81.5,101.1,93.5,82.1,100.4,103.1,131.5,96.9,90.2,
  108.0,108.7,101.4,108.3,103.7,114.7,92.4,96.2,113.0,90.8,100.2,126.0,103.0,120.2,100.0,
  108.5,103.5,118.2,85.7,101.3,91.5,96.6,106.1,101.6,88.7,96.1,96.3,101.6,91.5,104.7,
  107.3,104.9,93.8,93.2,97.0,96.8,108.9,119.5,91.7,104.1,106.7,97.8,100.0,107.5,100.7,
  99.1,101.1,95.9)
cusum <- vector()
cusumValue = 0
if(theModel=="F")
{
  cusumValue = h / 2
}
for(i in 1 : length(dat))
{
  cusumValue = cusumValue + dat[i] - k
  #print(cusumValue)
  if(outOfControlMean>inControlMean) # Up
  {
    if(cusumValue<0)
    {
      cusumValue = 0
    }
  }
  else                               # down
  {
    if(cusumValue>0)
    {
      cusumValue = 0
    }
  }
  cusum[i] = cusumValue
}
The first 6 lines of code in step 3a creates the empty cusum vector and sets the initial cusum value. The remaining codes calculates the cusum value for each measurement, and places it in the cusum vector.

The result CUSUM vector is as follows

> cusum
 [1]  29.774941  28.074941  25.874941  32.774941  34.074941  41.174941  25.374941
 [8]  24.674941  20.874941  27.574941  33.874941  42.374941  46.674941  56.574941
[15]  61.474941  60.274941  58.774941  49.174941  53.574941  50.174941  44.074941
[22]  54.274941  51.774941  32.274941  32.374941  24.874941   5.974941   5.374941
[29]   7.474941  37.974941  33.874941  23.074941  30.074941  37.774941  38.174941
[36]  45.474941  48.174941  61.874941  53.274941  48.474941  60.474941  50.274941
[43]  49.474941  74.474941  76.474941  95.674941  94.674941 102.174941 104.674941
[50] 121.874941 106.574941 106.874941  97.374941  92.974941  98.074941  98.674941
[57]  86.374941  81.474941  76.774941  77.374941  67.874941  71.574941  77.874941
[64]  81.774941  74.574941  66.774941  62.774941  58.574941  66.474941  84.974941
[71]  75.674941  78.774941  84.474941  81.274941  80.274941  86.774941  86.474941
[78]  84.574941  84.674941  79.574941
Step 3b: Plot CUSUM
# Step 3b: Plot the cusum vector and h
plot(cusum,type="l")
abline(h=h)
In step 3b, the first line plots the cusum vector, and the second line the decision interval h. The result plot is shown to the right.

The data was a set of normally distributed random numbers created by the computer, with a Standard Deviation of 10. The first 40 values with a mean of 100, and the last 40 a mean of 102. The CUSUM plot reflects this change.

Step 4: archive data and results

# Step 4: Optional export of results
#myDataFrame <- data.frame(dat,cusum)       #combine dat and cusum to dataframe
#myDataFrame                                #display dataframe
#write.csv(myDataFrame, "CusumMean.csv")    # write dataframe to .csv file
Step 4 is optional, and in fact commented out, and included as a template only. Each line can be activated by removing the #

The first line places the two vectors, dat and cusum together into a dataframe

The second line displays the data, along with row numbers, in the console, which can then be copied and pasted into other applications for further processing

The third line saves the dataframe as a comma delimited .csv file. This is needed if the data is too large to handle by copy and paste from the console.

Please note: R Studio write files to User/Document/ folder by default. The path needs to be reset if the user wishes to save files using a specific folder.

Javascript Plot