Content Disclaimer
Copyright @2020.
All Rights Reserved.

StatsToDo: CUSUM for Normally Distributed Variance

Links : Home Index (Subjects) Contact StatsToDo


Explanations R Code
The example is a made up one to demonstrate the numerical process, and the data is generated by the computer. It purports to be from a quality control exercise in the manufacturing of ball bearings.
  • When everything is working normally, we expect our ball bearing to weigh 100g, with a Standard Deviation of 10g.
  • The machine making the ball bearing however wears out, and when this happens, the variations in the weight of ball bearings progressively widens. We would like to trigger an alarm and service the machinery when the Standard Deviation of the ball bearing become 10.5 or more, an increase of 0.5 from Standard Deviation=10.
  • We estimate the Standard Deviations 5 times a day, using 10 measurements each time to calculate the Standard Deviation. and we do not want a false alarm more frequent than every 20 days, so out average run length ARL = 5x20 = 100

Step 1. Define Planning parameters

# Step 1: parameters
inControlSD = 10
outOfControlSD = 10.5
ssiz = 10
arl = 100 
theModel = "F"   #F for FIR,   Z for zero, S for steady state 
Step 1 contains the parameters. The first 4 lines sets the parameters required for the analysis.

Please note: although the CUSUM is for variance, the parameters entered here are the in control and out of control Standard Deviations.

The 5th line, the model has 3 options, which sets the first value of the CUSUM

  • F means Fast Initial Response, where the initial CUSUM value is set at half of the Decision Interval h. The rationale is that, if the situation is in control then CUSUM will gradually drift towards zero, but if the situation is already out of control, an alarm would be triggered early. The down side is that a false alarm is slightly more likely early on in the monitoring. As FIR is recommended by Hawkins, it is set as the default option
  • Z is for zero, and CUSUM starts at the baseline value of 0. This will lower the risk of false alarm in the early stages of monitoring, but will detect the out of control situation slower if it already exists at the begining.
  • S is for steady state, intended for when monitoring is already ongoing, and a new plot is being constructed. The CUSUM starts at the value when the previous chart ends.
  • Each model will make minor changes to the value of the decision interval h. The setting of the initial values is mostly intended to determine how quickly an alarm can be triggered if the out of control situation exists from the beginning.

Step 2: Calculate Reference Value (k) and Decision Interval (h)

# Step 2: Calculate k and h
#install.packages("CUSUMdesign")   # if not already installed
library(CUSUMdesign)
result <- getH(distr=2, ICsd=inControlSD, OOCsd=outOfControlSD, samp.size=ssiz, ARL=ssiz, type=theModel)
k = result$ref
h = result$DI
if(outOfControlSD<inControlSD)
{
  h = -h
}
cat("Reference Value k=",k,"\tDecision Interval h=", h, "\n")
Step 2 performs that statistical calculations using the parameters entered. The package CUSUMdesign needs to be alrady installed, and the library activated each time the program is used.

result is the object that contains the results of the analysis. The result required for this program are the reference value (k) and decision interval h. Please note that h is calculated as a positive value. If the CUSUM is designed to detect a decrease from in control value, then h needs to be changed to a negative value.

The last line displays the results we need

Reference Value k= 104.9584 	Decision Interval h= 87.97868 
Please Note: that, although the input parameters were in Standard Deviations, k and h are now in variance units, and variance is the square of Standard Deviation. All subsequent output are in terms of variance, and not Standard Deviation.

Step 3: CUSUM Plot

Step 3 is divided into 2 parts. Step 3a calculates the cusum vector, and 3b plots the vector and h in a graph.

Step 3a. CUSUM Vector

# Step 3a: Calculate CUSUM values
dat=c(
  10.3,9.6,8.7,12.4,10.5,11.5,7.2,10.2,7.9,10.3,11.4,10.7,11.1,10.5,8,10.4,9.6,
  9.5,12.8,12,7.2,10.9,9,8.8,9.4,10.5,8,9.7,12.4,9.3,7.2,8.2,8,6.6,13.5,9.2,
  9.8,11.5,9.6,8.6,8.6,9.9,12.6,10.7,10.6,10.5,10.1,10.3,7.8,9.9,10.5,12.7,9.5,
  9.6,10.6,12.7,8.8,8.9,7.9,10.9,10.2,12.3,7.5,13.6,9,9,11.6,13.1,7.6,9.3,10,
  11.2,8.9,8.8,10.2,10.7,12.3,8.7,9.4,9.6,11.3,12.3,10.2,10.9,10.9,10.3,10.6,10.7,
  13.6,9.7,12.3,12.8,9,11.3,9,10.2,11.6,8.2,10.8,9.9
  )
cusum <- vector()
cusumValue = 0
if(theModel=="F")
{
  cusumValue = h / 2
}
for(i in 1 : length(dat))
{
  cusumValue = cusumValue + dat[i]^2 - k   # SD->variance
  if(outOfControlSD>inControlSD) # Up
  {
    if(cusumValue<0)
    {
      cusumValue = 0
    }
  }
  else                               # down
  {
    if(cusumValue>0)
    {
      cusumValue = 0
    }
  }
  cusum[i] = cusumValue
}
cusum

The code in 3a creates the empty cusum vector and sets the initial cusum value, then calculates the cusum value for each measurement, and places it in the cusum vector.

Please Note: The data entered are Standard Deviations, each value is squared to become variance before the CUSUM value is calculated in variance units

The resulting vector is shown as follows

> cusum
  [1]  45.1209877  32.3226345   3.0542813  51.8559282  57.1475750  84.4392218  31.3208687
  [8]  30.4025155   0.0000000   1.1316468  26.1332937  35.6649405  53.9165873  59.2082342
 [15]  18.2498810  21.4515278   8.6531747   0.0000000  58.8816468  97.9232937  44.8049405
 [22]  58.6565873  34.6982342   7.1798810   0.0000000   5.2916468   0.0000000   0.0000000
 [29]  48.8016468  30.3332937   0.0000000   0.0000000   0.0000000   0.0000000  77.2916468
 [36]  56.9732937  48.0549405  75.3465873  62.5482342  31.5498810   0.5515278   0.0000000
 [43]  53.8016468  63.3332937  70.7349405  76.0265873  73.0782342  74.2098810  30.0915278
 [50]  23.1431747  28.4348215  84.7664684  70.0581152  57.2597620  64.6614089 120.9930557
 [57]  93.4747025  67.7263494  25.1779962  39.0296430  38.1112899  84.4429367  35.7345835
 [64] 115.7362304  91.7778772  67.8195241  97.4211709 164.0728177 116.8744646  98.4061114
 [71]  93.4477582 113.9294051  88.1810519  60.6626987  59.7443456  69.2759924 115.6076392
 [78]  86.3392861  69.7409329  56.9425797  79.6742266 126.0058734 125.0875203 138.9391671
 [85] 152.7908139 153.9224608 161.3241076 170.8557544 250.8574013 239.9890481 286.3206949
 [92] 345.2023418 321.2439886 343.9756354 320.0172823 319.0989291 348.7005760 310.9822228
 [99] 322.6638696 315.7155165

Step 3b. CUSUM Plot

# Step 3b: Plot the cusum vector and h
plot(cusum,type="l")
abline(h=h)
In step 3b, the first line plots the cusum vector, and the second line the decision interval h. The result plot is shown to the right.

Step 4. Archive results

# Step 4: Optional export of results
#myDataFrame <- data.frame(dat,cusum)          #combine dat and cusum to dataframe
#myDataFrame                                   #display dataframe
#write.csv(myDataFrame, "CusumVariance.csv")   # write dataframe to .csv file
Step 4 is optional, and in fact commented out, and included as a template only. Each line can be activated by removing the #

The first line places the two vectors, dat and cusum together into a dataframe

The second line displays the data, along with row numbers, in the console, which can then be copied and pasted into other applications for further processing

The third line saves the dataframe as a comma delimited .csv file. This is needed if the data is too large to handle by copy and paste from the console.

Please note: R Studio write files to User/Document/ folder by default. The path needs to be reset if the user wishes to save files using a specific folder.

Javascript Plot