Discriminant

Content Disclaimer
Copyright @2020.
All Rights Reserved.

StatsToDo: Discriminant Analysis

Links : Home Index (Subjects) Contact StatsToDo

MacroPlot Resources
Explanations Javascript Program

Help & Hints Calculations

Program 1. Produce Discriminant Coefficients from Reference Data

Data Input for Discriminant Analysis Using Reference Data
    The data is for a single analysis
    It is a table of multiple columns
    Each row contains data from a case or a record
    All columns except the last are predictors and must be numerical
    The last column (on the right) is outcome group name, single character or text with no gaps

Program 2. Using Discriminant Coefficients for Calssification

Table of group names
Single column of group names in alphabetical order

Table of Means and Standard Deviations of predictor variables
Two columns, mean and Standard Deviations
Each row from a predictor variable

Discriminant Fubction Coefficients
Number of columns = number of significant functions
Each row from a predictor variable

Function Centroids values
Number of columns = number of significant functionss
Each row from a group

Apriori Probabilities
Enter apriori probability for each group, separated by spaces

Table of data to be analysed
columns are the predictor variables
Each row from a case

x axis
y axis

Functions to plot
Enter function numbers for 2D plotting

R Code

The Linear Discriminant analysis using R was carried out to check the accuracy of the Javascript program. Only the minimum amount of coding is used. User can search R for the numerous version of calculation and graphic support for Discriminant analysis.

Please note that R codes are in maroon, and results in blue

Step 1. Data entry

myDat = ("
v1    v2   v3     v4     Grp
1.2   45   3.16   72.7   SR
1.3   67   3.38  102.4   SR
1.1   48   3.61   33.7   SR
1.6   36   3.51   58.2   SR
1.5   47   3.20   44.2   DR
1.5   74   3.21   91.8   DR
1.7   47   3.39   53.1   DR
1.6   56   3.36   88.5   DR
1.1   27   3.30   36.3   SW
1.0   53   3.55   74.7   SW
0.9   37   3.23   94.2   SW
1.2   23   3.07   53.8   SW
1.4   44   3.34   20.7   DW
1.3   34   3.24    9.5   DW
1.1   37   3.24   17.8   DW
1.4   55   3.35   35.9   DW
") 
myDataFrame <- read.table(textConnection(myDat),header=TRUE)

Please note that the headers are included in R as they are required to call the algorithm

Step 2. Calculate the Standard Deviate z=(v-mean/SD)

myDataFrame$z1<-(myDataFrame$v1-mean(myDataFrame$v1)) / sd(myDataFrame$v1)
myDataFrame$z2<-(myDataFrame$v2-mean(myDataFrame$v2)) / sd(myDataFrame$v2)
myDataFrame$z3<-(myDataFrame$v3-mean(myDataFrame$v3)) / sd(myDataFrame$v3)
myDataFrame$z4<-(myDataFrame$v4-mean(myDataFrame$v4)) / sd(myDataFrame$v4)

Step 3. Display the data object, including the calculated z values

summary(myDataFrame)

The results are

    v1 v2   v3    v4 Grp          z1         z2         z3          z4
1  1.2 45 3.16  72.7  SR -0.45185501 -0.0460777 -1.1033570  0.58784387
2  1.3 67 3.38 102.4  SR -0.02657971  1.5758573  0.4019983  1.60105898
3  1.1 48 3.61  33.7  SR -0.87713031  0.1750953  1.9757788 -0.74264062
4  1.6 36 3.51  58.2  SR  1.24924619 -0.7095966  1.2915264  0.09317656
5  1.5 47 3.20  44.2  DR  0.82397089  0.1013709 -0.8296560 -0.38443326
6  1.5 74 3.21  91.8  DR  0.82397089  2.0919275 -0.7612308  1.23944012
7  1.7 47 3.39  53.1  DR  1.67452149  0.1013709  0.4704235 -0.08080988
8  1.6 56 3.36  88.5  DR  1.24924619  0.7648898  0.2651478  1.12686066
9  1.1 27 3.30  36.3  SW -0.87713031 -1.3731154 -0.1454036 -0.65394166
10 1.0 53 3.55  74.7  SW -1.30240561  0.5437168  1.5652273  0.65607384
11 0.9 37 3.23  94.2  SW -1.72768091 -0.6358722 -0.6243803  1.32131609
12 1.2 23 3.07  53.8  SW -0.45185501 -1.6680127 -1.7191841 -0.05692938
13 1.4 44 3.34  20.7  DW  0.39869559 -0.1198020  0.1282973 -1.18613545
14 1.3 34 3.24   9.5  DW -0.02657971 -0.8570452 -0.5559551 -1.56822331
15 1.1 37 3.24  17.8  DW -0.87713031 -0.6358722 -0.5559551 -1.28506892
16 1.4 55 3.35  35.9  DW  0.39869559  0.6911655  0.1967226 -0.66758765

Please note: z1, z2, z3, and z4 are the z values for v1, v2, v3, and v4

Step 4. Perform Linear Discriminant analysis and display results

#install.packages("MASS")   # if not already installed
library(MASS)
fit <- lda(Grp ~ z1 + z2 + z3 + z4, data=myDataFrame)  
fit

Please note: the calculations are based on the z values and not the original measurements

Prior probabilities of groups:
  DR   DW   SR   SW 
0.25 0.25 0.25 0.25 

Group means:
            z1         z2         z3         z4
DR  1.14292737  0.7648898 -0.2138289  0.4752644
DW -0.02657971 -0.2303885 -0.1967226 -1.1767538
SR -0.02657971  0.2488196  0.6414866  0.3848597
SW -1.08976796 -0.7833209 -0.2309352  0.3166297

Coefficients of linear discriminants:
          LD1        LD2        LD3
z1  1.3710438 -0.7462890  0.1364612
z2  1.7208986  0.4156835 -0.2198221
z3 -0.6709095 -0.2961205 -0.8885895
z4 -1.5797134 -1.4082862  0.2128178

Proportion of trace:
   LD1    LD2    LD3 
0.7899 0.1899 0.0202

The prior probabilities are calculated from the sample sizes of the groups

The LD1, 2, and 3 are the 3 Linear Discriminant functions

The proportion of trace represents the proportion of discriminating power of each function, and can be used to test for statistical significance

Step 5. Calculate function scores

predict(fit,newdata=myDataFrame,prior=c(1,1,1,1)/4)$x          #calculate function scores

please note that the prior term is actually unnecesary, as it is not used when function scores are calculated

When the data object is called as newdata, a separate sets of data can be used, providing the the appropriately labelled independent variables are present (in this example z1, z2, z3, and z4)

          LD1        LD2          LD3
1  -0.8871802 -0.1830651  1.054003243
2  -0.1234701 -1.6988951 -0.366512945
3  -1.0536723  1.1881588 -2.071887461
4  -0.5220622 -1.7409330 -0.801348480
5   2.4680678  0.2142880  0.745565901
6   3.2824521 -1.2654109  0.592784805
7   2.2823362 -1.2330374 -0.228987474
8   1.0710619 -2.2798046  0.006542448
9  -2.4349834  1.0478052  0.172180524
10 -2.9365081 -0.1894505 -1.548469220
11 -5.1313958 -0.6508717  0.740034619
12 -2.2466446  0.2331076  1.820538675
13  2.1281402  1.2850848 -0.285692745
14  1.3390091  2.0367135  0.345040379
15  0.1061805  2.3646456  0.240614788
16  2.6586689  0.8716648 -0.414407058

Step 6. Calculate the posterior (Bayesean) Probability of belonging to each group

predict(fit,newdata=myDataFrame,prior=c(1,1,1,1)/4)$posterior  #calculate posteriori (Bayesean) probabilities

Please note that the prior term for apriori probability is used here. If this is left out, the program assumes the prior probabilities are the same as that in the reference data, depending on the sample sizes of the groups there.

             DR           DW           SR           SW
1  1.027719e-02 1.738244e-02 0.8056435151 1.666969e-01
2  7.584478e-02 1.695596e-03 0.9196816594 2.777961e-03
3  2.543569e-04 5.741198e-02 0.8883330540 5.400061e-02
4  1.793459e-02 5.428730e-04 0.9760628920 5.459645e-03
5  6.615467e-01 3.338972e-01 0.0045559379 1.919581e-07
6  9.948814e-01 4.791425e-03 0.0003272035 5.284128e-10
7  9.744939e-01 1.355788e-02 0.0119480765 1.254452e-07
8  8.326310e-01 1.399772e-03 0.1659475333 2.173213e-05
9  2.634318e-06 5.441063e-04 0.0758854570 9.235678e-01
10 7.201969e-07 1.160473e-05 0.1924352689 8.075524e-01
11 9.464596e-12 1.011071e-10 0.0001827173 9.998173e-01
12 2.029302e-05 2.288982e-04 0.0560473803 9.437034e-01
13 5.420201e-02 9.416221e-01 0.0041755258 3.779071e-07
14 4.865341e-03 9.917932e-01 0.0033348804 6.545613e-06
15 7.663001e-04 9.729023e-01 0.0250258317 1.305537e-03
16 2.029808e-01 7.940588e-01 0.0029603250 4.639190e-08

when translated to normal numerical format with 2 decimal point precisions

	DR	DW	SR	SW
1	0.01	0.02	0.81	0.17
2	0.08	0.00	0.92	0.00
3	0.00	0.06	0.89	0.05
4	0.02	0.00	0.98	0.01
5	0.66	0.33	0.00	0.00
6	0.99	0.00	0.00	0.00
7	0.97	0.01	0.01	0.00
8	0.83	0.00	0.17	0.00
9	0.00	0.00	0.08	0.92
10	0.00	0.00	0.19	0.81
11	0.00	0.00	0.00	1.00
12	0.00	0.00	0.06	0.94
13	0.05	0.94	0.00	0.00
14	0.00	0.99	0.00	0.00
15	0.00	0.97	0.03	0.00
16	0.20	0.79	0.00	0.00

The function coefficients, scores, and probabilities are the same as that produced by the Javascrip program, other than minor discrepancies caused by different rounding errors.

StatsToDo: Discriminant Analysis

Initialize Plotting

Settings for lines

Settings for fills

Settings for fonts

Drawing on the bitmap

Drawing on the plotting area

The quickest and easiest way to draw axis

Other methods of drawing axis

Drawing lines

Drawing bars

Drawing dots

Drawing text

Other miscellaneous drawings

Different approaches to calculations

References

Genral references on Discriminant analysis

Resources used to developed the web bases Javascript program

Useful references and advice on Discriminant analysis using R

Default Example

Data Entry

Result Output

Step 1: Data Definition

Step 2: Create Discriminant functions, coefficients and parameters

Step 3. Validating Analysis

Step 4: Transfer Parameters and Coefficients as Input Data for Program 2

Step 5: Analyse the Modelling Data

Step 6

Program 1. Produce Discriminant Coefficients from Reference Data

Program 2. Using Discriminant Coefficients for Calssification