Content Disclaimer Copyright @2020. All Rights Reserved. |

**Links : **Home
Index (Subjects)
Contact StatsToDo

MacroPlot Resources
Orientation
Macroplot plotting is controlled by the macros in the text area provided.
Macros
Each macro must occupy its own line. If the first character of a macro is not A-Z, the line will be considered a comment and ignored The first macro, which is obligatory, initializes the plot. The macro is Bitmap Initialize width(in pixels), height(in pixels), red(0-255) blue(0-255), green(0-255) transparency(0-255)Example : Bitmap Initialize 700 500 255 255 255 255 which provides a landscape area 700 pixels wide, 500 pixel high, with white background
The following are default settings when the bitmap is initiated. - Lines are black (0 0 0 255) and 3 pixels in width
- Fill color for bars and dots are black (0 0 0 255), and the fill type is set to fill only (1) (see Fill Type)
- Dots (circl and square) are set to 5 pixels radius (diameter=11 pixels)
- Fonts are set as follows
- Font face is set to sans-serif. Serif, sans-serif, and monospace are available to all browsers, user can use any font available to his/her browser
- Font size is set to 16 pixels high
- Font color, both line and fill are set to black (0 0 0 255), and fill type to 1 (fill only) (see Font Type)
Bitmap, and the coordinates are x=number of pixels from the left border and y=number of pixels from the top border
A central plotting area is also defined - By default, at initialization, as 15% from the left and bottom, 5% from right and top
- defined by user as
**Plot Pixels left top right bottom**, these being number of pixels from the left and top border e.g.**Plot Pixels 105 25 665 425**would be the same as the default setting for a bitmap of 700 pixels wide and 500 pixels high
**Plot Values left top right bottom**, these being the extreme values used in the data e.g.**Plot Values 0 100 10 50**represents x values of 0 on the left to 10 to the right, and y values of 50 at the bottom to 100 to the top
Plot, and the coordinates are the values in the data
This panel lists and describes all macros used in this version of MacroPlot by Javascript. They are divided into the following sub-panels
Color Palettes
- Initialization and settings
- Plotting areas, coordinates used, and drawing of x and y axis
- Drawing lines, bars, dots, text, and other shapes
This sub-panel lists those macros that initialized the bitmap, and set the parametrs for drawing
Axis & Coordinates
## Initialize PlottingBitmap Initialize w h r g b t is the first and obligatory macro, which Initializes the bitmap
- w and h are width and height of the bitmap in number of pixels. The most common dimensions are
- w=700 and h= 500 for landscape orientation
- w=500 and h=700 for portrait orientation
- Both 500 for square bitmap
- r g b t represents red, green, blue and transparency values for the background, each value is 0 for non-existence to 255 for maximum intensity. The most commonly used background is white (255 255 255 255)
- For most plotting programs in StatsToDo the macro used is
**Bitmap Initialize 700 500 255 255 255 255**, a landscape orientation with white background
## Settings for linesThe settings provide parameters for all subsequent plotting until the parameter is reset
## Settings for fillsWhen bars, dots, arcs and wedges are plotted, the interior of these symbols are called fills, and they are set as follows
- t=0: only the outline, defined by the line parameters, are plotted. Fill is ignored
- t=1: only fill is carried out, outline is ignored
- t=2: both outline and fill are plotted
- When the plot is initialized, the default setting for fill type is t=1
## Settings for fontsThese set the font characteristics for text output.Please note: settings for lines and fills for fonts are separate and independent to those for general line and shape plottings
- t=0: only the outline of the font, defined by the thick and LColor parameter is drawn
- t=1: only the fill of the font is drawn
- t=2: both outline and fill are drawn
- When the plot is initialized, the default setting for Font type is t=1
Please Note: When the bitmap is initialized, the default settings, which are suitable for most situations, are automatically set, so users need not worry about these settings unless he/she has a different preference.
This sub-panel presents macros that define the plotting areas, and creating the x and y axis for plotting
Drawings
## Drawing on the bitmapWhen plotting on the initialized bitmap - the horizontal coordinate x is the number of pixels from the left border
- the vertical coordinate y is the number of pixels from the top border
- The macro used begins with the keyword
**Bitmap**
## Drawing on the plotting areaIn most cases, there is a need to draw and label the x and y axis, and drawing coordinates used are the actual values of the data. The macros used for these all begins with the keywordPlot, and are purposes are as follows
- lp defines the left border of the plotting area, in the number of pixels from the left border of the bitmap. In most cases this is 15% of the bitmap's width
- tp defines the top of the plotting area, in the number of pixels from the top border of the bitmap. In most cases this is 5% of the height
- rp defines the right border of the plotting area, in the number of pixels from the left border of the bitmap. In most cases this is 95% of the width (or 5% from the right border of the bitmap)
- bp defines the bottom border of the plotting area, in the number of pixels from the top border of the bitmap. In most cases this is 85% of the height (or 15% from the bottom)
- An example is that is that, in a landscape orientated bitmap of 700 pixels width and 500 pixel height,
**Plot Pixels 105 25 665 425**sets the central area for plotting that is 15% from the left and bottom, and 5% from the top and right. - This macro is usually not necessary if the 5%/15% setting suits the user, as this is the default setting when the bitmap is initialized
Plot Values lv tv rv bv defines the data values to be used in plotting
- lv is the extreme data value for the horizontal variable x on the left
- tv is the extreme data value for the vertical variable y at the top
- rv is the extreme data value for horizontal variable x on the right
- bv is the extreme data value for the vertical variable y at the bottom
Plot Logx 1 sets the horizontal x axis to the log scale. Normal scale is set on initialization, or reset by Plot Logx 0
- lable is a single word text string, using the underscore
**_**to represent spaces if necessary - space is the number of pixels between the bottom of the plot area and the label text string
- lable is a single word text string, using the underscore
**_**to represent spaces if necessary - space is the number of pixels between the left of the plot area and the label text string
## The quickest and easiest way to draw axisThe following 4 macros are sufficient to draw the x and y axis under most circumstances
- y is the y value on which the x axis lie
- nsIntv is the number of small intervals between the vertical line marks, 10 to 20 are recommended
- nbIntv is the number of big intervals between the numerical scales, 5 to 10 are recommended
- len is the length of the mark in pixels, +ve value downwards and negative value upwards. -10 is recommended
- gap is the number of pixels between the numerical scaling text and the y value of the axis, +ve values for text below axis and negative value for text above axis. 3 is recommended
- Line determines the axis line is drawn, 0 for no line, 1 for line
Plot YAxis x nsIntv nbIntv len gap line will mark out and numerate the vertical y axis
- x is the x value on which the y axis lie
- nsIntv is the number of small intervals between the horizontal line marks, 10 to 20 are recommended
- nbIntv is the number of big intervals between the numerical scales, 5 to 10 are recommended
- len is the length of the mark in pixels, +ve value to the right and negative value to the left. 10 is recommended
- gap is the number of pixels between the numerical scaling text and the y value of the axis, +ve values for text to the right of axis and negative value for text to the left of axis. -3 is recommended
- Line determines the axis line is drawn, 0 for no line, 1 for line
Plot AutoXLogScale y len gap line will mark and numerate the x axis if it is in log scale
- The x axis must be set to the log scale by
**Plot Logx 1**. If axis not set to log this macro will abort - y is the y value on which the x axis lie
- len is the length of the mark in pixels, +ve value downwards and negative value upwards. -10 is recommended
- gap is the number of pixels between the numerical scaling text and the y value of the axis, +ve values for text below axis and negative value for text above axis. 3 is recommended
- Line determines the axis line is drawn, 0 for no line, 1 for line
Plot AutoYLogScale x len gap line will mark and numerate the y axis if it is in log scale
- The y axis must be set to the log scale by
**Plot Logy 1**. If axis not set to log this macro will abort - x is the x value on which the x axis lie
- len is the length of the mark in pixels, +ve value downwards and negative value upwards. -10 is recommended
- gap is the number of pixels between the numerical scaling text and the y value of the axis, +ve values for text below axis and negative value for text above axis. 3 is recommended
- Line determines the axis line is drawn, 0 for no line, 1 for line
## Other methods of drawing axisUsers may wish to draw individual part of the axis, and the following macros can be used
- y is the y value where the axis is to be marked
- begin is the value for the first mark
- interval is the interval between marks
- len is the length of the mark line in pixels, +ve downwards, -ve upwards
- x is the x value where the axis is to be marked
- start is the value for the first mark
- interval is the interval between marks
- len is the length of the mark line in pixels, +ve to the right, -ve to the left
- y is the y value for the axis
- start is the first value to be written
- interval is the interval between numerical scales
- gap is the space in pixels between the scale text and the axis, +ve for text below axis, -ve for text above axis
- The number of decimal points in the scale is the same as that of the interval value
- x is the x value for the axis
- start is the first value to be written
- interval is the interval between numerical scales
- gap is the space in pixels between the scale text and the axis, +ve for text to the right of axis, -ve for text to the left of axis
- The number of decimal points in the scale is the same as that of the interval value
Plot XMarkIntv y interval len marks the horizontal x axis with a series of vertical marks
- y is the y value of the axis
- interval is the interval between the marks, beginning at 0 and while in range
- len is the length of the mark line in pixels, +ve downwards, -ve upwards
Plot YMarkIntv x interval len marks the vertical y axis with a series of horizontal marks
- x is the x value of the axis
- interval is the interval between the marks, beginning at 0 and while in range
- len is the length of the mark line in pixels, +ve to the right, -ve to the left
Plot XScaleIntv y interval gap writes the numerical scales for the horizontal x axis
- y is the y value of the axis
- interval is the interval between the numerical scales, beginning at 0 and while in range
- gap is the space in pixels between the scale text and the axis, +ve for text below axis, -ve for text above axis
- The number of decimal points in the scale is the same as that of the interval value
Plot YScaleIntv x interval gap writes the numerical scales for the vertical y axis
- x is the x value of the axis
- interval is the interval between the numerical scales, beginning at 0 and while in range
- gap is the space in pixels between the scale text and the axis, +ve for text to the right of axis, -ve for text to the left of axis
- The number of decimal points in the scale is the same as that of the interval value
This sub-panel describes those macros that draws the plotting objects. Drawing are performed in two environments
- Macros that begins with the keyword
**Bitmap**uses pixel values as coordinates, where x is the number of pixels from the left border, and y the number of pixels from the top border - Macros that begins with the keyword
**Plot**uses actual data values (as defined in the**Plot Values lv tv rv bv**macro, as coordinates
## Drawing linesThe thickness and color of any line drawn is set by theLine macros (see setting sub-panel). The default setting is black line 3 pixels in width
- x1 and x2 are number of pixels from the left border
- y1 and y2 are number of pixels from the top border
- x1 and x2 are data values for the horizontal variable x
- y1 and y2 are data variables for the vertical variable y
Plot PixLine x y hpix vpix draws a line
- x and y are data values for the horizonal x value and verticsl y value. This defines the coordinate at the origin of the line
- hpix is the number of pixels horizontally from the origin, +ve value to the right, -ve value to the left
- vpix is the number of pixels vertically from the origin, +ve value downwards, -ve value upwards
- The line is then drawn between the origin and that defined by hpix and vpix
## Drawing barsThe color and thickness of the outline are defined in theLine macro. The color of the fill is defined in the fill color and Fill Type macro. The default setting is black (0 0 0 255) for both line and fill color, and the Fill type is set to 1, only the fill and no outlines. These settings are suitable for most circumstances, but user can change them is so required.
- w is the half width of the bar, so a VBar is 2w+1 pixels in width, and HBar is 2w+1 pixels in height
- The default value for w is 7 pixels (making width/height of 15 pixels), unless the user changes it
- x is the data value for the horizontal x variable. The is the center of the vertical bar
- y1 and y2 are values for the vertical y variable. They define the vertical ends of the bar
- hshift is the number of pixels the whole bar is shefted horizontally, +ve value to the left and +ve value to the right. In most cases this is 0 (no shift). However, if there are more than 1 bar in the same position, shifting some of them will avoid the bars overlapping and obscuring each other
- The width of the vertical bar is set by default at 7, (width of bar=15 pixels)
- x1 and x2 are data values for the horizontal x variable. They define the horizontal ends of the bar
- y is the value for the vertical y variable, and defines and center of the horizontal bar
- vshift is the number of pixels the whole bar is shefted vertically, -ve value upwards and +ve value downwards. In most cases this is 0 (no shift). However, if there are more than 1 bar in the same position, shifting some of them will avoid the bars overlapping and obscuring each other
- Theheight of the horizontal bar is set by default at 7, (height of bar=15 pixels)
## Drawing dotsThere are only 2 dot types, circle and square. If more than 2 tyoes of dats are required, they can be distinguished by the colours of the outline and fill, and by their sizes. Settingsd for dot parameters are in the settings sub-panel
- x and y are the number of pixels from the left and top border
- Radius is in number of pixels. The diameter of the dot is 2Radius+1 pixels
- x and y are the data values of the horizontal x variable and vertical y variable, as defined by
**Plot Values lv tv rv bv** - Radius is in number of pixels. The diameter of the dot is 2Radius+1 pixels
- hshift is the number of pixels the dot is shifted horizontally, -ve value to the left, +ve value to the right
- vshift is the number of pixels the dot is shifted vertically, -ve value upwards, +ve value downwards
- In most cases there is no shift (0 0), but id there are more than 1 dot in the same position, shifting avoids the dots superimposing over and obscuring each other
Dot Radius r sets the radius of the dot in pixels. The diameter of the dot is 2radius+1 pixels. The default radius is 5
- x and y are the data values of the horizontal x variable and vertical y variable, as defined by
**Plot Values lv tv rv bv** - hshift is the number of pixels the dot is shifted horizontally, -ve value to the left, +ve value to the right
- vshift is the number of pixels the dot is shifted vertically, -ve value upwards, +ve value downwards
- In most cases there is no shift (0 0), but if there are more than 1 dot in the same position, shifting avoids the dots superimposing over and obscuring each other
## Drawing textThe color, outline, fill, font, and weight of text are preset (see settings). The default settinfs are sans-sherif, black fill only, and 16pxs high
- x and y are number of pixels fom the left and top borders, and together being the reference coordinate of the text
- ha is horizontal adjust
- ha=0: the left end of the text is at the x coordinate
- ha=1: the center of the text is at the x coordinate
- ha=2: the right end of the text is at the x coordinate
- va is vertical adjust
- va=0: the top of the text is at the y coordinate
- va=1: the center of the text is at the x coordinate
- va=2: the bottom end of the text is at the x coordinate
- txt is the text to be drawn. It must be a single word with no gaps. Spaces can be represented by the underscore _
- x and y are data values as defined by
**Plot Values lv tv rv bv**, and together being the reference coordinate of the text - ha is horizontal adjust
- ha=0: the left end of the text is at the x coordinate
- ha=1: the center of the text is at the x coordinate
- ha=2: the right end of the text is at the x coordinate
- va is vertical adjust
- va=0: the top of the text is at the y coordinate
- va=1: the center of the text is at the x coordinate
- va=2: the bottom end of the text is at the x coordinate
- txt is the text to be drawn. It must be a single word with no gaps. Spaces can be represented by the underscore _
- hshift is the number of pixels the text is shifted horizontally, -ve value to the left, +ve value to the right
- vshift is the number of pixels the text is shifted vertically, -ve value upwards, +ve value downwards
- In most cases there is no shift (0 0), but if there are other structures in the same position, shifting avoids the text and structures obscuring each other
- x and y are number of pixels fom the left and top borders, and together being the reference coordinate of the text
- ha is horizontal adjust
- ha=0: the left end of the text is at the x coordinate
- ha=1: the center of the text is at the x coordinate
- ha=2: the right end of the text is at the x coordinate
- va is vertical adjust
- va=0: the top of the text is at the y coordinate
- va=1: the center of the text is at the x coordinate
- va=2: the bottom end of the text is at the x coordinate
- txt is the text to be drawn. It must be a single word with no gaps. Spaces can be represented by the underscore _
- x and y are data values as defined by
**Plot Values lv tv rv bv**, and together being the reference coordinate of the text - ha is horizontal adjust
- ha=0: the left end of the text is at the x coordinate
- ha=1: the center of the text is at the x coordinate
- ha=2: the right end of the text is at the x coordinate
- va is vertical adjust
- va=0: the top of the text is at the y coordinate
- va=1: the center of the text is at the x coordinate
- va=2: the bottom end of the text is at the x coordinate
- hshift is the number of pixels the text is shifted horizontally, -ve value to the left, +ve value to the right
- vshift is the number of pixels the text is shifted vertically, -ve value upwards, +ve value downwards
- In most cases there is no shift (0 0), but if there are other structures in the same position, shifting avoids the text and structures obscuring each other
## Other miscellaneous drawingsBitmap Arc x y radius startDeg endDeg rotate draws an arc.
- x and y are number of pixels from the left and top border, and together form the center of the arc
- radius is the radius of the arc, in number of pixels
- startDeg and endDeg are the degrees (360 degrees in full circle) of the arc
- rotate defines the direction of the arc, 0 for clockwise, 1 for anti-clockwise
Bitmap Wedge x y radius startDeg endDeg shift rotate draws a wedge, essentially an arc with lines to the center
- x and y are number of pixels from the left and top border, and together form the center of the wedge
- radius is the radius of the edge, in number of pixels
- startDeg and endDeg are the degrees (360 degrees in full circle) of the wedge
- shift is the number of pixels that the wedge is moved centrifugally (away from the center). This is used in pie charts to separate the wedges of the pie
- rotate defines the direction of the wedge, 0 for clockwise, 1 for anti-clockwise
Plot Curve a b1 b2 b3 b4 b5 x1 x2 draws a polynomial curve
- The curve is y=a + b1x + b2x
^{2}+ b3x^{3}+ b4x^{4}+ b5x^{5}. Where higher power is not needed, 0 is used to represent the the coefficient b - The curve is drawn from data value x from x1 to x2
Plot Normal mean sd height draws a normal distribution curve
- mean and sd (Standard Deviation) are as in the data horizontal variable variable x
- height is the maximum height (where x=mean) of the curve as in the vertical variable y
Plain Colors
Table of colors used on this web site
Patterns of complementary colors
Explanations
This page provides explanations, clarifications, and supports to Linear Discriminant Analysis, as shown in the Javascript program, and the R codes
Javascript Program
Discriminant Analysis is clearly and succinctly described in Wikipedia, and only a brief description will be provided here, to help the user to understand the data input required and interpreting the results. ## Different approaches to calculationsDiscriminant Analysis was introduced by Fisher more than a century ago, as a variant of the multiple regression model, but with multinomial groups as dependent variable and normally distributed measurements as independent variable. It enjoys continued usage, but with time many modifications and additions are made, producing options for users to produce results that are conceptually similar but numerically different. This confusion is further aggravated by some statistical packages keeping to the original algorithm, some including useful additions and and modifications, and some providing menus from which users can choose different options. The following comments attempts to clarify some of these issues- The original algorithm used actual measurements of independent variables and their covariate matrix for Principal Component extraction. The problem is that the results can be distorted if the scalars of different variables differ widely. For example, using height in inches and weight in pounds will produce numerically different results from the same data using cms and Kgs. Increasingly therefore the original measurements are normalized to Standard Deviation units z, where z=(value-mean)/SD, so that all independent variables are converted to have a mean of 0 and Standard Deviation of 1 before they are used.
Statistical packages vary in how this problem is managed. The algorithm descibed on this page allows actual values in data input, but converts these to z values for calculating Discriminate functions. To enable the calculation of z values therefore requires the means and Standard Deviations estimated using the modelling data. - The number of Linear Discriminant functions created by the original algorithm is one less than the number of outcome groups (nf=ng-1). Traditionally, all functions are used to estimate the probability of belonging to each group for a case (row of data).
However, it can be clearly demonstrated that, because the functions are essentially transforms of Principal Components, the functions extracted earlier have greater discriminating power than the latter ones, as the latter ones contain mostly statistical noise. To ignore the more trivial functions therefore will create only minor distortions to the numerical results, and in most cases not altering how the results are interpreted. In the Javascript program on this page, the common convention is followed, and all functions are used in estimating group probability during the validation of the data, and copied to templates for future use. However, the statistical significance of each function is estimated using the chi square test, and those functions that are not statistically significant are identified. This allows the user (if he/she so wishes), to use a reduce set of functions (only the statistically significant ones) on future data. - There are two major reasons for using Discriminant analysis. Firstly, to analyse the structure of the data itself, and interpret the results as scientific realities represented by the data. Secondly, to use the data to create a model, and use that model to interpret future and different sets of similar measurements. Both approaches require the allocation of outcome groups, based on the probabilities that are calculated from the Discriminant Functions. How the probabilities are calculated, however, differ according to the purpose of the analysis.
If the purpose is to analysis and interpret the structure of the data, then apriori probabilities are not included, so that the structure of the model is not distorted. This is the method of calculation in program 1, during the validation of the model. If the purpose is to use the model already developed, to interpret and classify future and additional data, then the accuracy of prediction is a priority, and the inclusion of apriori probabilities is appropriate, as in program 2 of the Javascript program To demonstrate this difference, we can imagine an analysis relating a set of symptoms to discriminate headache caused by tension or by brain tumor. To study the relationship between symptoms and diagnosis, a data set representative of the clinical scenario (say from medical records) is used, which contains similar number of cases with tension and brain tumor. From this data, the relationship between symptoms and eventual diagnosis can be established (Maxumum Likelihood). When the model is used clinically to make diagnosis, that headache caused by tension is many times more common than that caused be tumours must be taken in consideration, and a more accurate diagnosis requires the inclusion of the apriori probabilities of these two conditions (Bayesean probability). Statistical packages differ in the inclusion of a priori probabilities. Many includes a priori probabilities as the default, estimated from the sample sizes of the outcome groups in the reference data. Some use Maximum Likelihood as default. Some require users to insert the priori probabilities. All these approaches are correct in the right context, leaving the options with the user. The programs in the Javascript program allows both approaches. In program 1, when the model is developed, the validation examines the structure of the data, so no apriori probabilities are included. In program 2, when the developed model is used to describe or predict new and additional data, there are provisions for the user to include apriori probabilities
## References## Genral references on Discriminant analysisWikipedia on Discriminant Analysis.George D and Mallery P (1999) SPSS for Windows Step by Step. A Simple Guide and Reference. Allyn and Bacon, Sydney. ISBN 0-205-28395-0 Chapter 26. The Discriminant Procedure p.313-328. ## Resources used to developed the web bases Javascript programThese are very old books, that still present actual formulae and algortithms for all step in the calculations. Most newer references do not provide detailed algorithms, but advise users to access available packages such as SAS, SPSS, R, and PythonOverall JE and Klett CJ (1972) Applied Multivariate Analysis. McGraw Hill Series in Psychology. McGraw Hill Book Company New York. Library of Congress No. 73-14716407-047935-6 - Chapter 2 p.24-56 : Matrix math. Particularly the Square Root method of matrix inversion. Also calculations for between and within group Sum Product and Covariance matrices
- Chapter 10 p.280-306 : Multiple Discriminant Analysis, particularly the algorithm.
- Chapter 13 p. 345-371 Normal Probability Density Model for classification.
- Chapter 14 p. 373-383 Use of Canonical Correlates for Classification. Chapters 13 and 14 provided the algorithms for calculating Maximum Likelihood and the Bayesian Probabilities
Press WH, Flannery VP, Teukolsky SA, Vetterling WT (1989). Numerical Recipes in Pascal. Cambridge University Press IBSN 0-521-37516-9 p.395-396 and p.402-404. Jacobi method for finding Eigen values and Eigen vectors Norusis MJ (1979) SPSS Statistical Algorithms Release 8. SPSS Inc Chicago Chapterr 23 : Discriminant p. 69-83. Formulae for algorithm provided by SPSS. ## Useful references and advice on Discriminant analysis using Rhttps://www.geeksforgeeks.org/linear-discriminant-analysis-in-r-programming/https://www.statmethods.net/advstats/discriminant.html Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
Help & Hints
R Code
This panel provides supports for data entry and interpretation of results from the Javascript program.
Calculations
There are two programs. - Program 1 is primary discriminant analysis using a set of modelling data, creating parameters and coefficients that can be used to interpret future new data.
- Program 2 is intended for using coefficients and parameters previously established to analyse new sets of data to produce discriminate function values and classify into groups. It provides data inputs for parameters and coefficients previously created, and the new data to be analysed.
- on this page, Program 2 is executed automatically as a cascade from program 1, with all the coefficients and parameters established in program 1 copied into the appropriate input for program 2.
## Default Example
## Data EntryThe data are entered as a table- All rows must have the same number of columns, each row represents a wine
- Each column represents a predictor variable, in this example, in order, tanin, color, acidity, and sugar
- The last column on the right is the group labels, and is treated as text. It is a single character or word with no gaps. In this example DR for dry red, DW for dry white, SR for sweet red and SW for sweet white.
## Result OutputGroups are organized in alphabetical order, and in this example DR=1, DW=2, SR=3, SW=4Predictor variables are in the same order as that of the columns in data entry Discriminate functions are ordered in order of magnitude (statistical significance). The most significant being f1, then f2, f3, etc in decreasing significance. The mark # is used to designate functions that are not statistically significant and can be ignored in subsequent calculations ## Step 1: Data DefinitionThe entered data are summarised. In this example, number of cases (rows) = 16, number of variables = 4 and number of groups = 4.Table 1a. shows group names in alphabetical order and number of cases in each group Group 1b shows tha means and Standard Deviations of the variables in the same order as the columns ## Step 2: Create Discriminant functions, coefficients and parametersThe values of the predictor variables are converted to z values, where z = (value-mean) / SD, and used in all subsequent calculations. In other words, all values are transformed to have a mean of 0 and SD of 1An analysis of variance and covariance is carried out to obtained the within group covariance matrix, which is used to produce the Eigen values and vectors in orders of magnitude.
Using the eigen values and chi square, the statistical significance of the functions, in decreasing order of chi square and significance is calculated, as shown in the table to the left. Given that there are 4 predictor variables nv=4), the maximum number of functions is nv-1 = 3. However, only the first 2 functions are statistically significant (p<0.05)
The Eigen vectors are then used to establish the function coefficients, which are shown in the table to the right. Each function value is the sum of the z value of the variable and the coefficient of the function/variable. Please note that all functional coefficients are presented, the third function is not statistically significant in this example and can be ignored as trivial
Using the function coefficients and the mean z values of each variable in each group, the centroid value of each function (equivlent to the mean) in each group is estimated, as shown in the table to the left. Again values for all 3 functions from this example are calculated, but the third one, being not statistically significant, can be ignored as trivial. At this point, the initial analysis is completed. The group names, mean and Standard Deviation for rach predictor variables, the function coefficients, and the centroid values, collectively represents the parameters and coefficients of the Discriminant model, and can be used to calculate discriminant functions and allocate to groups any similar data. ## Step 3. Validating AnalysisThe parameters and coefficients developed are now used to evaluate the modelling data, and check whether the Discriminant calculation predicts the same group as that designated in the modelling data.
Each row of the modelling data is analysed in turn. The last column is the designated group, and other columns are the predictor variables. All functions (3 in this example) are used in calculations, although the third function is non-significant and, if ignored, will only make trivial differences to the results - The values of the predictor variables are converted into z values where z = (value - mean) / SD
- The function scores are then calculated, each being sum(z x function coefficient) for that function
- The distance between the function score and the centroid in each group is then calculated as d = sum(function score - centroid value)
^{2} - As d represents Standard Deviations
^{2}from the centroid, the probability of the distance being 0 from a centroid is calculated as exp(-d / 2). This is the probability of belonging to each group - The probabilities in the 4 groups in this example are then normalized to a total of 1 (maximum Likelihood Ratio), and the group with the highest possibility is assuigned as the calculated group.
The results show that the Discriminate functions correctly assign all the cases. This is not surprising as the data was artificially created to discriminate the groups well. When real data are used, particularly when large number of cases are involved, the random variations involved means that a proportion of erroneous allocation often occurs. ## Step 4: Transfer Parameters and Coefficients as Input Data for Program 2The tables for means and Standard Deviation, the function coefficents, and the centrod values, represents the Discriminant model developed from the modelling data. These are copied to the appropriate text areas of program 2, so they can be used to analyse additional and new data in the future.Should the user intend to use the coefficients developed in the future, it would be appropriate to archive these parameters and coefficients, so they can be use in future analysis. Should the user wishes to use only the significant functions, then the last column of the function coefficients and centroid values can be deleted, resulting in the table as follows. Furthermore, the results can be further altered by Bayesean probability, if the apriori probabilities of the groups are defined. The results of calculations, using all functions, and having the same apriori probability for all groups, produces the same results as when the data is modelled.
## Step 5: Analyse the Modelling DataThe predictor variables of the reference data are copied to the data input box of program two and analysed. The same table of results as the validating exercise is produced, as shown in the table to the right. The same input data can be used on clicking the Example button. However this is merely to demonstrate how program 2 can be used. The intention is that users may wish to enter new data here for analysis. To demonstrate that the third and not significant discriminant function is trivial and unnecessary, the caculations are repeated after the third column from the functions and centroid matrices are deleted, and the results are shown in the table to the right. Comparing to the validating results, the probability coefficients are very similar, and group designations unchanged.
This is essentially a repeat of the validating exercise, and carried out to demonstrate how new data can be analysed.
## Step 6Plotting data The user can designate any two functions to be plotted in a x/y scatter plot. The program will firstly mark out in color the areas occupied by each group in the plot, then plot the function scores from the two functions in each case as in an x/y scatter plot. The intention is to allow a visual display of the data in any two function, and how they are related to each group.Ten default background colors are available (assuming that no more than 10 functions will be required in any single Discriminant Analysis). The colors are, in alphabetical order of group names, The plot using the example data with x=function 1 and y=function 2 are as shown above and to the left. The 4 cases from each group can be seen to fit clearly within the areas for each group, with DR (group 1) in green, DW (group 2) in red, SR (group 3) in blue, and SW (group 4) in yellow It can be seen that function 1 can clearly separate DR (grp 1 green), SR (grp 3 blue) and SW (grp 4 yellow). Data points from DW(grp 2 red) however overaps other groups, and require the additional function 2 to separately identify them ## Program 1. Produce Discriminant Coefficients from Reference DataData Input for Discriminant Analysis Using Reference Data
The data is for a single analysis It is a table of multiple columns Each row contains data from a case or a record All columns except the last are predictors and must be numerical The last column (on the right) is outcome group name, single character or text with no gaps
## Program 2. Using Discriminant Coefficients for Calssification
The Linear Discriminant analysis using R was carried out to check the accuracy of the Javascript program. Only the minimum amount of coding is used. User can search R for the numerous version of calculation and graphic support for Discriminant analysis.
Please note that R codes are in maroon, and results in blue
myDat = (" v1 v2 v3 v4 Grp 1.2 45 3.16 72.7 SR 1.3 67 3.38 102.4 SR 1.1 48 3.61 33.7 SR 1.6 36 3.51 58.2 SR 1.5 47 3.20 44.2 DR 1.5 74 3.21 91.8 DR 1.7 47 3.39 53.1 DR 1.6 56 3.36 88.5 DR 1.1 27 3.30 36.3 SW 1.0 53 3.55 74.7 SW 0.9 37 3.23 94.2 SW 1.2 23 3.07 53.8 SW 1.4 44 3.34 20.7 DW 1.3 34 3.24 9.5 DW 1.1 37 3.24 17.8 DW 1.4 55 3.35 35.9 DW ") myDataFrame <- read.table(textConnection(myDat),header=TRUE)Please note that the headers are included in R as they are required to call the algorithm
myDataFrame$z1<-(myDataFrame$v1-mean(myDataFrame$v1)) / sd(myDataFrame$v1) myDataFrame$z2<-(myDataFrame$v2-mean(myDataFrame$v2)) / sd(myDataFrame$v2) myDataFrame$z3<-(myDataFrame$v3-mean(myDataFrame$v3)) / sd(myDataFrame$v3) myDataFrame$z4<-(myDataFrame$v4-mean(myDataFrame$v4)) / sd(myDataFrame$v4) Step 3. Display the data object, including the calculated z values
summary(myDataFrame)The results are v1 v2 v3 v4 Grp z1 z2 z3 z4 1 1.2 45 3.16 72.7 SR -0.45185501 -0.0460777 -1.1033570 0.58784387 2 1.3 67 3.38 102.4 SR -0.02657971 1.5758573 0.4019983 1.60105898 3 1.1 48 3.61 33.7 SR -0.87713031 0.1750953 1.9757788 -0.74264062 4 1.6 36 3.51 58.2 SR 1.24924619 -0.7095966 1.2915264 0.09317656 5 1.5 47 3.20 44.2 DR 0.82397089 0.1013709 -0.8296560 -0.38443326 6 1.5 74 3.21 91.8 DR 0.82397089 2.0919275 -0.7612308 1.23944012 7 1.7 47 3.39 53.1 DR 1.67452149 0.1013709 0.4704235 -0.08080988 8 1.6 56 3.36 88.5 DR 1.24924619 0.7648898 0.2651478 1.12686066 9 1.1 27 3.30 36.3 SW -0.87713031 -1.3731154 -0.1454036 -0.65394166 10 1.0 53 3.55 74.7 SW -1.30240561 0.5437168 1.5652273 0.65607384 11 0.9 37 3.23 94.2 SW -1.72768091 -0.6358722 -0.6243803 1.32131609 12 1.2 23 3.07 53.8 SW -0.45185501 -1.6680127 -1.7191841 -0.05692938 13 1.4 44 3.34 20.7 DW 0.39869559 -0.1198020 0.1282973 -1.18613545 14 1.3 34 3.24 9.5 DW -0.02657971 -0.8570452 -0.5559551 -1.56822331 15 1.1 37 3.24 17.8 DW -0.87713031 -0.6358722 -0.5559551 -1.28506892 16 1.4 55 3.35 35.9 DW 0.39869559 0.6911655 0.1967226 -0.66758765Please note: z1, z2, z3, and z4 are the z values for v1, v2, v3, and v4 Step 4. Perform Linear Discriminant analysis and display results #install.packages("MASS") # if not already installed library(MASS) fit <- lda(Grp ~ z1 + z2 + z3 + z4, data=myDataFrame) fitPlease note: the calculations are based on the z values and not the original measurements Prior probabilities of groups: DR DW SR SW 0.25 0.25 0.25 0.25 Group means: z1 z2 z3 z4 DR 1.14292737 0.7648898 -0.2138289 0.4752644 DW -0.02657971 -0.2303885 -0.1967226 -1.1767538 SR -0.02657971 0.2488196 0.6414866 0.3848597 SW -1.08976796 -0.7833209 -0.2309352 0.3166297 Coefficients of linear discriminants: LD1 LD2 LD3 z1 1.3710438 -0.7462890 0.1364612 z2 1.7208986 0.4156835 -0.2198221 z3 -0.6709095 -0.2961205 -0.8885895 z4 -1.5797134 -1.4082862 0.2128178 Proportion of trace: LD1 LD2 LD3 0.7899 0.1899 0.0202The prior probabilities are calculated from the sample sizes of the groups The LD1, 2, and 3 are the 3 Linear Discriminant functions The proportion of trace represents the proportion of discriminating power of each function, and can be used to test for statistical significance
predict(fit,newdata=myDataFrame,prior=c(1,1,1,1)/4)$x #calculate function scoresplease note that the prior term is actually unnecesary, as it is not used when function scores are calculated When the data object is called as newdata, a separate sets of data can be used, providing the the appropriately labelled independent variables are present (in this example z1, z2, z3, and z4) LD1 LD2 LD3 1 -0.8871802 -0.1830651 1.054003243 2 -0.1234701 -1.6988951 -0.366512945 3 -1.0536723 1.1881588 -2.071887461 4 -0.5220622 -1.7409330 -0.801348480 5 2.4680678 0.2142880 0.745565901 6 3.2824521 -1.2654109 0.592784805 7 2.2823362 -1.2330374 -0.228987474 8 1.0710619 -2.2798046 0.006542448 9 -2.4349834 1.0478052 0.172180524 10 -2.9365081 -0.1894505 -1.548469220 11 -5.1313958 -0.6508717 0.740034619 12 -2.2466446 0.2331076 1.820538675 13 2.1281402 1.2850848 -0.285692745 14 1.3390091 2.0367135 0.345040379 15 0.1061805 2.3646456 0.240614788 16 2.6586689 0.8716648 -0.414407058 Step 6. Calculate the posterior (Bayesean) Probability of belonging to each group
predict(fit,newdata=myDataFrame,prior=c(1,1,1,1)/4)$posterior #calculate posteriori (Bayesean) probabilitiesPlease note that the prior term for apriori probability is used here. If this is left out, the program assumes the prior probabilities are the same as that in the reference data, depending on the sample sizes of the groups there. DR DW SR SW 1 1.027719e-02 1.738244e-02 0.8056435151 1.666969e-01 2 7.584478e-02 1.695596e-03 0.9196816594 2.777961e-03 3 2.543569e-04 5.741198e-02 0.8883330540 5.400061e-02 4 1.793459e-02 5.428730e-04 0.9760628920 5.459645e-03 5 6.615467e-01 3.338972e-01 0.0045559379 1.919581e-07 6 9.948814e-01 4.791425e-03 0.0003272035 5.284128e-10 7 9.744939e-01 1.355788e-02 0.0119480765 1.254452e-07 8 8.326310e-01 1.399772e-03 0.1659475333 2.173213e-05 9 2.634318e-06 5.441063e-04 0.0758854570 9.235678e-01 10 7.201969e-07 1.160473e-05 0.1924352689 8.075524e-01 11 9.464596e-12 1.011071e-10 0.0001827173 9.998173e-01 12 2.029302e-05 2.288982e-04 0.0560473803 9.437034e-01 13 5.420201e-02 9.416221e-01 0.0041755258 3.779071e-07 14 4.865341e-03 9.917932e-01 0.0033348804 6.545613e-06 15 7.663001e-04 9.729023e-01 0.0250258317 1.305537e-03 16 2.029808e-01 7.940588e-01 0.0029603250 4.639190e-08when translated to normal numerical format with 2 decimal point precisions DR DW SR SW 1 0.01 0.02 0.81 0.17 2 0.08 0.00 0.92 0.00 3 0.00 0.06 0.89 0.05 4 0.02 0.00 0.98 0.01 5 0.66 0.33 0.00 0.00 6 0.99 0.00 0.00 0.00 7 0.97 0.01 0.01 0.00 8 0.83 0.00 0.17 0.00 9 0.00 0.00 0.08 0.92 10 0.00 0.00 0.19 0.81 11 0.00 0.00 0.00 1.00 12 0.00 0.00 0.06 0.94 13 0.05 0.94 0.00 0.00 14 0.00 0.99 0.00 0.00 15 0.00 0.97 0.03 0.00 16 0.20 0.79 0.00 0.00The function coefficients, scores, and probabilities are the same as that produced by the Javascrip program, other than minor discrepancies caused by different rounding errors. |