Content Disclaimer
Copyright @2020.
All Rights Reserved.

StatsToDo: Generalized Linear Models

Links : Home Index (Subjects) Contact StatsToDo


Introduction Gaussian Binomial Multinomial Ordinal Poisson Negative Binomial

The Generalized Linear Models (GLM) calculate regression coefficients relating a dependent variable to multiple independent variables. The value of the dependent variable is a linear combination of products of coefficient and value of each independent variable, so the model has the same appearence as that of multiple regression or covariance analysis.

A particular advantage of the models is that the algorithms can cope with a combination of factors (group names in text) and values (numerical data), as well as dependent variables from a variety of probability distributions. This allows the models to be highly complex, providing the probability distribution of the data is correctly assumed.

It is beyond the capability of this page to provide full explanation or web paged based program for GLM, as it is a vast subject, and the calculations complex.

R code examples are offered on this page, as these algorithms are already fully developed, tested, and accepted. Only brief explanations, assisting users to negotiate the R codes are offered

For those new to using R the page R_Exp provides an introduction and help for installation of the R packages

For full explation and exploration of this area of statistics, users are referred to the references as the starting point. Users are also reminded to seek advice and assistance from experienced statisticians if they are new to this area of statistics.

Programs

The panels of this page provides R codes for regression with the following distributions for the dependent variable

  • Gaussian. Where the dependent variable is normally distributed. This model is also termed General Linear Model, not to be confused with the generic name of all the models presented here the Generalized Linear Models
    • When all the independent variables are factors (group names in text), the results are similar to that produced by the Analysis of Variance
    • When all the independent variables are values (numerical measurements), the results are similar to Multiple Regression
    • When the independent variables are a mixture of factors and values, the results are similar to that of Analysis of Covariance
  • Proportions. Three models of regression for proportions are available
    • Binomial Where the dependent variable is the probabilities in each of two groups (no/yes, false/true)
    • Multinomial, the same as Binomial, except there are more than 2 groups
    • Ordinal, the same as Multinominal, except the groups are ordered
  • Poisson where the dependent variable is a count of events in a defined environment, where the variance is the same value as the count
  • Negative Binomial Where the dependent variable is the Odds Ratio of the number of negative cases per positive case in the two groups. Negative Binomial distribution can also be used generally for count data that fails to conform to the Poisson distribution

Format

Each program is described in one of the subsequent panels. Each panel containing an introduction and a program template, each in a sub-panel

Each program is provided with a set of example data, to demonstrate the procedures. Please note:

  • The research model is deliberately simplistic so the user is not distracted from the computation
  • The data is computer generated and do not represent anything real.
  • The sample size is deliberately small, to make visualization easier.
The R code in each program is broken up into its constituent steps, and each step contains
  • Description of the step (in black)
  • The R code (in Maroon)
  • The results from that step (in Navy blue)
To re-constituate the whole program, the user should
  1. Copy all the codes in Maroon to the source code panel of RStudio, in the same order as in the template. This should include the example data
  2. Test run the program to make sure it works
  3. Change the example data to the user's own data
  4. Repeated cycles of test run and editing the codes (add, delete, and modify) until the required results are produced.

References

Please note that the R code template contins only the minimum amount of code, and produces only the basic results. Users may want to produce a more complete program, including the many options that are availble with each procedure.

References are provided in the Explanation panel of each program for this purpose