![]() | Content Disclaimer Copyright @2020. All Rights Reserved. |
Links : Home Index (Subjects) Contact StatsToDo
|
Explanations and References Currently, the multivariate Logistic Regressions (binomial, multinomial, or ordinal) are used to establish the regression relationship between one or more independent variables and probability (proportion, risk) as the dependent variable. These algorithm are flexible and widely accepted, but requiring specialized software, and an understanding of complex multivariate statistics. StatsToDo presents some code samples in R for those who wish to access these algorithms (see Index Subjects).
This page provides an earlier algorithm to perform simple linear regression between a single ordinal predictor and an outcome that is a proportion. The calculations are based on the Chi Square distribution.
The entry data consists of 3 columns.
The data in the example were artificially created to demonstrate the procedure, and not real. It perports to be from a study of business failures over the years.
The data was compiled, and the probability of failure (proportion, risk, Ppos) calculated. The results are presented as in the table to the right. It can be seen that failure rates were 9.1% for 1990, 13.8% for 1992, and 16.9% for 1995, and the overall failure rate was 14.9%
The program now partitions the Chi Square, as shown in the table to the left. The analysis shows that the Chi Square for regression is significant at the p<0.05 level. Once this is partitioned, the residual Chi Square is not statistically significant. A conclusion can therefore be drawn that, other than an increasing trend, the proportion of business failures were otherwise homogeneous during those years Finally, the regression coefficient is calculated. Change in proportion per unit row value = 0.015, indicates that, between 1990 and 1995, the trend of business failures increased by 1.5% per year. ReferencesSteel R.G.D., Torrie J.H., Dickey D.A. Principles and Procedures of Statistics. A Biomedical Approach. 3rd. Ed. (1997) ISBN 0-07-061028-2 p. 520-521
R Program for regression of proportion is a single conginuous program. To make it easier to follow, the listing is divided into 2 sections
Section 1: Initial data input and matrix of summaries
# Section 1: Preparation
dat = ("
X NPos NNeg
1990 10 100
1992 8 50
1995 61 300
")
df <- read.table(textConnection(dat),header=TRUE) # conversion to data frame
df$RowTot <- df$NPos + df$NNeg # total number each row
df$Prob <- df$NPos / df$RowTot # probability of Pos each row
df # Summary of Input Data
The initial matrix with all the data necessary for calculations are as follows. Please note:
> df # Summary of Input Data
X NPos NNeg RowTot Prob
1 1990 10 100 110 0.09090909
2 1992 8 50 58 0.13793103
3 1995 61 300 361 0.16897507
Section 2 is the actual calculations
# Preparation for calculation
rows = nrow(df)
posTot = sum(df$NPos)
negTot = sum(df$NNeg)
tot = sum(df$RowTot)
# vevtors for results
Source <- vector()
ChiSq <- vector()
DF <- vector()
P<- vector()
# calculate total chi sq
zw = 0
chiTot = 0
dfTot = rows - 1
for(i in 1:rows) # for each row
{
zw = zw + df$X[i] * df$RowTot[i] # row value x row count
e = df$RowTot[i] * posTot / tot; # expected
o = df$NPos[i] # observed number pos
chiTot = chiTot + (o - e)**2 / e # add to Chi Sq
e = df$RowTot[i] * negTot / tot; # expected
o = df$NNeg[i] # observed number beg
chiTot = chiTot + (o - e)**2 / e # add to Chi Sq
}
pTot = 1 - pchisq(chiTot, df=dfTot)
Source <- append(Source, "Total") # add to vectors for eventual display
ChiSq <- append(ChiSq, chiTot)
DF <- append(DF, dfTot)
P<- append(P,pTot)
#c(chiTot,pTot)
# Calculate regression and its chi sq
p2 = posTot / tot; # probability of col 1
top = 0;
bot = 0;
for(i in 1:rows)
{
top = top + df$X[i] * df$NPos[i] # sum row value x col 1
bot = bot + df$X[i]^2 * df$RowTot[i] # row val sq x row count
}
#Calculation of regression coefficient
top = top - posTot * zw / tot
bot = bot - zw^2 / tot
reg = top / bot # regression coefficient
# calculate chi sq regression
chiReg = top^2 / (bot * p2 * (1 - p2)) # chi sq regression
pReg = 1 - pchisq(chiReg, df=1)
Source <- append(Source, "Regression") # add to vectors for eventual display
ChiSq <- append(ChiSq, chiReg)
DF <- append(DF, 1)
P<- append(P,pReg)
# Calculate residual chi sq
chiRes = chiTot - chiReg # chi sq residual
dfRes = dfTot - 1
pRes = 1 - pchisq(chiRes, df=dfRes)
Source <- append(Source, "Residual") # add to vectors for eventual display
ChiSq <- append(ChiSq, chiRes)
DF <- append(DF, dfRes)
P<- append(P,pRes)
# output
dfRes <- data.frame(Source, ChiSq, DF, P) # combine vectors into data frame for display
dfRes # display chi sq, df, and significance in p
# Regression coefficient
reg #regression coefficient changes in propbabilit per unit of X
The results are as follows
> dfRes
Source ChiSq DF P
1 Total 4.11131566 2 0.12800860
2 Regression 4.01760141 1 0.04502771
3 Residual 0.09371425 1 0.75950732
> reg #regression coefficient changes in propbability per unit of X
[1] 0.014958
|
