![]() | Content Disclaimer Copyright @2020. All Rights Reserved. |
Links : Home Index (Subjects) Contact StatsToDo |
Related Links:
This page provides explanations and example R codes for Ordinal Logistic Regression, which is one of the algorithms based on the Generalized Linear Models. It is a variant of the Multinomial Regression model, as explained in the Multinominal Logistic Regression Explained Page
, but the dependent variable is ordinal, in that they are ordered in scale alphabetically, so the grp1<Grp2<Grp3 ...
In R, the independent variables can be measurements or factors
Referenceshttps://en.wikipedia.org/wiki/Ordered_logithttps://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/
This section presents the R code in total so it can be copied and pasted into RStudio as a template. Detailed explanations for each step are in the next panel
# OrdinalRegression.R # Step 1: Data entry myDat = (" Counsel Educ Age BFeed No Tertiary 27 S3 No Tertiary 29 S1 Yes Secondary 27 S1 No Primary 37 S2 No Tertiary 27 S0 No Primary 37 S3 Yes Secondary 25 S0 Yes Primary 33 S1 No Tertiary 35 S2 Yes Secondary 29 S1 No Primary 25 S0 Yes Tertiary 33 S2 Yes Secondary 27 S3 ") myDataFrame <- read.table(textConnection(myDat),header=TRUE) summary(myDataFrame) # Step 2: Ordinal Regression #install.Package("MASS") # if not already installed library(MASS) ordRegResults <- polr(BFeed ~ Counsel + Educ + Age, data = myDataFrame, Hess=TRUE) summary(ordRegResults) # Step 3: Place fitted values into data frame and display results myFitted<-round(fitted.values(ordRegResults),4) myPredict <- predict(ordRegResults, myDataFrame) myDataFrame <- cbind(myDataFrame, myFitted, myPredict) myDataFrame #optional display of data frame with(myDataFrame, table(BFeed,myPredict)) #check table of prediction cor.test( ~ as.numeric(BFeed) + as.numeric(myPredict), data=myDataFrame, method = "spearman",exact=FALSE) # Step 4: Plot fitted values for the 3 ethnic groups #Step 4a: Make 4 different plots #install.Package("ggplot2") # if not already installed library(ggplot2) plot1<-ggplot(myDataFrame, aes(x = myDataFrame$BFeed, y = myDataFrame$S0)) + coord_cartesian(ylim= c(0, 1)) + geom_boxplot() + geom_dotplot(binaxis="y",binwidth=.03, stackdir="center") plot2<-ggplot(myDataFrame, aes(x = myDataFrame$BFeed, y = myDataFrame$S1, ymin=0.0, ymax=1.0)) + coord_cartesian(ylim= c(0, 1)) + geom_boxplot() + geom_dotplot(binaxis="y",binwidth=.03, stackdir="center") plot3<-ggplot(myDataFrame, aes(x = myDataFrame$BFeed, y = myDataFrame$S2, ymin=0.0, ymax=1.0)) + coord_cartesian(ylim= c(0, 1)) + geom_boxplot() + geom_dotplot(binaxis="y",binwidth=.03, stackdir="center") plot4<-ggplot(myDataFrame, aes(x = myDataFrame$BFeed, y = myDataFrame$S3, ymin=0.0, ymax=1.0)) + coord_cartesian(ylim= c(0, 1)) + geom_boxplot() + geom_dotplot(binaxis="y",binwidth=.03, stackdir="center") #plot1 #plot2 #plot3 #plot4 # Step 4b: Combine 4 plots into 1 #install.packages("Grid") # if not already installed library(grid) # Make a list from the ... arguments and plotlist myPlots <- c(list(plot1,plot2,plot3,plot4)) #change list to your list of ggplots myCols = 2 #change the number of plot columns per row of plots # The rest of the codes need no change myNumPlots = length(myPlots) layout <- matrix(seq(1, myCols * ceiling(myNumPlots/myCols)), ncol = myCols, nrow = ceiling(myNumPlots/myCols)) grid.newpage() pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout)))) # Make each plot, in the correct location for (i in 1:myNumPlots) { # Get the i,j matrix positions of the regions that contain this subplot matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE)) print(myPlots[[i]], vp = viewport(layout.pos.row = matchidx$row, layout.pos.col = matchidx$col)) } # Step 5: Calculate on a different set of data newDat = (" Counsel Educ Age No Tertiary 27 No Tertiary 29 Yes Secondary 25 Yes Tertiary 33 ") newDataFrame <- read.table(textConnection(newDat),header=TRUE) summary(newDataFrame) newProbs <- predict(ordRegResults, newdata=newDataFrame, type='probs') newProbs newDiag <- predict(ordRegResults, newdata=newDataFrame, type='class') newDiag newDataFrame <- cbind(newDataFrame, newProbs, newDiag) newDataFrame #Step 6: Optional saving and loading of coefficients #save(logRegResults, file = "LogRegResults.rda") #save results to rda file #load("LogRegResults.rda") #load the rda file # Step 1: Data entry to dataframe myDat = (" # Step 1: Data entry myDat = (" Counsel Educ Age BFeed No Tertiary 27 S3 No Tertiary 29 S1 Yes Secondary 27 S1 No Primary 37 S2 .... ") myDataFrame <- read.table(textConnection(myDat),header=TRUE) summary(myDataFrame)Step 1 creates the dataframe that contains the data as in myDat. The data is computer generated, and purports to be from a study of the success of breast feeding in women who had their first baby. The predictors are whether they received antenatal breast feeding counselling (Counsel = No/Yes), level of education (Educ = Primary/Secondary/Tertiary), and age in completed years (Age). Outcomes are level success in breast feeding with the following levels (BFeed)
# Step 2: Ordinal Regression #install.Package("MASS") # if not already installed library(MASS) ordRegResults <- polr(BFeed ~ Counsel + Educ + Age, data = myDataFrame, Hess=TRUE) summary(ordRegResults)Step 2 performs the ordinal logistic regression, and the results are as follows Call: polr(formula = BFeed ~ Counsel + Educ + Age, data = myDataFrame, Hess = TRUE) Coefficients: Value Std. Error t value CounselYes -0.353 1.340 -0.263 EducSecondary 2.319 2.122 1.093 EducTertiary 1.229 1.371 0.896 Age 0.427 0.206 2.073 Intercepts: Value Std. Error t value S0|S1 12.36 6.75 1.83 S1|S2 14.52 7.19 2.02 S2|S3 15.87 7.35 2.16 Residual Deviance: 29.90 AIC: 43.90The results from Step 2 to focus on are the rows under the heading Value in the two sets of Coefficients and Intercepts. We will use the first row of the data as an example of calculation (Counsel=No, Educ=Tertiary, Age=27) The probabilities of each level are calculated as follows
# Step 3: Place fitted values into data frame and display results myFitted<-round(fitted.values(ordRegResults),4) myPredict <- predict(ordRegResults, myDataFrame) myDataFrame <- cbind(myDataFrame, myFitted, myPredict) myDataFrame #optional display of data frame with(myDataFrame, table(BFeed,myPredict)) #check table of prediction cor.test( ~ as.numeric(BFeed) + as.numeric(myPredict), data=myDataFrame, method = "spearman",exact=FALSE)In step 3, the fitted.values function creates the 4 columns of probabilities for the levels of outcomes, and the predict function make the decision which outcome is the most likely. The cbind function concatenates these results to the dataframe, which is then displayed. The table function creates the 4x4 table of counts between the original Diagnosis and the estimated myPredict, to test the accuracy of the algorithm. The Spearman's Correlation Coefficient estimates the precision of the prediction (0=random, 1=perfect). The results are as follows Counsel Educ Age BFeed S0 S1 S2 S3 myPredict 1 No Tertiary 27 S3 0.405 0.45 0.103 0.0423 S1 2 No Tertiary 29 S1 0.225 0.49 0.191 0.0939 S1 3 Yes Secondary 27 S1 0.245 0.49 0.177 0.0845 S1 4 No Primary 37 S2 0.032 0.19 0.300 0.4792 S3 ..... > with(myDataFrame, table(BFeed,myPredict)) #check table of prediction myPredict BFeed S0 S1 S2 S3 S0 1 2 0 0 S1 0 4 0 0 S2 0 1 0 2 S3 0 2 0 1 > cor.test( ~ as.numeric(BFeed) + as.numeric(myPredict), data=myDataFrame, method = "spearman",exact=FALSE) Spearman's rank correlation rho data: as.numeric(BFeed) and as.numeric(myPredict) S = 200, p-value = 0.05 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.55The dataframe now contains the original columns, plus the 4 columns of probability for each level of outcome, as well as the conclusion which level is most likely. The table shows the correct prediction along the diagonal, 6 cases out of 13. The Spearman's correlation coefficient is 0.55 Step 4 is optional, and plots the results to visually demonstrate the results of analysis. It is divided into two parts. Step 4a produces 4 plots, and step 4b combines these 4 plots into a single one for presentation #Step 4a: Make 4 different plots #install.Package("ggplot2") # if not already installed library(ggplot2) plot1<-ggplot(myDataFrame, aes(x = myDataFrame$BFeed, y = myDataFrame$S0)) + coord_cartesian(ylim= c(0, 1)) + geom_boxplot() + geom_dotplot(binaxis="y",binwidth=.03, stackdir="center") plot2<-ggplot(myDataFrame, aes(x = myDataFrame$BFeed, y = myDataFrame$S1, ymin=0.0, ymax=1.0)) + coord_cartesian(ylim= c(0, 1)) + geom_boxplot() + geom_dotplot(binaxis="y",binwidth=.03, stackdir="center") plot3<-ggplot(myDataFrame, aes(x = myDataFrame$BFeed, y = myDataFrame$S2, ymin=0.0, ymax=1.0)) + coord_cartesian(ylim= c(0, 1)) + geom_boxplot() + geom_dotplot(binaxis="y",binwidth=.03, stackdir="center") plot4<-ggplot(myDataFrame, aes(x = myDataFrame$BFeed, y = myDataFrame$S3, ymin=0.0, ymax=1.0)) + coord_cartesian(ylim= c(0, 1)) + geom_boxplot() + geom_dotplot(binaxis="y",binwidth=.03, stackdir="center") #plot1 #plot2 #plot3 #plot4Plot1 plots the estimated probability of S0 for the 4 levels of outcome in the original data, plot2 the probability of S1, plot3 the probability of S2, and plot4 the probability of S3. At this stage, the 4 plots can be separately displayed. The problem is that, in RStudio, each plot over-writes the previous one, so the 4 cannot be visualized at the same time. To do this, we need to do step 4b. # Step 4b: Combine 4 plots into 1 #install.packages("Grid") # if not already installed library(grid) # Make a list from the ... arguments and plotlist myPlots <- c(list(plot1,plot2,plot3,plot4)) #change list to your list of ggplots myCols = 2 #change the number of plot columns per row of plots # The rest of the codes need no change myNumPlots = length(myPlots) layout <- matrix(seq(1, myCols * ceiling(myNumPlots/myCols)), ncol = myCols, nrow = ceiling(myNumPlots/myCols)) grid.newpage() pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout)))) # Make each plot, in the correct location for (i in 1:myNumPlots) { # Get the i,j matrix positions of the regions that contain this subplot matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE)) print(myPlots[[i]], vp = viewport(layout.pos.row = matchidx$row, layout.pos.col = matchidx$col)) }
# Step 5: Calculate on a different set of data newDat = (" Counsel Educ Age No Tertiary 27 No Tertiary 29 Yes Secondary 25 Yes Tertiary 33 ") newDataFrame <- read.table(textConnection(newDat),header=TRUE) summary(newDataFrame) newProbs <- predict(ordRegResults, newdata=newDataFrame, type='probs') newProbs newDiag <- predict(ordRegResults, newdata=newDataFrame, type='class') newDiag newDataFrame <- cbind(newDataFrame, newProbs, newDiag) newDataFrameStep 5 is optional and for demonstration only. It shows how the results produced can be used to calculate and estimate the value for each case in a different dataset. Step 5 creates a new dataframe with 4 rows of data. Predict with type='probs' produces the 3 columns of probability, and type='class' the classification. These can then be concatenated to the dataframe, and displayed. Counsel Educ Age S0 S1 S2 S3 newDiag 1 No Tertiary 27 0.40 0.45 0.103 0.042 S1 2 No Tertiary 29 0.22 0.49 0.191 0.094 S1 3 Yes Secondary 25 0.43 0.44 0.093 0.038 S1 4 Yes Tertiary 33 0.07 0.32 0.320 0.286 S1Please note the following
#Step 6: Optional saving and loading of coefficients #save(ordRegResults, file = "OrdRegResults.rda") #save results to rda file #load("OrdRegResults.rda") #load the rda fileStep 6 is optional, and in fact commented out, and included as a template only. It allows the results of analysis to be stored as a .rda file, that can be reloaded on a different occasion and used on a different dataset. Please note the following
<
|