Content Disclaimer Copyright @2020. All Rights Reserved. |

**Links : **Home
Index (Subjects)
Contact StatsToDo

Explanations
This page provides the program and explanations for the basic Backpropagation Neural Net.
Calculations
Neural net is a vast subject, and subjected to rapid development in the 21st century, as it forms the basis of machine learning and artificial intelligence, and backprpagation is one of the earliest to develop, and form the basic framework for many algorithms. Backpropagation began as a simple adaptive learning algorithm, and this is presented in this page, in the form of a Javascript program, so that users can use the program directly, or if required, copy and adapt the algorithm into their own programs. The program is best viewed as a form of non-parametric regression, where the variables are based on Fuzzy Logic, a number between 0 (false) and 1 (true).
An example of this is shown in the plot to the right, which translates the measurement of fetal blood pH into a diagnosis of acidosis, by firstly rescale the normally accepted non-acidosis value of 7.35 to -2.4444 and its logistic value of 0.05 and the normally accepted acidosis value of 7.2 to 2.4444 and its logistic value of 0.95. This rescaling changes an otherwise normally distributed measurement into the bimodal one of acidosis and non-acidosis, compressing the values less than 7.2 and more then 7.35, while stretches the values in between.
- The first is to combined the inputs so the y= Σw
_{i}v_{i}+ c, where v are input values, w the weights given to each input, and c the bias value - The combined value (y) is then transformed into a Fussy Logic value between 0 (false) and 1 (true). This can be binary (>0.5=1, <0.5=0), but most commonly the logistic transform is used.
The Backpropagation neuronet is an arrangement of neurones as shown to the right, and consists of the following - The input layer, which contains as many neurone as there are inputs. In this example, there are 2 input neurones
- One or more middle layer, each containing a number of neurones. In this example there is 1 middle layer containing 3 neurones
- The output layer, which contains as many neurones as there are outputs. In this example, there is 1 output neurone
The coefficients (w and c) in all of the neurones in a backpropagation neuronet consist of random numbers when the neuronet is initially constructed. Training consists of presenting a series of templates (input and output) to the neuronet, which adapts (learn) through the following processes - Forward Propagation
- Each input entered via the input layer is entered into each neurone of the middle layer. Each neurone then processes all the inputs (dendrite) and produces its output (axon)
- If there is more than 1 layer, the outputs from each layer becomes the inputs of the next layer, until the output layer, the neurones of which produce the final output values.
- Backward Propagation
- The output values are compared with the template output values. The coefficients in each neurone (w and c) are then changed so that results would be closer to the template output values
- Going backwords through the layers, each preceeding layer is similarly altered so that the output from each neurone would produce an output that is closer to the required value
- For each template in the training data set, the error produced is estimated and compared with the output values in the template
- The maximum error for each iteration of the whole dataset is estimated, and compared with the acceptable error value. The training is re-iterated until the maximum error for each iteration is less than the acceptable error. At this point, the training is completed, and the values in the coefficients represent the "memory" of the training, and can be used to reproduce the template output values from inputs.
At the end of training, the set of coefficients represents the "memory" that has been trained, and can be use to produce outputs from sets of input. Simple neuronet can be process manually, but usually the set of coefficients is incorporated into a computer program or hardwired into machineries. From the Javascript program in this page, the trained neural network can be presented as a program (html and Javascript code) that the user can copy to a text editor and saved as an html file. The html propgram can then be used to interpret future data ## ReferencesUsers should be aware that neural network generally, and backpropagation in particular, have undergone dramatic development in the 21st century, and the current complexity and capability of these algorithms greatly exceed the content of this page. The program on this page is a simple and primitive one, and can probably used for diagnostic or therapeutic decision making in clearly defined clinical domains, with 5-20 inputs,10-20 patterns to learn, and training dataset of no more than a few hundred templates. It is insufficient to process complex patterns that requires large datasets such as in predicting share prices, company profitability, or weather forecast. where ambiguous data, multiple causal input and output, unknown patterns, and massive training data are involved. The following are references for beginners. They introduce the concept, and lead to further reading. Mueller J P and Massaron L (2019) Deep Learning for Dummies. John Wiley and Sons, Inc., New Jersey. ISBN 978-1-119-54303-9. Chapter 7 and 8 p.131-162. A very good introduction to neuronet and Backpropagation On Line - https://en.wikipedia.org/wiki/Backpropagation Wikipedia on Backpropagation
- https://blog.revolutionanalytics.com/2017/07/nnets-from-scratch.html An introduction to the concepts
- https://www.datacamp.com/community/tutorials/neural-network-models-r A tutorial in using one of the R packages
- https://cran.r-project.org/web/packages/neuralnet/neuralnet.pdf The R resource for a really sophisticated Backpropagation package
- https://www.rdocumentation.org/packages/nnet/versions/7.3-16/topics/nnet Documentation for the neural net presented in the R panel of this page
Hints and Suggestions
R Codes
This panel explains how the program in the program panel can be run, and provides some suggestions on how to make the program run efficiently
Javascript Program
This is a column of numbers which represents the number of neurones for each layer. The minumum is 2 rows, input and output. The most common is 3 rows, with a single middle layer. In theory there can be any number of middle layers, and there can be any number of neurones in each layer. The following general approach can be used, although each network is unique, and some trial and error may be necessary - The larger the number of input and output, the greater the complexity of patterns in the training data, the more where data values are away from 0 and 1, and the more where similar patterns of input values are related to different outcome values, then the larger number of neurones (in terms of layers or neurones per layer) is required.
- Where the model is similar to regression, with linear relationships between inputs and outputs, only 2 layers (input and outputs) are necessary.
- Where a limited number of cause and effect patterns are clearly represented in the training data, one, or at most 2 middle layers should suffice. Where many or unrecognized patterns exists in the training data, such as training a network to predict share prices, many layers and neurones, requiring high speed computers with dedicated processing over a long time, are required
- The number of neurones in the middle layers can be determined by trial and error. although not obligatory, it is useful to have at least the same number of neurone as inputs in the first middle layer, and more neurones than the number of outputs in the last middle layer.
- In our example (a simple XOR simulator plus a switch), there is 1 middle layer which contains 4 neurone, 1 more than the number of inputs and 3 more than the outputs. The example net will still train with 2 or 3 neurones in the middle layer, but requires many more iterations to reach the same precision.
The values representing the structure are placed in the
The data is a table of numbers representing the template pattern. It can have any number of rows, but the number of columns is the number of inputs (first row in structure) plus the number of outputs (last row in the structure). For training, the number of columns must conformed to input + output. To use a trained net to interpret a set of data, only the input columns are required. All data used by the backpropagation, be they input parameters or result output, represents Fuzzy Logic, and numerically represented as values between 0 for false, and 1 for True. Real data will therefore need to be edited to conform with this. Fuzzy Logic is discussed in the introduction panel, and will not be elaborated here. - The simplest binary groups (no/yes, false/true, female/male). 0 for false and 1 for true can used
- For multiple groups, the easiest is to have an input for each group. For example 1 0 0 0 for group A, 0 1 0 0 for group B, 0 0 1 0 for group C and 0 0 0 1 for group D.
A more abbreviated set of dummy variable can also be used. For example 0 0 for group A, 0 1 for group B, 1 0 for group C and 1 1 for group D. This makes for a smaller neural net with shorter training runs, but the results are intuitively more difficult to use, as group names will need to be firstly converted into a different format - Measurements must be transformed to a conceptual value between 0 and 1 for false and true. We will use height of a person to demonstrate 3 common methods of transformation, using 155cms for short, 170cm for tall. One of the following options can be used
- The use of a cut off to transform into two input values. Those with 155cms or less would be 1 0, 170cms or more 0 1, in between 0 0, and 1 1 does not exist.
- Using a straight line gradient as a single value. 155cms or less = 0, 160cms or more = 1, and the rest (ht-155) / (170-155)
- Conversion to Fuzzy Logic values using logistic transformation, which clusters values near the extremes and stretching the distances between value in between, producing a bi-modal distribution of probability values between 0 and 1. In our height example, 155cms is given the probability of 0.05 and 170cms 0.95 for the transform. Logic of Fuzzy Logic is discussed in the Introduction panel of this page, and will not be further elaborated here.
Data Matrix text area.
**The learning rate**is a value between 0 where no learning occurs and 1 when the weights in the neurones are fully corrected by the values of the error found. When the training set is simple, the same training rate can be used throughout. When the training data is complex, with values not close to 0 and 1, and where the same input are related to different outcomes, there is a need to reduce the learning rate as training progresses, so that the result would converge better. In most cases users should adjust these parameters by trial and error. They do not affect the final outcome of training, but governs the speed (number of iterations) required.- Maximum learning rate is the rate set for the start of the training.The value 1 can be used, but this tends to over-correct and thus prolong training. In most cases it should begins at about 0.8 (the default setting).
- Minimum rate is the smallest training rate used, and should be the same or lower value as the acceptable error
- Decrement is the increment of decrease (proportion of the current training rate) as the training progresses. The value is usually set at 0.5 to 0.1, although smaller values can also be used.
- Number of iteration (of the training data) per decrement is the rate at which the training rate is reduced. Given that most backprogagation training requires 500 to 50,000 cycles, this can be set at about 1/10th of the expected number of cycles required.
- Acceptable Error is the error acceptable to end the training. Neural net are base on Fuzzy Logic, where false(0) and true (1) are unattainable extremes, so the user has to determine how close to 0 and 1 they would accept. The default is set to 0.05, meaning that outcomes >= 0.95 is accepteable as 1 and <= 0.05 is accepted as 0. In many cases, especially when the training data is complex, this level of precision is unattainable. Also there is a need to avoid over-training, as the neural net will then model the trivial variations in the data. For practical reasons, a precision closer than 0.2 is considered workable and 0.1 as precise.
- Maximum iteration. Training will cease when the acceptable error is attained or when the maximum number of iterations is reached. Maximum iteration is required to stop training, if the required precision is not be attainable, or the duration of training exceeds the time allowed by the browser.
If the neural net already exists, either because the user paste it in the text area, or if some training has already occurs, the coefficients are used and further modified if more training is performed The program produces coefficients to 10 decimal places, in excess to precision requirements in most cases, but allowing users to truncate them as preferred. In general, the number of decimal places should be 1 or 2 more than the precision of results required by the user. In our example, truncation coefficients to 3 decimal places will produce the same results.
The default example is a backpropagation network with 3 layers - In the
**Structure**box, the 3 rows are 3 for 3 inputs, 4 for 4 neurones in a single middle layer, and 1 for a single output - In the
**Data Matrix**text area is the training data, which demonstrates a decision making algorithm based on the XOR pattern, which cannot be otherwise computed numerically.- There are 4 columns. The first 3 are the inputs, and the last the output
- In the first 4 rows, where the value in the third volumn is 0, if both the values in the first two columns are both 0 or both 1, the network should return the value of 0. If the values of the first two columns are different (0 1 or 1 0), then the return value is 1
- In the last 4 rows, where the value in the third column is 1, the return values are reversed.
- This the return values produced depends 2 patterns,
- Are A and B both true or both false(1 1 or 0 0) or are they opposite to each other (0 1 or 1 0)
- Is the third input represent true (1) or false (0)
- Make sure the neural net structure and the training data are compatible, the number of columns in the training data is the number of input plus the number of outputs.
- Leave the default settings for training, but initially set a low value (e.g. 1000) for maximum iterations
- Click the
**Commence Training**button to train the network - At the end of initial traing, examine the neural net produced. Adjust training parameters, and click the
**Commence Training**button again. This will use the existing neural net and further modifying it. - Repeat adjusting and re-training until the required solution is obtains, or when no further improvement is possible
**Requirements**The structure of the network must be stated in the**Network Structure**text area, and the trained network in the**Neural Net Matrix**text area. The two sets of numbers must be compatiable, in that the number of inputs and outputs are the same- Clicking the
**Produce Program**button will produce a Jaascript function that will calculate the output from a row of inputs. This program can be copied and pasted into any web page that acts as an interpreter of the neural net, or as a basis to write a function in any other computer language in another application. - To
**Calculate Results**a set of data is required in the**Data Matrix**text box. For interpretation, only the input values are required (the rest of the columns will be ignored). Clicking the**Calculate Results**will produce a table , each row containing the input values, followed by the result output values.
**Requirements**The structure of the network must be stated in the**Network Structure**text area, and the trained network in the**Neural Net Matrix**text area. The two sets of numbers must be compatiable, in that the number of inputs and outputs are the same- Clicking the
**Export Trained Neuralnet**button will produce the source code for a complete html web page, including the input/output interface and the Javascript program representing the trained neural network. - The source code can be copied as pasted into a text editor, and saved as an html file. The html file can then be used via a web browser to interpret future data.
## Using the Trained neural NetUsing the structure, input data and trained neural netCalculate output from information in Structure, input data, and neural net text areas
The R code on this panel is based on the nnet library from https://www.rdocumentation.org/packages/nnet/versions/7.3-16/topics/nnet. The algorithm supports backpropagation calculations for any number of inputs and outputs, but only allows 1 middle layer with any number of neurones.
If more than 1 middle layer is required, user can go to https://cran.r-project.org/web/packages/neuralnet/index.html to download the package The following 2 examples calculates a simple backpropagation neural network
myDat = (" I1 I2 I3 O1 O2 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 0 1 1 0 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 ") myDataFrame <- read.table(textConnection(myDat),header=TRUE) myDataFrame library(nnet) ## Test 1input=I1,I2,I3, 1 middle layer with 4 neurones, output = O1, tolerance=0.05, maximum interation=200x<-subset(myDataFrame, select=I1:I3) #subset x=I1,I2,I3 x y<-c(myDataFrame$O1) #subset y=O1 y nn <- nnet(x,y, size=4, abstol=0.05, maxit = 200) // backpropagation summary(nn) predict(nn)The subset x consists of the 3 inputs I1 to I3. y the single output O1 The function nnet is called. - x and y are the inputs and outputs
- size=4 means a middle leyer with 4 neurones
- abstol=0.05 means processing stops when error is smaller than 0.05
- maxit=200 means processing stop after 200 iterations
> x I1 I2 I3 1 0 0 0 2 0 1 0 3 1 0 0 4 1 1 0 5 0 0 1 6 0 1 1 7 1 0 1 8 1 1 1 > y<-c(myDataFrame$O1) #subset y=O1 > y [1] 0 1 1 0 1 0 0 1 > nn <- nnet(x,y, size=4, abstol=0.05, maxit = 200) # weights: 21 initial value 2.050771 iter 10 value 1.999513 iter 20 value 1.913413 iter 30 value 1.597108 iter 40 value 0.216418 final value 0.033959 converged > summary(nn) a 3-4-1 network with 21 weights options were - b->h1 i1->h1 i2->h1 i3->h1 2.16 -5.72 -0.09 1.12 b->h2 i1->h2 i2->h2 i3->h2 5.10 -1.99 -4.27 -4.27 b->h3 i1->h3 i2->h3 i3->h3 2.64 -1.68 1.87 5.01 b->h4 i1->h4 i2->h4 i3->h4 -4.91 -9.70 10.28 9.01 b->o h1->o h2->o h3->o h4->o -3.18 -6.82 7.57 -0.95 7.93 > predict(nn) [,1] 1 0.06983661 2 0.95256979 3 0.96103373 4 0.08941800 5 0.91531074 6 0.07498991 7 0.05624830 8 0.96314380 - x and y are shown.
- Summary presents the values of weights in each neurone
- Predict shows the calculated values of each O1 as calculated by the trained neural net using I1 to I3
newDat = (" I1 I2 I3 0.1 0.2 0.05 0.3 0.9 0.0 0.8 0.1 0.1 ") newData<-read.table(textConnection(newDat),header=TRUE) predict(nn, newData)The results are > predict(nn, newData) [,1] [1,] 0.1086763 [2,] 0.9636761 [3,] 0.9212004 ## Test 2This is the same as test 1, except that there are now 2 outputs O1 and O2.The program is x<-subset(myDataFrame, select=I1:I3) #subset x=I1,I2,I3 x y<-subset(myDataFrame, select=O1:O2) #subset y=O1,O2 y nn <- nnet(x,y, size=4, abstol=0.05, maxit=200) summary(nn) predict(nn)The results are > x I1 I2 I3 1 0 0 0 2 0 1 0 3 1 0 0 4 1 1 0 5 0 0 1 6 0 1 1 7 1 0 1 8 1 1 1 > y<-subset(myDataFrame, select=O1:O2) #subset y=O1,O2 > y O1 O2 1 0 1 2 1 0 3 1 0 4 0 1 5 1 0 6 0 1 7 0 1 8 1 0 > nn <- nnet(x,y, size=4, abstol=0.05, maxit=200) # weights: 26 initial value 4.071980 iter 10 value 3.999929 iter 20 value 3.990630 iter 30 value 1.902310 final value 0.037599 converged > summary(nn) a 3-4-2 network with 26 weights options were - b->h1 i1->h1 i2->h1 i3->h1 -1.96 -2.40 0.04 0.15 b->h2 i1->h2 i2->h2 i3->h2 12.68 65.74 -40.83 -65.08 b->h3 i1->h3 i2->h3 i3->h3 3.57 3.40 -0.89 -2.10 b->h4 i1->h4 i2->h4 i3->h4 -0.58 -1.89 4.56 2.13 b->o1 h1->o1 h2->o1 h3->o1 h4->o1 -0.32 -7.82 -16.71 20.34 -15.00 b->o2 h1->o2 h2->o2 h3->o2 h4->o2 1.63 7.85 17.30 -22.59 15.04 > predict(nn) O1 O2 1 2.689003e-02 0.965504577 2 9.521576e-01 0.023248336 3 8.834675e-01 0.086618520 4 3.778867e-05 0.999949062 5 9.400405e-01 0.038104978 6 3.346196e-02 0.963676744 7 3.884451e-02 0.948116245 8 9.914175e-01 0.003671725There are now two outputs O1 and O2. As with Test 1, the trained neural net is tested on a new set of data, and produced two outputs newDat = (" I1 I2 I3 0.1 0.2 0.05 0.3 0.9 0.0 0.8 0.1 0.1 ") newData<-read.table(textConnection(newDat),header=TRUE) predict(nn, newData)The results are > predict(nn, newData) O1 O2 [1,] 0.001618309 0.997914331 [2,] 0.989863726 0.004413234 [3,] 0.546131182 0.375853575 |