This page provides the program and explanations for the basic Backpropagation Neural Net.
Neural net is a vast subject, and subjected to rapid development in the 21st century, as it forms the basis of machine learning and artificial intelligence, and backprpagation is one of the earliest to develop, and form the basic framework for many algorithms.
Backpropagation began as a simple adaptive learning algorithm, and this is presented in this page, in the form of a Javascript program, so that users can use the program directly, or if required, copy and adapt the algorithm into their own programs. The program is best viewed as a form of non-parametric regression, where the variables are based on Fuzzy Logic, a number between 0 (false) and 1 (true).
Fuzzy Logic The Greek philosopher, Aristotle, stated that things can be true or not true, but cannot be both. Fuzzy logic replaces this statement with that true and false are only extremes that seldom exists, while reality is mostly somewhere in between.
Mathematically this is represented as a number (y) between 0 (false) and 1 (true), and its relationship to a linear measurement (x) represented by the logistic curve (y = 1/(1+exp(-x)), as shown in the plot to the left, where a value of 0 is translated to a probability of 0.5, -∞ to 0 and or +∞ to 1. If we then accept that <=0.05 as unlikely to be true and >=0.95 likely to be true, then we can rescale any measurement to -2.9444 and + 2.9444, which is then logistically transformed to 0.05 and 0.95. Program for logistic transformation is available Numerical Transformation Program Page
.
An example of this is shown in the plot to the right, which translates the measurement of fetal blood pH into a diagnosis of acidosis, by firstly rescale the normally accepted non-acidosis value of 7.35 to -2.4444 and its logistic value of 0.05 and the normally accepted acidosis value of 7.2 to 2.4444 and its logistic value of 0.95. This rescaling changes an otherwise normally distributed measurement into the bimodal one of acidosis and non-acidosis, compressing the values less than 7.2 and more then 7.35, while stretches the values in between.
Neurone
The processing unit in a Backpropagation neuronet is the perceptron, based on the concept of the nerve cell the neurone. The unit receives one or more inputs (dendrites), process them to produce an output (axon). Mathematically, this is divided into two processes.
The first is to combined the inputs so the y= Σw_{i}v_{i} + c, where v are input values, w the weights given to each input, and c the bias value
The combined value (y) is then transformed into a Fussy Logic value between 0 (false) and 1 (true). This can be binary (>0.5=1, <0.5=0), but most commonly the logistic transform is used.
Neuronet and Backpropagation
The Backpropagation neuronet is an arrangement of neurones as shown to the right, and consists of the following
The input layer, which contains as many neurone as there are inputs. In this example, there are 2 input neurones
One or more middle layer, each containing a number of neurones. In this example there is 1 middle layer containing 3 neurones
The output layer, which contains as many neurones as there are outputs. In this example, there is 1 output neurone
Tranining the neuronet
The coefficients (w and c) in all of the neurones in a backpropagation neuronet consist of random numbers when the neuronet is initially constructed. Training consists of presenting a series of templates (input and output) to the neuronet, which adapts (learn) through the following processes
Forward Propagation
Each input entered via the input layer is entered into each neurone of the middle layer. Each neurone then processes all the inputs (dendrite) and produces its output (axon)
If there is more than 1 layer, the outputs from each layer becomes the inputs of the next layer, until the output layer, the neurones of which produce the final output values.
Backward Propagation
The output values are compared with the template output values. The coefficients in each neurone (w and c) are then changed so that results would be closer to the template output values
Going backwords through the layers, eack preceeding layer is similarly altered so that the output from each neurone would produce an output that is closer to the required value
For each template in the training data set, the error produced is estimated and compared with the output values in the template
The maximum error for each iteration of the whole dataset is estimated, and compared with the acceptable error value. The training is re-iterated until the maximum error for each iteration is less than the acceptable error. At this point, the training is completed, and the values in the coefficients represent the "memory" of the training, and can be used to reproduce the template output values from inputs.
Using the trained neuronet
At the end of training, the set of coefficients represents the "memory" that has been trained, and can be use to produce outputs from sets of input. Simple neuronet can be process manually, but usually the set of coefficients is incorporated into a computer program or hardwired into machineries.
References
Users should be aware that neural network generally, and backpropagation in particular, have undergone dramatic development in the 21st century, and the current complexity and capability of these algorithms greatly exceed the content of this page.
The program on this page is a simple and primitive one, and can probably used for diagnostic or therapeutic decision making in clearly defined clinical domains, with 5-20 inputs,10-20 patterns to learn, and training dataset of no more than a few hundred templates. It is insufficient to process complex patterns that requires large datasets such as in predicting share prices, company profitability, or weather forecast. where ambiguous data, multiple causal input and output, unknown patterns, and massive training data are involved.
The following are references for beginners. They intorduce the concept, and are leads to further reading.
Mueller J P and Massaron L (2019) Deep Learning for Dummies. John Wiley and Sons, Inc., New Jersey. ISBN 978-1-119-54303-9. Chapter 7 and 8 p.131-162. A very good introduction to neuronet and Backpropagation
Training schedule
Learning rate maximum
Learning rate minimum
Decrement
Number of iterations per decrement
Acceptable error
Maximum Iterations
Data Matrix
Training status
Cycle
Learning Rate
Max error : Last cycle=
This cycle=
Numberof : error>0.5=
error>0.1=
Changed=
Comment
Neural Net Matrix
Using the Trained neural Net
Using the structure, input data and trained neural net
Calculate output from information in Structure, input data, and neural net text areas
Translate Neural Net as Javascript Program
This panel explains how the program in the previous panel can be run, and provides some suggestions on how to make the program run efficiently
The Structure
This is a column of numbers which represents the number of neurones for each layer.
The minumum is 2 rows, input and output. The most common is 3 rows, with a single middle layer. In theory there can be any number of middle layers, and there can be any number of neurones in each layer. The following general approach can be used, although each network is unique, and some trial and error may be necessary
The larger the number of input and output, the greater the complexity of patterns in the training data, the more where data values are away from 0 and 1, and the more where similar patterns of input values are related to different outcome values, then the larger number of neurones (in terms of layers or neurones per layer) is required.
Where the model is similar to regression, with linear relationships between inputs and outputs, only 2 layers (input and outputs) are necessary.
Where a limited number of cause and effect patterns are clearly represented in the training data, one, or at most 2 middle layers should suffice. Where many or unrecognized patterns exists in the training data, such as training a network to predict share prices, many layers and neurones, requiring high speed computers with dedicated processing over a long time, are required
The number of neurones in the middle layers can be determined by trial and error. although not obligatory, it is useful to have at least the same number of neurone as inputs in the first middle layer, and more neurones than the number of outputs in the last middle layer.
In our example (a simple XOR simulator plus a switch), there is 1 middle layer which contains 4 neurone, 1 more than the number of inputs and 3 more than the outputs. The example net will still train with 2 or 3 neurones in the middle layer, but requires many more iterations to reach the same precision.
The values representing the structure are placed in the Network Structure text area. In our example, there are 3 inputs and one output. The middle layer contains 4 neurones.
Data : Input and Output
All data used by the backpropagation, be they input parameters or result output, represents Fuzzy Logic, and numerically represented as values between 0 for false, and 1 for True. Real data will therefore need to be edited to conform with this. Fuzzy Logic is discussed in the introduction panel, and will not be elaborated here.
The simplest binary groups (no/yes, false/true, female/male). 0 for false and 1 for true can used
For multiple groups, the easiest is to have an input for each group. For example 1 0 0 0 for group A, 0 1 0 0 for group B, 0 0 1 0 for group C and 0 0 0 1 for group D.
A more abbreviated set of dummy variable can also be used. For example 0 0 for group A, 0 1 for group B, 1 0 for group C and 1 1 for group D. This makes for a smaller neural net with shorter training runs, but the results are intuitively more difficult to use, as group names will need to be firstly converted into a different format
Measurements must be transformed to a conceptual value between 0 and 1 for false and true. We will use height of a person to demonstrate 3 common methods of transformation, using 155cms for short, 170cm for tall
The use of a cut off to transform into two input values. Those with 155cms or less would be 1 0, 170cms or more 0 1, in between 0 0, and 1 1 does not exist.
Using a straight line gradient as a single value. 155cms or less = 0, 160cms or more = 1, and the rest (ht-155) / (170-155)
Conversion to Fuzzy Logic values using logistic transformation, which clusters values near the extremes and stretching the distances between value in between, producing a bi-modal distribution of probability values between 0 and 1. In our height example, 155cms is given the probability of 0.05 and 170cms 0.95 for the transform. Logic of Fuzzy Logic is discussed in the Introduction panel of this page, and will not be further elaborated here.
The data consists of a table, each row a case, and the columns are firstly the input values, then the output values. The columns are separated by spaces or tabs. The number of columns must be compatible with the structure of the net. In our example, there are 3 input and one output. This means that the data must be 4 columns for training and 3 to use the trained network. The data is placed in the Data Matrix text area.
The Default Example
The default example in the Data Matrix text area demonstrates a decision making algorithm based on the XOR pattern, which cannot be otherwise computed numerically. The basic decision is whether the first two input values are the same of different, and the third input determines how the values are displayed.
In the first 4 rows, where the value in the third volumn is 0, if both the values in the first two columns are both 0 or both 1, the network should return the value of 0. If the values of the first two columns are different (0 1 or 1 0), then the return value is 1
In the last 4 rows, where the value in the third column is 1, the return values are reversed.
This the return values produced depends 2 patterns,
Are A and B both true or both false(1 1 or 0 0) or are they opposite to each other (0 1 or 1 0)
Is the third input represent true (1) or false (0)
Training Schedule
These are parameters that controls the speed and precision of training the neural network. They do not matteer much when the training data is brief, conceptually clear, and all the values are close to 0 and 1 (as in the example). They become increasingly important when the training data is large, where the values are heterogenous, and when the same set of inputs are linked to different outcomes
The learning rate is a value between 0 where no learning occurs and 1 when the weights in the neurones are fully corrected by the values of the error found. When the training set is simple, the same training rate can be used throughout. When the training data is complex, with values not close to 0 and 1, and where the same input are related to different outcomes, there is a need to reduce the learning rate as training progresses, so that the result would converge better. In most cases users should adjust these parameters by trial and error. They do not affect the final outcome of training, but governs the speed (number of iterations) required.
Maximum learning rate is the rate set for the start of the training.The value 1 can be used, but this tends to over-correct and thus prolong training. In most cases it should begins at about 0.8 (the default setting).
Minimum rate is the smallest training rate used, and should be the same or lower value as the acceptable error
Decrement is the increment of decrease (proportion of the current training rate) as the training progresses. The value is usually set at 0.5 to 0.1, although smaller values can also be used.
Number of iteration (of the training data) per decrement is the rate at which the training rate is reduced. Given that most backprogagation training requires 500 to 50,000 cycles, this can be set at about 1/10th of the expected number of cycles required.
Acceptable Error is the error acceptable to end the training. Neural net are base on Fuzzy Logic, where false(0) and true (1) are unattainable extremes, so the user has to determine how close to 0 and 1 they would accept. The default is set to 0.05, meaning that outcomes >= 0.95 is accepteable as 1 and <= 0.05 is accepted as 0. In many cases, especially when the training data is complex, this level of precision is unattainable. Also there is a need to avoid over-training, as the neural net will then model the trivial variable variations in the data. For practical reasons, a precision closer than 0.2 is considered workable and 0.1 as precise.
Maximum iteration. Training will cease when the acceptable error is attained or when the maximum number of iterations is reached. Maximum iteration is required to stop training, if the required precision is not be attainable, or the duration of training exceeds the time allowed by the browser.
The data is a table of numbers representing the desired pattern. It can have any number of rows, but the number of columns is the number of inputs (first row in structure) plus the number of outputs (last row in the structure). For training, the number of columns must conformed to input + output. To use a trained net to interpret a set of data, only the input columns are required.
The neural net text area contains the neural net, a table of coefficients, each row containing the coefficients of a neurone. When the program begins, this area is blanked, as no neural network as yet exists, and the program creates the neural net using random numbers.
If the neural net already exists, either because the user paste it in the text area, or if some training has already occurs, the coefficients are used and further modified if more training is performed
The program produces coefficients to 10 decimal places, in excess to precision requirements in most cases, but allowing users to truncate them as preferred. In general, the number of decimal places should be 1 or 2 more than the precision of results required by the user. In our example, truncation coefficients to 3 decimal places will produce the same results.
Suggestions for training
The following schedules are suggested to help users not familiar with the program
Make sure the neural net structure and the training data are compatible, the the number of columns in the training data is the number of input plus the number of outputs.
Keave the default settings for training, but initially set a low value (e.g. 1000) for maximum iterations
At the end of initial traing, examine the neural net produced. Adjust training parameters, and click the "Commence Training" button again. This will use the existing neural net and further modifying it.
Repeat afjusting and re-training until the required solution is obtains, or when no further improvement is possible
Using the Trained Neural Net
This page also provides a platform for using the Bckpropagation network once it is trained.
Requirements The structure of the network must be stated in the Network Structure text area, and the trained network in the Neural Net Matrix text area. The two sets of numbers must be compatiable, in that the number of inputs and outputs are the same
Clicking the Produce Program button will produce a Jaascript function that will calculate the output from a row of inputs. This program can be copied and pasted into any web page that acts as an interpreter of the neural net, or as a basis to write a function in any other computer language in another application.
To Calculate Results a set of data is required in the Data Matrix text box. For interpretation, only the input values are required (the rest of the columns will be ignored). Clicking the Calculate Results will produce a table , each row containing the input values, followed by the result output values.