knowledge, the teacher is able to provide the neural network with a desired responsefor that training vector. Indeed, the desired response represents the "optimum" ac-tion to be performed by the neural network. The network parameters are adjustedunder the combined influence of the training vector and the error signal. The errorsignal is defined as the difference between the desired response and the actual re-sponse of the network. This adjustment is carried out iteratively in a step-by-stepfashion with the aim of eventually making the neural network emulate the teacher;the emulation is presumed to be optimum in some statistical sense. In this way,knowledge of the environment available to the teacher is transferred to the neuralnetwork through training and stored in the form of"fixed" synaptic weights, repre-senting long-term memory. When this condition is reached, we may then dispensewith the teacher and let the neural network deal with the environment completelyby itself.
The form of supervised learning we have just described is the basis of error-correction learning. From Fig. 24, we see that the supervised-learning process con-stitutes a closed-loop feedback system, but the unknown environment is outside theloop. As a performance measure for the system, we may think in terms of the mean-square error, or the sum of squared errors over the training sample, defined as a func-tion of the free parameters (i.e., synaptic weights) of the system. This function maybe visualized as a multidimensional error-performance surface, or simply error surface,with the free paiameters as coordinates.The true error surface is averaged over allpossible input-output examples. Any given operation of the system under theteachers supervision is represented as a point on the error surface. For the system toimprove performance over time and therefore learn from the teacher, the operatingpoint has to move down successively toward a minimum point of the error surface;the minimum point may be a local minimum or a global minimum. A supervisedlearning system is able to do this with the useful information it has about the gradient of the error surface corresponding to the current behavior of the system.