Multi-Layer Perceptron

Linearly Inseperable Problems

A Linearly Inseperable Problem You must have realised that there are problems which can't be solved by slicing the input space with a single straight plane, however many dimensions you use. These are linearly inseparable problems. Here's a simple example, with 2-dimensional space (2 inputs). The neural network is supposed to produce a 1 if the combined input is in either of the two regions shown or 0 if it is in middle region.

The problem is that a single layer perceptron (SLP) can't produce such a division of the input space, since it is impossible to draw a straight line through that input space with the red region entirely on one side and the blue region entirely on the other. This problem and countless variations on it are generally called the Exclusive OR problem, referring to the Exclusive OR (XOR) function in digital electronics, which produces an output if either one of its inputs is true but not both.

MLP network for XOR problem The solution to this problem is to have an SLP divide up the input space as best it can and then have the outputs from that SLP interpreted by another SLP, giving a simple 1 or 0 (yes or no) answer. This SLP-piggy-backed-on-an-SLP arrangement is called a Multi-Layer Perceptron (MLP), and is probably the most common configuration for an ANN.

The diagram on the left shows the MLP for the two-dimensional XOR problem shown above - an SLP containing two neurons and feeding into a single neuron that produces the single 1 or 0 output, and the JavaScript code for the network is in this textarea:

X: <input type="text" size="4" name="input1" id="input1" /><br />
Y: <input type="text" size="4" name="input2" id="input2" /><br />
<input type="button" value="Run the network" onclick="run_net();" />

var inp = new Array(); // Assume input values read from text fields

var w = new Array();    // Weights from inputs to IPL neurons
w[1] = new Array(0,1,-1); // Weights from input 1 to IPL neurons 1 and 2
w[2] = new Array(0,1,-1); // Weights from input 2 to IPL neurons 1 and 2

var thresh = new Array(0,1.5,-0.5);  // Thresholds for IPL neurons
var IPL_outs = new Array();          // Outputs from IPL neurons
var outw = new Array(0,0.5,0.5);     // Weights from IPL neurons to OPL neuron
var outthresh = 0.2;                 // Threshold for single OPL neuron

function transfer (x)  // Simple step transfer function
{ if (x > 0)
    return 1;
  else
    return 0;
}

function run_net ()
{ // Alter the following lines depending on where/how you get the input values
  inp[1] = eval(document.getElementById("input1").value);
  inp[2] = eval(document.getElementById("input2").value);

// Run IPL
  for (var j = 1; j <= NUM_INPLAYER; j++)    // Go through all IPL neurons
   { var total = 0;
     for (var i = 1; i <= NUM_INPUTS; i++)
       total += inp[i] * w[i][j];
     IPL_outs[j] = transfer(total - thresh[j]);
   }

// The OPL consists of one neuron, so run that!
  total = 0;
  for (i = 1; i <= NUM_INPLAYER; i++)  // This time "inputs" are IPL outputs.
    total += IPL_outs[i] * outw[i];
  alert(transfer(total - outthresh));  // Simply output the value, 0 or 1.
}
</script>

Want to try it without all that "Copy and Paste" nonsense? Go ahead:
X: Y:

You will notice that some of the weight and threshold values are negative. I did say that they didn't have to be limited to the range 0 to 1. If you want to know how I calculated the values, click here.

The General MLP

Of course, that was a very simple MLP (probably the simplest that you can get). A more general one would have several neurons in both the input layer and the output layer, as shown in this diagram. Such a network would probably be referred to in the literature as a 2-layer network. I say probably, as some researchers refer to the set of inputs themselves as a layer, even though they are just points in the network where the input signal appears rather than neurons themselves, and so they call that network a 3-layer network.

The most general MLP In order to divide the input space in any arbitrary way you want, you need a Multi-Layer Perceptron with three true layers of neurons. It can be shown that the input space (of however many dimensions) can be divided up in as complicated a way as you like if your MLP has three layers - no more are needed (although you can add more layers of neurons if you want) - providing you have enough neurons in each layer, of course (Lippmann, 1987).

The second layer of neurons is called the hidden layer, as neither the inputs of those neurons nor their outputs are connected to the "outside world". The outputs of the input layer neurons feed into the hidden layer, and the outputs of the hidden layer neurons feed into the output layer.

... And now for some code!

From here on, I shall adopt a couple of conventions:

The first index of the neurons in any layer is 0, i.e. assuming that there NUM_INPUTS inputs, that their indices go 0, 1, 2, ... up to NUM_INPUTS - 1
All weights, thresholds and outputs for the input layer are preceded by i_ so
i_w[i][j] is the weight from input i to input layer neuron j.
i_thresh[j] is the threshold of input layer neuron j.
i_out[j] is the output of input layer neuron j (fed into the hidden layer).
All weights, thresholds and outputs for the hidden layer are preceded by h_ so
h_w[i][j] is the weight from the output of input layer neuron i to hidden layer neuron j.
h_thresh[j] is the threshold of hidden layer neuron j.
h_out[j] is the output of hidden layer neuron j (fed into the output layer).
All weights, thresholds and outputs for the output layer are preceded by o_ so
o_w[i][j] is the weight from the output of hidden layer neuron i to output layer neuron j.
o_thresh[j] is the threshold of output layer neuron j.
o_out[j] is the output of out layer neuron j, representing the outputs from the entire network.

function transfer (x)
{ return 1 / (1 + exp(-x));
}

function run_input_layer ()
{ var i,j;
  for (j = 0; j < NUM_INPUTS; j++)     // Go through all input layer neurons
   { var total = 0;
     for (i = 0; i < NUM_INPUTS; i++)  // For each neuron, go through all inputs
       total += i_w[i][j] * inputs[i];
     i_out[j] = transfer(total - i_thresh[j]);
   }
}

function run_hidden_layer ()
{ var i,j;
  for (j = 0; j < NUM_HIDDEN; j++)     // Go through all hidden layer neurons
   { var total = 0;
     for (i = 0; i < NUM_INPUTS; i++)  // For each neuron, go through all IPL neurons
       total += h_w[i][j] * i_out[i];
     h_out[j] = transfer(total - h_thresh[j]);
   }
}

function run_output_layer ()
{ var i,j;
  for (j = 0; j < NUM_OUTPUTS; j++)     // Go through all output layer neurons
   { var total = 0;
     for (i = 0; i < NUM_HIDDEN; i++)   // For each neuron, go through all HL neurons
       total += o_w[i][j] * h_out[i];
     o_out[j] = transfer(total - o_thresh[j]);
   }
}

function run_net ()
{ run_input_layer();
  run_hidden_layer();
  run_output_layer();
  // The final outputs from the network are in o_out[]
}

When you look at the code you will see that it consists of three almost identical functions, one for each layer. The run_input_layer() function takes the input values and multiplies them by the weights on the input layer to produce an output for the inner layer. The run_hidden_layer() function takes the outputs from the input layer and multiplies them by the weights on the hidden layer to produce an output for the hidden layer. The run_output_layer() function takes the output values for the hidden layer and multiplies them by the weights on the output layer to produce a final output for the network itself. To run the network, simply run each of the functions in order.

If you wanted to, you could add extra layers by putting in similar weights, thresholds and output values plus an extra function for that layer, though you could make your network solve more complicated problems simply by adding more hidden layer neurons.

So far we have glossed over a very important aspect of ANNs - How do we set all those weights and thresholds? After all, they form the knowledge in the ANN itself. For a MLP, there is a standard way to set weights and thresholds, the Back Propagation Method, which we shall meet in the next section...