Multi-Threshold Neurones Perceptron

Abstract. Minsky & Papert [1] show that a two layer perceptron with monotonic activation function cannot solve Xor problem. We present a two layer perceptron with non monotonic activation function which can separate non linearly separated sets of data. This kind of perceptron is then applied to xor problem, parity problem and a pattern recognition problem. Learning strategy is detailed and performance is evaluated.

1. Introduction As Stork & Allen [2] noticed, there are no fast and hard rule as to the number of hidden units required in a three layer perceptron. But, in a two layer perceptron, the layers are directly specified by the problem. It is then interesting to use a two layer perceptron if it is able to separate non linearly separated sets of data. This will be possible if the activation function is non monotonic. Periodic activation function have already been used, by Brady [3] or Gioello & Sorbello [4] for instance, but always in a 3 layer perceptron. Consider a 2 layer perceptron, the input layer containing 2 cells, the output layer one cell , which tries to recognise the Xor boolean function..

Fig 1 Two layer perceptron

Let us assume that the activation threshold is S for each cell. If C1 and C2 are the synaptic weights between input cells and the output cell, the entry array (0,0) gives the result 0. If the two arrays (0,1) and (1,0) must give the response 1 then C1>S and C2>S. But then the activity of the output cell is C1+C2>2*S>S when the input is (1,1) and the result is false. If we assume that the cells have two thresholds, S1 and S2, with the response 0 if activity is less than S1, 1 between S1 and S2 and 0 after S2, the XOR problem can be solved if S1<C1<S2, S1<C2<S2, and (C1+C2)> S2 .

Fig 2 Two-thresholds activation function

It is easy to see that one can choose C1 and C2 to solve OR and AND problem : For the AND problem, C1<=S1, C2<=S1 and S1<C1+C2<S2 gives a correct answer. One can chose C1=C2=S1/2+ , S2>S1+2*, S1>0 to have the good values. For the OR problem, S1<C1<S2, s1<C2<S2, and S1<(C1+C2)<S2 are necessary, so one can choose C1=C2=S1+, and S2>2*(S1+). When you chose good values for S1 and S2, this perceptron can solve the three Boolean operations.

2. From two thresholds neurones to multi thresholds neurones. It may be difficult to choose the good values for the threshold, for a given problem. Perhaps is it possible to change the threshold values during learning, but it seems simpler to have an infinity of thresholds, with alternatively 0 and 1 responses. Such a neurone could be compared to a fork with an infinity of slots which give the response 1. A new problem raises: what is the optimal with of the slots and what about the learning strategy? We will study this points with periodic neurones: The activation function is defined by:

O= odd(truncate (A)) where A= (Ci x Ei) , Ci is the weight on synapse from the cell i in the input layer, Ei the value (0 or 1) of this cell . Where odd gives 1 for odd values else 0. This gives an infinity of slots with width=1, and a slot of with 1 each side of zero.

Fig 3: multi threshold activation function

The next problem is the training of a two layer perceptron which each cell of the output layer is a periodic neurone. For each cell in the output layer, if the activity is not correct, we have to modify the synaptic weights of the links leading to this cell from actives cells of the first layer. We modify the weights in order to increase or decrease A to reach the nearest good slot: for example, if we find that A=1.7 (O=1) and want to have O=0 then we have to increase A up to 2+ . We dispatch the correction regularly over all weights leading to this cell: if there are 3 weights to correct, each will be increased of (2+ -1.7)/3 . To avoid oscillations we introduce a random factor in the correction . In our example each weight will be increased of (2+ - 1.7 +r)/3 where r is a random value in [ 0..1 [ .

3. Some experiments with a periodic perceptron. We have tested this model with several problems.

· The first one is the XOR problem: The Neural Network is very simple: two input cells and one output cell:

Fig 4 Xor periodic perceptron

Here are the results of training sequences. The convergence is rapidly obtained. The initial value of the weight is 0 for every synaptic link.

Test Ndeg. 1

Epoch 1 3 errors C1= 0.558643 C2= 0.181427

Epoch 2 2 errors C1= 0.654978 C2= 0.339543

Epoch 3 2 errors C1= 1.405688 C2= 1.277607

Epoch 4 0 errors C1= 1.405688 C2= 1.277607

Test Ndeg. 2

Epoch 1 2 errors C1= 0.351805 C2= 0.110382

Epoch 2 3 errors C1= 0.508447 C2= 0.431842

Epoch 3 3 errors C1= 0.645629 C2= 1.066659

Epoch 4 1 errors C1= 1.198431 C2= 1.066659

Epoch 5 0 errors C1= 1.198431 C2= 1.066659

· The second is the parity problem. The input layer has 16 binary cells, the output layer only one which value is 1 if and only if the number of input cells with value 1 is even, else 0. The training stage is much faster than with a 3 layer perceptron and backpropagation learning. Here are the results of a test sequence with the number of errors in each epoch and the synaptic values after training :

Test Ndeg.1

Epoch 1  Errors =42314     Epoch 6  Errors =8         Epoch 10  Errors =2        
Epoch 2  Errors =42266     Epoch 7  Errors =6         Epoch 11  Errors =4        
Epoch 3  Errors =18348     Epoch 8  Errors =6         Epoch 12  Errors =1        
Epoch 4  Errors =47        Epoch 9  Errors =4         Epoch 13  Errors =0        
Epoch 5  Errors =17

final values of synaptic weights

-1.002 -1.170 -1.052 -1.026 -1.110 -1.066 0.842 0.998

-1.031 0.973 0.938 -1.031 0.964 -1.007 -1.032 0.838

---------------------------------------------------------------------------

Test Ndeg.2

Epoch 1  Errors      Epoch 5  Errors      Epoch 9  Errors      Epoch 13         
=42236              =42406              =15                 Errors =8           
Epoch 2  Errors      Epoch 6  Errors      Epoch 10             Epoch 14         
=42452              =42159              Errors =9           Errors =6           
Epoch 3  Errors      Epoch 7  Errors      Epoch 11             Epoch 15         
=42179              =183                Errors =3           Errors =2           
Epoch 4  Errors      Epoch 8  Errors      Epoch 12             Epoch 16         
=42166              =24                 Errors =7           Errors =0

final values of synaptic weights

-1.179 -1.035 -1.045 -1.153 -1.001 -1.053 -3.021 -1.002

-1.023 -1.005 -3.045 -1.074 0.998 2.962 -1.057 2.770

---------------------------------------------------------------------------

Test Ndeg.3

Epoch 1  Errors =42283     Epoch 4  Errors =19        Epoch 7  Errors =1         
Epoch 2  Errors =42328     Epoch 5  Errors =22        Epoch 8  Errors =0         
Epoch 3  Errors =5529      Epoch 6  Errors =8

final values of synaptic weights

-1.025 -1.022 -1.099 -1.046 -1.005 -1.038 -1.037 -1.015

0.926 -1.006 0.996 0.979 0.995 -3.001 2.786 0.628

· The third problem is a pattern recognition problem: We want to recognise the ten digits, each of them is represented in a 12x12 matrix of dots and looks like a digital calculator display :

Fig 5 models of digits

The input layer has then 64 cells (only 64 dots are really used) and the output layer 10 cells. The learning stage is very quick and convergence is obtained after 6 epoch. We then test the ability of the neural network to retrieve the good answer when the input is disturbed by switching N dots to the opposite value ( N varying from 5 to 40). During this test stage, only the best output cell is considered, if there are more than one active output cell . The best one is this with activity A nearest of the centre of a slot.

Fig 6 disturbed inputs, N=22

For each N, this work is repeated 1000 times (100 times for each digit) and the mean value of the good answer percent is computed. The same is done with a standard two layer perceptron to compare the results. Here are the result of this comparison, giving for each value of N the percent of good answers :

Fig 7 percent of good answers with noisy inputs

The answer is much better with the slots neurone than with the classical perceptron, and the learning is quicker.

4. Conclusions. The two layer perceptron with periodic neurones which can solve Xor type problems can replace MLP in many situations. The learning time is short, the generalisation abilities are good and the structure of the network is completely determined by the problem: there is no more hidden layer witch size is always empirically chosen. The size of the input layer is determined by the input data and the output layer by the results. The only parameter to tune is the random factor of the learning algorithm which can be do easily by a few attempts. As an example, we use a two layer periodic perceptron to do speech synthesis in guadeloupean creol from texts [5]: with a classical three layer neural network the training stage with 52 sentences was wery long. more than 12 hours were necessary to learn (with a 486 PC). The periodic perceptron performs the learning in only 6 minutes. Moreover, the generalisation ability of the periodic perceptron is better (95%) than the classical MLP (90%).

Refernces:

[1] M. Minsky, S. Papert Perceptrons: An introduction to computational geometry. Cambridge, MA:MIT Press 1969

[2] D.G. Stork, J.D. Allen How to Solve the N-Bit Parity Problem With Two Hidden Units Neural Networks vol 5 Ndeg. 6 nov-dec 1992

[3] M.J. Brady Guaranteed learning algorithm for networks with units having periodic threshold output functions. Neural Computation vol 2 p 405-408 1990

[4]M. Gioello, F. Sorbello A new VLSI neutal device with sinusoidal activation function for handwritten classification Neural Networks & Their Applications Marseille 20-22 mars 1996

[5] B. Filliatre, R. Racca Synthèse vocale du creole par un réseau de neurones 8deg. colloque international des études créoles Pointe à Pitre 6-10 mai 1996