2006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006
Stability of Equilibrium Points and Storage Capacity of Hopfield Neural Networks with Higher Order Nonlinearity Mohammad Reza Rajati, Mohammad Bagher Menhaj, Member, IEEE
Abstract—In this paper, we consider the storage capacity and stability of the so-called Hopfield neural networks with higher order nonlinearity. There are different ways to introduce higher order nonlinearity to the network; however we have considered one which does not have a huge computational cost. It is shown that, this modification of the Hopfield model significantly improves the storage capacity. We also classify the model via a stability measure, and study the effect of training the network with biased patterns on the stability.
L N In which L is the number of stored patterns and N is the number of neurons. It is frequently reported that the maximum α tolerated by the basic model with Hebbian learning is .14 for randomly realized unbiased patterns [4].
I. INTRODUCTION
It is proposed to introduce higher order nonlinearity to the Hopfield model to improve the storage capacity of the network (See [3], [5], [6], [7], and [8] for example). Hopfield employs Hebbian learning for obtaining the memory matrix W :
T
HE associative memory application of Hopfield neural network is an important and distinguished part of the neurocomputing literature. Although the basic Hopfield model with Hebbian learning suffers from many weaknesses and limitations, its dynamic behavior and interesting characteristics have founded a notable system theoretical basis for thorough investigations. In this article, we consider an architecture used for improving the storage capacity of the model. We also determine the storage capacity and behavior of the network in high loadings from the viewpoint of stability. Our approach is based on a specific measure of stability which will be explained in the paper. Other approaches to the stability of Hopfield neural networks with higher order correlation matrices could be found in the literature [1], [2], but the higher order models including the model we have analyzed [3] have not been considered for stability analysis via the stability measure discussed in this article. In section II, we define the concept of relative capacity. In section III, we review the basic Hopfield model and one of the ways of introducing higher order nonlinearity to it. In section IV, we consider the stability of the equilibrium points of the network. In section V we classify the higher order network via the stability measure considered in section IV. In the last section, we present some ideas for future work. II. RELATIVE CAPACITY The relative capacity of the Hopfield network is a common criterion to measure the amount of information stored in the network and is defined by: M. R. Rajati is an alumnus of the Department of Electrical Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, 15914 Iran (phone: +98-918-839-0095; fax: +98-21-640-6009; e-mail:
[email protected]). M. B. Menhaj is with the Department of Electrical Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, 15914, Iran (e-mail:
[email protected]).
0-7803-9490-9/06/$20.00/©2006 IEEE
αc =
III. INTRODUCING HIGHER ORDER NONLINEARITY TO THE MODEL
L
W =
∑ p i p iT i =1
In which p is the bipolar pattern to be stored in the network. The network is governed by the following difference equations: N
ni (t + 1) = ∑ wij a j (t ) + bi j
ai (t + 1) = f (n i (t + 1)) ai is the output vector of neuron i , n is the synaptic signal and f(u) = sgn(u) which acts component by component.
ai (0) is set equal to the input pattern to be recognized. One of the ways of introducing higher order nonlinearity to the model which does not have a significant computational cost is [3]: L
n(t + 1) =
∑ pi ( piT a)k i =1
a(t + 1) = f (n(t + 1)) This approach is similar to the match filter technique used in the pattern recognition literature [9]. Basically, the inner product of each stored pattern with the input is used as a weight parameter in a linear combination of the corresponding stored patterns. Multiplying each weight parameter by itself k times, will further strengthen the similarity of the inputs to a chosen prototype pattern and cancel all others. Thus, intuitively the architecture is supposed to reduce the number of spurious patterns.
3499
2
1
0.98
Normalized Overlap
Normalized Overlap
0.99
0.97 0.96 0.95
1.5
1
0.5
0.94 0.93
0
0
100
200 300 400 Number of Stored Patterns
500
600
0
100
200 300 400 Number of Stored Patterns
500
600
Figure 1. Normalized overlap of the quadratic network (k=2). The input pattern has a Hamming distance of 25 from the stored pattern. Each point on the graph is obtained over 20 runs
Figure 2. Normalized overlap of the cubic network (k=3). The input pattern has a Hamming distance of 25 from the stored pattern. Each point on the graph is obtained over 20 runs
Similarly, it can be recognized that this fact improves the storage capacity of the network. To obtain the relative capacity of higher order Hopfield models, computer simulations are carried out for 100 neurons. The network is fed by a modified version of one of the stored patterns (with a constant Hamming distance of 25 from it) and the normalized overlap of the output with the corresponding pattern is measured by: H 1− d N In which H d stands for the Hamming distance between two patterns. The overlap is measured relative to the number of patterns stored in the network. The network recalls the patterns in very high loadings almost correctly. (We used k=2, k=3. See Fig.1 and Fig. 2). It is notable that increasing the order of nonlinearity improves the storage capacity significantly, such that the cubic network (k=3) can recall patterns perfectly, even in a loading of α = 600% . It is much higher than the upper theoretical bound of relative loading (200%) mentioned by Gardner [10] for conventional Hopfield neural networks.
θ il = nil pil
IV. STABILITY OF THE STORED PATTERNS The conventional Hopfield net is proved to minimize the energy function:
E (t ) = −1 / 2∑∑ wij ai (t )a j (t ) i
j
The network elaborates in time until it reaches a stable equilibrium point, in which the output of the network does not change. (It may reach unstable and periodic states as well). Should a pattern p be stored in the network as a stable state, the following parameter must be non-negative for every i :
It is obvious that the weight matrix could be multiplied by any positive parameter without any effect on the stability. Therefore, we can define the following parameter as a normalized stability measure:
γ il =
nil pil N
∑w
2 ij
j =1
The minimum value of γ i ’s a parameter related to the l
size of the network’s attraction domains [11]. V. ABBOT’S METHOD FOR CLASSIFICATION OF HOPFIELD NETWORKS
Abbot classified Hopfield networks into three groups [12]. Members of each class have the same behavior when the loading on the network is near αmax, although they may behave differently when loaded below it. The global groups of recurrent associative memories are different in the distribution of their γ values when loaded near α max .
The distribution of γ values of the first group has a normal shape with a mean of:
1
α
( α < .15 )
This group is known as the Hopfield group because of the fact that Hopfield network with Hebbian learning falls into this group. The γ distribution of each member may contain negative γ values, i.e. they can have unstable equilibrium points. The members of the second group have matrices of pseudo-inverse type. The γ values theoretically converge to the same value γ 0 :
γ0 =
1−α
α
So we suppose a notch distribution of
3500
γ
values in our
numerical results (i.e. the width of the distribution must be much less than its minimum value). ( α must be near the maximum loading tolerated by the network,
α max ).
The third group has a clipped normal distribution, with positive values. To determine the group of models that higher order networks belong to, we trained two 1000 neuron higher order networks with 1500 random patterns. The loading is near 150% which is a high loading for the quadratic network (k=2) and is near its saturation point. It may not seem very high for the cubic network (k=3) which can bear a 600% loading, but simulation shows that the results of this paper are true for even higher loadings. Thereby, we analyze the γ distributions of the networks in the same loading, and both of the networks are trained via the same set of patterns. γ distributions are depicted in Fig. 3and Fig. 4. It is obvious that the shapes of the distributions are normal and there are no unstable patterns, thus these kinds of networks are among the pseudo-inverse class of models. The results are similar to those related to some high-capacity learning rules [13]. (We carried out simulation several times and didn’t encounter unstable states at all) Note that the γ values need additional normalization because of the higher order nature of the network, but for the sake of simplicity, we didn’t normalize them again. If they are somehow normalized, the notch shape of the distributions will be more obvious. With the method of normalization applied to the conventional Hopfield model, the γ values of the networks with higher order nonlinearity are greater, and this fact may be interpreted as a consequence of their better performance and the stable nature of their equilibrium points. We can see that the cubic network (k=3) has a sharper γ distribution; Also the minimum γ value is somewhat larger than that of the case k=2. These facts lead us to conclude that increasing the order of nonlinearity increases the stability performance. We also tried to know the effect of bias in higher order nets. We trained the same network with random patterns, but the probability of presence of 1 in each pattern was increased to 90%. The γ distribution obviously shows that the network stability and size of attraction basins decreases when trained by biased patterns, and some unstable patterns emerge in this case, in contrast to some other pseudo-inverse group’s networks [13].(Fig. 5, Fig.6)
increases the network performance. There are still some points that should be determined including the size and shape of basins of attraction and statistical upper bound of storage capacity, which are currently under investigation by the authors. The results will be theoretically invaluable if they are derived for the general case, i.e. not specified for a particular k. REFERENCES [1] [2] [3] [4] [5]
[6] [7] [8] [9] [10] [11] [12] [13]
VI. CONCLUSION We tried to classify Hopfield neural networks with higher order nonlinearity according to the stability of equilibrium points. As we noted, higher order Hopfield networks have better performance and storage capacity than Hopfield networks with Hebbian learning, and the improvement is significant. Also we noted that increasing the order of nonlinearity
3501
P. Baldi and S. S. Venkatesh, “Number of stable points for spin glasses and neural networks of higher orders,” Phys. Rev. Lett., vol. 58, pp. 913–916, 1987. Burnshtein D., “Long term attraction in higher order neural networks,” IEEE Trans. Neural Nets. vol. 9, pp. 42-50, Jan. 1998 Tai H.-M., Jong T.-L., “Neural Networks with Higher Order Nonlinearity,” IEE Electronic Letters, vol. 24, pp. 1225-1226, Sept. 1988 McEliece, R. J., Posner, E. C., Rodemich, E. R., and Venkatesh, S. S. 1987. “The capacity of the Hopfield associative memory,” IEEE Trans. Inf. Theor. vol. 33, pp. 461-482, Jul. 1987. Chen, H. H., Lee, Y. C., Sun, G. Z., Lee H. Y., Maxwell T., and Lee Giles C.: “High order correlation model for associative memory,” Proc. AIP Conference on Neural Networks for Computing, Snow-bird, 1986, pp. 86-99. Lee Y. C., Doolen G., Chen H. H., Sun G. Z., Maxwell T., Lee H. Y., and Giles C. L., “Machine learning using a higher order correlation network,” Physica, vol. 22D, pp. 276–306, 1986. Psaltis D., Park C. H., Hong J., “Higher-order associative memories and their optical implementations," Neural Networks, vol.1, No.2, pp. 149-163, 1998. Sejnowski, T. J.: “Higher-order Boltzman machines,” Proc. AIP Conference on Neural Networks for Computing, Snowbird, 1986, pp. 398 -403. Duda, R.O., Hart, P. E.: Pattern classification and scene analysis, Wiley, New York, 1973. Gardner E., “The phase space of interactions in neural network models,” J. Phys. vol. 21A, pp. 257-270, 1988 Krauth W., Mezard M., “Learning algorithms with optimal stability in neural networks,” J. Phys. A: Math. Gen. vol. 20, pp. L745–L752, 1987. Abbot L.F., Kepler T.B., “Universality in the space of interactions for network models,” J. Phys. A: Math. Gen. vol. 22 pp. 2031–2038, 1989. Rajati M. R., Menhaj M. B., “On classification of some Hopfield-type learning rules via stability measures,” Lecture Notes in Computer Science, to be published
4
9
x 10
8 7
Frequency
6 5 4 3 2 1 0
Figure3. Distribution of
γ
values in the quadratic network (k=3)
0
1
Figure4.Distribution of
2
γ
3 Gamma
4
5 8
x 10
values in the cubic network (k=3)
5
4.5
5
4.5
x 10
x 10
4
4 3.5 3.5 3
Frequency
Frequency
3 2.5 2
2.5 2 1.5
1.5
1
1 0.5 0.5 0 −1.5
0 −1.5 −1
−0.5
0
0.5 Gamma
1
1.5
2 7
x 10
Figure 5. Distribution of γ values for the quadratic neural network trained with biased patterns
−1
−0.5
0
0.5 Gamma
1
1.5
2 7
x 10
Figure 6. Distribution of γ values for the cubic neural network trained with biased patterns
3502