Modern Hopfield network

Modern Hopfield networks^[1]^[2] (also known as Dense Associative Memories^[3]) are generalizations of the classical Hopfield networks that break the linear scaling relationship between the number of input features and the number of stored memories. This is achieved by introducing stronger non-linearities (either in the energy function or neurons’ activation functions) leading to super-linear^[3] (even an exponential^[4]) memory storage capacity as a function of the number of feature neurons. The network still requires a sufficient number of hidden neurons.^[5]

The key theoretical idea behind the Modern Hopfield networks is to use an energy function and an update rule that is more sharply peaked around the stored memories in the space of neuron’s configurations compared to the classical Hopfield network.^[3]

Classical Hopfield networks

Hopfield networks^[6]^[7] are recurrent neural networks with dynamical trajectories converging to fixed point attractor states and described by an energy function. The state of each model neuron ${\textstyle i}$ is defined by a time-dependent variable $V_{i}$ , which can be chosen to be either discrete or continuous. A complete model describes the mathematics of how the future state of activity of each neuron depends on the known present or previous activity of all the neurons.

In the original Hopfield model of associative memory,^[6] the variables were binary, and the dynamics were described by a one-at-a-time update of the state of the neurons. An energy function quadratic in the $V_{i}$ was defined, and the dynamics consisted of changing the activity of each single neuron $i$ only if doing so would lower the total energy of the system. This same idea was extended to the case of $V_{i}$ being a continuous variable representing the output of neuron $i$ , and $V_{i}$ being a monotonic function of an input current. The dynamics became expressed as a set of first-order differential equations for which the "energy" of the system always decreased.^[7] The energy in the continuous case has one term which is quadratic in the $V_{i}$ (as in the binary model), and a second term which depends on the gain function (neuron's activation function). While having many desirable properties of associative memory, both of these classical systems suffer from a small memory storage capacity, which scales linearly with the number of input features.^[6]

Discrete variables

A simple example^[3] of the Modern Hopfield network can be written in terms of binary variables $V_{i}$ that represent the active $V_{i}=+1$ and inactive $V_{i}=-1$ state of the model neuron $i$ .

E=-\sum \limits _{\mu =1}^{N_{\text{mem}}}F{\Big (}\sum \limits _{i=1}^{N_{f}}\xi _{\mu i}V_{i}{\Big )}

In this formula the weights

{\textstyle \xi _{\mu i}}

represent the matrix of memory vectors (index

\mu =1...N_{\text{mem}}

enumerates different memories, and index

i=1...N_{f}

enumerates the content of each memory corresponding to the

i

-th feature neuron), and the function

F(x)

is a rapidly growing non-linear function. The update rule for individual neurons (in the asynchronous case) can be written in the following form

V_{i}^{(t+1)}=\operatorname {sign} {\bigg [}\sum \limits _{\mu =1}^{N_{\text{mem}}}{\bigg (}F{\Big (}\xi _{\mu i}+\sum \limits _{j\neq i}\xi _{\mu j}V_{j}^{(t)}{\Big )}-F{\Big (}-\xi _{\mu i}+\sum \limits _{j\neq i}\xi _{\mu j}V_{j}^{(t)}{\Big )}{\bigg )}{\bigg ]}

which states that in order to calculate the updated state of the

{\textstyle i}

-th neuron the network compares two energies: the energy of the network with the

i

-th neuron in the ON state and the energy of the network with the

i

-th neuron in the OFF state, given the states of the remaining neuron. The updated state of the

i

-th neuron selects the state that has the lowest of the two energies.^[3]

In the limiting case when the non-linear energy function is quadratic $F(x)=x^{2}$ these equations reduce to the familiar energy function and the update rule for the classical binary Hopfield network.^[6]

The memory storage capacity of these networks can be calculated for random binary patterns. For the power energy function $F(x)=x^{n}$ the maximal number of memories that can be stored and retrieved from this network without errors is given by^[3]

N_{\text{mem}}^{\max }\approx {\frac {1}{2(2n-3)!!}}{\frac {N_{f}^{n-1}}{\ln(N_{f})}}

For an exponential energy function

{\textstyle F(x)=e^{x}}

the memory storage capacity is exponential in the number of feature neurons^[4]

N_{\text{mem}}^{\max }\approx 2^{N_{f}/2}

Continuous variables

Modern Hopfield networks or Dense Associative Memories can be best understood in continuous variables and continuous time.^[1]^[5] Consider the network architecture, shown in Fig.1, and the equations for the neurons' state evolution^[5]

()

where the currents of the feature neurons are denoted by ${\textstyle x_{i}}$ , and the currents of the memory neurons are denoted by $h_{\mu }$ ( $h$ stands for hidden neurons). There are no synaptic connections among the feature neurons or the memory neurons. A matrix $\xi _{\mu i}$ denotes the strength of synapses from a feature neuron $i$ to the memory neuron $\mu$ . The synapses are assumed to be symmetric, so that the same value characterizes a different physical synapse from the memory neuron $\mu$ to the feature neuron $i$ . The outputs of the memory neurons and the feature neurons are denoted by $f_{\mu }$ and $g_{i}$ , which are non-linear functions of the corresponding currents. In general these outputs can depend on the currents of all the neurons in that layer so that $f_{\mu }=f(\{h_{\mu }\})$ and ${\textstyle g_{i}=g(\{x_{i}\})}$ . It is convenient to define these activation function as derivatives of the Lagrangian functions for the two groups of neurons

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Search

Modern Hopfield network

Contents

Classical Hopfield networks

Discrete variables

Continuous variables

Relationship to classical Hopfield network with continuous variables

General formulation of the modern Hopfield network

Hierarchical associative memory network

References