Abstract
The modern Hopfield network generalizes the classical Hopfield network by
allowing for sharper interaction functions. This increases the capacity of the
network as an autoassociative memory as nearby learned attractors will not
interfere with one another. However, the implementation of the network relies
on applying large exponents to the dot product of memory vectors and probe
vectors. If the dimension of the data is large the calculation can be very
large and result in problems when using floating point numbers in a practical
implementation. We describe this problem in detail, modify the original network
description to mitigate the problem, and show the modification will not alter
the networks' dynamics during update or training. We also show our modification
greatly improves hyperparameter selection for the modern Hopfield network,
removing hyperparameter dependence on the interaction vertex and resulting in
an optimal region of hyperparameters that does not significantly change with
the interaction vertex as it does in the original network.