[1503.04069] LSTM: A Search Space Odyssey
http://arxiv.org/abs/1503.04069
-
The most commonly used LSTM architecture (vanilla
LSTM) performs reasonably well on various datasets
and using any of eight possible modifications does not
significantly improve the LSTM performance.
-
Certain modifications such as coupling the input
and forget gates or removing peephole connections
simplify LSTM without significantly hurting performance.
-
The forget gate and the output activation function are
the critical components of the LSTM block. While
the first is crucial for LSTM performance, the second
is necessary whenever the cell state is unbounded.