- The most commonly used LSTM architecture (vanilla LSTM) performs reasonably well on various datasets and using any of eight possible modifications does not significantly improve the LSTM performance.
- Certain modifications such as coupling the input and forget gates or removing peephole connections simplify LSTM without significantly hurting performance.
- The forget gate and the output activation function are the critical components of the LSTM block. While the first is crucial for LSTM performance, the second is necessary whenever the cell state is unbounded.
2015-03-18
[1503.04069] LSTM: A Search Space Odyssey
http://arxiv.org/abs/1503.04069