for artificial intelligence, computer science and linguistics
Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch.