If our model can get better attention, the performance will be improved. How to get much better attention? We can use supervised learning method to train our attention with cross entropy loss function. However, we should determine the correct attention value before training. If there are multiple attention layers, the attention loss function may be more.
Here is the full tutorial!