This is an implementation of the paper: https://arxiv.org/pdf/1703.03130.pdf published in ICLR 2017. This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, the paper use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence.It also propose a self-attention mechanism.
The implementation is done on the imdb dataset with the following parameters:
top_words = 10000
learning_rate =0.001
max_seq_len = 200
emb_dim = 300
batch_size=500
u=64
da = 32
r= 16
top_words : only consider the top 10,000 most common words
u: hidden unit number for each unidirectional LSTM
da : a hyperparameter we can set arbitrarily.
r : no. of different parts to be extracted from the sentence.
python self-attention.py
Running this for 4 epochs gives a training accuracy of 94%
and test accuracy of 87%
.
Penalization term
results on other datasets