The "attention is all you need" paper did not invent attention mechanisms. It showed that existing models that were already using attention could have their non-attention parts removed and still worked. So those other parts were unnecessary and only attention was needed.