Jay Alammar's overview of BERT-based models shows us some not so obvious conclusions - like the fact that the best way of getting a contextual word embedding isn't taking the output layer of the model.
http://jalammar.github.io/illustrated-bert/
QOTO: Question Others to Teach Ourselves An inclusive, Academic Freedom, instance All cultures welcome. Hate speech and harassment strictly forbidden.