If you look at the first picture, you may ask yourself a question - why do we need to treat the projection from h to z separately?
It turns out that it's a side effect of contrastive loss! z vectors contain only the bare minimum info needed to recognize the similarity. (4/4)