In SimCLR we are going to show our neural network two different augmentations of our sample and train it to recognize that this is one and the same thing.
This way we're training the network to look past the augmentations and focus on what's important on the image. (3/4)
If you look at the first picture, you may ask yourself a question - why do we need to treat the projection from h to z separately?
It turns out that it's a side effect of contrastive loss! z vectors contain only the bare minimum info needed to recognize the similarity. (4/4)