1/10) Can we find a label-free model selection algorithm in Self-supervised learning (#SSL) for #vision?
Our #NeurIPS2022 paper, α-ReQ, presents just such an #algorithm based on the structure of #representations in the #neocortex!
This is work w/ Kumar Agrawal (#Berkeley), Arnab Kumar Mondal and Arna Ghosh (#McGill/#Mila). 🧵
paper: https://openreview.net/forum?id=ii9X4vtZGTZ
blog: https://people.eecs.berkeley.edu/~krishna/blog/nn/ssl.html
2/10) #SSL models show great promise and can learn #representations from large-scale unlabelled data. But, identifying the best model across diff #hyperparameter configs requires measuring downstream task performance, which requires #labels and adds to the #compute time+resources. 😕
3/10) Driven by recent 🧠 findings in the #visual #cortex, we propose using the slope of the #eigenspectrum decay of the representation #covariance, termed α, as a measure of representation quality for #SSL model representations.
4/10) In past work, @Computingnature and @marius10p et al. recorded responses to natural images in a mouse #visual #cortex and found that the variance encoded in the n-th dimension scales roughly as 1/n, i.e., α~1.
5/10) In our paper, we study the #eigenspectrum of #DNN representations trained across different loss functions, architectures, and datasets and assess the corresponding out-of-distribution (#OoD) #generalization performance.
6/10) We find that activations across different layers have an #eigenspectrum that follows a #powerlaw. Furthermore, well-defined intervals exist for the power law decay coefficient, α, where models exhibit excellent #OoD #generalization! 📈🎉🥳
7/10) This finding led to our #proposal: Can we use α for #modelSelection in an #SSL pipeline?
Two key +s of α:
1. α doesn’t require labels
2. α is quick to #compute (compared to training a readout)
We study hyperparam selection in #BarlowTwins (Zbontar et al.) as a case study!
8/10) We find that α correlates more strongly to downstream task performance than the #BarlowTwins loss itself! 🤯
Thus, we propose a model selection #algorithm based on this result to reduce the number of #readout evals required to identify the best #hyperparamaters. 🤓
9/10) Our model selection #algorithm is as follows:
Use α to filter out bad models and perform readout eval on a downstream task only on “good” models.
Our proposal decreases the #readout evals from #linear to #logarithmic growth in #configs in a fixed #compute budget setting. 🎉🥳
@tyrell_turing this is really cool!
@smfleming
Thanks! 🙂