Steve Fleming: "@tyrell_turing@fediscience.org this is really coo…"

1/10) Can we find a label-free model selection algorithm in Self-supervised learning (#SSL) for #vision?

Our #NeurIPS2022 paper, α-ReQ, presents just such an #algorithm based on the structure of #representations in the #neocortex!

This is work w/ Kumar Agrawal (#Berkeley), Arnab Kumar Mondal and Arna Ghosh (#McGill/#Mila). 🧵

paper: https://openreview.net/forum?id=ii9X4vtZGTZ
blog: https://people.eecs.berkeley.edu/~krishna/blog/nn/ssl.html

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:11

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:11

Nov 29, 2022, 14:11

2/10) #SSL models show great promise and can learn #representations from large-scale unlabelled data. But, identifying the best model across diff #hyperparameter configs requires measuring downstream task performance, which requires #labels and adds to the #compute time+resources. 😕

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:12

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:12

Nov 29, 2022, 14:12

3/10) Driven by recent 🧠 findings in the #visual #cortex, we propose using the slope of the #eigenspectrum decay of the representation #covariance, termed α, as a measure of representation quality for #SSL model representations.

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:13

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:13

Nov 29, 2022, 14:13

4/10) In past work, @Computingnature and @marius10p et al. recorded responses to natural images in a mouse #visual #cortex and found that the variance encoded in the n-th dimension scales roughly as 1/n, i.e., α~1.

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:18

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:18

Nov 29, 2022, 14:18

5/10) In our paper, we study the #eigenspectrum of #DNN representations trained across different loss functions, architectures, and datasets and assess the corresponding out-of-distribution (#OoD) #generalization performance.

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:19

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:19

Nov 29, 2022, 14:19

6/10) We find that activations across different layers have an #eigenspectrum that follows a #powerlaw. Furthermore, well-defined intervals exist for the power law decay coefficient, α, where models exhibit excellent #OoD #generalization! 📈🎉🥳

22c1bc614eb8e58a.png

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:20

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:20

Nov 29, 2022, 14:20

7/10) This finding led to our #proposal: Can we use α for #modelSelection in an #SSL pipeline?

Two key +s of α:

1. α doesn’t require labels

2. α is quick to #compute (compared to training a readout)

We study hyperparam selection in #BarlowTwins (Zbontar et al.) as a case study!

8622edd45dd728a1.png

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:21

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:21

Nov 29, 2022, 14:21

8/10) We find that α correlates more strongly to downstream task performance than the #BarlowTwins loss itself! 🤯
Thus, we propose a model selection #algorithm based on this result to reduce the number of #readout evals required to identify the best #hyperparamaters. 🤓

f94e4f4d10cc4b77.png

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:21

**Blake Richards** @tyrell_turing@fediscience.org · Nov 29, 2022, 14:21

Nov 29, 2022, 14:21