1/10) Can we find a label-free model selection algorithm in Self-supervised learning (#SSL) for #vision?

Our #NeurIPS2022 paper, α-ReQ, presents just such an #algorithm based on the structure of #representations in the #neocortex!

This is work w/ Kumar Agrawal (#Berkeley), Arnab Kumar Mondal and Arna Ghosh (#McGill/#Mila). 🧵

paper: openreview.net/forum?id=ii9X4v
blog: people.eecs.berkeley.edu/~kris

#AI #ML #deeplearning

2/10) #SSL models show great promise and can learn #representations from large-scale unlabelled data. But, identifying the best model across diff #hyperparameter configs requires measuring downstream task performance, which requires #labels and adds to the #compute time+resources. 😕

#AI #ML #deeplearning

3/10) Driven by recent 🧠 findings in the #visual #cortex, we propose using the slope of the #eigenspectrum decay of the representation #covariance, termed α, as a measure of representation quality for #SSL model representations.

#AI #ML #deeplearning

4/10) In past work, @Computingnature and @marius10p et al. recorded responses to natural images in a mouse #visual #cortex and found that the variance encoded in the n-th dimension scales roughly as 1/n, i.e., α~1.

#AI #ML #deeplearning

5/10) In our paper, we study the #eigenspectrum of #DNN representations trained across different loss functions, architectures, and datasets and assess the corresponding out-of-distribution (#OoD) #generalization performance.

#AI #ML #deeplearning #neuroscience

6/10) We find that activations across different layers have an #eigenspectrum that follows a #powerlaw. Furthermore, well-defined intervals exist for the power law decay coefficient, α, where models exhibit excellent #OoD #generalization! 📈🎉🥳

#AI #ML #deeplearning #neuroscience

7/10) This finding led to our #proposal: Can we use α for #modelSelection in an #SSL pipeline?

Two key +s of α:

1. α doesn’t require labels

2. α is quick to #compute (compared to training a readout)

We study hyperparam selection in #BarlowTwins (Zbontar et al.) as a case study!

#AI #ML #deeplearning #neuroscience

8/10) We find that α correlates more strongly to downstream task performance than the #BarlowTwins loss itself! 🤯
Thus, we propose a model selection #algorithm based on this result to reduce the number of #readout evals required to identify the best #hyperparamaters. 🤓

#AI #ML #deeplearning #neuroscience

9/10) Our model selection #algorithm is as follows:
Use α to filter out bad models and perform readout eval on a downstream task only on “good” models.
Our proposal decreases the #readout evals from #linear to #logarithmic growth in #configs in a fixed #compute budget setting. 🎉🥳

#AI #ML #deeplearning #neuroscience

10/10) We believe our work is a great example of clear #neuroscience 🧠 to #AI 🤖 research!

Lots of open questions and interesting #research directions.

🤓💪

Drop by our poster at #NeurIPS2022 on Wed (Nov 30) 14:30-16:00 PST @ Hall J 642 if you are interested! 😀

#AI #ML #deeplearning

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.