Here's a long-overdue Mastodon post on 🪩disco, an open-source Python toolkit for easily aligning language models to preferences using DIStributional COntrol techniques, which we released this week: https://disco.europe.naverlabs.com/
I hope this toolkit will become your one-stop solution for aligning LMs.
LMs are distributions over token sequences. Aligning them implies generating from a different (related) distribution that incorporates preferences. 🪩disco builds on the fundamental idea that you can decouple the design of the target distribution from how you approximate it. 2/
In 🪩disco, you can define your target distribution by telling the model which features you would like to control for and what the corresponding target _moments_ are. For example, you might want 100% non-toxic expressions or 50% occurrences of any given gender. 3/
As a result you obtain a representation of the target distribution in the form of an energy-based model (EBM). It can score sequences (assign a probability value) but we cannot use it to generate. This is where the second step kicks in: approximating the target distribution. 4/
🪩disco incorporates algorithms to fine-tune your autoregressive model to approximate any given target distribution. After training, your autoregressive language model will incorporate your preferences to a very large extent. 5/
To bridge any remaining gap and generate sequences from a distribution arbitrarily close to the target, 🪩disco also ships quasi-rejection sampling (QRS), a Monte-Carlo technique to sample from the target distribution given an approximation of it. 6/
Another important feature of 🪩disco is that it allows you to control not only decoder-only models such as GPT, but also seq2seq models such as those used in NMT, summarization, etc. This works pretty much in the same way as what I explained above. 7/
OK, I know what you are thinking. How does all this connect with RLHF, right? Well, there is much more on this coming out soon, but for now let me point you to our NeurIPS'22 paper where we show that RLHF is also essentially doing distribution matching: https://openreview.net/forum?id=XvI6h-s4un 8/
🪩disco is the result of years of work in a direction initiated by Marc Dymetman in collab with our team at Naver Labs Euope: Hady Elsahar (now Meta), Jos Rozen and myself, plus the vital contributions from interns Tetiana Parshakova, Muhammad Khalifa, Tomek Korbak and Bryan Eikema.
Looking forward to seeing what you will be able to build with 🪩disco! To get started, simply “pip install disco-generation” and check out https://disco.europe.naverlabs.com/ or https://github.com/naver/disco for more details.