I think #ChatGPT is starting to make sense to me. I just saw someone ask a question about how to do something on their computer and it mentioned what tool, what steps, and what options. The parser determined the intent, mapped a path from the start state to the end state, and replayed the steps that cause transformations in English. But it wasn't just 1 end state: it found multiple common end states. It's first answer was just the most popular and simplest. This is the best wayfinding system.
What's more impressive to me is that then I change the reward function to use language I like to use. I told it to refer to human annotators as "language sommaliers" and the process of generating good output as "spicy autocomplete" and it happily complied. It's like map navigation where I added an extra stops to the route while telling it I like to call freeways "speedy zones".
I'm feeling like this will change search engine optimization forever. In the before-GPT world, #SEO experts manually determined keywords, descriptions, and #microformats. They had to create machine-parseable URLs and taxonomies. In the post-GPT world, so long as your pages are linked together and you have high-quality text (minimal ads, high subject relevance), the parser should be able to categorize your site accurately. Then you show up on the map of rewards.
@amster LLM manipulation means getting your text into the bot's training text. SEO people should be looking closely at how decisions are made about what texts are included and what are ignored. The LLM makes it's own relevance "decisions" so fake backlinks won't work. An obvious exploit is to get into wikipedia because ChatGPT trusts wikipedia more than any other source. (it gets more weight than any other text)
Not supervised in that sense: the fine-tuning through RLHF reinforces features of a chat-like conversation. You can't do supervised learning at that scale, but you can emphasize specific patterns (or "behaviour") through the few-shot, or even zero-shot learning abilities of the pre-trained model.
There is an element like supervised learning in the original training: when the task is to predict the next word, you know what that word is because it's in the training data.
🙂
@boris_steipe @amster This guy was talking about the strategy of using human trained models to train models. I've no idea what was done for ChatGPT in specific.
https://www.youtube.com/watch?v=viJt_DXTfwA&t=240s