I think #ChatGPT is starting to make sense to me. I just saw someone ask a question about how to do something on their computer and it mentioned what tool, what steps, and what options. The parser determined the intent, mapped a path from the start state to the end state, and replayed the steps that cause transformations in English. But it wasn't just 1 end state: it found multiple common end states. It's first answer was just the most popular and simplest. This is the best wayfinding system.

The end states seem to be determined based upon what people say they do. (So, part of #ChatGPT is still a good search engine.) I don't know if this is #RLHF, but it seems to follow the pattern of "if user wants X here are steps A, B, C. Here are other common X', X'', X'''."

What's more impressive to me is that then I change the reward function to use language I like to use. I told it to refer to human annotators as "language sommaliers" and the process of generating good output as "spicy autocomplete" and it happily complied. It's like map navigation where I added an extra stops to the route while telling it I like to call freeways "speedy zones".

I'm feeling like this will change search engine optimization forever. In the before-GPT world, #SEO experts manually determined keywords, descriptions, and #microformats. They had to create machine-parseable URLs and taxonomies. In the post-GPT world, so long as your pages are linked together and you have high-quality text (minimal ads, high subject relevance), the parser should be able to categorize your site accurately. Then you show up on the map of rewards.

Does this change #SEO or make a whole new category: Machine Learning Web Optimization? #MLWO ? Does it reward good content, less ads, good taxonomy, deep linking? Does it imply that people with a great grasp of language (i.e. teachers, librarians) become more important? #ml

@amster LLM manipulation means getting your text into the bot's training text. SEO people should be looking closely at how decisions are made about what texts are included and what are ignored. The LLM makes it's own relevance "decisions" so fake backlinks won't work. An obvious exploit is to get into wikipedia because ChatGPT trusts wikipedia more than any other source. (it gets more weight than any other text)

@mistersql Even if Wikipedia articles are in the training text, does #RLHF imply that a human annotator has still taken a look? As in supervised learning? #WeekendCuriosity #AI #ML #ChatGPT #LLM

Follow

@amster @mistersql

Not supervised in that sense: the fine-tuning through RLHF reinforces features of a chat-like conversation. You can't do supervised learning at that scale, but you can emphasize specific patterns (or "behaviour") through the few-shot, or even zero-shot learning abilities of the pre-trained model.

There is an element like supervised learning in the original training: when the task is to predict the next word, you know what that word is because it's in the training data.

🙂

@boris_steipe @amster This guy was talking about the strategy of using human trained models to train models. I've no idea what was done for ChatGPT in specific.

youtube.com/watch?v=viJt_DXTfw

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.