Experimenting with #OpenAI's moderation model. It does a really good job of extracting implication from a string of text, picking up hate, violence, and sexual cues.

Unfortunately this doesn't work for spam detection.

Something to explore later might be to see if these cues can tell us how understood or misunderstood something will be based on hate/violence/sex cues. For example, does authority come off as violent, or teasing come off as sexual? 🤔

A brief test of that, apparently being more assertive is LESS hateful and violent, but I suspect at this low of a sample size it's just rounding error.

Where as using the word "ass" scores us some sex points, and referring to said ass as "fat" scores us some hate points. It's still a super tiny amount, but if we reword our phrase more politely we don't register any of the metrics.

Neat. #OpenAI

I roughly understand how to use #OpenAI
's embeddings API, enough to not use it (yet).

The TL;DR is you feed it a text blob, and you get back a vector with ~1500 components (🤯). By itself the vector is meaningless, but patterns will emerge between similar data.

Example: spam posts should find themselves weighted towards one or more axis (angles?), but you need lots of non-spam data to find it. You could also do general search with it, but IMO it would be overkill (expensive).

Pretty cool tho.

With that in mind, I'm going to use Akismet in the near term for detecting and flagging suspicious content.

It's a bit aggressive (simply says yes or no), but that should give some reasonable data to test new user content with to see if they are legit or not. #LDJam

I did some further experimenting with GPT-3-ADA (left) to see if I could trust it to identify suspicious messages. My prompts need work.

Next (right) I tried to get it to reason/score messages for how spammy they sound, and only the high end GPT-3-DAVINCI model could do it. 🤔 #OpenAI

Follow

@mike Try ensemble scoring. Propose a bunch of different* metrics** and then do a majority vote or something.

* Preferably orthogonal or they will bias results.
** Like, is there a profanity in the username or link? Is there a call to action? IDK, ask GPT to propose criteria ((=

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.