**Mike Kasprzak 🦖** @mike@jammer.social · Mar 27, 2023, 14:41

**Mike Kasprzak 🦖** @mike@jammer.social · Mar 27, 2023, 14:41

Mike Kasprzak 🦖 @mike@jammer.social

Mar 27, 2023, 14:41

Experimenting with #OpenAI's moderation model. It does a really good job of extracting implication from a string of text, picking up hate, violence, and sexual cues.

Unfortunately this doesn't work for spam detection.

Something to explore later might be to see if these cues can tell us how understood or misunderstood something will be based on hate/violence/sex cues. For example, does authority come off as violent, or teasing come off as sexual? 🤔

**Mike Kasprzak 🦖** @mike@jammer.social · Mar 27, 2023, 15:01

**Mike Kasprzak 🦖** @mike@jammer.social · Mar 27, 2023, 15:01

Mar 27, 2023, 15:01

Mike Kasprzak 🦖 @mike@jammer.social

A brief test of that, apparently being more assertive is LESS hateful and violent, but I suspect at this low of a sample size it's just rounding error.

Where as using the word "ass" scores us some sex points, and referring to said ass as "fat" scores us some hate points. It's still a super tiny amount, but if we reword our phrase more politely we don't register any of the metrics.

Neat. #OpenAI

**Mike Kasprzak 🦖** @mike@jammer.social · Mar 27, 2023, 16:21

**Mike Kasprzak 🦖** @mike@jammer.social · Mar 27, 2023, 16:21

Mar 27, 2023, 16:21

Mike Kasprzak 🦖 @mike@jammer.social

I roughly understand how to use #OpenAI
's embeddings API, enough to not use it (yet).

The TL;DR is you feed it a text blob, and you get back a vector with ~1500 components (🤯). By itself the vector is meaningless, but patterns will emerge between similar data.

Example: spam posts should find themselves weighted towards one or more axis (angles?), but you need lots of non-spam data to find it. You could also do general search with it, but IMO it would be overkill (expensive).

Pretty cool tho.

**Mike Kasprzak 🦖** @mike@jammer.social · Mar 27, 2023, 16:29

**Mike Kasprzak 🦖** @mike@jammer.social · Mar 27, 2023, 16:29

Mar 27, 2023, 16:29

Mike Kasprzak 🦖 @mike@jammer.social

With that in mind, I'm going to use Akismet in the near term for detecting and flagging suspicious content.

It's a bit aggressive (simply says yes or no), but that should give some reasonable data to test new user content with to see if they are legit or not. #LDJam

**Mike Kasprzak 🦖** @mike@jammer.social · Mar 27, 2023, 18:01

**Mike Kasprzak 🦖** @mike@jammer.social · Mar 27, 2023, 18:01

Mar 27, 2023, 18:01

Mike Kasprzak 🦖 @mike@jammer.social

I did some further experimenting with GPT-3-ADA (left) to see if I could trust it to identify suspicious messages. My prompts need work.

Next (right) I tried to get it to reason/score messages for how spammy they sound, and only the high end GPT-3-DAVINCI model could do it. 🤔 #OpenAI

**l'empathie mécanique** @dpwiz@qoto.org · 2023-03-27T19:30:55Z

l'empathie mécanique @dpwiz@qoto.org

@mike Try ensemble scoring. Propose a bunch of different* metrics** and then do a majority vote or something.

* Preferably orthogonal or they will bias results.
** Like, is there a profanity in the username or link? Is there a call to action? IDK, ask GPT to propose criteria ((=

Mar 27, 2023, 19:30 · · · ·

Resources

Developers

What is Mastodon?

qoto.org

More…