**Ben Adida** @ben@adida.net · Feb 01, 2023, 03:13

**Ben Adida** @ben@adida.net · Feb 01, 2023, 03:13

Ben Adida @ben@adida.net

Feb 01, 2023, 03:13

Ben Adida @ben@adida.net

So OpenAI just released a detector of AI-generated text, I assume because of concerns in education / homework.

https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text/

Maybe this is good?

No, it's very bad.

They claim 26% true positives, 9% false positives. Assume 10% of submitted homework is chatgpt generated, you get the classic counterintuitive outcome of poor predictive power: if a homework is flagged, there's a 3:1 chance it's *human* generated.

This is going to cause a lot of harm. It should be immediately recalled.

**Paul Sutton** @zleap@qoto.org · Feb 01, 2023, 08:27

**Paul Sutton** @zleap@qoto.org · Feb 01, 2023, 08:27

Feb 01, 2023, 08:27

Paul Sutton @zleap@qoto.org

@ben

How does plagiarism detection software work normally?

**JD Long ✅** @Cmastication@mastodon.social · Feb 01, 2023, 11:25

**JD Long ✅** @Cmastication@mastodon.social · Feb 01, 2023, 11:25

Feb 01, 2023, 11:25

JD Long ✅ @Cmastication@mastodon.social

@zleap @ben normally it matches literal strings. e.g. “these two sentences came from this source”

**Shriram Krishnamurthi** @shriramk@mastodon.social · Feb 01, 2023, 12:24

**Shriram Krishnamurthi** @shriramk@mastodon.social · Feb 01, 2023, 12:24

Feb 01, 2023, 12:24

Shriram Krishnamurthi @shriramk@mastodon.social

@Cmastication @zleap @ben That's not how MOSS, the most widely-used checker, works.

**JD Long ✅** @Cmastication@mastodon.social · Feb 01, 2023, 12:38

**JD Long ✅** @Cmastication@mastodon.social · Feb 01, 2023, 12:38

Feb 01, 2023, 12:38

JD Long ✅ @Cmastication@mastodon.social

@shriramk @zleap @ben looks like MOSS uses fingerprinting which is a computationally efficient way to find white space invariant string matches.

https://yangdanny97.github.io/blog/2019/05/03/MOSS

266443ad86a86c7b.jpeg

**Paul Sutton** @zleap@qoto.org · 2023-02-01T12:44:31Z

Paul Sutton @zleap@qoto.org

@Cmastication @shriramk @ben

Sounds interesting, so from this I would guess it would look at a string such as

The cat sat on the mat., or The quick brown fox jumps over the lazy dog. From these generate a hash, is this similar to how say md5sum works, as I could write one of the above in a text file, save and generate a md5sum from that, this would be unqie, if you want back in and changed a lower case letter to upper case, or added a comma, it would change the file, and the md5sum would be different. We could then compare the two checksums to see if they match or don't match.

Or am I completely off track here. Given I am not remotely an expert in this.

Feb 01, 2023, 12:44 · · · ·

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…