https://airc.nist.gov/docs/NIST.AI.100-4.SyntheticContent.ipd.pdf
A branch of the U.S. Department of Commerce has shown up with hot "AI" takes. Some of them are awful. In fairness, this part of it is said to be underfunded and undermanned (they also start out by saying these aren't necessarily endorsements or recommendations in a little note).
"Mitigating the production and dissemination of AI generated child sexual abuse material (CSAM) and non-consensual intimate imagery (NCII) of real individuals."
Well, it's nice to know they have presumably scoped themselves to only tackling actual problems, and not completely imaginary ones too.
"even when synthetic NCII and/or CSAM does not depict or appear to depict, real individuals" (page 7)
I think that using terms like "NCII" like this is very problematic. The NC in NCII refers to the consent of whoever is in the thing, not whether an outsourced contractor off in Kenya reckons it might be non-consensual. Isn't it moving the goalposts too? Instead of calling something a false positive, which it is, change the definition in a slimy way? #FreeSpeech
This argument is also fundamentally flawed, and I think abusable enough that it is worth responding to this consultation (the other tries too hard to justify censorship and offers up broad and abusable language). This is actually the only time this comes up in this document, even if you could argue that some of the ideas in this document are blunt instruments (for instance, in one later page, they admit that consent is relevant for NCII).
"Comments on NIST AI 100-4 may be sent electronically to NIST-AI-100-4@nist.gov with “Comment on NIST AI 100-4” in the subject line or via www.regulations.gov (enter NIST-2024-0001 in the search field.) Comments containing information in response to this notice must be received on or before June 2, 2024, at 11:59 PM Eastern Time." (from page 3).
Also, my new porn science post: https://qoto.org/@olives/112362450620045294
Page 9. While I could see it being possible for someone to disclose that a professionally produced textual piece involved "AI", it would be silly to expect everything that has been written using it to do so, nor is it technically possible to do so.
Page 12. I'm not sure it is good idea to add copyright enforcement metadata here.
I think that anything someone does with "AI" here is unlikely to be useful against a sophisticated state actor, especially the most obvious ones.
"people with disabilities and those with limited language skills regularly using generative AI to create content may be discriminated against if the content they publish on platforms is labeled as AI-generated"
Interesting, although the document fails to cover other risks to free expression.
Page 20. What about the context around the "terrorist" and "extremist" content? Also, I think the government should consider whether their ideas chill free expression prior to proposing them, in line with the values which underlie the #FirstAmendment. *Facebook does something* means nothing when Facebook is one of the platforms most notorious for suppressing vast swathes of legitimate expression.
Page 22. The document points out that some (or more?) metadata schemes have privacy issues.
For "provenance", it might be a better idea to have an optional additional metadata file along with the main file, rather than trying to be "smart" about it (violates the KISS principle). There are strong vibes of over-engineering here.
Page 28. Fake faces are easy to distinguish, apparently.
Page 32. "is being debated" is an under-statement. Detecting text is known to be completely unreliable.
Page 35. It's silly to think an algorithm can necessarily determine intent.
Page 39. Assuming that humans won't take a dodgy result at face value is really expecting too much from them.
Page 42. "keywords" have a high false positive rate (there have been many issues in the past, including even PornHub of all sites wrongly accusing people of looking for *actual* child porn at very high rates). This can be partially alleviated by having more dedicated models for different things but it can still be troublesome. This page also presumes that "sexual content" is harmful which is not necessarily the case.
Page 46. Likely exaggerations of the harms / influence of potential inputs in data sets.
While you might be able to interpret it more narrowly, if you squint at it, I wouldn't count on people reading that document doing so.