Like many other technologists, I gave my time and expertise for free to #StackOverflow because the content was licensed CC-BY-SA - meaning that it was a public good. It brought me joy to help people figure out why their #ASR code wasn't working, or assist with a #CUDA bug.

Now that a deal has been struck with #OpenAI to scrape all the questions and answers in Stack Overflow, to train #GenerativeAI models, like #LLMs, without attribution to authors (as required under the CC-BY-SA license under which Stack Overflow content is licensed), to be sold back to us (the SA clause requires derivative works to be shared under the same license), I have issued a Data Deletion request to Stack Overflow to disassociate my username from my Stack Overflow username, and am closing my account, just like I did with Reddit, Inc.

policies.stackoverflow.co/data

The data I helped create is going to be bundled in an #LLM and sold back to me.

In a single move, Stack Overflow has alienated its community - which is also its main source of competitive advantage, in exchange for token lucre.

Stack Exchange, Stack Overflow's former instantiation, used to fulfill a psychological contract - help others out when you can, for the expectation that others may in turn assist you in the future. Now it's not an exchange, it's #enshittification.

Programmers now join artists and copywriters, whose works have been snaffled up to create #GenAI solutions.

The silver lining I see is that once OpenAI creates LLMs that generate code - like Microsoft has done with Copilot on GitHub - where will they go to get help with the bugs that the generative AI models introduce, particularly, given the recent GitClear report, of the "downward pressure on code quality" caused by these tools?

While this is just one more example of #enshittification, it's also a salient lesson for #DevRel folks - if your community is your source of advantage, don't upset them.

@KathyReid

Simple question. Would it be allowed under the license terms to scrape the entire Stack Overflow dataset and recreate the entire thing as a fedi app?

Follow

@mastodonmigration @KathyReid to the ignorant that I am, this sounds legally feasible, but economically challenging. That's a lot of content and maintaining it on a live platform is associated with notable costs. Not a reason to stop someone from trying, rather an argument to plan ahead.

Here's a starting point: kaggle.com/datasets/stackoverf

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.