AI Loophole #1; Your GitHub README.md

Elias Griffin@lemmy.world · edit-2 5 months ago

AI Loophole #1; Your GitHub README.md

AlexanderESmith@social.alexanderesmith.com · 5 months ago

you got some criticism and now you’re saying everyone else is a bot or has an agenda

Please look up ad hominem, and stop doing it. Yes, their responses are a distraction from the topic at hand, but so were the random posts calling OP paranoid. I’d have been on the defensive too.

[Our company] publish[s] open source work … anyone is free to use it for any purpose, AI training included

Great, I hope this makes the models better. But you made that decision. OP clearly didn’t. In fact, they attempted to use several methods to explicitly block it, and the model trainers did it anyway.

I think that the anti-AI hysteria is stupid virtue signaling for luddites

Many loudly outspoken figures against the use of stolen data for the training of generative models work in the tech industry, myself included (I’ve been in the industry for over two decades). We’re far from Luddites.

LLMs are here

I’ve heard this used as a justification for using them, and reasonable people can discuss the merits of the technology in various contexts. However, this is not a justification for defending the blatant theft of content to train the models.

whether or not they train on your random project isn’t going to affect them in any meaningful way

And yet, they did it while ignoring explicit instructions to the contrary.

there are more than enough fully open source works to train on

I agree, and model trainers should use that content, instead of whatever they happen to grab off every site they happen to scrape.

Better to have your work included so that the LLM can recommend it to people or answer questions about it

I agree if you give permission for model trainers to do so. That’s not what happened here.

bamboo@lemm.ee · 5 months ago

Why do you think they need your permission to use information you posted publicly to train their models? Copyright isn’t unlimited, and model training is probably fair use.

AlexanderESmith@social.alexanderesmith.com · 5 months ago

“Your honor, we can use whatever data we want because model training is probably fair use, or whatever”.

I don’t know what’s worse, the fact that you think creators don’t have the right to dictate how their works are used, or that you apparently have no idea what fair use is.

This might help; https://copyright.gov/fair-use/