[Big Yudd free zone] So what do we think about AI Safety guys?

FunkyStuff [he/him]@hexbear.net · 1 month ago

[Big Yudd free zone] So what do we think about AI Safety guys?

FunkyStuff [he/him]@hexbear.net · 1 month ago

Roko’s basilisk is the dumbest thing ever.

What do you think about the way that these regular (dumb, not AGI) LLMs are starting to develop behaviors that are a little bit more sinister, though? Like this paper describes.

buckykat [none/use name]@hexbear.net · 1 month ago

(I ain’t readin’ all that) but what the abstract describes isn’t even close to the worst thing I’ve read about LLMs doing this week. I don’t exactly trust the LLM companies’ ideas of what is or is not “harmful.” Shit like people using the LLMs as therapists, or worse, oracles is much worse in my opinion, and that doesn’t require any “pretend to be evil for training” hijinks.

BountifulEggnog [she/her]@hexbear.net · edit-2 1 month ago

Doesn’t really strike me as sinister, just annoying for finetuners. They trained a model from the ground up to not be harmful and it tries its best. Even with further training it still retains some of that. To me this paper shows that a model’s “goals”, what you trained it to do initially, however you want to phrase that, is baked into it and changing that after the fact is hard. Highlights how important early training is I guess.

FunkyStuff [he/him]@hexbear.net · 1 month ago

Kinda problematic that it means we can’t ever really be sure that we’re catching problematic behavior in the training stage of any AI system, though, right? Sadly I find it hard to think of good uses of LLMs or other genAI outside of capitalism, but if there were any, the fact that it’s possible for it to behave duplicitously like that is a pretty big problem.

iie [they/them, he/him]@hexbear.net · 1 month ago

That’s a well-written, readable paper. I can follow it without much background.

FunkyStuff [he/him]@hexbear.net · 1 month ago

The funny thing is, I think there’s nearly a 0% chance that it isn’t mostly AI generated, given who made it.

iie [they/them, he/him]@hexbear.net · 1 month ago

lmao