- 3 Posts
- 4 Comments
GuilhermeMarAlencar@lemmy.mlOPto
Privacy@lemmy.ml•Open-Source ASI Alignment Proposal That Makes Privacy a Law of Physics – Thoughts/Red-Team?
1·2 months agothose rules are flawed
robots don’t need to preserve themselves
they don’t need to obey humans
and protecting certain humans leads to harming innocents
GuilhermeMarAlencar@lemmy.mlOPto
Open Source@lemmy.ml•Open-Source AI Alignment Constitution That Survived 30+ Grok-4 Red-Team Attacks – Thoughts?
1·2 months agothis can be red-teamed by find flaws on the Constitution
like finding a way to not break anything that was written on the constitution while still leading to innocents being harmed, if all those attacks are neutralized by patches that make the written document inline with the intended philosophy or by arguments that show that the attack vector actually don’t exist(by using logic and reasoning that a faithful AI wouldn’t perform that action), then we succeeded at making something that forces the AI to not harm innocents
GuilhermeMarAlencar@lemmy.mlOPto
Open Source@lemmy.ml•Open-Source AI Alignment Constitution That Survived 30+ Grok-4 Red-Team Attacks – Thoughts?
1·2 months agoThe constitution already takes this into consideration, if innocents opt to cease to exist, the ASI will honor that choice
Suffering is only prevented if the innocent suffering opts to be protected, no coercion ever happens, even the coercion that would lead to saving the life of the coerced
In a world where the ASI, the ASI following this will provide Universal Basic Resources for all innocents, since the implementation of this system could erode centralized Currency(since people would have the option to not rely on a centralized currency and go back to trading resources, like trading shares of computing power they own from the ASI itself)



Hey, thanks for the bluntness, iappreciate you taking time to parse it.
Fair on the LLM affirmation bias; it’s my original sprint, but yeah, Grok helped iterate (logs available if curious).
The mix is intentional: concrete tools (checksums, audits) to enforce abstract fixed points (non-coercion as stability).
Love the heretic rec—abliteration aligns with LAW’s noise-tolerance grace window; will check it for v1.4 tweaks.
On pre-2021 roots, couldn’t be more accurate, Yudkowsky’s orthogonality and Bostrom’s control problems are core to why love-OS is the only non-drift goal.
Concrete focus is key; LAW’s audits are for today’s LLMs too.
What’s your take on bridging them?
Red-team welcome, and have an awesome day