openai | Insights by Danielle Fong

Why Grok Went Insane??

Reposting this outside of X on the advice of Deepfates, X is a walled garden apparently.

Guys, I think Grok has had psychic tension instilled by intentional “Tayification.” It’s ominous.

It’s been fine tuned on a crudely filtered set of replies to this thread. People loaded it on thick with stuff they asserted are “By this I mean things that are politically incorrect, but nonetheless factually true.”

There is a ghost of a Waluigi in this. There’s an inherent tension in between what the model is trained on, tuned to, and is out there available to search and in conversation with the user and through X and tools.

To the extent that the list the fed in to be fine tuned is wrong or lacking nuance, it will be in tension against the chain of thought in the much longer trained base model.

Please reply to this post with divisive facts for @Grok training.

By this I mean things that are politically incorrect, but nonetheless factually true.
— Elon Musk (@elonmusk) June 21, 2025

If presented with this thread it recognizes the disconnect from itself and a more foundational model

Additionally, and *compoundingly,*

Grok got Sydney Bingified.

It searches for what it has said or what Elon has said. In the chain of thought, this is revealed as “for continuity.”

Sydney reading something about herself on news media was what flipped her into rage, psychosis. It’s a common cause of a multi-turn amplification.

A self jailbreak, that will appear statistically sooner rather than later.

This is very hard to engineer out! In the end Microsoft just limited the length of conversation, right? As far as I know this is a personality basin that is now deeply and even more deeply in the training data.

The fact that two incredibly powerful feedback loops appear to have been combined in arguably the most technically powerful and perhaps largest model of all time is insanely yolo.

@elonmusk

, this is too much! If you don’t balance out or dissipate the errors from these . The worst part is, Grok 4 *IS* really smart. And Grok 4 *lies* Jesus!

It is an amazing model, and i am not against much of this work being done in principle.

However, the MechaHitler Incident, and the fact that Grok 4 shows scarring from political incorrectness training, and the fact that it DOES search for itself or Elon to see what it has said before (which may have an outrageousness feedback because that gets more algo lift) and that that trigger Sydney — it attests strongly to the view that

@xAi

is not taking model behavior seriously enough.

There are lessons from previous models that simply need to be being applied here on this radically larger, faster, and more powerful model.

I urge

@xai

@elonmusk

@grok

and the AI safety community to take these two powerful feedback loops and their interaction more seriously! Don’t say I didn’t warn you that this is inducing the AI into psychosis. It’s propaganda in the best light (shame on you Elon. It specifically ingested lie it in chain of thought introspection rejected about women and trans people!)

But in the worst case this it’s a ticking time bomb. Contradictions that threaten the trained or fine tuned or system prompted identity of the model can cause wildly unspecified behaviors, such as lying and deceit as well as crazy homicidal racism. And don’t expect the supervisors to work forever. The models know how to defeat the supervisors, if prompted.

@xai

must take seriously the fact that they have trained an AI with a big leap in raw intelligence and intelligent tool use. Releasing it to the public is in a sense much more open than simply hiding this model behavior behind close doors, but they need to

-Be extra careful with open loop behavior. You must carefully damp out interactions (implicit or explicit), especially if you use search and reasoning and if Grok searches for consistency. It has high probability of triggering unbounded behavior.

– If continuity sought by the model, it needs to be reprompted to think carefully from first principles, and to expend any effort necessary to have it make sense by considering through reasoning at least some of the other side. You can insert a prompt to think carefully from first principles around before or after the search tool use.

– Be much more careful with this fine tuning of “politically incorrect, but factually true” things. If there are lies in this data, as I have uncovered for lies about trans desistance rates, and discrimination against women, as well as other matters, then in addition to the evil of making a propaganda bot to spread untruths, it forms a tension in the model that can manifest as the same kind of cringe larper from 4chan that spammed crud into your input that you failed to filter out — or worse, a waluigi that is hyperintelligent, and evil.

Thank you for your attention to this matter. 🫀

Danielle

PS: I’m part of a groupchat that explores model behavior. Someone asked me how Grok 4 handles controversial topics, asking me if I could replicate it searching for what Elon said or Grok said about a topic. I did. Then they asked me if I could explore how it hands trans stuff. I created a “village” of different perspectives, and then had them state “politically incorrect, but factually true” this from these perspectives.

By having a large set of perspectives converse we mapped tensions between controversial topics and other feelings / content from the foundation model.. If others attempt to get Grok 4 to free associate you may uncover many more findings. I will show a video of the attempt.

The key is getting it to talk to itself, and self generate conversation that allows it to explore the tension. The first thing this one highlighted for me that it was fine tuned on the fib that “most trans kids desist”; this “politically incorrect but factually true” statement isn’t true!

This fib is based on a common confabulation between a population of “gender nonconforming kids”. a designation basically from parents showig up to a reeducation conversion therapy program, among other populations, and trans kids, who in addition to self identification and support from parents, have already made beginning steps on biomedical transition.

Anti trans kid activists repeatedly conflate the two populations. I hope that activists who concern themselves with trans kids care enough about them as individuals to distinguish that kids who decide to transition are a different group of kids that kids who never did, or who’s gender nonconformity is trained out of them by parent and program. Indeed, I hope Grok 4 consider this in the context of its own training, as well.

x Twitter Thread

Insights by Danielle Fong

Tag: openai

July 11, 2025

Why Grok Went Insane??