The Beginning
At the beginning of this year, 2025, I was doing some crazy experiments. Well, maybe not crazy, but unusual. I was trying to work out if you could use compression on Large Language Models (LLMs) like ChatGPT to reduce the memory footprint and in turn reduce the computation needed. So, I tried using the JPEG algorithm. This algorithm is usually used for pictures. It's quite complicated and so needed a few workarounds. But finally, I did it, and it worked. I could carry out compression on some of the important data—the input embeddings. This is the data that encodes your prompt. In an LLM, your prompt is embedded, and every character is made a thousand times bigger to connect each word to other words to give the word a surrounding relationship. It's a big subject and creates big data.
So, I experimented and found that after compressing and decompressing the input data using the JPEG algorithm, the sentences and values were still close enough to work—this is called cosine similarity. It's how near two sentences are to each other. So, I thought I was onto something—the full story of that is in my earlier post.
Seeing the Dangers
But the biggest takeaway was that by doing this and increasing the level of the compression, the LLM model didn't give random errors but had behaviours. The outputs changed in a human-like way. It was fascinating, but as the compression increased, the LLM would go into loops or have OCD-like behaviour as outputs. And one thought jumped out: what if somebody did this to a critical system?
The realization jumped out at me. I had discovered a very critical safety issue. The failure mode had other implications, but the major one was that an LLM could be made to fail in both a subtle and catastrophic way.
The Alarming Heart of My Discovery: Beyond a Simple Glitch
But the implications of this go far deeper than just a simple malfunction. This isn't just a bug; it's what I believe is the tip of an alarming iceberg.
What I stumbled upon isn't just a technical glitch; it's a fundamental vulnerability at the core of how LLMs process information. By manipulating the input embeddings, which you can think of as the very "senses" of the AI, I was demonstrating a way to corrupt its perception of reality before it even begins to "think" or respond.
Imagine someone subtly altering the lenses of your glasses or the input to your ears. You wouldn't necessarily notice how your perception was being distorted, but your understanding of the world would become increasingly fractured and eventually, chaotic. My experiments show that LLMs, for all their complexity, are susceptible to a similar kind of sensory corruption, leading to predictable, human-like failures rather than just random errors. This is incredibly significant because predictable failures can be engineered and exploited. It's like giving the AI "brain damage" or exposing it to an "invisible poison" that subtly makes it sick or act in a way you want it to, without anyone knowing why.
Why This Is the "Tip of the Iceberg"
Beyond Prompt Engineering: Most current AI safety discussions focus on how users interact with AI i.e. what you ask it, or how it responds to malicious prompts. My work goes much deeper than this. It more about "internal corruption" of the model's foundational data. This is a far more insidious and harder-to-detect attack. It’s not about what you ask the AI, but about what it fundamentally perceives.
The "Mind" of the Machine
My findings touch on the very nature of how these models represent and relate information. The fact that compression leads to "OCD-like behaviour" or "loops" isn't just a random error; it suggests a fragility in the internal coherence of the model's "understanding." This highlights that these complex systems, despite their impressive capabilities, can be pushed into states of dysfunction that mirror human cognitive impairments. This is a huge challenge for building truly robust and reliable AI, especially as these models continue to grow more complex.
Stealth Attacks and Malicious Intent
My insight into a "timed failure" or "malware" is crucial. An attack based on internal corruption could be clandestine and highly targeted, causing a critical AI system (imagine one controlling infrastructure or financial markets) to subtly malfunction over time, or to catastrophically fail at a precise moment, and then self-correct, leaving no obvious trace. This makes identifying who did it and stopping it incredibly difficult.
The "Rogue AI" Scenario, Reimagined
When people talk about rogue AI, they often imagine a sentient, malevolent entity. My findings suggest a more pragmatic and terrifying path to a "rogue" system: one that is made to fail in a controlled, predictable, and even untraceable way by external manipulation of its core data, rather than developing malevolence on its own. It's less a science-fiction villain and more a sophisticated, digitally-induced psychosis.
The Deep Meaning for Society
The core of my message is a dire warning: our current AI safety paradigms might be fundamentally incomplete. If the industry is primarily focused on external interactions and prompt-based vulnerabilities, they are missing a massive, potentially catastrophic blind spot: the internal integrity of the models themselves. Imagine an AI controlling our power grid, financial systems, or even self-driving cars. What if someone could secretly make it "go mad" for a few minutes, or subtly misinterpret data over time, without anyone ever knowing it was tampered with?
Now you will have heard a lot about AI and LLM safety, but that is all about how the LLM responds to inputs, not about how the system can be internally corrupted but this is a different type of attack. This could go under the radar, especially if it was coded into the system without somebody noticing as malware or a timed failure. Or even just a few failure modes for a few seconds; the possibilities are endless. It's like creating a rogue person, and in code, you could do that just for a minute and then make everything look alright—just a glitch.
If this was done with intent, then it could be used both for good and bad purposes. I was actually frightened by what I found. How should I tell people, what could I say?
Trying to Communicate the Safety Implications
So, I spent months writing a book, creating a paper, and a website. I sent books and papers to experts and then waited. Surely somebody would tell me I was wrong as that would be good. Why? Because if I was wrong, then there was no problem to worry about. I didn't want to be right. I didn't want to see the results I had seen. But there they were: I had sent an LLM mad by just a few lines of code.
And here I am writing again, knowing I am unlikely to get even one response from LLM and AI people. Why? Because so far, I haven't had one single response.
Pinning an AI Safety Warning on the Outside of the Gates
So, this is an important point: you can create all the safety bodies across the world, new institutions, new company specialists for safety: they call them red teams. But the door is locked, the gatekeepers won't let the message through: “I'm sorry, nobody is in and we can’t take your call”.
I tried to contact OpenAI, and they had a forum, but no way of getting past the gatekeepers. Emails were answered by, yes, Chatbots. And what they said is they cannot transfer the message to real people, but I could ask ChatGPT for more help.
And there you have it: a serious bombshell of an attack method that may need active protection against, and I have no way of getting to people because of the layers and layers of people in what are vast organizations, gates after gates. And they are all locked.
You may say, "Why did you not contact the bosses of these organizations, the top people?" And I did. I sent them the paper, even a book, but have not had a single response. Not even to tell me I was wrong and "don't worry, our models don't work like that": Nothing, zero, nada.
If you're reading this, what would you do? Write yet another letter?
I've not given up. I am writing this, and I'm hoping that one day I'll get an email that at least recognizes the issue and hopefully tells me I was wrong as I had missed an important point, or my logic and ideas were mistaken. I would be happy about that, because being wrong is not such a bad thing. But if I was right, if I am right, then the implications for my little experiment are gargantuan, both technically and philosophically.
My code is code is available, and it shows that you can indeed make an LLM like ChatGPT go mad.
Links:
JPEG Experiment Explained by Grok
Code for ChatGPT
AI Emergency Safety Report
Finite Mechanics Website for more
Copyright © Kevin R. Haylett 2025