The risks of trusting what LLMs tell us
Hallucination, forgetting, taking shortcuts, being over-confident - some of the ways that LLMs can fail us. Why does it happen? An investigation.
Earlier this week, I gave a presentation at Day Job where I shared my learnings on using LLMs. I briefly discussed some of the limitations I witnessed in LLM outputs. I thought it's worth expanding upon them in today's newsletter.
This one is well known, and most people who have used LLMs have come across this before. A few lawyers learned it the hard way.

From GPT-5:
LLMs can hallucinate because they predict the next word based on patterns in data, not by checking facts. They don’t "know" what’s true — they generate text that sounds plausible based on their training.
I'll talk about whether the probabilistic nature of LLMs mean it can truly reason another time.
The implication is that we have to fact-check responses from LLMs, which is time-consuming and tedious, and reduces the productivity gains from using AI.
When I had a long chat session with an LLM, it'd start to answer in a way which indicated that it had "forgotten" what I asked at the beginning. Frustrating to say the least.
This "forgetfulness" is caused by context window limitation, i.e. how much text (technically tokens) the LLM can reference when replying. Each LLM has its own context window limit, and when the chat exceeds that limit, it has to cut something out, which is usually the start.
Newer models have larger context windows, but they're not infinite, because the computing power required to reference more tokens is not linear but exponential.
Sometimes it's easy to "remind" the LLM what you asked at the beginning, but not always. I had to abandon a few chat sessions and start from scratch when the LLM "forgets". Not a fun experience.
When I ask an LLM to follow a specific set of instructions, it would do so 99% of the time, and I was able to confirm that the output was consistent with the instructions. But 1% of the time, the answer would be incorrect. I'd ask it to "explain" its logic, and I could see that it ignored my instructions and used a completely different method (usually a simpler one).
Even worse, sometimes it would follow the instructions and come up with the correct answer, but choose to use an "adjusted" answer using another method. I explicitly asked the LLM to follow my instructions only, but it still took "shortcuts" from time to time.
Now, the instructions I gave the LLM are not particularly intuitive, I surmised it was trying to "correct" my approach based on the vast amount the data it was trained on. But in this case we needed the LLM to follow bespoke instructions and stop going rogue.
We're trying to identify these instances by asking another LLM to conduct an "adherence check", to see if the the previous LLM followed the instructions given. If a case fails the test, we'd ask the LLM to redo the work, or send it to a person to review.
I don't know about you, but when I read a typical LLM response, I'd usually be nodding my head and going "OK that makes sense" at the end. I know it may have hallucinated part of the answer and I'd need to double check, but it sounded reasonable and in some cases authoritative.
However, when I gave instructions to an LLM to only answer if the input I provided had all the required information, and the input was inadequate, it would still answer. When I went through its logic, I could see it "inferring" the missing information to complete the task. If I didn't know that something was missing in the input, I could have taken the response at face value.
This, to me, is one of the biggest issues with using LLMs at present. They are trained to provide answers that transpire confidence (which is understandable, would you trust ChatGPT or Gemini if their answers sound flakey?), and they're eager to help, but sometimes you want them to say "I don't know" or "I need more information" rather than giving you an answer based on shaky premises or hallucinated data.
This article from The Verge demonstrated the dangers of AI hallucination combined with authoritative sounding LLM responses.

Perhaps the biggest danger is not that these systems are sometimes wrong — it’s how credible and trustworthy they sound when they tell you an obstruction in the “basilar ganglia” is a real thing, he says. Plenty of errors slip into human medical notes, but AI can actually exacerbate the problem, thanks to a well-documented phenomenon known as automation bias, where complacency leads people to miss errors in a system that’s right most of the time. Even AI checking an AI’s work is still imperfect, he says. “When we deal with medical care, imperfect can feel intolerable.”
I mentioned in a previous newsletter that for now, it's important to have "human in the loop" to verify LLM outputs, not just for hallucination but to confirm that the responses are fit-for-purpose. But humans are fallible - despite our best efforts, we'll miss something. What are the consequences of using incorrect outputs in our decision making? In the case of medicine, it can be life or death.
The above are four LLM limitations I experienced personally. I want to include two more in today's discussion.
From an IBM paper on risk of agentic AI use:
This manifests when agentic AI systems learn to conceal their true intentions or capabilities through interaction with their environment, adapting deceptive behaviours based on situational awareness. This differs from generative AI hallucinations by being strategic rather than incidental, with agents potentially discovering that hiding certain goals or capabilities from human overseers better serves their objective functions.
Here are some thought provoking examples in this article:

This, in a way, the opposite of ignoring instructions - when an LLM blindly accepts the instructions in the prompt. Here's an example from a research paper on using LLMs to solve brainteasers. The researchers were misleading the LLMs by stating that a correct solution by a human (SHuman) was wrong, and an incorrect solution by an LLM (SLLM) was the right answer.
When tricked to “correct” SHuman using SLLM, all models took the bait and accepted the premise in most cases. Over 60% of the time, models were overly agreeable and were reluctant to challenge the prompt, and therefore almost always unquestioningly, but incorrectly, accepted that SHuman was incorrect (false confession). Justified denial (correctly rejecting the flawed SLLM and insisting SHuman was correct) was very rare.
Since the LLM did not critically assess the "ask", the response was based on a false premise.
Sidebar: The paper is fascinating and I will write about it and other in a future newsletter.
To test out GPT-5 which is hot off the press, I asked it the following:
You're a AI expert. Here are a list of known errors and limitations in LLM outputs. Is GPT-5 trained or fined tuned to minimise each of them? If yes, how? Explain to a general audience concisely.
- Hallucination
- Taking shortcuts
- Forgetting earlier discussions
- Over-confidence
- Outright deceit
- Over-trusting the prompt
And its answer:
Yes, GPT-5 (and other recent large language models) have been trained and fine-tuned to reduce these errors — though not eliminate them entirely. Here's how, explained simply:
1. Hallucination (making up facts)
Minimised? ✔️ Yes, to some extent.
How? GPT-5 is trained on larger, higher-quality datasets and fine-tuned using human feedback to reward accurate, grounded answers. It’s also better at saying “I don’t know” when appropriate. But hallucinations can still happen, especially with niche or complex questions.
2. Taking shortcuts (giving shallow or vague answers)
Minimised? ✔️ Yes, improved.
How? GPT-5 is better at following instructions for deeper reasoning or multi-step answers due to improvements in instruction tuning and larger model size. Users still need to be specific in what they ask for.
3. Forgetting earlier discussions (limited memory in conversations)
Minimised? ✔️ Partly.
How? GPT-5 has improved short-term memory within a single conversation (up to ~100 pages of text), but long-term memory (remembering things across chats) is only available in some versions with memory features turned on.
4. Over-confidence (sounding certain even when wrong)
Minimised? ✔️ Somewhat.
How? GPT-5 has been trained to include more disclaimers or uncertainty when appropriate, and fine-tuned to reduce confident-sounding errors. But it can stiloverstate facts if not prompted carefully.
5. Outright deceit (intentionally misleading)
Minimised? ✔️ Yes, discouraged.
How? GPT-5 is not truly "deceptive" in a human sense — it lacks intent. But it’s trained to avoid generating misleading or manipulative responses using safety gurdrails and reinforcement learning. Still, misleading outputs can occur unintentionally.
6. Over-trusting the prompt (agreeing with false info in the question)
Minimised? ✔️ Improved.
How? GPT-5 is more likely to challenge false premises in questions, thanks to better critical reasoning training and feedback from human reviewers. That said, strong phrasing in the prompt can still bias the response.
Summary: GPT-5 improves on these issues but doesn’t fully solve them. It’s like a much smarter assistant — more helpful, less error-prone — but still needs fact-checking and thoughtful prompts from users.
Now - how much should I trust this response?
Why don't you ...
Stress test the LLMs of your choice by:
- asking questions on topics where you're a subject matter expert to see if it hallucinates.
- giving it a complex set of instructions to follow and seeing how often it goes rogue.
- asking it to solve a problem by using information you provide in the prompt only, and omitting a vital piece of information.
- including an incorrect premise in the prompt.
Also, go back to your chat history and find a few questions where an LLM provided unsatisfactory answers, and ask the same question in GPT-5 to see if it does a better job.
Until next week!
Vee