"Language models' outputs are fabrications, even if they seem real"

This title was summarized by AI from the post below.

10mo

“All outputs are hallucinations i.e. fabricated and ungrounded. Many of these outputs happen to match reality when there’s abundant training data and repetition” Yeah that really nails what it is like working with language models. Very insightful post.

Simon Wardley

10mo Edited

Going through another AI horror story. These tools are great but please remember the following when you are using LLMs/LMMs :- 1) All outputs are hallucinations i.e. fabricated and ungrounded. Many of these outputs happen to match reality when there’s abundant training data and repetition, so they look useful on common tasks. But they cannot do research. These machines are stochastic parrots (Bender et al), they are pattern matchers and not reasoning engines. 2) These systems will happily invent plausible seeming but unverified detail. That’s a design feature not a bug, they are optimised for coherence, not truth. 3) These systems do not understand what they are creating. The use of tools and guardrails is mostly to convince you of their correctness and to hide their inner workings, they are about shaping perception and behaviour, not true comprehension. Yes, guardrails also reduce some classes of harm. 4) These problems are not with the user and their prompting. Stop blaming users for what are design flaws and systematic issues. 5) You cannot "swarm" your way out of these problems. Orchestration doesn’t solve fundamental epistemic limits. However, these systems (including agentic swarms) are extremely useful in the right context and are excellent for creating hypotheses (which then need to be tested). 6) These systems can output long, convincing “scientific” documents full of fabricated metrics, invented methods, and impossible conditions without flagging uncertainty. They cannot be trusted for policy, healthcare, or serious research, because they are far too willing to blur fact and fiction. 7) These systems can and should be used only as a drafting assistant (structuring notes, summarising papers) with all outputs fact-checked by humans that are capable in the field. Think of these systems as a calculator that sometimes “hallucinates” numbers - it should never be blindly trusted to do your tax return. 8) The persuasive but false outputs can cause real harm. These systems are highly persuasive and are designed to be this - hence coherence, the appearance of "helpfulness" and the use of authoritative language. 9) Being trained on market data, these systems exhibit large biases towards market benefit rather than societal benefit. Think of it like a little Ayn Rand on your shoulder whispering sovereign individual Kool-aid. In other words, the optimisation leans toward market benefit, not necessarily public good. So, yes ... these tools are great fun and can be useful. But apply critical thinking always. Review the output in detail. -- Appendix Many use the term hallucination as "error from reality". This implies that the LLM/LMM reasons its way to the correct answers. I take a position that all output is "hallucinated" and sometimes that output matches reality where we have lots of training data and narrow contexts. I feel this fairly reflects the more statistical nature of LLM/LMMs as we haven't built reasoning engines ... yet.

To view or add a comment, sign in

More Relevant Posts

Ivan Paudice
8mo Edited
Report this post
𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗜𝘀 𝗘𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴. Mark my words: Context is everything. You've felt this, right? AI is brilliant at firing off a quick email response or summarizing a meeting. Sharp, fast, helpful. Then you try something deeper—analysing your Q3 strategy, drafting that complex proposal, working through a business model—and it suddenly becomes useless. Gives you generic nonsense. Forgets what you said three prompts ago. Contradicts itself. You probably blamed the AI. I did too. Here's what I learned: it's not a limit of AI. It's how we use AI. Mark my words: Context is everything. 𝗧𝗵𝗲 𝗣𝗿𝗼𝗺𝗽𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗧𝗿𝗮𝗽 For months, I was convinced prompt engineering was the answer. Get the prompt right, get the output right. I read the guides. I tried the frameworks. I tortured my prompts into these elaborate instructions. Sometimes it worked. Mostly it didn't. Then I noticed something: the LLM giants—OpenAI, Anthropic, Google—they're all moving in the same direction. Not better prompts. Two things: agentic workflows and context management. That's when it clicked. The game isn't about asking better. It's about giving AI what it needs to think with you, not just respond to you. 𝗪𝗵𝗮𝘁 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗔𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝗠𝗲𝗮𝗻𝘀 AI agents have massive context windows—170,000 tokens, enough for hundreds of pages. But once you fill more than 40% of that space, performance falls off a cliff. And you'd be surprised how fast you can burn through it. I was burning through context without realizing it. Every time I said "no, try again" or "that's not what I meant," I was filling up the tank. Three rounds of back-and-forth, and the AI was already drowning in noise. Here's the kicker: when you add a file to project memory in GPT, Claude, or Grok, that entire file gets called with every single message. Add 2-3 files, exchange 10 messages, and you're done. When you hit 40%, you stop. Extract what matters. Start fresh. That single shift cut my frustration in half. 𝗧𝗵𝗲 𝗛𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝘆 𝗼𝗳 𝗔𝗜 𝗙𝗮𝗶𝗹𝘂𝗿𝗲? • A vague prompt creates bad output. • A bad brief creates ten bad deliverables. • A misunderstood context creates an entire project going sideways. Fix problems upstream, not downstream. I'm not debugging outputs anymore. I'm debugging inputs. The teams winning with AI aren't the ones with the most sophisticated prompts. They're the ones who figured out context management first. That's a different skill than what got most of us here. And I'm convinced it's the skill that matters now. ----- Wrote a longer version on my blog if you want the technical details and what I'm seeing work in practice: [https://lnkd.in/dRTqYkmt]
1 Comment
Like Comment
To view or add a comment, sign in
Anna Jacobi
9mo
Report this post
The biggest risks in AI are not technical. They are psychological. AI is often framed as an engineering race: bigger models, more compute, tighter evals. But the hardest risks are already in our heads. Decades of psychology uncovered truths about human behavior, and every one of them is surfacing in AI systems. 1. Minds lie. People rationalize after the fact. AI systems are built the same way. We see a polished demo, then invent logic to explain it. This is why agent-first design matters. Define the job, inputs, and handoffs before adding a model—or illusions become infrastructure. 2. Emotions rule. Logic rarely drives adoption. Trust, fear, and belonging do. AI parallels this: accuracy does not equal trust. A system can be statistically strong yet collapse in the real world if users do not feel safe relying on it. Robustness, drift, and recovery matter more than leaderboard scores. 3. Environment shapes behavior. Humans mirror their inputs. Models do too. Drop AI into broken pipelines and you get scaled dysfunction. That is why tooling is not an accessory. Tooling is the product: monitoring, rollback, and versioning are what create trust. 4. Growth requires discomfort. We avoid pain, yet progress requires it. Infrastructure is that discomfort in AI. Compute per unit may fall, but costs for energy, water, and reliability are rising. These are real choke points, and whoever controls them holds leverage. Infrastructure is the moat. 5. Beliefs create reality. Limiting beliefs shrink possibilities. High-quality beliefs expand them. The same holds in AI. Domain agents outperform generic copilots because they map to real workflows, real data, and real success metrics. 6. Code is not product. Belief in speed is not enough. Yes, 90% of software professionals now use AI tools daily. Yes, a chat app produced 11,000 lines of code in 30 hours. But most of that code never clears QA, review, or deployment. Generation is cheap. Integration is the work. The pattern is clear. Mind and machine distort in the same ways. Both prefer coherence over correctness. Both hide costs behind comforting stories. If we fail to account for that, we will build systems that inherit our blind spots at scale. The danger is not AI replacing us. The danger is us building AI that repeats our worst habits faster and at larger scale. Code is easy. Code is not production. The real work is building what people can trust. #AI #ProductManagement #Psychology #Infrastructure #Trust #Agents #Data #Strategy #FutureOfWork
1 Comment
Like Comment
To view or add a comment, sign in
Stefan Wirth
8mo
Report this post
People laughed when Dario Amodei said 90% of code will be written by AI. People are still laughing because they think it's not true, but it is for a tiny % of people like me that have spent hundreds of hours to invest into the tools and mindset. And now this small group of people gets to reap all the rewards, while people that tried AI 3 months ago with a sloppy prompt and shitty context keep dismissing it. Here's what's actually happening: The gap isn't between "humans" and "AI." It's between people who invested time learning how to work WITH agents and those who tried it once, got frustrated, and went back to the old way. The real barrier isn't the technology. It's process understanding. A year ago, AI agents didn't work well because models weren't good enough. Today? Claude Code can automate entire workflows, but only if you deeply understand the process you're automating. What I've learned building with AI agents: • Generic AI gets you 80% (not production-ready) • Domain expertise + AI gets you 95%+ (actually ships) • That last 20% requires knowing what questions to ask • The advantage compounds daily for those who invest the time The people dismissing AI agents haven't put in the hours to understand how they work. Meanwhile, the people learning this skill are building 10x faster than they could a year ago. The window is now. In 1–3 years, this becomes table stakes. Right now, it's still a competitive advantage. Are you investing in learning how to work with AI agents, or are you waiting until everyone else figures it out?
31 Comments
Like Comment
To view or add a comment, sign in
Shekhar Parekh
8mo
Report this post
🤖 AI has its own “Iron Triangle” and it’s shaping every model decision we make. Since my last post on Proprietary vs. Open Source LLMs, something kept bugging me. At first, it seemed simple, just pick a model and move on. 🤷♂️ Boy, was I wrong. 🫣 Beyond the obvious factors, control, privacy, and security - there’s something far more fundamental that defines every model: 👉 Tokens, Parameters, and Context Window. As I started digging deeper into these terms and their nuances, a thought hit me, I’ve seen this before. 🧐 Years ago, while studying for my PMP, I learned about the Golden (or Iron) Triangle: Cost, Quality, and Time. Improve one, and you inevitably stress the others. Turns out, AI has its own version of that triangle. 💰 Cost �� Tokens : Tokens are the currency of computation. Every word, symbol, or byte the model processes counts as tokens 💸 More tokens = longer inputs/outputs = more GPU time and power. 👉 So in AI, cost is measured in tokens burned. 🧠 Quality → Parameters 🧬 Parameters are the billions of weights the model learns - the essence of its “intelligence.” 🚨More parameters generally mean better quality, richer understanding, deeper reasoning. But after a point, adding parameters doesn’t make it smarter, it just makes it heavier and more expensive. 👉 Parameters define the ceiling of quality. ⚡ Time → Context Window : The context window is how much text the model can “see” at once i.e. its short-term memory. ⏩ Smaller context = faster responses but less awareness. 🐢 Larger context = deeper reasoning, longer documents but slower and more costly per query. 👉 This is your speed tradeoff it shapes the user experience directly. 🚦 You can’t maximize all three. ⚖️ Expand context (more awareness)? You pay in tokens and speed. ⚖️ Increase parameters (quality)? You pay in training cost and latency. ⚖️ Reduce tokens (save cost)? You may lose detail or context. ⚡ Every AI system is a balancing act - just like managing cost, time, and quality in a project. 📒 Different world, same principle. There’s an old truth: learning is never wasted. What you learn in one context often helps you understand another because the fundamentals rarely change. ❓ Curious in your work with AI, which corner of the triangle do you find yourself optimizing for most: Cost, Quality, or Time? #AI #LLM #MachineLearning #AIGovernance #AIEthics #ArtificialIntelligence #Leadership #Innovation #ProjectManagement
Like Comment
To view or add a comment, sign in
Markus Brandes
8mo Edited
Report this post
Inspiring post/read from Vanessa Cann. 💡 Imagine looking at your enterprise workflows like Hire-to-Retire or Quote-to-cash as examples, which are usually requiring thousands of steps to get performed in an Enterprise. If Agentic AI solutions can be trusted to perform them with 99% accuracy, this opens up significant productivity improvement opportunities in Enterprises. There is still a way to go to build such trustworthy and accurate Agentic AI solutions, but progress in this direction is impressive. ⚡ It is clear that technology advancements only will not yield the productivity improvements as current enterprise workflows get executed by a series of different teams and organizations not necessarily setup to run these processes efficiently. Thus driving productivity improvements with Agentic AI also needs to consider the human side of the house and needs an appetite to re-invent how things get done. At IBM we have been doing this. Learn more here https://lnkd.in/enkkrxj6 Explore IBM‘s watsonx suite with watsonx.orchestrate, watsonx.goverance, our open source Granite models, … contributing to the technology progress towards trustworthy and accurate Enterprise AI. #letscreate
Vanessa Cann Vanessa Cann is an Influencer

Managing Director & Data/AI Innovation Lead at Accenture • Angel Investor • ex AI founder, CEO & ecosystem builder • Forbes 30u30 • Capital 40u40 • Top 23 Women in AI in Germany by Manager Magazin
8mo Edited

How long can an AI system run without breaking? For any system to be reliable, it needs to maintain 99 % accuracy across every step of its work. A few weeks ago, researchers published an interesting paper on this topic: “Measuring Long-Horizon Execution in Large Language Models”. Their approach is refreshingly practical. Instead of testing models on abstract benchmarks, they measured how many consecutive actions a system can execute before it fails. When an AI’s performance drops below 99 %, small mistakes begin to chain together — a wrong retrieval, a misapplied rule, a misplaced figure. Once that happens, the system spirals, and the task collapses. Today’s frontier models can complete around 100 steps at 99 % reliability — roughly the complexity of a one-day research task. The current pace of progress suggests this number doubles roughly every seven months. If that trend continues, we could see: ≈ 1 000 steps (around 2027): preparing a quarterly investor pack — updating KPIs, writing the narrative, and compiling a Q&A brief with talking points. ≈ 11 000 steps (around 2029): running a full product launch — from concept and specification to go-to-market planning, execution, and early optimisation. ≈ 37 000 steps (around 2030): redesigning enterprise pricing and packaging — from willingness-to-pay research to rollout, communications, and renewal uplift. Each of these examples currently requires teams of people working together for weeks or months. Soon — if systems can maintain 99 % accuracy across that many steps — such projects could be executed almost entirely by AI. At around ten thousand reliable steps, autonomy becomes a capability. Something you can actually build on and trust. That is why, I always advise businesses to start classifying their work by step-length. This provides a tangible view of when each process might become automatable. It turns AI planning from guesswork into measurement — and helps leaders design their organizations for a world where weeks of machine autonomy become normal. It was great fun joining this year’s IBM Ecosystem Conference and discussing how we as an AI ecosystem can help industry make AI reliable and robust.
2 Comments
Like Comment
To view or add a comment, sign in
Vanessa Cann Vanessa Cann is an Influencer
8mo Edited
Report this post
How long can an AI system run without breaking? For any system to be reliable, it needs to maintain 99 % accuracy across every step of its work. A few weeks ago, researchers published an interesting paper on this topic: “Measuring Long-Horizon Execution in Large Language Models”. Their approach is refreshingly practical. Instead of testing models on abstract benchmarks, they measured how many consecutive actions a system can execute before it fails. When an AI’s performance drops below 99 %, small mistakes begin to chain together — a wrong retrieval, a misapplied rule, a misplaced figure. Once that happens, the system spirals, and the task collapses. Today’s frontier models can complete around 100 steps at 99 % reliability — roughly the complexity of a one-day research task. The current pace of progress suggests this number doubles roughly every seven months. If that trend continues, we could see: ≈ 1 000 steps (around 2027): preparing a quarterly investor pack — updating KPIs, writing the narrative, and compiling a Q&A brief with talking points. ≈ 11 000 steps (around 2029): running a full product launch — from concept and specification to go-to-market planning, execution, and early optimisation. ≈ 37 000 steps (around 2030): redesigning enterprise pricing and packaging — from willingness-to-pay research to rollout, communications, and renewal uplift. Each of these examples currently requires teams of people working together for weeks or months. Soon — if systems can maintain 99 % accuracy across that many steps — such projects could be executed almost entirely by AI. At around ten thousand reliable steps, autonomy becomes a capability. Something you can actually build on and trust. That is why, I always advise businesses to start classifying their work by step-length. This provides a tangible view of when each process might become automatable. It turns AI planning from guesswork into measurement — and helps leaders design their organizations for a world where weeks of machine autonomy become normal. It was great fun joining this year’s IBM Ecosystem Conference and discussing how we as an AI ecosystem can help industry make AI reliable and robust.
31 Comments
Like Comment
To view or add a comment, sign in
Megha M
8mo Edited
Report this post
The Black Box Dilemma: Moving from Prediction to Prescription with Explainable AI (XAI) We trust AI with our lives from diagnosing diseases to approving loans but can it explain itself? That’s the crux of the Black Box Dilemma and the reason why Explainable AI (XAI) is no longer optional but essential. Modern Machine Learning, especially Deep Learning, excels at making predictions. From forecasting stock prices to identifying tumors, its accuracy is often remarkable. But its biggest weakness is that it can’t clearly tell us how it arrived at those predictions. This lack of clarity is the “Black Box Problem.” In high-stakes fields like finance, law, and healthcare, a prediction without reasoning can be dangerous. If an AI denies a loan or misdiagnoses a patient, people and regulators need transparency, fairness, and accountability. That is where XAI steps in to make models not only powerful but also understandable. Local vs. Global Explanations in XAI 1. Local Interpretability: Explaining a Single Prediction Local methods focus on explaining why a model made a specific decision for one data point. LIME (Local Interpretable Model-agnostic Explanations): It perturbs the input around a single data point to approximate the model locally with a simple linear version. This helps humans see what features most influenced that prediction. SHAP (SHapley Additive exPlanations): Based on game theory, SHAP assigns each feature a value showing its contribution to the final output. All the contributions together perfectly add up to the model’s predicted value. 2. Global Interpretability: Understanding the Model’s Overall Logic Global methods look at how the model behaves across the entire dataset. Global Feature Importance: Shows which features most influence predictions on average. Surrogate Models: A simpler, interpretable model (like a Decision Tree) is trained to mimic the black box’s decisions. Though less accurate, it gives a clear picture of overall reasoning. The Miravya Edge: From XAI to Actionable Insight At Miravya, we see XAI not as a compliance tool, but as a way to turn understanding into intelligent action. A typical ML model might predict that a customer will churn. XAI helps reveal why perhaps low app usage and frequent support tickets. Instead of just knowing who will leave, we use those insights to design interventions a proactive support call or a personalized tutorial that helps retain the customer. This is how we move from prediction to prescription. The future of enterprise AI isn’t just about achieving high accuracy. It’s about making every decision explainable, ethical, and executable. That’s the real power of solving the Black Box Dilemma.
Like Comment
To view or add a comment, sign in
Tim Vieyra
9mo
Report this post
Taming the Beasts: Field Notes on Which AI Models Actually Deliver Working with AI models feels less like selecting tools and more like a naturalist learning which creatures to trust in unexplored territory. Each has its own temperament, its own strengths, its own ways of surprising you. After months of observation, here are my field notes: **Claude Opus - The Wise Counselor** This one is wildly impressive - a genuine breakthrough in what machine thinking can be. Subtle, deeply thoughtful, reading between lines I didn't know existed. When I need to craft board documents or navigate complex partnership negotiations, Opus grasps not just what's said, but what's deliberately left unsaid. It catches the subtle currents that determine whether a proposal sinks or swims. Last week it spotted something in our negotiations that three advisors had missed: how a technical capability wasn't just a feature but the entire strategic moat. Like watching a master tracker read signs others walk past. Yes, it's expensive. But for strategic thought that requires genuine nuance? Nothing else comes close. **Code GPT-5 - The Master Builder** Through Cursor (~$20/month), GPT-5 is the seasoned architect who holds entire blueprints in mind while working. It maintains coherent understanding across vast codebases, whether constructing new systems or renovating existing ones. But here's what I've learned: Cursor's default auto-model is an eager apprentice - fast, enthusiastic, but codes a lot quicker than it thinks. Its useful for quick scripts, but dangerous for systems. GPT-5 is the master craftsman who measures twice, cuts once. The difference between them isn't speed - it's wisdom. (I avoid coding with Opus - it's simply too expensive for the volume of processing that development requires. Save the thoroughbred for the work that needs it.) **Grok - The Scout** While others herald ChatGPT's web browsing, Grok excels at reconnaissance - tracking market movements, competitor intelligence, emerging patterns. (Though Gemini is evolving rapidly in Google's ecosystem - another species adapting quickly.). I love the "task" function which runs a scheduled prompt and delivers the result to a notification in my phone. this is now the way I consume news. The humbling truth? These aren't tools we've built. They're intelligences that have emerged through technological evolution - complex organisms we're still learning to understand. Some days I feel less like an engineer and more like those Victorian explorers, carefully documenting which creatures can be trusted with which tasks. Using Opus for quick scripts is like using a thoroughbred to pull a plow. Using GPT-3.5 for strategy is like sending a house cat to hunt buffalo. They're not just different in degree - they're different in kind. Different species for different purposes. What creatures have you learned to work with? And what have they taught you about their nature?
1 Comment
Like Comment
To view or add a comment, sign in
Amal Muhsin MCIS
8mo
Report this post
The Ghost of the Algorithm When your work involves translating complex concepts into practical technology, you start to notice recurring themes. One that has become increasingly relevant is the relationship between our past data and our future powered by AI. A framework I've found useful is the idea of a "Ghost in the Algorithm." It connects the old philosophical concept of essentialism—that everything has a core set of defining traits—with the practical function of modern AI. At its core, an AI is designed to analyze data and identify what it determines to be the "essence" of a subject. The challenge is that this process isn't philosophical; it's statistical. The AI's conclusions are a direct reflection of the data it's given, including any flaws or outdated information. This idea comes up often in my work, especially during data migration projects. A client might have thirty years of company history, and the conversation inevitably turns to the "garbage in, garbage out" principle. The core question is a practical one: Do you want outdated processes and historical data defining your capabilities for the future? Now, AI adds a new layer to this decision. It's not just a tool for filtering data; it's a system that learns from it. This elevates the stakes, as the data we provide doesn't just populate the new system—it actively teaches it. This leads to a fundamental question that is central to responsible AI implementation. As we build our future systems, how much influence should we allow our unexamined past to have? It's a strategic choice between carrying forward old habits or consciously defining a new direction.
Like Comment
To view or add a comment, sign in
Palanimohan D
8mo
Report this post
--AI Wave-- The AI Wave is Relentless. You Don't Need to Surf Every One. 🏄♂️ Feeling overwhelmed by the daily deluge of new AI models, libraries, and breakthroughs? You're not alone. The pressure to know everything is the fastest path to burnout. Here's the hard truth: You cannot and should not try to learn it all. The goal isn't to become a walking encyclopedia of AI. The goal is to build a "T-Shaped" AI Skillset with strategic awareness. Here’s my practical framework: 1. Build Your Deep "Vertical" Bar (The | in the T) This is your foundation. Pick one or two core domains and go deep. Is it LLM Orchestration? (e.g., LangChain, LlamaIndex) Is it MLOps and Deployment? (e.g., Kubernetes, MLflow, TFX) Is it Computer Vision? (e.g., Diffusion models, YOLO) Is it your industry's domain knowledge? (e.g., AI in BioTech, FinTech) This is your home base. This is where you build undeniable, expert-level value. No new flashy paper should shake this foundation. 2. Build Your Broad "Horizontal" Bar (The — in the T) This is for everything else. Your goal here is not mastery, but "Conceptual Understanding." Spend 30 minutes, twice a week, on "scouting." Skim arXiv summaries, watch a 10-minute video on the new OpenAI o1 model, or read a high-level blog post about Mixture-of-Experts. Your question isn't "How do I implement this?" but "What problem does this solve, and when might it be relevant to me?" Use tools like AI-read article summarizers to save time. ------------------------------------------------------------------------------------- The Power is in the Intersection. Innovation happens when you apply a new, broad-concept ("Hey, this Retrieval-Augmented Generation thing is interesting...") to your deep domain ("...how could I use RAG to solve our specific data access problem in healthcare?"). Your New Mantra: "I am an expert in [My Vertical]. I am intelligently aware of the [AI Landscape]. I connect the dots, I don't collect them." The signal will emerge from the noise when you have a filter for what truly matters to you. Stay Consistent. How are you managing the AI learning curve? Share your strategy below! 👇 #AI #MachineLearning #CareerDevelopment #Tech #LLM #DataScience #ContinuousLearning #AvoidingBurnout #Engineering
Like Comment
To view or add a comment, sign in

2,087 followers

1,891 Posts

View Profile Connect

"Language models' outputs are fabrications, even if they seem real"

More Relevant Posts

Explore related topics

Explore content categories