Why Your AI Agent Hallucinates: Engineering Lessons From a Production Workflow Builder in Drupal
Why Your AI Agent Hallucinates: Engineering Lessons From a Production Workflow Builder in Drupal
Shibin Devadas Kakanat (D34dman)
LLMs are now table stakes in Drupal. Modules like AI, ECA, and a growing set of agentic tools make it trivial to wire a model into a site. What's not trivial is making that integration trustworthy enough to put in front of a real editor or site builder.
Over the last several months, building FlowDrop's chat-based workflow editor, we collected a set of failure modes that recur across every LLM-driven Drupal feature we've seen: hallucinated field names, half-applied state changes, prompt-injection through user content, runaway token costs as state grows, and the "what ID did you just create?" round-trip problem.
This session is the post-mortem. It's the engineering log of what we changed, why each change mattered, and how each pattern generalises beyond FlowDrop. Every claim is backed by code/data we'll show on screen.
---
Session is not for: attendees looking for a vendor comparison of LLM providers, or a pure prompt-engineering talk with no code. This is an engineering session about the system around the prompt.
Prerequisite
Attendees should be comfortable with:
- Drupal module development (services, plugins, config entities)
- Basic LLM concepts: system prompts, context windows, tool/function calling. You do not need to have built an agent before.
- PHP 8.2+ syntax (readonly properties, attributes, enums)
No prior FlowDrop knowledge is required. The patterns are framed in general terms; FlowDrop is the case study.
Target Audience
Module developers building AI-powered features (content assistants, agents, workflow tools, chatbots) who have hit the "demo works, production doesn't" wall.
Outline
Most "AI agent" demos look great in a screencast and fall apart the moment a real user touches them. They invent fields, corrupt state, and forget what they did two turns ago. The fix isn't a smarter model -- it's the boring engineering around the model.
This session is a code-level walkthrough of FlowDrop Chat, a production AI agent that builds and modifies visual workflows inside Drupal from natural-language chat. It works reliably not because of the LLM choice, but because of seven specific architectural decisions: a tiny domain-specific language instead of JSON tool calls, aggressive context slimming, predictable IDs the model can plan against, prompt-injection hardening, atomic batched commands with rollback, and a self-hosted prompt pipeline that ships as a workflow rather than as code.
You'll leave with concrete patterns you can apply to any Drupal AI feature -- whether you're building an agent, a content assistant, or just trying to keep an LLM from melting your editorial workflow.
Learning Objectives
By the end of this session, attendees will be able to:
1. Choose the right LLM output format for their use case -- and explain why a small DSL often beats JSON tool-calling for editor-style agents (token economy, atomic batching, deterministic parsing, undoability).
2. Slim context aggressively by separating what the LLM needs to know from what the system needs to store -- including a reusable pattern for reducing Drupal config schemas, entity references, and plugin metadata for prompt injection.
3. Defend against prompt injection in chat history and user-supplied state with concrete, copy-pastable techniques (XML wrapping, JSON validation gates, never echoing raw user input into system instructions).
4. Design predictable identifiers so the LLM can plan multi-step changes in a single response without round-trips; and recognise where this pattern applies outside agents (slugs, machine names, batch operations).
5. Make agent actions atomic and reversible, turning "the model needs to be right" into "the model needs to be right or recoverable" (a dramatically lower bar).
6. Encode prior failures into the system prompt as worked examples and explicit anti-patterns, instead of hoping a bigger model will figure it out.
7. Dogfood the agent through their own platform so improvements ship as configuration, not code deploys.
Experience level
Intermediate