Dear DocTalk,
I’m Cassie.
I wanted to share a peer insight on feedback from our team about the Red Flag Language Smart Skill from the Resource Center.
We found a couple instances of AI hallucination.
When I ran it, most feedback was reliable, but it also returned two or three points that weren’t present in the document. I added two lines under Step 2 in the prompt (see my screenshot), and I’m hoping it helps reduce hallucinations.
I’d love to hear if others have had similar issues with Smart Skills and how you’ve adjusted prompts for better responses.
— Cassie
Dear Cassie,
First: yes, you’re doing this exactly right.
Writing into the Community is the perfect place for this—because you’re not just describing the problem, you’re also sharing what you tried and inviting others to learn with you. That’s how Community becomes useful, so thank you!
Now, on the hallucinations: you’re not imagining things, and your team isn’t alone.
AI can be amazingly helpful—and also occasionally overhelpful, especially when the prompt leaves room for interpretation or the AI is working with broader context than you intended.
The good news is that the fix is usually less about “becoming an expert at prompting” in the abstract and more about adding a few practical guardrails that you can incorporate into your Smart Skill prompts and also in the underlying configuration (choice of model and temperature especially).
Here are a few pointers that might help and are worth trying if you’re seeing hallucinations:
Choose settings that match the task
For detection and review tasks, you typically want precision, not creativity.
Model choice and temperature can make a noticeable difference in consistency. These are settings you can adjust for each Smart Skill. You can find some guidance on how model and temperature affect results in the Smart Skills Configuration article.
For what you’re trying to achieve Cassie, GPT-4.1 as a model choice and 0.1 as a temperature would make sure the AI follows instructions less creatively and more to the letter. You can also try GPT-5, a newer GPT model, that might yield better results.
If you’d like to change the GPT model used for your whole hub, for all QPilot tasks, you can do this in the Hub Management > Settings tab. You can learn how to do this here.
Keep the Sources focused
Sometimes hallucinations aren’t coming from the prompt at all—they come from the AI pulling from too much context. When a Smart Skill has access to multiple sources, it can unintentionally blend them.
When accuracy matters, the best practice is to keep the skill pointed at the current document (using an @mention) or a single intended source (when configuring the Smart Skill or when prompting in QPilot), rather than “everything available.”
Make the Skill “show its work”
Your instinct to add lines to Step 2 is spot on—because the fastest way to stop hallucinations is to force the output to stay grounded in the text.
If the skill is identifying something (red flags, requirements, risks, action items), a great rule is: Only report an item if you can quote it verbatim from the document.
That way, anything “not present in the document” simply can’t make it into the final output. It also makes review faster, because you can validate findings with a quick scan.
For example, you could include this mini-block:
Evidence rules (must follow):
- Only flag a red flag if you can quote it verbatim from @this document.
- “Red Flag Found” must contain the full sentence/bullet/cell text (not just a single word).
- If you cannot provide an exact quote, do not include the item.
- Recommendations must preserve meaning and must not add new claims.Define the boundaries—then repeat them
Most hallucinations happen when the AI feels invited to be helpful outside the intended scope. Prompts get much safer when you add language like:
- “Only flag items from this approved list.”
- “Do not add anything outside these categories.”
- “If nothing matches, say ‘No issues found.’”
It feels repetitive, but repeating constraints is exactly what improves consistency.
Watch for “rewrite inflation”
One of the sneakiest hallucination-adjacent issues is when the rewrite sounds polished but subtly adds claims (“measurable results,” “verified by reports,” “guaranteed outcomes”) that weren’t in the original text.
A safe rule of thumb: Rewrites should remove risk without increasing certainty.
If a rewrite adds something you can’t prove, it’s better to keep it simpler—or revise it manually.
Here are some ‘Rewrite inflation’ guardrails you can drop in as a mini-block into the bigger prompt:
Rewrite Inflation Guardrails:
- Rewrite must preserve meaning and must not increase certainty.
- Do not add new claims, metrics, guarantees, proof language (e.g., “verified,” “demonstrated,” “measurable”), or comparisons not present in the quote.
- Do not add bracketed placeholders like [metric] or [source].
- Prefer minimal edits: replace or remove the red-flag term while keeping the rest of the sentence intact.
- If the claim cannot be supported with evidence in the quoted text, soften it (e.g., “designed to,” “intended to,” “aims to”) or remove the claim.Treat Smart Skill output like a strong first pass—then do a quick human verification
Even with the best prompt, Smart Skills are still an assistant, not a final reviewer. The best workflow is:
- run the Skill
- skim the quotes/flags
- approve or adjust the recommendations you actually want in the document
That way you keep the speed, but you don’t inherit the risk.
Before I end off – just a quick word about the Resource Center.
While it provides a great baseline for setting up your own Smart Skills and the prompts are highly re-usable, they are very much intended to be modified and adapted for the kind of work you’re using them for. It might work great ‘as-is’ for some folks, but need some tweaking for others.
And then finally, my last bit of advice on this is to not be afraid to try stuff and see what works best – and reach out like you just did when you need pointers.
Until the next doc dilemma,
DocTalk
Comments
Please sign in to leave a comment.