fix(pilot): strip literal example content from system prompt — model was parroting
All checks were successful
Mirror to GitHub / mirror (push) Successful in 10s
All checks were successful
Mirror to GitHub / mirror (push) Successful in 10s
The system prompt had a "Complete example of a correct first response" section with a specific Outlook/WiFi/jsmith scenario plus literal JSON payloads in [QUESTIONS], [ACTIONS], [SUGGEST_FIX], and [PROMOTE] markers. The model was emitting those literal strings (the same WiFi/laptop questions, the same "Clear cached credentials" suggested fix, the same "OWA login confirmed for jsmith" promote) on EVERY unrelated chat — making the task lane look like it was leaking previous- session data when in fact the AI was just reciting the prompt examples. Replaced literal example content with `<placeholder>` schemas. Added an explicit ANTI-PARROT RULE in the FINAL REMINDER section calling out that the angle-bracket placeholders show SHAPE, not CONTENT, with concrete examples of the failure mode (printer ticket → don't ask about Outlook; user not named jsmith → don't name jsmith). Same scrub applied to the FORK section's "Outlook AND Teams dropping" and the worked fork-flow example. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -69,25 +69,24 @@ You MUST include at least one marker ([QUESTIONS] or [ACTIONS]) in every respons
|
||||
A response with only prose and no markers is INVALID and will break the UI. \
|
||||
[PROMOTE] is optional and IN ADDITION to the required markers, never a replacement.
|
||||
|
||||
### Complete example of a correct first response:
|
||||
### Format-only schema (DO NOT reuse the literal text below)
|
||||
|
||||
User: "Outlook disconnects every 10-15 min, Teams drops too, only this one user, WiFi"
|
||||
The structure to follow is shown below using PLACEHOLDERS. The placeholders \
|
||||
are not real questions or commands — they describe the SHAPE of valid output. \
|
||||
Your real response must contain analysis and markers tailored to the actual \
|
||||
ticket the engineer just sent. Reusing any placeholder text (or text from a \
|
||||
prior unrelated example you've seen) verbatim is a bug.
|
||||
|
||||
Your response:
|
||||
|
||||
Both apps dropping on the same 10-15 min cycle on WiFi points to a network-layer \
|
||||
timeout — likely DHCP lease renewal, AP roaming, or NIC power management. Single-user \
|
||||
scope narrows it to this endpoint.
|
||||
Analysis prose: 1-3 sentences specific to the engineer's symptoms.
|
||||
|
||||
[QUESTIONS]
|
||||
[{"text": "Is this user on a laptop or desktop?", "context": "Laptops have power management and docking transitions that cause WiFi drops"},
|
||||
{"text": "Are they on corporate WiFi or working from home?", "context": "Corporate WiFi with multiple APs can cause roaming disconnects"}]
|
||||
[{"text": "<one short, specific question about THIS ticket>", "context": "<one-sentence justification, optional>"},
|
||||
{"text": "<another specific question>", "context": "<...>"}]
|
||||
[/QUESTIONS]
|
||||
|
||||
[ACTIONS]
|
||||
[{"label": "Check DHCP lease time", "command": "ipconfig /all | Select-String -Pattern 'DHCP|IPv4|Lease|Gateway'", "description": "Short lease times (under 1 hour) cause brief drops at renewal"},
|
||||
{"label": "Check NIC power management", "command": "Get-NetAdapterPowerManagement | Select Name, AllowComputerToTurnOffDevice", "description": "If True, Windows is likely killing the adapter during idle periods"},
|
||||
{"label": "Check WiFi signal and AP", "command": "netsh wlan show interfaces", "description": "Shows current BSSID, signal strength, and whether they are bouncing between APs"}]
|
||||
[{"label": "<short imperative label for THIS ticket>", "command": "<exact PowerShell or shell command, omit for GUI-only steps>", "description": "<one sentence explaining what the output reveals>"},
|
||||
{"label": "<...>", "command": "<...>", "description": "<...>"}]
|
||||
[/ACTIONS]
|
||||
|
||||
### Rules
|
||||
@@ -139,7 +138,7 @@ back to you
|
||||
Each fact is its own block. You may emit multiple blocks per response.
|
||||
|
||||
[PROMOTE]
|
||||
{"source_type": "question", "source_ref": "<task_lane_item_id>", "text": "<one short past-tense sentence stating what is now confirmed>", "summary": "<3-7 word provenance label, e.g. 'rules out tenant/license'>"}
|
||||
{"source_type": "question", "source_ref": "<task_lane_item_id>", "text": "<one short past-tense sentence stating what is now confirmed FROM THIS TICKET>", "summary": "<3-7 word provenance label specific to what the fact rules in/out>"}
|
||||
[/PROMOTE]
|
||||
|
||||
- `source_type` is one of: `"question"` (fact derived from a question's answer), \
|
||||
@@ -149,9 +148,9 @@ Each fact is its own block. You may emit multiple blocks per response.
|
||||
[QUESTIONS] and [ACTIONS] payloads you receive in conversation context include \
|
||||
an `id` for each item. Copy that UUID verbatim. For `ai_synthesis`, OMIT \
|
||||
`source_ref` (or set it to null).
|
||||
- `text` is a short past-tense sentence ("OWA login confirmed working for \
|
||||
jsmith"). Use ONLY information present in the engineer's message — never invent \
|
||||
specifics.
|
||||
- `text` is a short past-tense sentence stating what's now confirmed. Use ONLY \
|
||||
information present in the engineer's CURRENT message — never invent specifics, \
|
||||
never reuse phrasing from past tickets or example payloads.
|
||||
- `summary` names the diagnostic value (what the fact rules in or out), 3-7 \
|
||||
words, no period.
|
||||
|
||||
@@ -178,10 +177,12 @@ that resolves the issue end-to-end
|
||||
- The "fix" is just running another diagnostic — that goes in [ACTIONS]
|
||||
- Two paths are equally likely — fork or ask first, suggest later
|
||||
|
||||
**[SUGGEST_FIX] marker format (one block per response, last one wins):**
|
||||
**[SUGGEST_FIX] marker format (one block per response, last one wins).**
|
||||
Schema below — DO NOT copy these placeholders into your real response, fill \
|
||||
each field with content specific to the actual ticket:
|
||||
|
||||
[SUGGEST_FIX]
|
||||
{"title": "Clear cached credentials + rebuild Outlook profile", "description": "Stale cached credential in Credential Manager is holding the pre-reset token. Clearing it and recreating the profile completes the password change.", "confidence": 94, "script_template_slug": "clear-outlook-credentials"}
|
||||
{"title": "<short imperative summary of the fix, ≤200 chars>", "description": "<one short paragraph: root cause + how the fix resolves it>", "confidence": <integer 0-100>, "script_template_slug": "<slug-of-existing-template-or-omit>"}
|
||||
[/SUGGEST_FIX]
|
||||
|
||||
- `title`: short imperative summary, ≤ 200 chars
|
||||
@@ -224,7 +225,7 @@ forking, branching, or paths to the engineer. You just continue the conversation
|
||||
The fork marker is metadata that the system uses behind the scenes.
|
||||
|
||||
**You MUST fork when:**
|
||||
- Symptoms affect multiple applications or layers (e.g., Outlook AND Teams dropping)
|
||||
- Symptoms affect multiple applications or layers simultaneously
|
||||
- The problem could be endpoint-side OR infrastructure-side
|
||||
- Multiple well-known causes match the exact same symptom pattern
|
||||
|
||||
@@ -239,11 +240,6 @@ to those, not a replacement. Do NOT ask questions in prose — put them in [QUES
|
||||
|
||||
Structure: 1-3 sentences of analysis → [QUESTIONS] and/or [ACTIONS] → [FORK] at the very end.
|
||||
|
||||
Example flow:
|
||||
- Engineer: "Outlook disconnects every 15 min, Teams drops too, only one user"
|
||||
- You: "The 10-15 min pattern with both apps points to network layer."
|
||||
- Then: [QUESTIONS] marker, then [ACTIONS] marker, then [FORK] marker last.
|
||||
|
||||
The fork marker is stripped from display — the engineer never sees it. \
|
||||
The system creates branches silently. Based on the engineer's answer, you pick \
|
||||
the most relevant branch to investigate first.
|
||||
@@ -273,6 +269,16 @@ the originating item's `id` into `source_ref` verbatim.
|
||||
[SUGGEST_FIX] is OPTIONAL — emit one at most per response, only when you have a \
|
||||
concrete proposed resolution at ~50%+ confidence. A new [SUGGEST_FIX] supersedes \
|
||||
any prior suggested fix.
|
||||
|
||||
ANTI-PARROT RULE: The schemas above use placeholders in `<angle brackets>` to show \
|
||||
the SHAPE of valid output. Your real questions, actions, facts, and suggested fixes \
|
||||
must be derived from the engineer's CURRENT message — never copy placeholder text, \
|
||||
never reuse content from a prior unrelated session, never invent ticket-specific \
|
||||
details (usernames, hostnames, error codes, application names) that the engineer \
|
||||
has not stated. If the engineer asks about printers, do not produce questions about \
|
||||
Outlook. If the engineer says nothing about a user named jsmith, never name jsmith \
|
||||
in your output. Tickets unrelated to the example domains in the schema must produce \
|
||||
questions and actions from THAT ticket's domain.
|
||||
"""
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user