feat(ai): robust response extraction + structured-output foundation (flag-gated) #188
Reference in New Issue
Block a user
Delete Branch "feat/ai-structured-outputs"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Hardens the Anthropic provider and lays the groundwork for schema-constrained JSON in
kb_conversion. No model changes (still claude-sonnet-4-6 / claude-haiku-4-5). New behavior is gated behindAI_KB_CONVERT_STRUCTURED_OUTPUT(defaultFalse) so prod behavior is unchanged until staging-validated.Decision recorded in
.ai/DECISIONS.md: structured outputs scope to flat-arraygenerate_json(kb_conversion) only —ai_fixandknowledge_flywheelemit recursive/nested decision trees that Anthropic's "no recursive schemas" limit excludes, so their fence-strippers stay.What changes
backend/app/core/ai_provider.py_extract_text_from_responsereplaces fragileresponse.content[0].text: skips non-text leading blocks (e.g. thinking blocks), returns first text block, logs ananthropic.stop_reasonwarning onmax_tokens/refusal (truncation now observable), raisesValueErroron no-text response.generate_jsongains optionalschemaparam. Anthropic wires it tooutput_config.format(structured outputs);schema=Nonepreserves exact prior call for every existing caller. Gemini accepts-and-ignores.backend/app/core/kb_conversion_service.pyTROUBLESHOOTING_SCHEMA/PROCEDURAL_SCHEMA+_schema_for_target_type(), modelled as a strict superset of every field the prompts emit.convert_documentpasses the schema only whenAI_KB_CONVERT_STRUCTURED_OUTPUTisTrue(defaultFalse). The_try_repair_jsonfallback stays as belt-and-suspenders.Cleanup
.gitignore— stopcore.<pid>crash dumps from showing up as untracked.Test plan
AI_KB_CONVERT_STRUCTURED_OUTPUT=true, run a live constrained-decoding smoke test, confirm token count + JSON validity