16 KiB
Tasks: RAG Retrieval Quality Improvements
Input: Design documents from /specs/004-rag-retrieval-quality/
Prerequisites: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/ ✅
Organization: Tasks are grouped by user story to enable independent implementation and testing of each story. Tests: Not requested in spec — no test tasks generated.
Format: [ID] [P?] [Story] Description
- [P]: Can run in parallel (different files, no dependencies)
- [Story]: Which user story this task belongs to (US1, US2, US3)
Phase 1: Setup (Shared Infrastructure)
Purpose: No new project structure needed — services are added to the existing retrieval/ package. This phase is a single verification step.
- T001 Verify active branch is
004-rag-retrieval-qualityandbackend/src/main/java/com/aiteacher/retrieval/exists
Phase 2: Foundational (Blocking Prerequisites)
Purpose: Two lightweight value objects shared by both user stories.
⚠️ CRITICAL: Both user story phases depend on these records being present.
- T002 Create
ExpandedQueryrecord inbackend/src/main/java/com/aiteacher/retrieval/ExpandedQuery.javawith fieldsString originalandString rewritten - T003 [P] Create
LabelledContextrecord inbackend/src/main/java/com/aiteacher/retrieval/LabelledContext.javawith fieldsMap<String, SectionEntity> sectionLabels,Map<String, FigureEntity> figureLabels, andString promptText
Checkpoint: Foundation ready — US1 and US2 implementation can begin in parallel
Phase 3: User Story 1 — Accurate Retrieval Despite Different Terminology (Priority: P1) 🎯 MVP
Goal: Before each retrieval call, rewrite the user's question into clinical terminology so that vector search finds relevant sections even when the user uses lay language.
Independent Test: Ask "what happens after cutting the skull?" — verify retrieved sections contain content about craniotomy without that word appearing in the query.
Implementation for User Story 1
-
T004 [US1] Create
QueryExpansionServiceinbackend/src/main/java/com/aiteacher/retrieval/QueryExpansionService.java:- Constructor-inject
ChatClient - Method
expand(String query): ExpandedQuery - LLM prompt: "Rewrite the following question using precise medical/surgical terminology as it would appear in a neurosurgery textbook index. Output only the rewritten question, nothing else. Question: {query}"
- Return
new ExpandedQuery(query, rewrittenText) - Annotate with
@Service
- Constructor-inject
-
T005 [US1] Modify
NeurosurgeryRetriever.retrieve()inbackend/src/main/java/com/aiteacher/retrieval/NeurosurgeryRetriever.java:- Change method signature from
retrieve(String query, UUID bookId)toretrieve(String query, UUID bookId)— no signature change; just usequeryfor vector search (already correct; no change needed here unless query is pre-expanded by caller) - Note: expansion is done in ChatService before calling retrieve, so no change to NeurosurgeryRetriever is required
- Change method signature from
-
T006 [US1] Modify
ChatServiceinbackend/src/main/java/com/aiteacher/chat/ChatService.java:- Constructor-inject
QueryExpansionService - In
sendMessage(), callqueryExpansionService.expand(fullQuestion)before the retrieval loop - Pass
expandedQuery.rewritten()toretriever.retrieve()instead offullQuestion - Keep passing
fullQuestion(original) tobuildContextPrompt()so the QUESTION block shown to the model reflects what the user actually asked
- Constructor-inject
Checkpoint: User Story 1 fully functional — retrieval now uses clinically rewritten queries
Phase 4: User Story 2 — Grounded Citation in Generated Answers (Priority: P1)
Goal: Tag all retrieved sections and figures with short ref-labels ([S1], [F1]…) in the prompt, instruct the model to cite only those labels, then post-process the answer to strip any citation referencing a label that was not provided.
Independent Test: Trigger a question where only sections S1–S3 are retrieved. Verify the generated answer contains no citation outside that set, and the sources list in the response carries refLabel fields.
Implementation for User Story 2
-
T007 [US2] Create
CitationValidatorServiceinbackend/src/main/java/com/aiteacher/retrieval/CitationValidatorService.java:- Annotate with
@Service - Method
validate(String generatedAnswer, Set<String> validLabels): String - Scan
generatedAnswerfor occurrences of[Sn]and[Fn]patterns using a regex like\[(S|F)\d+\] - Remove (or replace with empty string) any match whose label is not in
validLabels - Return the cleaned answer text
- Annotate with
-
T008 [US2] Modify
ChatService.buildContextPrompt()inbackend/src/main/java/com/aiteacher/chat/ChatService.java:- Change signature to return
LabelledContextinstead ofString - Assign sequential labels: sections get
S1,S2, …; figures getF1,F2, … - Prefix each section block with its label:
[S1] Section Title, p.N\n{fullText}\n\n - Prefix each figure line with its label:
[F1] Fig. X (p.N): caption - Populate
sectionLabelsandfigureLabelsmaps in the returnedLabelledContext - Store the full formatted prompt in
LabelledContext.promptText()
- Change signature to return
-
T009 [US2] Update system prompt constant in
backend/src/main/java/com/aiteacher/chat/ChatService.java:- Replace the citation rule "Cite sources for each major point (book title and page number from the context)" with: "Cite claims using ONLY the reference labels provided in the context (e.g. [S1], [F2]). Do not invent page numbers, section titles, or labels not present in the CONTEXT block."
-
T010 [US2] Wire
CitationValidatorServiceintoChatService.sendMessage()inbackend/src/main/java/com/aiteacher/chat/ChatService.java:- Constructor-inject
CitationValidatorService - After the
chatClient.prompt()...call().content()call, passassistantContentand the label set fromLabelledContexttocitationValidatorService.validate() - Use the validated string as
assistantContentgoing forward
- Constructor-inject
-
T011 [US2] Modify
buildSources()inbackend/src/main/java/com/aiteacher/chat/ChatService.java:- Accept the
LabelledContext(or its two maps) as an additional parameter - Add
"refLabel"entry to each source map: e.g.source.put("refLabel", "S1")for sections,source.put("refLabel", "F1")for figures - Keep all other existing fields unchanged
- Accept the
-
T012 [US2] Update
sendMessage()call chain inbackend/src/main/java/com/aiteacher/chat/ChatService.javato threadLabelledContextthrough steps T008–T011:LabelledContext ctx = buildContextPrompt(fullQuestion, allSections, allFigures)- Pass
ctx.promptText()to the LLM call - Pass
ctxlabel maps tovalidate()andbuildSources()
Checkpoint: User Stories 1 and 2 both fully functional — queries are expanded, citations are grounded
Phase 5: User Story 3 — User Visibility into Retrieval Confidence (Priority: P2)
Goal: The answer text contains [S1]-style labels (after US2). This phase exposes them in the frontend so users can see which claim maps to which source card.
Independent Test: Send a question, receive an answer with inline [S1] labels visible in the rendered text, and confirm clicking/hovering the label highlights the corresponding source card.
Note: Per research.md, the backend is already complete after US1+US2. US3 is a frontend-only UX enhancement.
Implementation for User Story 3
-
T013 [US3] Modify
ChatMessage.vueinfrontend/src/components/ChatMessage.vue:- Parse the answer text for
[Sn]and[Fn]citation labels using a regex - Render each label as a styled inline badge (e.g.
<span class="citation-badge">[S1]</span>) - When a badge is clicked or hovered, highlight the corresponding source card (match by
source.refLabel)
- Parse the answer text for
-
T014 [US3] Update source card rendering in
frontend/src/components/ChatMessage.vue:- Add a
data-ref-labelattribute to each source card element so it can be targeted by the citation badge interaction - Apply a visual highlight style (CSS class) when the card is active
- Add a
Checkpoint: All three user stories functional — full end-to-end quality improvements delivered
Phase 6: User Story 4 — Topic Summary Persistence & History (user-requested)
Goal: Every generated topic summary is saved to the database. When a topic is selected the UI shows a numbered history list; the student can view any past summary or generate a new one.
Independent Test: Generate a summary for "Intracranial Aneurysms", reload the page, click the topic — verify "Summary #1" appears. Generate again — verify "Summary #2" appears. Click "Summary #1" — verify the original text loads without regeneration.
- T018 Create Flyway migration
backend/src/main/resources/db/migration/V6__topic_summary.sql— tabletopic_summarywith columns:id UUID PRIMARY KEY DEFAULT gen_random_uuid(),topic_id VARCHAR(100) NOT NULL,summary_number INT NOT NULL,summary TEXT NOT NULL,sources_json TEXT NOT NULL,generated_at TIMESTAMPTZ NOT NULL - T019 [P] [US4] Create
TopicSummaryEntity.javainbackend/src/main/java/com/aiteacher/topic/TopicSummaryEntity.java— JPA@Entitymapped to tabletopic_summary; fields:@Id UUID id,String topicId,int summaryNumber,String summary,String sourcesJson,Instant generatedAt; no-arg + all-args constructor - T02X [P] [US4] Create
SavedSummaryItem.javarecord inbackend/src/main/java/com/aiteacher/topic/SavedSummaryItem.java— fields:UUID id,int summaryNumber,Instant generatedAt(list-view DTO, no full text) - T02X [US4] Create
TopicSummaryRepository.javainbackend/src/main/java/com/aiteacher/topic/TopicSummaryRepository.java—extends JpaRepository<TopicSummaryEntity, UUID>; addList<TopicSummaryEntity> findByTopicIdOrderBySummaryNumberAsc(String topicId)andlong countByTopicId(String topicId) - T02X [US4] Modify
TopicSummaryResponse.javainbackend/src/main/java/com/aiteacher/topic/TopicSummaryResponse.java— add fieldsUUID idandint summaryNumberto the record components - T02X [US4] Modify
TopicSummaryService.javainbackend/src/main/java/com/aiteacher/topic/TopicSummaryService.java— injectTopicSummaryRepositoryandObjectMapper; at end ofgenerateSummary()computesummaryNumber = (int) repository.countByTopicId(topicId) + 1, persist aTopicSummaryEntity(serialisesourceslist to JSON viaobjectMapper.writeValueAsString()), and includeid+summaryNumberin the returnedTopicSummaryResponse; addList<SavedSummaryItem> listSummaries(String topicId)andTopicSummaryResponse getSummary(UUID summaryId)methods - T02X [US4] Modify
TopicController.javainbackend/src/main/java/com/aiteacher/topic/TopicController.java— add@GetMapping("/{id}/summaries")returningList<SavedSummaryItem>(delegates tolistSummaries); add@GetMapping("/{id}/summaries/{summaryId}")returningTopicSummaryResponse(delegates togetSummary); both return 404 viaNoSuchElementExceptionwhen topic or summary not found - T02X [US4] Modify
topicStore.tsinfrontend/src/stores/topicStore.ts— add statesummaryList: SavedSummaryItem[]; addfetchSummaries(topicId)action callingGET /api/v1/topics/{topicId}/summaries; addfetchSummaryDetail(topicId, summaryId)action callingGET /api/v1/topics/{topicId}/summaries/{summaryId}and settingactiveSummary; clearsummaryListwhen a different topic is selected - T02X [US4] Modify
TopicsView.vueinfrontend/src/views/TopicsView.vue— when a topic card is clicked: (1) calltopicStore.fetchSummaries(topicId)first; (2) if summaries exist, display a summary history list showing chips "Summary #1 · [date]", "Summary #2 · [date]", … + a "Generate New" button; (3) clicking a chip callsfetchSummaryDetail()and renders the saved summary in the existing panel; (4) clicking "Generate New" callshandleGenerate()then re-callsfetchSummaries()to refresh the list; (5) if no summaries exist, show only the "Generate Summary" button (current behaviour)
Checkpoint: Summary persistence fully working end-to-end. US4 independently testable.
Phase 7: Polish & Cross-Cutting Concerns
Purpose: Constitution IV compliance and cleanup.
- T027 Update
README.mdMermaid architecture diagram to addQueryExpansionServiceandCitationValidatorServiceto the chat pipeline flow, and thetopic_summarytable to the data diagram (required by Constitution Principle IV — must be in the same PR) - T028 [P] Log the expanded query at DEBUG level in
QueryExpansionService(e.g.log.debug("Query expanded: '{}' → '{}'", original, rewritten)) for observability - T029 [P] Log stripped citation labels at WARN level in
CitationValidatorServicewhen any labels are removed (e.g.log.warn("Stripped hallucinated citations: {}", removedLabels))
Dependencies & Execution Order
Phase Dependencies
- Phase 1 (Setup): No dependencies — start immediately
- Phase 2 (Foundational): Depends on Phase 1 — blocks all user story phases
- Phase 3 (US1): Depends on Phase 2 (needs
ExpandedQuery) - Phase 4 (US2): Depends on Phase 2 (needs
LabelledContext); can run in parallel with Phase 3 - Phase 5 (US3): Depends on Phase 4 (needs
refLabelin sources) - Phase 6 (US4): No dependency on Phase 2 for the migration (T018); entity/service work (T019+) depends on T018
- Phase 7 (Polish): Depends on all implementation phases complete
User Story Dependencies
- User Story 1 (P1): Depends on Phase 2 only — no dependency on US2 or US3
- User Story 2 (P1): Depends on Phase 2 only — can run in parallel with US1
- User Story 3 (P2): Depends on US2 (needs
refLabelin the API response) - User Story 4: Independent of US1–US3 — can start immediately after T018 migration
Within Each User Story
- T004 → T006 (QueryExpansionService must exist before ChatService wiring)
- T007 → T010 → T012 (CitationValidatorService → wire into sendMessage → thread context)
- T008 → T012 (LabelledContext must be built before threading through)
- T013 → T014 (badge rendering before card targeting)
Parallel Opportunities
- T002 and T003 (Phase 2) can run in parallel — different files
- Phase 3 (US1) and Phase 4 (US2) can run in parallel after Phase 2 — all different files
- T015, T016, T017 (Polish) can run in parallel — different files
Parallel Example: US1 + US2
After Phase 2 completes:
Track A (US1):
T004 — Create QueryExpansionService
T005 — (no change to NeurosurgeryRetriever)
T006 — Wire into ChatService
Track B (US2):
T007 — Create CitationValidatorService
T008 — Modify buildContextPrompt() → LabelledContext
T009 — Update system prompt
T010 — Wire CitationValidatorService into sendMessage()
T011 — Add refLabel to buildSources()
T012 — Thread LabelledContext through call chain
Merge point: Both tracks modify ChatService — coordinate T006 and T012
to avoid conflicts (implement T006 first or use feature branches).
Implementation Strategy
MVP First (User Stories 1 + 2 — both P1)
- Complete Phase 1: Setup (T001)
- Complete Phase 2: Foundational (T002, T003)
- Complete Phase 3: US1 — query expansion (T004–T006)
- VALIDATE: Ask a lay-language question; confirm relevant clinical passages are retrieved
- Complete Phase 4: US2 — citation grounding (T007–T012)
- VALIDATE: Confirm no
[Sx]label appears in the answer that wasn't in the retrieved set - STOP and DEMO: Both P1 stories deliver the core reliability improvements
Incremental Delivery
- Phase 1 + 2 → infrastructure ready
- Phase 3 → vocabulary mismatch fixed → demo-able
- Phase 4 → citation hallucination fixed → demo-able
- Phase 5 → citation badges in UI → UX polish
- Phase 6 → README + logging → PR-ready
Notes
ChatServiceis modified by both US1 (T006) and US2 (T008–T012) — coordinate edits or implement sequentiallybuildContextPrompt()changes return type fromStringtoLabelledContext(T008) — update all callers in the same task- The system prompt change (T009) is a one-line string edit inside
ChatService; no separate class needed CitationValidatorServiceoperates purely on strings — no DB or AI dependency, easy to unit-test manually- US3 frontend tasks (T013–T014) are entirely in
ChatMessage.vue— no backend change