enhance rag retrieval + summary

2026-04-07 22:39:28 +02:00
parent 0cf318f0a7
commit aee6a9dfba
34 changed files with 2306 additions and 279 deletions
@@ -0,0 +1,250 @@
+# Tasks: RAG Retrieval Quality Improvements
+
+**Input**: Design documents from `/specs/004-rag-retrieval-quality/`
+**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/ ✅
+
+**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
+**Tests**: Not requested in spec — no test tasks generated.
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel (different files, no dependencies)
+- **[Story]**: Which user story this task belongs to (US1, US2, US3)
+
+---
+
+## Phase 1: Setup (Shared Infrastructure)
+
+**Purpose**: No new project structure needed — services are added to the existing `retrieval/` package. This phase is a single verification step.
+
+- [x] T001 Verify active branch is `004-rag-retrieval-quality` and `backend/src/main/java/com/aiteacher/retrieval/` exists
+
+---
+
+## Phase 2: Foundational (Blocking Prerequisites)
+
+**Purpose**: Two lightweight value objects shared by both user stories.
+
+**⚠️ CRITICAL**: Both user story phases depend on these records being present.
+
+- [x] T002 Create `ExpandedQuery` record in `backend/src/main/java/com/aiteacher/retrieval/ExpandedQuery.java` with fields `String original` and `String rewritten`
+- [x] T003 [P] Create `LabelledContext` record in `backend/src/main/java/com/aiteacher/retrieval/LabelledContext.java` with fields `Map<String, SectionEntity> sectionLabels`, `Map<String, FigureEntity> figureLabels`, and `String promptText`
+
+**Checkpoint**: Foundation ready — US1 and US2 implementation can begin in parallel
+
+---
+
+## Phase 3: User Story 1 — Accurate Retrieval Despite Different Terminology (Priority: P1) 🎯 MVP
+
+**Goal**: Before each retrieval call, rewrite the user's question into clinical terminology so that vector search finds relevant sections even when the user uses lay language.
+
+**Independent Test**: Ask "what happens after cutting the skull?" — verify retrieved sections contain content about craniotomy without that word appearing in the query.
+
+### Implementation for User Story 1
+
+- [x] T004 [US1] Create `QueryExpansionService` in `backend/src/main/java/com/aiteacher/retrieval/QueryExpansionService.java`:
+  - Constructor-inject `ChatClient`
+  - Method `expand(String query): ExpandedQuery`
+  - LLM prompt: *"Rewrite the following question using precise medical/surgical terminology as it would appear in a neurosurgery textbook index. Output only the rewritten question, nothing else. Question: {query}"*
+  - Return `new ExpandedQuery(query, rewrittenText)`
+  - Annotate with `@Service`
+
+- [x] T005 [US1] Modify `NeurosurgeryRetriever.retrieve()` in `backend/src/main/java/com/aiteacher/retrieval/NeurosurgeryRetriever.java`:
+  - Change method signature from `retrieve(String query, UUID bookId)` to `retrieve(String query, UUID bookId)` — no signature change; just use `query` for vector search (already correct; no change needed here unless query is pre-expanded by caller)
+  - *Note*: expansion is done in ChatService before calling retrieve, so no change to NeurosurgeryRetriever is required
+
+- [x] T006 [US1] Modify `ChatService` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
+  - Constructor-inject `QueryExpansionService`
+  - In `sendMessage()`, call `queryExpansionService.expand(fullQuestion)` before the retrieval loop
+  - Pass `expandedQuery.rewritten()` to `retriever.retrieve()` instead of `fullQuestion`
+  - Keep passing `fullQuestion` (original) to `buildContextPrompt()` so the QUESTION block shown to the model reflects what the user actually asked
+
+**Checkpoint**: User Story 1 fully functional — retrieval now uses clinically rewritten queries
+
+---
+
+## Phase 4: User Story 2 — Grounded Citation in Generated Answers (Priority: P1)
+
+**Goal**: Tag all retrieved sections and figures with short ref-labels (`[S1]`, `[F1]`…) in the prompt, instruct the model to cite only those labels, then post-process the answer to strip any citation referencing a label that was not provided.
+
+**Independent Test**: Trigger a question where only sections S1–S3 are retrieved. Verify the generated answer contains no citation outside that set, and the `sources` list in the response carries `refLabel` fields.
+
+### Implementation for User Story 2
+
+- [x] T007 [US2] Create `CitationValidatorService` in `backend/src/main/java/com/aiteacher/retrieval/CitationValidatorService.java`:
+  - Annotate with `@Service`
+  - Method `validate(String generatedAnswer, Set<String> validLabels): String`
+  - Scan `generatedAnswer` for occurrences of `[Sn]` and `[Fn]` patterns using a regex like `\[(S|F)\d+\]`
+  - Remove (or replace with empty string) any match whose label is not in `validLabels`
+  - Return the cleaned answer text
+
+- [x] T008 [US2] Modify `ChatService.buildContextPrompt()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
+  - Change signature to return `LabelledContext` instead of `String`
+  - Assign sequential labels: sections get `S1`, `S2`, …; figures get `F1`, `F2`, …
+  - Prefix each section block with its label: `[S1] Section Title, p.N\n{fullText}\n\n`
+  - Prefix each figure line with its label: `[F1] Fig. X (p.N): caption`
+  - Populate `sectionLabels` and `figureLabels` maps in the returned `LabelledContext`
+  - Store the full formatted prompt in `LabelledContext.promptText()`
+
+- [x] T009 [US2] Update system prompt constant in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
+  - Replace the citation rule *"Cite sources for each major point (book title and page number from the context)"* with: *"Cite claims using ONLY the reference labels provided in the context (e.g. [S1], [F2]). Do not invent page numbers, section titles, or labels not present in the CONTEXT block."*
+
+- [x] T010 [US2] Wire `CitationValidatorService` into `ChatService.sendMessage()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
+  - Constructor-inject `CitationValidatorService`
+  - After the `chatClient.prompt()...call().content()` call, pass `assistantContent` and the label set from `LabelledContext` to `citationValidatorService.validate()`
+  - Use the validated string as `assistantContent` going forward
+
+- [x] T011 [US2] Modify `buildSources()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
+  - Accept the `LabelledContext` (or its two maps) as an additional parameter
+  - Add `"refLabel"` entry to each source map: e.g. `source.put("refLabel", "S1")` for sections, `source.put("refLabel", "F1")` for figures
+  - Keep all other existing fields unchanged
+
+- [x] T012 [US2] Update `sendMessage()` call chain in `backend/src/main/java/com/aiteacher/chat/ChatService.java` to thread `LabelledContext` through steps T008–T011:
+  - `LabelledContext ctx = buildContextPrompt(fullQuestion, allSections, allFigures)`
+  - Pass `ctx.promptText()` to the LLM call
+  - Pass `ctx` label maps to `validate()` and `buildSources()`
+
+**Checkpoint**: User Stories 1 and 2 both fully functional — queries are expanded, citations are grounded
+
+---
+
+## Phase 5: User Story 3 — User Visibility into Retrieval Confidence (Priority: P2)
+
+**Goal**: The answer text contains `[S1]`-style labels (after US2). This phase exposes them in the frontend so users can see which claim maps to which source card.
+
+**Independent Test**: Send a question, receive an answer with inline `[S1]` labels visible in the rendered text, and confirm clicking/hovering the label highlights the corresponding source card.
+
+**Note**: Per research.md, the backend is already complete after US1+US2. US3 is a frontend-only UX enhancement.
+
+### Implementation for User Story 3
+
+- [x] T013 [US3] Modify `ChatMessage.vue` in `frontend/src/components/ChatMessage.vue`:
+  - Parse the answer text for `[Sn]` and `[Fn]` citation labels using a regex
+  - Render each label as a styled inline badge (e.g. `<span class="citation-badge">[S1]</span>`)
+  - When a badge is clicked or hovered, highlight the corresponding source card (match by `source.refLabel`)
+
+- [x] T014 [US3] Update source card rendering in `frontend/src/components/ChatMessage.vue`:
+  - Add a `data-ref-label` attribute to each source card element so it can be targeted by the citation badge interaction
+  - Apply a visual highlight style (CSS class) when the card is active
+
+**Checkpoint**: All three user stories functional — full end-to-end quality improvements delivered
+
+---
+
+## Phase 6: User Story 4 — Topic Summary Persistence & History (user-requested)
+
+**Goal**: Every generated topic summary is saved to the database. When a topic is selected the UI shows a numbered history list; the student can view any past summary or generate a new one.
+
+**Independent Test**: Generate a summary for "Intracranial Aneurysms", reload the page, click the topic — verify "Summary #1" appears. Generate again — verify "Summary #2" appears. Click "Summary #1" — verify the original text loads without regeneration.
+
+- [x] T018 Create Flyway migration `backend/src/main/resources/db/migration/V6__topic_summary.sql` — table `topic_summary` with columns: `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, `topic_id VARCHAR(100) NOT NULL`, `summary_number INT NOT NULL`, `summary TEXT NOT NULL`, `sources_json TEXT NOT NULL`, `generated_at TIMESTAMPTZ NOT NULL`
+- [x] T019 [P] [US4] Create `TopicSummaryEntity.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryEntity.java` — JPA `@Entity` mapped to table `topic_summary`; fields: `@Id UUID id`, `String topicId`, `int summaryNumber`, `String summary`, `String sourcesJson`, `Instant generatedAt`; no-arg + all-args constructor
+- [x] T02X [P] [US4] Create `SavedSummaryItem.java` record in `backend/src/main/java/com/aiteacher/topic/SavedSummaryItem.java` — fields: `UUID id`, `int summaryNumber`, `Instant generatedAt` (list-view DTO, no full text)
+- [x] T02X [US4] Create `TopicSummaryRepository.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryRepository.java` — `extends JpaRepository<TopicSummaryEntity, UUID>`; add `List<TopicSummaryEntity> findByTopicIdOrderBySummaryNumberAsc(String topicId)` and `long countByTopicId(String topicId)`
+- [x] T02X [US4] Modify `TopicSummaryResponse.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryResponse.java` — add fields `UUID id` and `int summaryNumber` to the record components
+- [x] T02X [US4] Modify `TopicSummaryService.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryService.java` — inject `TopicSummaryRepository` and `ObjectMapper`; at end of `generateSummary()` compute `summaryNumber = (int) repository.countByTopicId(topicId) + 1`, persist a `TopicSummaryEntity` (serialise `sources` list to JSON via `objectMapper.writeValueAsString()`), and include `id` + `summaryNumber` in the returned `TopicSummaryResponse`; add `List<SavedSummaryItem> listSummaries(String topicId)` and `TopicSummaryResponse getSummary(UUID summaryId)` methods
+- [x] T02X [US4] Modify `TopicController.java` in `backend/src/main/java/com/aiteacher/topic/TopicController.java` — add `@GetMapping("/{id}/summaries")` returning `List<SavedSummaryItem>` (delegates to `listSummaries`); add `@GetMapping("/{id}/summaries/{summaryId}")` returning `TopicSummaryResponse` (delegates to `getSummary`); both return 404 via `NoSuchElementException` when topic or summary not found
+- [x] T02X [US4] Modify `topicStore.ts` in `frontend/src/stores/topicStore.ts` — add state `summaryList: SavedSummaryItem[]`; add `fetchSummaries(topicId)` action calling `GET /api/v1/topics/{topicId}/summaries`; add `fetchSummaryDetail(topicId, summaryId)` action calling `GET /api/v1/topics/{topicId}/summaries/{summaryId}` and setting `activeSummary`; clear `summaryList` when a different topic is selected
+- [x] T02X [US4] Modify `TopicsView.vue` in `frontend/src/views/TopicsView.vue` — when a topic card is clicked: (1) call `topicStore.fetchSummaries(topicId)` first; (2) if summaries exist, display a summary history list showing chips "Summary #1 · [date]", "Summary #2 · [date]", … + a "Generate New" button; (3) clicking a chip calls `fetchSummaryDetail()` and renders the saved summary in the existing panel; (4) clicking "Generate New" calls `handleGenerate()` then re-calls `fetchSummaries()` to refresh the list; (5) if no summaries exist, show only the "Generate Summary" button (current behaviour)
+
+**Checkpoint**: Summary persistence fully working end-to-end. US4 independently testable.
+
+---
+
+## Phase 7: Polish & Cross-Cutting Concerns
+
+**Purpose**: Constitution IV compliance and cleanup.
+
+- [x] T027 Update `README.md` Mermaid architecture diagram to add `QueryExpansionService` and `CitationValidatorService` to the chat pipeline flow, and the `topic_summary` table to the data diagram (required by Constitution Principle IV — must be in the same PR)
+- [x] T028 [P] Log the expanded query at DEBUG level in `QueryExpansionService` (e.g. `log.debug("Query expanded: '{}' → '{}'", original, rewritten)`) for observability
+- [x] T029 [P] Log stripped citation labels at WARN level in `CitationValidatorService` when any labels are removed (e.g. `log.warn("Stripped hallucinated citations: {}", removedLabels)`)
+
+---
+
+## Dependencies & Execution Order
+
+### Phase Dependencies
+
+- **Phase 1 (Setup)**: No dependencies — start immediately
+- **Phase 2 (Foundational)**: Depends on Phase 1 — blocks all user story phases
+- **Phase 3 (US1)**: Depends on Phase 2 (needs `ExpandedQuery`)
+- **Phase 4 (US2)**: Depends on Phase 2 (needs `LabelledContext`); can run in parallel with Phase 3
+- **Phase 5 (US3)**: Depends on Phase 4 (needs `refLabel` in sources)
+- **Phase 6 (US4)**: No dependency on Phase 2 for the migration (T018); entity/service work (T019+) depends on T018
+- **Phase 7 (Polish)**: Depends on all implementation phases complete
+
+### User Story Dependencies
+
+- **User Story 1 (P1)**: Depends on Phase 2 only — no dependency on US2 or US3
+- **User Story 2 (P1)**: Depends on Phase 2 only — can run in parallel with US1
+- **User Story 3 (P2)**: Depends on US2 (needs `refLabel` in the API response)
+- **User Story 4**: Independent of US1–US3 — can start immediately after T018 migration
+
+### Within Each User Story
+
+- T004 → T006 (QueryExpansionService must exist before ChatService wiring)
+- T007 → T010 → T012 (CitationValidatorService → wire into sendMessage → thread context)
+- T008 → T012 (LabelledContext must be built before threading through)
+- T013 → T014 (badge rendering before card targeting)
+
+### Parallel Opportunities
+
+- T002 and T003 (Phase 2) can run in parallel — different files
+- Phase 3 (US1) and Phase 4 (US2) can run in parallel after Phase 2 — all different files
+- T015, T016, T017 (Polish) can run in parallel — different files
+
+---
+
+## Parallel Example: US1 + US2
+
+```
+After Phase 2 completes:
+
+Track A (US1):
+  T004 — Create QueryExpansionService
+  T005 — (no change to NeurosurgeryRetriever)
+  T006 — Wire into ChatService
+
+Track B (US2):
+  T007 — Create CitationValidatorService
+  T008 — Modify buildContextPrompt() → LabelledContext
+  T009 — Update system prompt
+  T010 — Wire CitationValidatorService into sendMessage()
+  T011 — Add refLabel to buildSources()
+  T012 — Thread LabelledContext through call chain
+
+Merge point: Both tracks modify ChatService — coordinate T006 and T012
+to avoid conflicts (implement T006 first or use feature branches).
+```
+
+---
+
+## Implementation Strategy
+
+### MVP First (User Stories 1 + 2 — both P1)
+
+1. Complete Phase 1: Setup (T001)
+2. Complete Phase 2: Foundational (T002, T003)
+3. Complete Phase 3: US1 — query expansion (T004–T006)
+4. **VALIDATE**: Ask a lay-language question; confirm relevant clinical passages are retrieved
+5. Complete Phase 4: US2 — citation grounding (T007–T012)
+6. **VALIDATE**: Confirm no `[Sx]` label appears in the answer that wasn't in the retrieved set
+7. **STOP and DEMO**: Both P1 stories deliver the core reliability improvements
+
+### Incremental Delivery
+
+1. Phase 1 + 2 → infrastructure ready
+2. Phase 3 → vocabulary mismatch fixed → demo-able
+3. Phase 4 → citation hallucination fixed → demo-able
+4. Phase 5 → citation badges in UI → UX polish
+5. Phase 6 → README + logging → PR-ready
+
+---
+
+## Notes
+
+- `ChatService` is modified by both US1 (T006) and US2 (T008–T012) — coordinate edits or implement sequentially
+- `buildContextPrompt()` changes return type from `String` to `LabelledContext` (T008) — update all callers in the same task
+- The system prompt change (T009) is a one-line string edit inside `ChatService`; no separate class needed
+- `CitationValidatorService` operates purely on strings — no DB or AI dependency, easy to unit-test manually
+- US3 frontend tasks (T013–T014) are entirely in `ChatMessage.vue` — no backend change