Files
ai-teacher/specs/004-rag-retrieval-quality/tasks.md
T
2026-04-07 22:39:28 +02:00

251 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Tasks: RAG Retrieval Quality Improvements
**Input**: Design documents from `/specs/004-rag-retrieval-quality/`
**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/ ✅
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
**Tests**: Not requested in spec — no test tasks generated.
## Format: `[ID] [P?] [Story] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (US1, US2, US3)
---
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: No new project structure needed — services are added to the existing `retrieval/` package. This phase is a single verification step.
- [x] T001 Verify active branch is `004-rag-retrieval-quality` and `backend/src/main/java/com/aiteacher/retrieval/` exists
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Two lightweight value objects shared by both user stories.
**⚠️ CRITICAL**: Both user story phases depend on these records being present.
- [x] T002 Create `ExpandedQuery` record in `backend/src/main/java/com/aiteacher/retrieval/ExpandedQuery.java` with fields `String original` and `String rewritten`
- [x] T003 [P] Create `LabelledContext` record in `backend/src/main/java/com/aiteacher/retrieval/LabelledContext.java` with fields `Map<String, SectionEntity> sectionLabels`, `Map<String, FigureEntity> figureLabels`, and `String promptText`
**Checkpoint**: Foundation ready — US1 and US2 implementation can begin in parallel
---
## Phase 3: User Story 1 — Accurate Retrieval Despite Different Terminology (Priority: P1) 🎯 MVP
**Goal**: Before each retrieval call, rewrite the user's question into clinical terminology so that vector search finds relevant sections even when the user uses lay language.
**Independent Test**: Ask "what happens after cutting the skull?" — verify retrieved sections contain content about craniotomy without that word appearing in the query.
### Implementation for User Story 1
- [x] T004 [US1] Create `QueryExpansionService` in `backend/src/main/java/com/aiteacher/retrieval/QueryExpansionService.java`:
- Constructor-inject `ChatClient`
- Method `expand(String query): ExpandedQuery`
- LLM prompt: *"Rewrite the following question using precise medical/surgical terminology as it would appear in a neurosurgery textbook index. Output only the rewritten question, nothing else. Question: {query}"*
- Return `new ExpandedQuery(query, rewrittenText)`
- Annotate with `@Service`
- [x] T005 [US1] Modify `NeurosurgeryRetriever.retrieve()` in `backend/src/main/java/com/aiteacher/retrieval/NeurosurgeryRetriever.java`:
- Change method signature from `retrieve(String query, UUID bookId)` to `retrieve(String query, UUID bookId)` — no signature change; just use `query` for vector search (already correct; no change needed here unless query is pre-expanded by caller)
- *Note*: expansion is done in ChatService before calling retrieve, so no change to NeurosurgeryRetriever is required
- [x] T006 [US1] Modify `ChatService` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
- Constructor-inject `QueryExpansionService`
- In `sendMessage()`, call `queryExpansionService.expand(fullQuestion)` before the retrieval loop
- Pass `expandedQuery.rewritten()` to `retriever.retrieve()` instead of `fullQuestion`
- Keep passing `fullQuestion` (original) to `buildContextPrompt()` so the QUESTION block shown to the model reflects what the user actually asked
**Checkpoint**: User Story 1 fully functional — retrieval now uses clinically rewritten queries
---
## Phase 4: User Story 2 — Grounded Citation in Generated Answers (Priority: P1)
**Goal**: Tag all retrieved sections and figures with short ref-labels (`[S1]`, `[F1]`…) in the prompt, instruct the model to cite only those labels, then post-process the answer to strip any citation referencing a label that was not provided.
**Independent Test**: Trigger a question where only sections S1S3 are retrieved. Verify the generated answer contains no citation outside that set, and the `sources` list in the response carries `refLabel` fields.
### Implementation for User Story 2
- [x] T007 [US2] Create `CitationValidatorService` in `backend/src/main/java/com/aiteacher/retrieval/CitationValidatorService.java`:
- Annotate with `@Service`
- Method `validate(String generatedAnswer, Set<String> validLabels): String`
- Scan `generatedAnswer` for occurrences of `[Sn]` and `[Fn]` patterns using a regex like `\[(S|F)\d+\]`
- Remove (or replace with empty string) any match whose label is not in `validLabels`
- Return the cleaned answer text
- [x] T008 [US2] Modify `ChatService.buildContextPrompt()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
- Change signature to return `LabelledContext` instead of `String`
- Assign sequential labels: sections get `S1`, `S2`, …; figures get `F1`, `F2`, …
- Prefix each section block with its label: `[S1] Section Title, p.N\n{fullText}\n\n`
- Prefix each figure line with its label: `[F1] Fig. X (p.N): caption`
- Populate `sectionLabels` and `figureLabels` maps in the returned `LabelledContext`
- Store the full formatted prompt in `LabelledContext.promptText()`
- [x] T009 [US2] Update system prompt constant in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
- Replace the citation rule *"Cite sources for each major point (book title and page number from the context)"* with: *"Cite claims using ONLY the reference labels provided in the context (e.g. [S1], [F2]). Do not invent page numbers, section titles, or labels not present in the CONTEXT block."*
- [x] T010 [US2] Wire `CitationValidatorService` into `ChatService.sendMessage()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
- Constructor-inject `CitationValidatorService`
- After the `chatClient.prompt()...call().content()` call, pass `assistantContent` and the label set from `LabelledContext` to `citationValidatorService.validate()`
- Use the validated string as `assistantContent` going forward
- [x] T011 [US2] Modify `buildSources()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
- Accept the `LabelledContext` (or its two maps) as an additional parameter
- Add `"refLabel"` entry to each source map: e.g. `source.put("refLabel", "S1")` for sections, `source.put("refLabel", "F1")` for figures
- Keep all other existing fields unchanged
- [x] T012 [US2] Update `sendMessage()` call chain in `backend/src/main/java/com/aiteacher/chat/ChatService.java` to thread `LabelledContext` through steps T008T011:
- `LabelledContext ctx = buildContextPrompt(fullQuestion, allSections, allFigures)`
- Pass `ctx.promptText()` to the LLM call
- Pass `ctx` label maps to `validate()` and `buildSources()`
**Checkpoint**: User Stories 1 and 2 both fully functional — queries are expanded, citations are grounded
---
## Phase 5: User Story 3 — User Visibility into Retrieval Confidence (Priority: P2)
**Goal**: The answer text contains `[S1]`-style labels (after US2). This phase exposes them in the frontend so users can see which claim maps to which source card.
**Independent Test**: Send a question, receive an answer with inline `[S1]` labels visible in the rendered text, and confirm clicking/hovering the label highlights the corresponding source card.
**Note**: Per research.md, the backend is already complete after US1+US2. US3 is a frontend-only UX enhancement.
### Implementation for User Story 3
- [x] T013 [US3] Modify `ChatMessage.vue` in `frontend/src/components/ChatMessage.vue`:
- Parse the answer text for `[Sn]` and `[Fn]` citation labels using a regex
- Render each label as a styled inline badge (e.g. `<span class="citation-badge">[S1]</span>`)
- When a badge is clicked or hovered, highlight the corresponding source card (match by `source.refLabel`)
- [x] T014 [US3] Update source card rendering in `frontend/src/components/ChatMessage.vue`:
- Add a `data-ref-label` attribute to each source card element so it can be targeted by the citation badge interaction
- Apply a visual highlight style (CSS class) when the card is active
**Checkpoint**: All three user stories functional — full end-to-end quality improvements delivered
---
## Phase 6: User Story 4 — Topic Summary Persistence & History (user-requested)
**Goal**: Every generated topic summary is saved to the database. When a topic is selected the UI shows a numbered history list; the student can view any past summary or generate a new one.
**Independent Test**: Generate a summary for "Intracranial Aneurysms", reload the page, click the topic — verify "Summary #1" appears. Generate again — verify "Summary #2" appears. Click "Summary #1" — verify the original text loads without regeneration.
- [x] T018 Create Flyway migration `backend/src/main/resources/db/migration/V6__topic_summary.sql` — table `topic_summary` with columns: `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, `topic_id VARCHAR(100) NOT NULL`, `summary_number INT NOT NULL`, `summary TEXT NOT NULL`, `sources_json TEXT NOT NULL`, `generated_at TIMESTAMPTZ NOT NULL`
- [x] T019 [P] [US4] Create `TopicSummaryEntity.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryEntity.java` — JPA `@Entity` mapped to table `topic_summary`; fields: `@Id UUID id`, `String topicId`, `int summaryNumber`, `String summary`, `String sourcesJson`, `Instant generatedAt`; no-arg + all-args constructor
- [x] T02X [P] [US4] Create `SavedSummaryItem.java` record in `backend/src/main/java/com/aiteacher/topic/SavedSummaryItem.java` — fields: `UUID id`, `int summaryNumber`, `Instant generatedAt` (list-view DTO, no full text)
- [x] T02X [US4] Create `TopicSummaryRepository.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryRepository.java``extends JpaRepository<TopicSummaryEntity, UUID>`; add `List<TopicSummaryEntity> findByTopicIdOrderBySummaryNumberAsc(String topicId)` and `long countByTopicId(String topicId)`
- [x] T02X [US4] Modify `TopicSummaryResponse.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryResponse.java` — add fields `UUID id` and `int summaryNumber` to the record components
- [x] T02X [US4] Modify `TopicSummaryService.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryService.java` — inject `TopicSummaryRepository` and `ObjectMapper`; at end of `generateSummary()` compute `summaryNumber = (int) repository.countByTopicId(topicId) + 1`, persist a `TopicSummaryEntity` (serialise `sources` list to JSON via `objectMapper.writeValueAsString()`), and include `id` + `summaryNumber` in the returned `TopicSummaryResponse`; add `List<SavedSummaryItem> listSummaries(String topicId)` and `TopicSummaryResponse getSummary(UUID summaryId)` methods
- [x] T02X [US4] Modify `TopicController.java` in `backend/src/main/java/com/aiteacher/topic/TopicController.java` — add `@GetMapping("/{id}/summaries")` returning `List<SavedSummaryItem>` (delegates to `listSummaries`); add `@GetMapping("/{id}/summaries/{summaryId}")` returning `TopicSummaryResponse` (delegates to `getSummary`); both return 404 via `NoSuchElementException` when topic or summary not found
- [x] T02X [US4] Modify `topicStore.ts` in `frontend/src/stores/topicStore.ts` — add state `summaryList: SavedSummaryItem[]`; add `fetchSummaries(topicId)` action calling `GET /api/v1/topics/{topicId}/summaries`; add `fetchSummaryDetail(topicId, summaryId)` action calling `GET /api/v1/topics/{topicId}/summaries/{summaryId}` and setting `activeSummary`; clear `summaryList` when a different topic is selected
- [x] T02X [US4] Modify `TopicsView.vue` in `frontend/src/views/TopicsView.vue` — when a topic card is clicked: (1) call `topicStore.fetchSummaries(topicId)` first; (2) if summaries exist, display a summary history list showing chips "Summary #1 · [date]", "Summary #2 · [date]", … + a "Generate New" button; (3) clicking a chip calls `fetchSummaryDetail()` and renders the saved summary in the existing panel; (4) clicking "Generate New" calls `handleGenerate()` then re-calls `fetchSummaries()` to refresh the list; (5) if no summaries exist, show only the "Generate Summary" button (current behaviour)
**Checkpoint**: Summary persistence fully working end-to-end. US4 independently testable.
---
## Phase 7: Polish & Cross-Cutting Concerns
**Purpose**: Constitution IV compliance and cleanup.
- [x] T027 Update `README.md` Mermaid architecture diagram to add `QueryExpansionService` and `CitationValidatorService` to the chat pipeline flow, and the `topic_summary` table to the data diagram (required by Constitution Principle IV — must be in the same PR)
- [x] T028 [P] Log the expanded query at DEBUG level in `QueryExpansionService` (e.g. `log.debug("Query expanded: '{}' → '{}'", original, rewritten)`) for observability
- [x] T029 [P] Log stripped citation labels at WARN level in `CitationValidatorService` when any labels are removed (e.g. `log.warn("Stripped hallucinated citations: {}", removedLabels)`)
---
## Dependencies & Execution Order
### Phase Dependencies
- **Phase 1 (Setup)**: No dependencies — start immediately
- **Phase 2 (Foundational)**: Depends on Phase 1 — blocks all user story phases
- **Phase 3 (US1)**: Depends on Phase 2 (needs `ExpandedQuery`)
- **Phase 4 (US2)**: Depends on Phase 2 (needs `LabelledContext`); can run in parallel with Phase 3
- **Phase 5 (US3)**: Depends on Phase 4 (needs `refLabel` in sources)
- **Phase 6 (US4)**: No dependency on Phase 2 for the migration (T018); entity/service work (T019+) depends on T018
- **Phase 7 (Polish)**: Depends on all implementation phases complete
### User Story Dependencies
- **User Story 1 (P1)**: Depends on Phase 2 only — no dependency on US2 or US3
- **User Story 2 (P1)**: Depends on Phase 2 only — can run in parallel with US1
- **User Story 3 (P2)**: Depends on US2 (needs `refLabel` in the API response)
- **User Story 4**: Independent of US1US3 — can start immediately after T018 migration
### Within Each User Story
- T004 → T006 (QueryExpansionService must exist before ChatService wiring)
- T007 → T010 → T012 (CitationValidatorService → wire into sendMessage → thread context)
- T008 → T012 (LabelledContext must be built before threading through)
- T013 → T014 (badge rendering before card targeting)
### Parallel Opportunities
- T002 and T003 (Phase 2) can run in parallel — different files
- Phase 3 (US1) and Phase 4 (US2) can run in parallel after Phase 2 — all different files
- T015, T016, T017 (Polish) can run in parallel — different files
---
## Parallel Example: US1 + US2
```
After Phase 2 completes:
Track A (US1):
T004 — Create QueryExpansionService
T005 — (no change to NeurosurgeryRetriever)
T006 — Wire into ChatService
Track B (US2):
T007 — Create CitationValidatorService
T008 — Modify buildContextPrompt() → LabelledContext
T009 — Update system prompt
T010 — Wire CitationValidatorService into sendMessage()
T011 — Add refLabel to buildSources()
T012 — Thread LabelledContext through call chain
Merge point: Both tracks modify ChatService — coordinate T006 and T012
to avoid conflicts (implement T006 first or use feature branches).
```
---
## Implementation Strategy
### MVP First (User Stories 1 + 2 — both P1)
1. Complete Phase 1: Setup (T001)
2. Complete Phase 2: Foundational (T002, T003)
3. Complete Phase 3: US1 — query expansion (T004T006)
4. **VALIDATE**: Ask a lay-language question; confirm relevant clinical passages are retrieved
5. Complete Phase 4: US2 — citation grounding (T007T012)
6. **VALIDATE**: Confirm no `[Sx]` label appears in the answer that wasn't in the retrieved set
7. **STOP and DEMO**: Both P1 stories deliver the core reliability improvements
### Incremental Delivery
1. Phase 1 + 2 → infrastructure ready
2. Phase 3 → vocabulary mismatch fixed → demo-able
3. Phase 4 → citation hallucination fixed → demo-able
4. Phase 5 → citation badges in UI → UX polish
5. Phase 6 → README + logging → PR-ready
---
## Notes
- `ChatService` is modified by both US1 (T006) and US2 (T008T012) — coordinate edits or implement sequentially
- `buildContextPrompt()` changes return type from `String` to `LabelledContext` (T008) — update all callers in the same task
- The system prompt change (T009) is a one-line string edit inside `ChatService`; no separate class needed
- `CitationValidatorService` operates purely on strings — no DB or AI dependency, easy to unit-test manually
- US3 frontend tasks (T013T014) are entirely in `ChatMessage.vue` — no backend change