ai-teacher/specs/004-rag-retrieval-quality/tasks.md

# Tasks: RAG Retrieval Quality Improvements

**Input**: Design documents from `/specs/004-rag-retrieval-quality/`
**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/ ✅

**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
**Tests**: Not requested in spec — no test tasks generated.

## Format: `[ID] [P?] [Story] Description`

- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (US1, US2, US3)

---

## Phase 1: Setup (Shared Infrastructure)

**Purpose**: No new project structure needed — services are added to the existing `retrieval/` package. This phase is a single verification step.

- [x] T001 Verify active branch is `004-rag-retrieval-quality` and `backend/src/main/java/com/aiteacher/retrieval/` exists

---

## Phase 2: Foundational (Blocking Prerequisites)

**Purpose**: Two lightweight value objects shared by both user stories.

**⚠️ CRITICAL**: Both user story phases depend on these records being present.

- [x] T002 Create `ExpandedQuery` record in `backend/src/main/java/com/aiteacher/retrieval/ExpandedQuery.java` with fields `String original` and `String rewritten`
- [x] T003 [P] Create `LabelledContext` record in `backend/src/main/java/com/aiteacher/retrieval/LabelledContext.java` with fields `Map<String, SectionEntity> sectionLabels`, `Map<String, FigureEntity> figureLabels`, and `String promptText`

**Checkpoint**: Foundation ready — US1 and US2 implementation can begin in parallel

---

## Phase 3: User Story 1 — Accurate Retrieval Despite Different Terminology (Priority: P1) 🎯 MVP

**Goal**: Before each retrieval call, rewrite the user's question into clinical terminology so that vector search finds relevant sections even when the user uses lay language.

**Independent Test**: Ask "what happens after cutting the skull?" — verify retrieved sections contain content about craniotomy without that word appearing in the query.

### Implementation for User Story 1

- [x] T004 [US1] Create `QueryExpansionService` in `backend/src/main/java/com/aiteacher/retrieval/QueryExpansionService.java`:
  - Constructor-inject `ChatClient`
  - Method `expand(String query): ExpandedQuery`
  - LLM prompt: *"Rewrite the following question using precise medical/surgical terminology as it would appear in a neurosurgery textbook index. Output only the rewritten question, nothing else. Question: {query}"*
  - Return `new ExpandedQuery(query, rewrittenText)`
  - Annotate with `@Service`

- [x] T005 [US1] Modify `NeurosurgeryRetriever.retrieve()` in `backend/src/main/java/com/aiteacher/retrieval/NeurosurgeryRetriever.java`:
  - Change method signature from `retrieve(String query, UUID bookId)` to `retrieve(String query, UUID bookId)` — no signature change; just use `query` for vector search (already correct; no change needed here unless query is pre-expanded by caller)
  - *Note*: expansion is done in ChatService before calling retrieve, so no change to NeurosurgeryRetriever is required

- [x] T006 [US1] Modify `ChatService` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
  - Constructor-inject `QueryExpansionService`
  - In `sendMessage()`, call `queryExpansionService.expand(fullQuestion)` before the retrieval loop
  - Pass `expandedQuery.rewritten()` to `retriever.retrieve()` instead of `fullQuestion`
  - Keep passing `fullQuestion` (original) to `buildContextPrompt()` so the QUESTION block shown to the model reflects what the user actually asked

**Checkpoint**: User Story 1 fully functional — retrieval now uses clinically rewritten queries

---

## Phase 4: User Story 2 — Grounded Citation in Generated Answers (Priority: P1)

**Goal**: Tag all retrieved sections and figures with short ref-labels (`[S1]`, `[F1]`…) in the prompt, instruct the model to cite only those labels, then post-process the answer to strip any citation referencing a label that was not provided.

**Independent Test**: Trigger a question where only sections S1–S3 are retrieved. Verify the generated answer contains no citation outside that set, and the `sources` list in the response carries `refLabel` fields.

### Implementation for User Story 2

- [x] T007 [US2] Create `CitationValidatorService` in `backend/src/main/java/com/aiteacher/retrieval/CitationValidatorService.java`:
  - Annotate with `@Service`
  - Method `validate(String generatedAnswer, Set<String> validLabels): String`
  - Scan `generatedAnswer` for occurrences of `[Sn]` and `[Fn]` patterns using a regex like `\[(S|F)\d+\]`
  - Remove (or replace with empty string) any match whose label is not in `validLabels`
  - Return the cleaned answer text

- [x] T008 [US2] Modify `ChatService.buildContextPrompt()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
  - Change signature to return `LabelledContext` instead of `String`
  - Assign sequential labels: sections get `S1`, `S2`, …; figures get `F1`, `F2`, …
  - Prefix each section block with its label: `[S1] Section Title, p.N\n{fullText}\n\n`
  - Prefix each figure line with its label: `[F1] Fig. X (p.N): caption`
  - Populate `sectionLabels` and `figureLabels` maps in the returned `LabelledContext`
  - Store the full formatted prompt in `LabelledContext.promptText()`

- [x] T009 [US2] Update system prompt constant in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
  - Replace the citation rule *"Cite sources for each major point (book title and page number from the context)"* with: *"Cite claims using ONLY the reference labels provided in the context (e.g. [S1], [F2]). Do not invent page numbers, section titles, or labels not present in the CONTEXT block."*

- [x] T010 [US2] Wire `CitationValidatorService` into `ChatService.sendMessage()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
  - Constructor-inject `CitationValidatorService`
  - After the `chatClient.prompt()...call().content()` call, pass `assistantContent` and the label set from `LabelledContext` to `citationValidatorService.validate()`
  - Use the validated string as `assistantContent` going forward

- [x] T011 [US2] Modify `buildSources()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
  - Accept the `LabelledContext` (or its two maps) as an additional parameter
  - Add `"refLabel"` entry to each source map: e.g. `source.put("refLabel", "S1")` for sections, `source.put("refLabel", "F1")` for figures
  - Keep all other existing fields unchanged

- [x] T012 [US2] Update `sendMessage()` call chain in `backend/src/main/java/com/aiteacher/chat/ChatService.java` to thread `LabelledContext` through steps T008–T011:
  - `LabelledContext ctx = buildContextPrompt(fullQuestion, allSections, allFigures)`
  - Pass `ctx.promptText()` to the LLM call
  - Pass `ctx` label maps to `validate()` and `buildSources()`

**Checkpoint**: User Stories 1 and 2 both fully functional — queries are expanded, citations are grounded

---

## Phase 5: User Story 3 — User Visibility into Retrieval Confidence (Priority: P2)

**Goal**: The answer text contains `[S1]`-style labels (after US2). This phase exposes them in the frontend so users can see which claim maps to which source card.

**Independent Test**: Send a question, receive an answer with inline `[S1]` labels visible in the rendered text, and confirm clicking/hovering the label highlights the corresponding source card.

**Note**: Per research.md, the backend is already complete after US1+US2. US3 is a frontend-only UX enhancement.

### Implementation for User Story 3

- [x] T013 [US3] Modify `ChatMessage.vue` in `frontend/src/components/ChatMessage.vue`:
  - Parse the answer text for `[Sn]` and `[Fn]` citation labels using a regex
  - Render each label as a styled inline badge (e.g. `<span class="citation-badge">[S1]</span>`)
  - When a badge is clicked or hovered, highlight the corresponding source card (match by `source.refLabel`)

- [x] T014 [US3] Update source card rendering in `frontend/src/components/ChatMessage.vue`:
  - Add a `data-ref-label` attribute to each source card element so it can be targeted by the citation badge interaction
  - Apply a visual highlight style (CSS class) when the card is active

**Checkpoint**: All three user stories functional — full end-to-end quality improvements delivered

---

## Phase 6: User Story 4 — Topic Summary Persistence & History (user-requested)

**Goal**: Every generated topic summary is saved to the database. When a topic is selected the UI shows a numbered history list; the student can view any past summary or generate a new one.

**Independent Test**: Generate a summary for "Intracranial Aneurysms", reload the page, click the topic — verify "Summary #1" appears. Generate again — verify "Summary #2" appears. Click "Summary #1" — verify the original text loads without regeneration.

- [x] T018 Create Flyway migration `backend/src/main/resources/db/migration/V6__topic_summary.sql` — table `topic_summary` with columns: `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, `topic_id VARCHAR(100) NOT NULL`, `summary_number INT NOT NULL`, `summary TEXT NOT NULL`, `sources_json TEXT NOT NULL`, `generated_at TIMESTAMPTZ NOT NULL`
- [x] T019 [P] [US4] Create `TopicSummaryEntity.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryEntity.java` — JPA `@Entity` mapped to table `topic_summary`; fields: `@Id UUID id`, `String topicId`, `int summaryNumber`, `String summary`, `String sourcesJson`, `Instant generatedAt`; no-arg + all-args constructor
- [x] T02X [P] [US4] Create `SavedSummaryItem.java` record in `backend/src/main/java/com/aiteacher/topic/SavedSummaryItem.java` — fields: `UUID id`, `int summaryNumber`, `Instant generatedAt` (list-view DTO, no full text)
- [x] T02X [US4] Create `TopicSummaryRepository.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryRepository.java` — `extends JpaRepository<TopicSummaryEntity, UUID>`; add `List<TopicSummaryEntity> findByTopicIdOrderBySummaryNumberAsc(String topicId)` and `long countByTopicId(String topicId)`
- [x] T02X [US4] Modify `TopicSummaryResponse.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryResponse.java` — add fields `UUID id` and `int summaryNumber` to the record components
- [x] T02X [US4] Modify `TopicSummaryService.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryService.java` — inject `TopicSummaryRepository` and `ObjectMapper`; at end of `generateSummary()` compute `summaryNumber = (int) repository.countByTopicId(topicId) + 1`, persist a `TopicSummaryEntity` (serialise `sources` list to JSON via `objectMapper.writeValueAsString()`), and include `id` + `summaryNumber` in the returned `TopicSummaryResponse`; add `List<SavedSummaryItem> listSummaries(String topicId)` and `TopicSummaryResponse getSummary(UUID summaryId)` methods
- [x] T02X [US4] Modify `TopicController.java` in `backend/src/main/java/com/aiteacher/topic/TopicController.java` — add `@GetMapping("/{id}/summaries")` returning `List<SavedSummaryItem>` (delegates to `listSummaries`); add `@GetMapping("/{id}/summaries/{summaryId}")` returning `TopicSummaryResponse` (delegates to `getSummary`); both return 404 via `NoSuchElementException` when topic or summary not found
- [x] T02X [US4] Modify `topicStore.ts` in `frontend/src/stores/topicStore.ts` — add state `summaryList: SavedSummaryItem[]`; add `fetchSummaries(topicId)` action calling `GET /api/v1/topics/{topicId}/summaries`; add `fetchSummaryDetail(topicId, summaryId)` action calling `GET /api/v1/topics/{topicId}/summaries/{summaryId}` and setting `activeSummary`; clear `summaryList` when a different topic is selected
- [x] T02X [US4] Modify `TopicsView.vue` in `frontend/src/views/TopicsView.vue` — when a topic card is clicked: (1) call `topicStore.fetchSummaries(topicId)` first; (2) if summaries exist, display a summary history list showing chips "Summary #1 · [date]", "Summary #2 · [date]", … + a "Generate New" button; (3) clicking a chip calls `fetchSummaryDetail()` and renders the saved summary in the existing panel; (4) clicking "Generate New" calls `handleGenerate()` then re-calls `fetchSummaries()` to refresh the list; (5) if no summaries exist, show only the "Generate Summary" button (current behaviour)

**Checkpoint**: Summary persistence fully working end-to-end. US4 independently testable.

---

## Phase 7: Polish & Cross-Cutting Concerns

**Purpose**: Constitution IV compliance and cleanup.

- [x] T027 Update `README.md` Mermaid architecture diagram to add `QueryExpansionService` and `CitationValidatorService` to the chat pipeline flow, and the `topic_summary` table to the data diagram (required by Constitution Principle IV — must be in the same PR)
- [x] T028 [P] Log the expanded query at DEBUG level in `QueryExpansionService` (e.g. `log.debug("Query expanded: '{}' → '{}'", original, rewritten)`) for observability
- [x] T029 [P] Log stripped citation labels at WARN level in `CitationValidatorService` when any labels are removed (e.g. `log.warn("Stripped hallucinated citations: {}", removedLabels)`)

---

## Dependencies & Execution Order

### Phase Dependencies

- **Phase 1 (Setup)**: No dependencies — start immediately
- **Phase 2 (Foundational)**: Depends on Phase 1 — blocks all user story phases
- **Phase 3 (US1)**: Depends on Phase 2 (needs `ExpandedQuery`)
- **Phase 4 (US2)**: Depends on Phase 2 (needs `LabelledContext`); can run in parallel with Phase 3
- **Phase 5 (US3)**: Depends on Phase 4 (needs `refLabel` in sources)
- **Phase 6 (US4)**: No dependency on Phase 2 for the migration (T018); entity/service work (T019+) depends on T018
- **Phase 7 (Polish)**: Depends on all implementation phases complete

### User Story Dependencies

- **User Story 1 (P1)**: Depends on Phase 2 only — no dependency on US2 or US3
- **User Story 2 (P1)**: Depends on Phase 2 only — can run in parallel with US1
- **User Story 3 (P2)**: Depends on US2 (needs `refLabel` in the API response)
- **User Story 4**: Independent of US1–US3 — can start immediately after T018 migration

### Within Each User Story

- T004 → T006 (QueryExpansionService must exist before ChatService wiring)
- T007 → T010 → T012 (CitationValidatorService → wire into sendMessage → thread context)
- T008 → T012 (LabelledContext must be built before threading through)
- T013 → T014 (badge rendering before card targeting)

### Parallel Opportunities

- T002 and T003 (Phase 2) can run in parallel — different files
- Phase 3 (US1) and Phase 4 (US2) can run in parallel after Phase 2 — all different files
- T015, T016, T017 (Polish) can run in parallel — different files

---

## Parallel Example: US1 + US2

```
After Phase 2 completes:

Track A (US1):
  T004 — Create QueryExpansionService
  T005 — (no change to NeurosurgeryRetriever)
  T006 — Wire into ChatService

Track B (US2):
  T007 — Create CitationValidatorService
  T008 — Modify buildContextPrompt() → LabelledContext
  T009 — Update system prompt
  T010 — Wire CitationValidatorService into sendMessage()
  T011 — Add refLabel to buildSources()
  T012 — Thread LabelledContext through call chain

Merge point: Both tracks modify ChatService — coordinate T006 and T012
to avoid conflicts (implement T006 first or use feature branches).
```

---

## Implementation Strategy

### MVP First (User Stories 1 + 2 — both P1)

1. Complete Phase 1: Setup (T001)
2. Complete Phase 2: Foundational (T002, T003)
3. Complete Phase 3: US1 — query expansion (T004–T006)
4. **VALIDATE**: Ask a lay-language question; confirm relevant clinical passages are retrieved
5. Complete Phase 4: US2 — citation grounding (T007–T012)
6. **VALIDATE**: Confirm no `[Sx]` label appears in the answer that wasn't in the retrieved set
7. **STOP and DEMO**: Both P1 stories deliver the core reliability improvements

### Incremental Delivery

1. Phase 1 + 2 → infrastructure ready
2. Phase 3 → vocabulary mismatch fixed → demo-able
3. Phase 4 → citation hallucination fixed → demo-able
4. Phase 5 → citation badges in UI → UX polish
5. Phase 6 → README + logging → PR-ready

---

## Notes

- `ChatService` is modified by both US1 (T006) and US2 (T008–T012) — coordinate edits or implement sequentially
- `buildContextPrompt()` changes return type from `String` to `LabelledContext` (T008) — update all callers in the same task
- The system prompt change (T009) is a one-line string edit inside `ChatService`; no separate class needed
- `CitationValidatorService` operates purely on strings — no DB or AI dependency, easy to unit-test manually
- US3 frontend tasks (T013–T014) are entirely in `ChatMessage.vue` — no backend change