enhance rag retrieval + summary
This commit is contained in:
@@ -0,0 +1,250 @@
|
||||
# Tasks: RAG Retrieval Quality Improvements
|
||||
|
||||
**Input**: Design documents from `/specs/004-rag-retrieval-quality/`
|
||||
**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/ ✅
|
||||
|
||||
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
|
||||
**Tests**: Not requested in spec — no test tasks generated.
|
||||
|
||||
## Format: `[ID] [P?] [Story] Description`
|
||||
|
||||
- **[P]**: Can run in parallel (different files, no dependencies)
|
||||
- **[Story]**: Which user story this task belongs to (US1, US2, US3)
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Setup (Shared Infrastructure)
|
||||
|
||||
**Purpose**: No new project structure needed — services are added to the existing `retrieval/` package. This phase is a single verification step.
|
||||
|
||||
- [x] T001 Verify active branch is `004-rag-retrieval-quality` and `backend/src/main/java/com/aiteacher/retrieval/` exists
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Foundational (Blocking Prerequisites)
|
||||
|
||||
**Purpose**: Two lightweight value objects shared by both user stories.
|
||||
|
||||
**⚠️ CRITICAL**: Both user story phases depend on these records being present.
|
||||
|
||||
- [x] T002 Create `ExpandedQuery` record in `backend/src/main/java/com/aiteacher/retrieval/ExpandedQuery.java` with fields `String original` and `String rewritten`
|
||||
- [x] T003 [P] Create `LabelledContext` record in `backend/src/main/java/com/aiteacher/retrieval/LabelledContext.java` with fields `Map<String, SectionEntity> sectionLabels`, `Map<String, FigureEntity> figureLabels`, and `String promptText`
|
||||
|
||||
**Checkpoint**: Foundation ready — US1 and US2 implementation can begin in parallel
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: User Story 1 — Accurate Retrieval Despite Different Terminology (Priority: P1) 🎯 MVP
|
||||
|
||||
**Goal**: Before each retrieval call, rewrite the user's question into clinical terminology so that vector search finds relevant sections even when the user uses lay language.
|
||||
|
||||
**Independent Test**: Ask "what happens after cutting the skull?" — verify retrieved sections contain content about craniotomy without that word appearing in the query.
|
||||
|
||||
### Implementation for User Story 1
|
||||
|
||||
- [x] T004 [US1] Create `QueryExpansionService` in `backend/src/main/java/com/aiteacher/retrieval/QueryExpansionService.java`:
|
||||
- Constructor-inject `ChatClient`
|
||||
- Method `expand(String query): ExpandedQuery`
|
||||
- LLM prompt: *"Rewrite the following question using precise medical/surgical terminology as it would appear in a neurosurgery textbook index. Output only the rewritten question, nothing else. Question: {query}"*
|
||||
- Return `new ExpandedQuery(query, rewrittenText)`
|
||||
- Annotate with `@Service`
|
||||
|
||||
- [x] T005 [US1] Modify `NeurosurgeryRetriever.retrieve()` in `backend/src/main/java/com/aiteacher/retrieval/NeurosurgeryRetriever.java`:
|
||||
- Change method signature from `retrieve(String query, UUID bookId)` to `retrieve(String query, UUID bookId)` — no signature change; just use `query` for vector search (already correct; no change needed here unless query is pre-expanded by caller)
|
||||
- *Note*: expansion is done in ChatService before calling retrieve, so no change to NeurosurgeryRetriever is required
|
||||
|
||||
- [x] T006 [US1] Modify `ChatService` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
|
||||
- Constructor-inject `QueryExpansionService`
|
||||
- In `sendMessage()`, call `queryExpansionService.expand(fullQuestion)` before the retrieval loop
|
||||
- Pass `expandedQuery.rewritten()` to `retriever.retrieve()` instead of `fullQuestion`
|
||||
- Keep passing `fullQuestion` (original) to `buildContextPrompt()` so the QUESTION block shown to the model reflects what the user actually asked
|
||||
|
||||
**Checkpoint**: User Story 1 fully functional — retrieval now uses clinically rewritten queries
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: User Story 2 — Grounded Citation in Generated Answers (Priority: P1)
|
||||
|
||||
**Goal**: Tag all retrieved sections and figures with short ref-labels (`[S1]`, `[F1]`…) in the prompt, instruct the model to cite only those labels, then post-process the answer to strip any citation referencing a label that was not provided.
|
||||
|
||||
**Independent Test**: Trigger a question where only sections S1–S3 are retrieved. Verify the generated answer contains no citation outside that set, and the `sources` list in the response carries `refLabel` fields.
|
||||
|
||||
### Implementation for User Story 2
|
||||
|
||||
- [x] T007 [US2] Create `CitationValidatorService` in `backend/src/main/java/com/aiteacher/retrieval/CitationValidatorService.java`:
|
||||
- Annotate with `@Service`
|
||||
- Method `validate(String generatedAnswer, Set<String> validLabels): String`
|
||||
- Scan `generatedAnswer` for occurrences of `[Sn]` and `[Fn]` patterns using a regex like `\[(S|F)\d+\]`
|
||||
- Remove (or replace with empty string) any match whose label is not in `validLabels`
|
||||
- Return the cleaned answer text
|
||||
|
||||
- [x] T008 [US2] Modify `ChatService.buildContextPrompt()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
|
||||
- Change signature to return `LabelledContext` instead of `String`
|
||||
- Assign sequential labels: sections get `S1`, `S2`, …; figures get `F1`, `F2`, …
|
||||
- Prefix each section block with its label: `[S1] Section Title, p.N\n{fullText}\n\n`
|
||||
- Prefix each figure line with its label: `[F1] Fig. X (p.N): caption`
|
||||
- Populate `sectionLabels` and `figureLabels` maps in the returned `LabelledContext`
|
||||
- Store the full formatted prompt in `LabelledContext.promptText()`
|
||||
|
||||
- [x] T009 [US2] Update system prompt constant in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
|
||||
- Replace the citation rule *"Cite sources for each major point (book title and page number from the context)"* with: *"Cite claims using ONLY the reference labels provided in the context (e.g. [S1], [F2]). Do not invent page numbers, section titles, or labels not present in the CONTEXT block."*
|
||||
|
||||
- [x] T010 [US2] Wire `CitationValidatorService` into `ChatService.sendMessage()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
|
||||
- Constructor-inject `CitationValidatorService`
|
||||
- After the `chatClient.prompt()...call().content()` call, pass `assistantContent` and the label set from `LabelledContext` to `citationValidatorService.validate()`
|
||||
- Use the validated string as `assistantContent` going forward
|
||||
|
||||
- [x] T011 [US2] Modify `buildSources()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java`:
|
||||
- Accept the `LabelledContext` (or its two maps) as an additional parameter
|
||||
- Add `"refLabel"` entry to each source map: e.g. `source.put("refLabel", "S1")` for sections, `source.put("refLabel", "F1")` for figures
|
||||
- Keep all other existing fields unchanged
|
||||
|
||||
- [x] T012 [US2] Update `sendMessage()` call chain in `backend/src/main/java/com/aiteacher/chat/ChatService.java` to thread `LabelledContext` through steps T008–T011:
|
||||
- `LabelledContext ctx = buildContextPrompt(fullQuestion, allSections, allFigures)`
|
||||
- Pass `ctx.promptText()` to the LLM call
|
||||
- Pass `ctx` label maps to `validate()` and `buildSources()`
|
||||
|
||||
**Checkpoint**: User Stories 1 and 2 both fully functional — queries are expanded, citations are grounded
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: User Story 3 — User Visibility into Retrieval Confidence (Priority: P2)
|
||||
|
||||
**Goal**: The answer text contains `[S1]`-style labels (after US2). This phase exposes them in the frontend so users can see which claim maps to which source card.
|
||||
|
||||
**Independent Test**: Send a question, receive an answer with inline `[S1]` labels visible in the rendered text, and confirm clicking/hovering the label highlights the corresponding source card.
|
||||
|
||||
**Note**: Per research.md, the backend is already complete after US1+US2. US3 is a frontend-only UX enhancement.
|
||||
|
||||
### Implementation for User Story 3
|
||||
|
||||
- [x] T013 [US3] Modify `ChatMessage.vue` in `frontend/src/components/ChatMessage.vue`:
|
||||
- Parse the answer text for `[Sn]` and `[Fn]` citation labels using a regex
|
||||
- Render each label as a styled inline badge (e.g. `<span class="citation-badge">[S1]</span>`)
|
||||
- When a badge is clicked or hovered, highlight the corresponding source card (match by `source.refLabel`)
|
||||
|
||||
- [x] T014 [US3] Update source card rendering in `frontend/src/components/ChatMessage.vue`:
|
||||
- Add a `data-ref-label` attribute to each source card element so it can be targeted by the citation badge interaction
|
||||
- Apply a visual highlight style (CSS class) when the card is active
|
||||
|
||||
**Checkpoint**: All three user stories functional — full end-to-end quality improvements delivered
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: User Story 4 — Topic Summary Persistence & History (user-requested)
|
||||
|
||||
**Goal**: Every generated topic summary is saved to the database. When a topic is selected the UI shows a numbered history list; the student can view any past summary or generate a new one.
|
||||
|
||||
**Independent Test**: Generate a summary for "Intracranial Aneurysms", reload the page, click the topic — verify "Summary #1" appears. Generate again — verify "Summary #2" appears. Click "Summary #1" — verify the original text loads without regeneration.
|
||||
|
||||
- [x] T018 Create Flyway migration `backend/src/main/resources/db/migration/V6__topic_summary.sql` — table `topic_summary` with columns: `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, `topic_id VARCHAR(100) NOT NULL`, `summary_number INT NOT NULL`, `summary TEXT NOT NULL`, `sources_json TEXT NOT NULL`, `generated_at TIMESTAMPTZ NOT NULL`
|
||||
- [x] T019 [P] [US4] Create `TopicSummaryEntity.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryEntity.java` — JPA `@Entity` mapped to table `topic_summary`; fields: `@Id UUID id`, `String topicId`, `int summaryNumber`, `String summary`, `String sourcesJson`, `Instant generatedAt`; no-arg + all-args constructor
|
||||
- [x] T02X [P] [US4] Create `SavedSummaryItem.java` record in `backend/src/main/java/com/aiteacher/topic/SavedSummaryItem.java` — fields: `UUID id`, `int summaryNumber`, `Instant generatedAt` (list-view DTO, no full text)
|
||||
- [x] T02X [US4] Create `TopicSummaryRepository.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryRepository.java` — `extends JpaRepository<TopicSummaryEntity, UUID>`; add `List<TopicSummaryEntity> findByTopicIdOrderBySummaryNumberAsc(String topicId)` and `long countByTopicId(String topicId)`
|
||||
- [x] T02X [US4] Modify `TopicSummaryResponse.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryResponse.java` — add fields `UUID id` and `int summaryNumber` to the record components
|
||||
- [x] T02X [US4] Modify `TopicSummaryService.java` in `backend/src/main/java/com/aiteacher/topic/TopicSummaryService.java` — inject `TopicSummaryRepository` and `ObjectMapper`; at end of `generateSummary()` compute `summaryNumber = (int) repository.countByTopicId(topicId) + 1`, persist a `TopicSummaryEntity` (serialise `sources` list to JSON via `objectMapper.writeValueAsString()`), and include `id` + `summaryNumber` in the returned `TopicSummaryResponse`; add `List<SavedSummaryItem> listSummaries(String topicId)` and `TopicSummaryResponse getSummary(UUID summaryId)` methods
|
||||
- [x] T02X [US4] Modify `TopicController.java` in `backend/src/main/java/com/aiteacher/topic/TopicController.java` — add `@GetMapping("/{id}/summaries")` returning `List<SavedSummaryItem>` (delegates to `listSummaries`); add `@GetMapping("/{id}/summaries/{summaryId}")` returning `TopicSummaryResponse` (delegates to `getSummary`); both return 404 via `NoSuchElementException` when topic or summary not found
|
||||
- [x] T02X [US4] Modify `topicStore.ts` in `frontend/src/stores/topicStore.ts` — add state `summaryList: SavedSummaryItem[]`; add `fetchSummaries(topicId)` action calling `GET /api/v1/topics/{topicId}/summaries`; add `fetchSummaryDetail(topicId, summaryId)` action calling `GET /api/v1/topics/{topicId}/summaries/{summaryId}` and setting `activeSummary`; clear `summaryList` when a different topic is selected
|
||||
- [x] T02X [US4] Modify `TopicsView.vue` in `frontend/src/views/TopicsView.vue` — when a topic card is clicked: (1) call `topicStore.fetchSummaries(topicId)` first; (2) if summaries exist, display a summary history list showing chips "Summary #1 · [date]", "Summary #2 · [date]", … + a "Generate New" button; (3) clicking a chip calls `fetchSummaryDetail()` and renders the saved summary in the existing panel; (4) clicking "Generate New" calls `handleGenerate()` then re-calls `fetchSummaries()` to refresh the list; (5) if no summaries exist, show only the "Generate Summary" button (current behaviour)
|
||||
|
||||
**Checkpoint**: Summary persistence fully working end-to-end. US4 independently testable.
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: Polish & Cross-Cutting Concerns
|
||||
|
||||
**Purpose**: Constitution IV compliance and cleanup.
|
||||
|
||||
- [x] T027 Update `README.md` Mermaid architecture diagram to add `QueryExpansionService` and `CitationValidatorService` to the chat pipeline flow, and the `topic_summary` table to the data diagram (required by Constitution Principle IV — must be in the same PR)
|
||||
- [x] T028 [P] Log the expanded query at DEBUG level in `QueryExpansionService` (e.g. `log.debug("Query expanded: '{}' → '{}'", original, rewritten)`) for observability
|
||||
- [x] T029 [P] Log stripped citation labels at WARN level in `CitationValidatorService` when any labels are removed (e.g. `log.warn("Stripped hallucinated citations: {}", removedLabels)`)
|
||||
|
||||
---
|
||||
|
||||
## Dependencies & Execution Order
|
||||
|
||||
### Phase Dependencies
|
||||
|
||||
- **Phase 1 (Setup)**: No dependencies — start immediately
|
||||
- **Phase 2 (Foundational)**: Depends on Phase 1 — blocks all user story phases
|
||||
- **Phase 3 (US1)**: Depends on Phase 2 (needs `ExpandedQuery`)
|
||||
- **Phase 4 (US2)**: Depends on Phase 2 (needs `LabelledContext`); can run in parallel with Phase 3
|
||||
- **Phase 5 (US3)**: Depends on Phase 4 (needs `refLabel` in sources)
|
||||
- **Phase 6 (US4)**: No dependency on Phase 2 for the migration (T018); entity/service work (T019+) depends on T018
|
||||
- **Phase 7 (Polish)**: Depends on all implementation phases complete
|
||||
|
||||
### User Story Dependencies
|
||||
|
||||
- **User Story 1 (P1)**: Depends on Phase 2 only — no dependency on US2 or US3
|
||||
- **User Story 2 (P1)**: Depends on Phase 2 only — can run in parallel with US1
|
||||
- **User Story 3 (P2)**: Depends on US2 (needs `refLabel` in the API response)
|
||||
- **User Story 4**: Independent of US1–US3 — can start immediately after T018 migration
|
||||
|
||||
### Within Each User Story
|
||||
|
||||
- T004 → T006 (QueryExpansionService must exist before ChatService wiring)
|
||||
- T007 → T010 → T012 (CitationValidatorService → wire into sendMessage → thread context)
|
||||
- T008 → T012 (LabelledContext must be built before threading through)
|
||||
- T013 → T014 (badge rendering before card targeting)
|
||||
|
||||
### Parallel Opportunities
|
||||
|
||||
- T002 and T003 (Phase 2) can run in parallel — different files
|
||||
- Phase 3 (US1) and Phase 4 (US2) can run in parallel after Phase 2 — all different files
|
||||
- T015, T016, T017 (Polish) can run in parallel — different files
|
||||
|
||||
---
|
||||
|
||||
## Parallel Example: US1 + US2
|
||||
|
||||
```
|
||||
After Phase 2 completes:
|
||||
|
||||
Track A (US1):
|
||||
T004 — Create QueryExpansionService
|
||||
T005 — (no change to NeurosurgeryRetriever)
|
||||
T006 — Wire into ChatService
|
||||
|
||||
Track B (US2):
|
||||
T007 — Create CitationValidatorService
|
||||
T008 — Modify buildContextPrompt() → LabelledContext
|
||||
T009 — Update system prompt
|
||||
T010 — Wire CitationValidatorService into sendMessage()
|
||||
T011 — Add refLabel to buildSources()
|
||||
T012 — Thread LabelledContext through call chain
|
||||
|
||||
Merge point: Both tracks modify ChatService — coordinate T006 and T012
|
||||
to avoid conflicts (implement T006 first or use feature branches).
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Strategy
|
||||
|
||||
### MVP First (User Stories 1 + 2 — both P1)
|
||||
|
||||
1. Complete Phase 1: Setup (T001)
|
||||
2. Complete Phase 2: Foundational (T002, T003)
|
||||
3. Complete Phase 3: US1 — query expansion (T004–T006)
|
||||
4. **VALIDATE**: Ask a lay-language question; confirm relevant clinical passages are retrieved
|
||||
5. Complete Phase 4: US2 — citation grounding (T007–T012)
|
||||
6. **VALIDATE**: Confirm no `[Sx]` label appears in the answer that wasn't in the retrieved set
|
||||
7. **STOP and DEMO**: Both P1 stories deliver the core reliability improvements
|
||||
|
||||
### Incremental Delivery
|
||||
|
||||
1. Phase 1 + 2 → infrastructure ready
|
||||
2. Phase 3 → vocabulary mismatch fixed → demo-able
|
||||
3. Phase 4 → citation hallucination fixed → demo-able
|
||||
4. Phase 5 → citation badges in UI → UX polish
|
||||
5. Phase 6 → README + logging → PR-ready
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- `ChatService` is modified by both US1 (T006) and US2 (T008–T012) — coordinate edits or implement sequentially
|
||||
- `buildContextPrompt()` changes return type from `String` to `LabelledContext` (T008) — update all callers in the same task
|
||||
- The system prompt change (T009) is a one-line string edit inside `ChatService`; no separate class needed
|
||||
- `CitationValidatorService` operates purely on strings — no DB or AI dependency, easy to unit-test manually
|
||||
- US3 frontend tasks (T013–T014) are entirely in `ChatMessage.vue` — no backend change
|
||||
Reference in New Issue
Block a user