Files
ai-teacher/specs/004-rag-retrieval-quality/tasks.md
T
2026-04-07 22:39:28 +02:00

16 KiB
Raw Blame History

Tasks: RAG Retrieval Quality Improvements

Input: Design documents from /specs/004-rag-retrieval-quality/ Prerequisites: plan.md , spec.md , research.md , data-model.md , contracts/

Organization: Tasks are grouped by user story to enable independent implementation and testing of each story. Tests: Not requested in spec — no test tasks generated.

Format: [ID] [P?] [Story] Description

  • [P]: Can run in parallel (different files, no dependencies)
  • [Story]: Which user story this task belongs to (US1, US2, US3)

Phase 1: Setup (Shared Infrastructure)

Purpose: No new project structure needed — services are added to the existing retrieval/ package. This phase is a single verification step.

  • T001 Verify active branch is 004-rag-retrieval-quality and backend/src/main/java/com/aiteacher/retrieval/ exists

Phase 2: Foundational (Blocking Prerequisites)

Purpose: Two lightweight value objects shared by both user stories.

⚠️ CRITICAL: Both user story phases depend on these records being present.

  • T002 Create ExpandedQuery record in backend/src/main/java/com/aiteacher/retrieval/ExpandedQuery.java with fields String original and String rewritten
  • T003 [P] Create LabelledContext record in backend/src/main/java/com/aiteacher/retrieval/LabelledContext.java with fields Map<String, SectionEntity> sectionLabels, Map<String, FigureEntity> figureLabels, and String promptText

Checkpoint: Foundation ready — US1 and US2 implementation can begin in parallel


Phase 3: User Story 1 — Accurate Retrieval Despite Different Terminology (Priority: P1) 🎯 MVP

Goal: Before each retrieval call, rewrite the user's question into clinical terminology so that vector search finds relevant sections even when the user uses lay language.

Independent Test: Ask "what happens after cutting the skull?" — verify retrieved sections contain content about craniotomy without that word appearing in the query.

Implementation for User Story 1

  • T004 [US1] Create QueryExpansionService in backend/src/main/java/com/aiteacher/retrieval/QueryExpansionService.java:

    • Constructor-inject ChatClient
    • Method expand(String query): ExpandedQuery
    • LLM prompt: "Rewrite the following question using precise medical/surgical terminology as it would appear in a neurosurgery textbook index. Output only the rewritten question, nothing else. Question: {query}"
    • Return new ExpandedQuery(query, rewrittenText)
    • Annotate with @Service
  • T005 [US1] Modify NeurosurgeryRetriever.retrieve() in backend/src/main/java/com/aiteacher/retrieval/NeurosurgeryRetriever.java:

    • Change method signature from retrieve(String query, UUID bookId) to retrieve(String query, UUID bookId) — no signature change; just use query for vector search (already correct; no change needed here unless query is pre-expanded by caller)
    • Note: expansion is done in ChatService before calling retrieve, so no change to NeurosurgeryRetriever is required
  • T006 [US1] Modify ChatService in backend/src/main/java/com/aiteacher/chat/ChatService.java:

    • Constructor-inject QueryExpansionService
    • In sendMessage(), call queryExpansionService.expand(fullQuestion) before the retrieval loop
    • Pass expandedQuery.rewritten() to retriever.retrieve() instead of fullQuestion
    • Keep passing fullQuestion (original) to buildContextPrompt() so the QUESTION block shown to the model reflects what the user actually asked

Checkpoint: User Story 1 fully functional — retrieval now uses clinically rewritten queries


Phase 4: User Story 2 — Grounded Citation in Generated Answers (Priority: P1)

Goal: Tag all retrieved sections and figures with short ref-labels ([S1], [F1]…) in the prompt, instruct the model to cite only those labels, then post-process the answer to strip any citation referencing a label that was not provided.

Independent Test: Trigger a question where only sections S1S3 are retrieved. Verify the generated answer contains no citation outside that set, and the sources list in the response carries refLabel fields.

Implementation for User Story 2

  • T007 [US2] Create CitationValidatorService in backend/src/main/java/com/aiteacher/retrieval/CitationValidatorService.java:

    • Annotate with @Service
    • Method validate(String generatedAnswer, Set<String> validLabels): String
    • Scan generatedAnswer for occurrences of [Sn] and [Fn] patterns using a regex like \[(S|F)\d+\]
    • Remove (or replace with empty string) any match whose label is not in validLabels
    • Return the cleaned answer text
  • T008 [US2] Modify ChatService.buildContextPrompt() in backend/src/main/java/com/aiteacher/chat/ChatService.java:

    • Change signature to return LabelledContext instead of String
    • Assign sequential labels: sections get S1, S2, …; figures get F1, F2, …
    • Prefix each section block with its label: [S1] Section Title, p.N\n{fullText}\n\n
    • Prefix each figure line with its label: [F1] Fig. X (p.N): caption
    • Populate sectionLabels and figureLabels maps in the returned LabelledContext
    • Store the full formatted prompt in LabelledContext.promptText()
  • T009 [US2] Update system prompt constant in backend/src/main/java/com/aiteacher/chat/ChatService.java:

    • Replace the citation rule "Cite sources for each major point (book title and page number from the context)" with: "Cite claims using ONLY the reference labels provided in the context (e.g. [S1], [F2]). Do not invent page numbers, section titles, or labels not present in the CONTEXT block."
  • T010 [US2] Wire CitationValidatorService into ChatService.sendMessage() in backend/src/main/java/com/aiteacher/chat/ChatService.java:

    • Constructor-inject CitationValidatorService
    • After the chatClient.prompt()...call().content() call, pass assistantContent and the label set from LabelledContext to citationValidatorService.validate()
    • Use the validated string as assistantContent going forward
  • T011 [US2] Modify buildSources() in backend/src/main/java/com/aiteacher/chat/ChatService.java:

    • Accept the LabelledContext (or its two maps) as an additional parameter
    • Add "refLabel" entry to each source map: e.g. source.put("refLabel", "S1") for sections, source.put("refLabel", "F1") for figures
    • Keep all other existing fields unchanged
  • T012 [US2] Update sendMessage() call chain in backend/src/main/java/com/aiteacher/chat/ChatService.java to thread LabelledContext through steps T008T011:

    • LabelledContext ctx = buildContextPrompt(fullQuestion, allSections, allFigures)
    • Pass ctx.promptText() to the LLM call
    • Pass ctx label maps to validate() and buildSources()

Checkpoint: User Stories 1 and 2 both fully functional — queries are expanded, citations are grounded


Phase 5: User Story 3 — User Visibility into Retrieval Confidence (Priority: P2)

Goal: The answer text contains [S1]-style labels (after US2). This phase exposes them in the frontend so users can see which claim maps to which source card.

Independent Test: Send a question, receive an answer with inline [S1] labels visible in the rendered text, and confirm clicking/hovering the label highlights the corresponding source card.

Note: Per research.md, the backend is already complete after US1+US2. US3 is a frontend-only UX enhancement.

Implementation for User Story 3

  • T013 [US3] Modify ChatMessage.vue in frontend/src/components/ChatMessage.vue:

    • Parse the answer text for [Sn] and [Fn] citation labels using a regex
    • Render each label as a styled inline badge (e.g. <span class="citation-badge">[S1]</span>)
    • When a badge is clicked or hovered, highlight the corresponding source card (match by source.refLabel)
  • T014 [US3] Update source card rendering in frontend/src/components/ChatMessage.vue:

    • Add a data-ref-label attribute to each source card element so it can be targeted by the citation badge interaction
    • Apply a visual highlight style (CSS class) when the card is active

Checkpoint: All three user stories functional — full end-to-end quality improvements delivered


Phase 6: User Story 4 — Topic Summary Persistence & History (user-requested)

Goal: Every generated topic summary is saved to the database. When a topic is selected the UI shows a numbered history list; the student can view any past summary or generate a new one.

Independent Test: Generate a summary for "Intracranial Aneurysms", reload the page, click the topic — verify "Summary #1" appears. Generate again — verify "Summary #2" appears. Click "Summary #1" — verify the original text loads without regeneration.

  • T018 Create Flyway migration backend/src/main/resources/db/migration/V6__topic_summary.sql — table topic_summary with columns: id UUID PRIMARY KEY DEFAULT gen_random_uuid(), topic_id VARCHAR(100) NOT NULL, summary_number INT NOT NULL, summary TEXT NOT NULL, sources_json TEXT NOT NULL, generated_at TIMESTAMPTZ NOT NULL
  • T019 [P] [US4] Create TopicSummaryEntity.java in backend/src/main/java/com/aiteacher/topic/TopicSummaryEntity.java — JPA @Entity mapped to table topic_summary; fields: @Id UUID id, String topicId, int summaryNumber, String summary, String sourcesJson, Instant generatedAt; no-arg + all-args constructor
  • T02X [P] [US4] Create SavedSummaryItem.java record in backend/src/main/java/com/aiteacher/topic/SavedSummaryItem.java — fields: UUID id, int summaryNumber, Instant generatedAt (list-view DTO, no full text)
  • T02X [US4] Create TopicSummaryRepository.java in backend/src/main/java/com/aiteacher/topic/TopicSummaryRepository.javaextends JpaRepository<TopicSummaryEntity, UUID>; add List<TopicSummaryEntity> findByTopicIdOrderBySummaryNumberAsc(String topicId) and long countByTopicId(String topicId)
  • T02X [US4] Modify TopicSummaryResponse.java in backend/src/main/java/com/aiteacher/topic/TopicSummaryResponse.java — add fields UUID id and int summaryNumber to the record components
  • T02X [US4] Modify TopicSummaryService.java in backend/src/main/java/com/aiteacher/topic/TopicSummaryService.java — inject TopicSummaryRepository and ObjectMapper; at end of generateSummary() compute summaryNumber = (int) repository.countByTopicId(topicId) + 1, persist a TopicSummaryEntity (serialise sources list to JSON via objectMapper.writeValueAsString()), and include id + summaryNumber in the returned TopicSummaryResponse; add List<SavedSummaryItem> listSummaries(String topicId) and TopicSummaryResponse getSummary(UUID summaryId) methods
  • T02X [US4] Modify TopicController.java in backend/src/main/java/com/aiteacher/topic/TopicController.java — add @GetMapping("/{id}/summaries") returning List<SavedSummaryItem> (delegates to listSummaries); add @GetMapping("/{id}/summaries/{summaryId}") returning TopicSummaryResponse (delegates to getSummary); both return 404 via NoSuchElementException when topic or summary not found
  • T02X [US4] Modify topicStore.ts in frontend/src/stores/topicStore.ts — add state summaryList: SavedSummaryItem[]; add fetchSummaries(topicId) action calling GET /api/v1/topics/{topicId}/summaries; add fetchSummaryDetail(topicId, summaryId) action calling GET /api/v1/topics/{topicId}/summaries/{summaryId} and setting activeSummary; clear summaryList when a different topic is selected
  • T02X [US4] Modify TopicsView.vue in frontend/src/views/TopicsView.vue — when a topic card is clicked: (1) call topicStore.fetchSummaries(topicId) first; (2) if summaries exist, display a summary history list showing chips "Summary #1 · [date]", "Summary #2 · [date]", … + a "Generate New" button; (3) clicking a chip calls fetchSummaryDetail() and renders the saved summary in the existing panel; (4) clicking "Generate New" calls handleGenerate() then re-calls fetchSummaries() to refresh the list; (5) if no summaries exist, show only the "Generate Summary" button (current behaviour)

Checkpoint: Summary persistence fully working end-to-end. US4 independently testable.


Phase 7: Polish & Cross-Cutting Concerns

Purpose: Constitution IV compliance and cleanup.

  • T027 Update README.md Mermaid architecture diagram to add QueryExpansionService and CitationValidatorService to the chat pipeline flow, and the topic_summary table to the data diagram (required by Constitution Principle IV — must be in the same PR)
  • T028 [P] Log the expanded query at DEBUG level in QueryExpansionService (e.g. log.debug("Query expanded: '{}' → '{}'", original, rewritten)) for observability
  • T029 [P] Log stripped citation labels at WARN level in CitationValidatorService when any labels are removed (e.g. log.warn("Stripped hallucinated citations: {}", removedLabels))

Dependencies & Execution Order

Phase Dependencies

  • Phase 1 (Setup): No dependencies — start immediately
  • Phase 2 (Foundational): Depends on Phase 1 — blocks all user story phases
  • Phase 3 (US1): Depends on Phase 2 (needs ExpandedQuery)
  • Phase 4 (US2): Depends on Phase 2 (needs LabelledContext); can run in parallel with Phase 3
  • Phase 5 (US3): Depends on Phase 4 (needs refLabel in sources)
  • Phase 6 (US4): No dependency on Phase 2 for the migration (T018); entity/service work (T019+) depends on T018
  • Phase 7 (Polish): Depends on all implementation phases complete

User Story Dependencies

  • User Story 1 (P1): Depends on Phase 2 only — no dependency on US2 or US3
  • User Story 2 (P1): Depends on Phase 2 only — can run in parallel with US1
  • User Story 3 (P2): Depends on US2 (needs refLabel in the API response)
  • User Story 4: Independent of US1US3 — can start immediately after T018 migration

Within Each User Story

  • T004 → T006 (QueryExpansionService must exist before ChatService wiring)
  • T007 → T010 → T012 (CitationValidatorService → wire into sendMessage → thread context)
  • T008 → T012 (LabelledContext must be built before threading through)
  • T013 → T014 (badge rendering before card targeting)

Parallel Opportunities

  • T002 and T003 (Phase 2) can run in parallel — different files
  • Phase 3 (US1) and Phase 4 (US2) can run in parallel after Phase 2 — all different files
  • T015, T016, T017 (Polish) can run in parallel — different files

Parallel Example: US1 + US2

After Phase 2 completes:

Track A (US1):
  T004 — Create QueryExpansionService
  T005 — (no change to NeurosurgeryRetriever)
  T006 — Wire into ChatService

Track B (US2):
  T007 — Create CitationValidatorService
  T008 — Modify buildContextPrompt() → LabelledContext
  T009 — Update system prompt
  T010 — Wire CitationValidatorService into sendMessage()
  T011 — Add refLabel to buildSources()
  T012 — Thread LabelledContext through call chain

Merge point: Both tracks modify ChatService — coordinate T006 and T012
to avoid conflicts (implement T006 first or use feature branches).

Implementation Strategy

MVP First (User Stories 1 + 2 — both P1)

  1. Complete Phase 1: Setup (T001)
  2. Complete Phase 2: Foundational (T002, T003)
  3. Complete Phase 3: US1 — query expansion (T004T006)
  4. VALIDATE: Ask a lay-language question; confirm relevant clinical passages are retrieved
  5. Complete Phase 4: US2 — citation grounding (T007T012)
  6. VALIDATE: Confirm no [Sx] label appears in the answer that wasn't in the retrieved set
  7. STOP and DEMO: Both P1 stories deliver the core reliability improvements

Incremental Delivery

  1. Phase 1 + 2 → infrastructure ready
  2. Phase 3 → vocabulary mismatch fixed → demo-able
  3. Phase 4 → citation hallucination fixed → demo-able
  4. Phase 5 → citation badges in UI → UX polish
  5. Phase 6 → README + logging → PR-ready

Notes

  • ChatService is modified by both US1 (T006) and US2 (T008T012) — coordinate edits or implement sequentially
  • buildContextPrompt() changes return type from String to LabelledContext (T008) — update all callers in the same task
  • The system prompt change (T009) is a one-line string edit inside ChatService; no separate class needed
  • CitationValidatorService operates purely on strings — no DB or AI dependency, easy to unit-test manually
  • US3 frontend tasks (T013T014) are entirely in ChatMessage.vue — no backend change