ai-teacher/specs/004-rag-retrieval-quality/research.md

# Research: RAG Retrieval Quality Improvements

**Branch**: `004-rag-retrieval-quality` | **Date**: 2026-04-06

## Decision 1: Query Expansion Strategy

**Decision**: Single LLM rewrite — ask the model to restate the user's question using clinical/technical terminology before retrieval. Use the rewritten query for vector search instead of (or in addition to) the original.

**Rationale**: The simplest approach that directly addresses vocabulary mismatch. A single extra LLM call rewrites the query into the language of the documentation (clinical terms), so the embedding similarity search has a much better chance of matching. No new dependencies, no index changes.

**Alternatives considered**:
- *HyDE (Hypothetical Document Embeddings)*: Ask the model to write a hypothetical answer, then embed that. More powerful but adds latency and the answer may itself hallucinate clinical content — rejected for POC.
- *Multi-query retrieval*: Generate 3–5 alternative queries and merge results. Effective but multiplies retrieval calls and deduplication complexity — rejected (KISS).
- *Synonym dictionary*: Pre-built medical thesaurus mapping. No new dependencies but requires maintenance and won't generalise — rejected.
- *Re-ranking*: Run a cross-encoder after retrieval. Addresses ordering, not vocabulary gap — deferred.

**Implementation note**: The rewrite prompt should be a short, focused instruction: "Rewrite the following question using precise medical/surgical terminology as it would appear in a neurosurgery textbook index. Output only the rewritten question." Use the same `ChatClient` bean; no new API client needed.

---

## Decision 2: Citation Grounding Strategy

**Decision**: Tag each retrieved context section with a short ref-label (`[S1]`, `[S2]`, …, `[Fn]` for figures) in the prompt. Instruct the model to cite only using those labels. Post-process the generated answer to detect and strip any citation that does not correspond to a known label.

**Rationale**: The simplest way to make citations verifiable — all valid citation targets are enumerated in the prompt itself. Post-processing is a pure string operation; no second LLM call needed for validation.

**Alternatives considered**:
- *Ask the model to self-check citations*: Second LLM call to verify each claim. More accurate but doubles cost and latency — rejected for POC.
- *Structured output (JSON)*: Return answer as JSON with claim/source pairs. Most precise but requires significant prompt engineering and frontend changes — deferred.
- *No citation enforcement*: Let the model cite freely and show retrieved sources separately. Already the current state — rejected because it doesn't solve citation hallucination.

**Implementation note**:
- Context sections labelled `[S1] Section Title, p.N` through `[Sk]` in `buildContextPrompt()`.
- Figures labelled `[F1]`, `[F2]` etc.
- System prompt updated: "Cite claims using ONLY the reference labels [S1]…[Sk] and [F1]…[Fj] provided in the context. Do not invent page numbers or section titles."
- `CitationValidatorService` scans generated text for `[Sx]` / `[Fx]` patterns, checks each against the known label set, and removes unknown ones.
- `Message.sources` is already a `List<Map<String,Object>>`. No schema change needed — but we store the ref-label alongside each source so the frontend can correlate.

---

## Decision 3: Frontend Source Display

**Decision**: No frontend change needed for MVP. The backend already filters out hallucinated citations via `CitationValidatorService` before saving the message. The `sources` list attached to the message is already the retrieved set. The existing `ChatMessage.vue` source panel continues to show all retrieved sources (which is correct — they were all used as context).

**Rationale**: KISS. The critical correctness fix (no hallucinated citations in answer text) is fully backend-side. The sources panel shows context that was genuinely available to the model — that is accurate and useful. Linking specific claims to specific sources (inline highlighting) is a UX enhancement that can be a follow-on feature.

**Alternatives considered**:
- *Inline citation links*: Highlight `[S1]` in answer text and link to the source card. Better UX but requires markdown parsing and component changes — deferred.
- *Show only cited sources*: Filter `sources` to only those actually cited in the answer. Marginally more accurate but the uncited context is still legitimately retrieved — deferred.

---

## Decision 4: Persisting Topic Summaries

**Decision**: Add a new `topic_summary` table that stores each generated summary. Summaries are numbered sequentially per topic (summary #1, #2, …). The POST endpoint continues to generate and now also persists. A new GET endpoint lists saved summaries for a topic.

**Rationale**: The simplest approach — one new table, one new repository, minimal changes to the existing service. No caching layer, no event system. The sequential number is derived at query time via `ROW_NUMBER` or counted in the repository, keeping the schema minimal.

**Alternatives considered**:
- *Store in frontend state only (session memory)*: Already the current state — summaries are lost on reload. Rejected because the user explicitly wants persistence.
- *Store summary_number as a persisted column*: Avoids a query-time count but risks gaps/duplicates on concurrent writes. For a POC with single-user use, a `SELECT COUNT(*) + 1` approach at insert time is sufficient.
- *Versioning / soft-delete*: Overkill for a POC where the user just wants to re-read old summaries. No delete endpoint needed initially.

**Implementation note**:
- New Flyway migration `V6__topic_summary.sql`.
- `TopicSummaryEntity` (JPA `@Entity` → table `topic_summary`): `id` (UUID), `topicId` (VARCHAR FK), `summaryNumber` (INT), `summary` (TEXT), `sourcesJson` (TEXT — JSON array), `generatedAt` (TIMESTAMP).
- `summaryNumber` set at insert time: `COUNT(*) WHERE topic_id = ?` + 1.
- `TopicSummaryRepository extends JpaRepository<TopicSummaryEntity, UUID>` with a `findByTopicIdOrderBySummaryNumberAsc` query.
- `TopicSummaryService.generateSummary()` saves the result before returning it.
- `TopicController` gains `GET /api/v1/topics/{id}/summaries` returning `List<SavedSummaryResponse>` (id, summaryNumber, generatedAt — no full text, for list efficiency) and `GET /api/v1/topics/{id}/summaries/{summaryId}` for the full detail.
- Frontend: when a topic card is clicked, fetch the summary list. Show "Summary #1", "Summary #2" chips + "Generate New" button. Clicking a chip loads the full summary.