enhance rag retrieval + summary

2026-04-07 22:39:28 +02:00
parent 0cf318f0a7
commit aee6a9dfba
34 changed files with 2306 additions and 279 deletions
@@ -0,0 +1,71 @@
+# Research: RAG Retrieval Quality Improvements
+
+**Branch**: `004-rag-retrieval-quality` | **Date**: 2026-04-06
+
+## Decision 1: Query Expansion Strategy
+
+**Decision**: Single LLM rewrite — ask the model to restate the user's question using clinical/technical terminology before retrieval. Use the rewritten query for vector search instead of (or in addition to) the original.
+
+**Rationale**: The simplest approach that directly addresses vocabulary mismatch. A single extra LLM call rewrites the query into the language of the documentation (clinical terms), so the embedding similarity search has a much better chance of matching. No new dependencies, no index changes.
+
+**Alternatives considered**:
+- *HyDE (Hypothetical Document Embeddings)*: Ask the model to write a hypothetical answer, then embed that. More powerful but adds latency and the answer may itself hallucinate clinical content — rejected for POC.
+- *Multi-query retrieval*: Generate 3–5 alternative queries and merge results. Effective but multiplies retrieval calls and deduplication complexity — rejected (KISS).
+- *Synonym dictionary*: Pre-built medical thesaurus mapping. No new dependencies but requires maintenance and won't generalise — rejected.
+- *Re-ranking*: Run a cross-encoder after retrieval. Addresses ordering, not vocabulary gap — deferred.
+
+**Implementation note**: The rewrite prompt should be a short, focused instruction: "Rewrite the following question using precise medical/surgical terminology as it would appear in a neurosurgery textbook index. Output only the rewritten question." Use the same `ChatClient` bean; no new API client needed.
+
+---
+
+## Decision 2: Citation Grounding Strategy
+
+**Decision**: Tag each retrieved context section with a short ref-label (`[S1]`, `[S2]`, …, `[Fn]` for figures) in the prompt. Instruct the model to cite only using those labels. Post-process the generated answer to detect and strip any citation that does not correspond to a known label.
+
+**Rationale**: The simplest way to make citations verifiable — all valid citation targets are enumerated in the prompt itself. Post-processing is a pure string operation; no second LLM call needed for validation.
+
+**Alternatives considered**:
+- *Ask the model to self-check citations*: Second LLM call to verify each claim. More accurate but doubles cost and latency — rejected for POC.
+- *Structured output (JSON)*: Return answer as JSON with claim/source pairs. Most precise but requires significant prompt engineering and frontend changes — deferred.
+- *No citation enforcement*: Let the model cite freely and show retrieved sources separately. Already the current state — rejected because it doesn't solve citation hallucination.
+
+**Implementation note**:
+- Context sections labelled `[S1] Section Title, p.N` through `[Sk]` in `buildContextPrompt()`.
+- Figures labelled `[F1]`, `[F2]` etc.
+- System prompt updated: "Cite claims using ONLY the reference labels [S1]…[Sk] and [F1]…[Fj] provided in the context. Do not invent page numbers or section titles."
+- `CitationValidatorService` scans generated text for `[Sx]` / `[Fx]` patterns, checks each against the known label set, and removes unknown ones.
+- `Message.sources` is already a `List<Map<String,Object>>`. No schema change needed — but we store the ref-label alongside each source so the frontend can correlate.
+
+---
+
+## Decision 3: Frontend Source Display
+
+**Decision**: No frontend change needed for MVP. The backend already filters out hallucinated citations via `CitationValidatorService` before saving the message. The `sources` list attached to the message is already the retrieved set. The existing `ChatMessage.vue` source panel continues to show all retrieved sources (which is correct — they were all used as context).
+
+**Rationale**: KISS. The critical correctness fix (no hallucinated citations in answer text) is fully backend-side. The sources panel shows context that was genuinely available to the model — that is accurate and useful. Linking specific claims to specific sources (inline highlighting) is a UX enhancement that can be a follow-on feature.
+
+**Alternatives considered**:
+- *Inline citation links*: Highlight `[S1]` in answer text and link to the source card. Better UX but requires markdown parsing and component changes — deferred.
+- *Show only cited sources*: Filter `sources` to only those actually cited in the answer. Marginally more accurate but the uncited context is still legitimately retrieved — deferred.
+
+---
+
+## Decision 4: Persisting Topic Summaries
+
+**Decision**: Add a new `topic_summary` table that stores each generated summary. Summaries are numbered sequentially per topic (summary #1, #2, …). The POST endpoint continues to generate and now also persists. A new GET endpoint lists saved summaries for a topic.
+
+**Rationale**: The simplest approach — one new table, one new repository, minimal changes to the existing service. No caching layer, no event system. The sequential number is derived at query time via `ROW_NUMBER` or counted in the repository, keeping the schema minimal.
+
+**Alternatives considered**:
+- *Store in frontend state only (session memory)*: Already the current state — summaries are lost on reload. Rejected because the user explicitly wants persistence.
+- *Store summary_number as a persisted column*: Avoids a query-time count but risks gaps/duplicates on concurrent writes. For a POC with single-user use, a `SELECT COUNT(*) + 1` approach at insert time is sufficient.
+- *Versioning / soft-delete*: Overkill for a POC where the user just wants to re-read old summaries. No delete endpoint needed initially.
+
+**Implementation note**:
+- New Flyway migration `V6__topic_summary.sql`.
+- `TopicSummaryEntity` (JPA `@Entity` → table `topic_summary`): `id` (UUID), `topicId` (VARCHAR FK), `summaryNumber` (INT), `summary` (TEXT), `sourcesJson` (TEXT — JSON array), `generatedAt` (TIMESTAMP).
+- `summaryNumber` set at insert time: `COUNT(*) WHERE topic_id = ?` + 1.
+- `TopicSummaryRepository extends JpaRepository<TopicSummaryEntity, UUID>` with a `findByTopicIdOrderBySummaryNumberAsc` query.
+- `TopicSummaryService.generateSummary()` saves the result before returning it.
+- `TopicController` gains `GET /api/v1/topics/{id}/summaries` returning `List<SavedSummaryResponse>` (id, summaryNumber, generatedAt — no full text, for list efficiency) and `GET /api/v1/topics/{id}/summaries/{summaryId}` for the full detail.
+- Frontend: when a topic card is clicked, fetch the summary list. Show "Summary #1", "Summary #2" chips + "Generate New" button. Clicking a chip loads the full summary.