Files
2026-04-07 22:39:28 +02:00

72 lines
6.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Research: RAG Retrieval Quality Improvements
**Branch**: `004-rag-retrieval-quality` | **Date**: 2026-04-06
## Decision 1: Query Expansion Strategy
**Decision**: Single LLM rewrite — ask the model to restate the user's question using clinical/technical terminology before retrieval. Use the rewritten query for vector search instead of (or in addition to) the original.
**Rationale**: The simplest approach that directly addresses vocabulary mismatch. A single extra LLM call rewrites the query into the language of the documentation (clinical terms), so the embedding similarity search has a much better chance of matching. No new dependencies, no index changes.
**Alternatives considered**:
- *HyDE (Hypothetical Document Embeddings)*: Ask the model to write a hypothetical answer, then embed that. More powerful but adds latency and the answer may itself hallucinate clinical content — rejected for POC.
- *Multi-query retrieval*: Generate 35 alternative queries and merge results. Effective but multiplies retrieval calls and deduplication complexity — rejected (KISS).
- *Synonym dictionary*: Pre-built medical thesaurus mapping. No new dependencies but requires maintenance and won't generalise — rejected.
- *Re-ranking*: Run a cross-encoder after retrieval. Addresses ordering, not vocabulary gap — deferred.
**Implementation note**: The rewrite prompt should be a short, focused instruction: "Rewrite the following question using precise medical/surgical terminology as it would appear in a neurosurgery textbook index. Output only the rewritten question." Use the same `ChatClient` bean; no new API client needed.
---
## Decision 2: Citation Grounding Strategy
**Decision**: Tag each retrieved context section with a short ref-label (`[S1]`, `[S2]`, …, `[Fn]` for figures) in the prompt. Instruct the model to cite only using those labels. Post-process the generated answer to detect and strip any citation that does not correspond to a known label.
**Rationale**: The simplest way to make citations verifiable — all valid citation targets are enumerated in the prompt itself. Post-processing is a pure string operation; no second LLM call needed for validation.
**Alternatives considered**:
- *Ask the model to self-check citations*: Second LLM call to verify each claim. More accurate but doubles cost and latency — rejected for POC.
- *Structured output (JSON)*: Return answer as JSON with claim/source pairs. Most precise but requires significant prompt engineering and frontend changes — deferred.
- *No citation enforcement*: Let the model cite freely and show retrieved sources separately. Already the current state — rejected because it doesn't solve citation hallucination.
**Implementation note**:
- Context sections labelled `[S1] Section Title, p.N` through `[Sk]` in `buildContextPrompt()`.
- Figures labelled `[F1]`, `[F2]` etc.
- System prompt updated: "Cite claims using ONLY the reference labels [S1]…[Sk] and [F1]…[Fj] provided in the context. Do not invent page numbers or section titles."
- `CitationValidatorService` scans generated text for `[Sx]` / `[Fx]` patterns, checks each against the known label set, and removes unknown ones.
- `Message.sources` is already a `List<Map<String,Object>>`. No schema change needed — but we store the ref-label alongside each source so the frontend can correlate.
---
## Decision 3: Frontend Source Display
**Decision**: No frontend change needed for MVP. The backend already filters out hallucinated citations via `CitationValidatorService` before saving the message. The `sources` list attached to the message is already the retrieved set. The existing `ChatMessage.vue` source panel continues to show all retrieved sources (which is correct — they were all used as context).
**Rationale**: KISS. The critical correctness fix (no hallucinated citations in answer text) is fully backend-side. The sources panel shows context that was genuinely available to the model — that is accurate and useful. Linking specific claims to specific sources (inline highlighting) is a UX enhancement that can be a follow-on feature.
**Alternatives considered**:
- *Inline citation links*: Highlight `[S1]` in answer text and link to the source card. Better UX but requires markdown parsing and component changes — deferred.
- *Show only cited sources*: Filter `sources` to only those actually cited in the answer. Marginally more accurate but the uncited context is still legitimately retrieved — deferred.
---
## Decision 4: Persisting Topic Summaries
**Decision**: Add a new `topic_summary` table that stores each generated summary. Summaries are numbered sequentially per topic (summary #1, #2, …). The POST endpoint continues to generate and now also persists. A new GET endpoint lists saved summaries for a topic.
**Rationale**: The simplest approach — one new table, one new repository, minimal changes to the existing service. No caching layer, no event system. The sequential number is derived at query time via `ROW_NUMBER` or counted in the repository, keeping the schema minimal.
**Alternatives considered**:
- *Store in frontend state only (session memory)*: Already the current state — summaries are lost on reload. Rejected because the user explicitly wants persistence.
- *Store summary_number as a persisted column*: Avoids a query-time count but risks gaps/duplicates on concurrent writes. For a POC with single-user use, a `SELECT COUNT(*) + 1` approach at insert time is sufficient.
- *Versioning / soft-delete*: Overkill for a POC where the user just wants to re-read old summaries. No delete endpoint needed initially.
**Implementation note**:
- New Flyway migration `V6__topic_summary.sql`.
- `TopicSummaryEntity` (JPA `@Entity` → table `topic_summary`): `id` (UUID), `topicId` (VARCHAR FK), `summaryNumber` (INT), `summary` (TEXT), `sourcesJson` (TEXT — JSON array), `generatedAt` (TIMESTAMP).
- `summaryNumber` set at insert time: `COUNT(*) WHERE topic_id = ?` + 1.
- `TopicSummaryRepository extends JpaRepository<TopicSummaryEntity, UUID>` with a `findByTopicIdOrderBySummaryNumberAsc` query.
- `TopicSummaryService.generateSummary()` saves the result before returning it.
- `TopicController` gains `GET /api/v1/topics/{id}/summaries` returning `List<SavedSummaryResponse>` (id, summaryNumber, generatedAt — no full text, for list efficiency) and `GET /api/v1/topics/{id}/summaries/{summaryId}` for the full detail.
- Frontend: when a topic card is clicked, fetch the summary list. Show "Summary #1", "Summary #2" chips + "Generate New" button. Clicking a chip loads the full summary.