enhance rag retrieval + summary
This commit is contained in:
@@ -0,0 +1,71 @@
|
||||
# Research: RAG Retrieval Quality Improvements
|
||||
|
||||
**Branch**: `004-rag-retrieval-quality` | **Date**: 2026-04-06
|
||||
|
||||
## Decision 1: Query Expansion Strategy
|
||||
|
||||
**Decision**: Single LLM rewrite — ask the model to restate the user's question using clinical/technical terminology before retrieval. Use the rewritten query for vector search instead of (or in addition to) the original.
|
||||
|
||||
**Rationale**: The simplest approach that directly addresses vocabulary mismatch. A single extra LLM call rewrites the query into the language of the documentation (clinical terms), so the embedding similarity search has a much better chance of matching. No new dependencies, no index changes.
|
||||
|
||||
**Alternatives considered**:
|
||||
- *HyDE (Hypothetical Document Embeddings)*: Ask the model to write a hypothetical answer, then embed that. More powerful but adds latency and the answer may itself hallucinate clinical content — rejected for POC.
|
||||
- *Multi-query retrieval*: Generate 3–5 alternative queries and merge results. Effective but multiplies retrieval calls and deduplication complexity — rejected (KISS).
|
||||
- *Synonym dictionary*: Pre-built medical thesaurus mapping. No new dependencies but requires maintenance and won't generalise — rejected.
|
||||
- *Re-ranking*: Run a cross-encoder after retrieval. Addresses ordering, not vocabulary gap — deferred.
|
||||
|
||||
**Implementation note**: The rewrite prompt should be a short, focused instruction: "Rewrite the following question using precise medical/surgical terminology as it would appear in a neurosurgery textbook index. Output only the rewritten question." Use the same `ChatClient` bean; no new API client needed.
|
||||
|
||||
---
|
||||
|
||||
## Decision 2: Citation Grounding Strategy
|
||||
|
||||
**Decision**: Tag each retrieved context section with a short ref-label (`[S1]`, `[S2]`, …, `[Fn]` for figures) in the prompt. Instruct the model to cite only using those labels. Post-process the generated answer to detect and strip any citation that does not correspond to a known label.
|
||||
|
||||
**Rationale**: The simplest way to make citations verifiable — all valid citation targets are enumerated in the prompt itself. Post-processing is a pure string operation; no second LLM call needed for validation.
|
||||
|
||||
**Alternatives considered**:
|
||||
- *Ask the model to self-check citations*: Second LLM call to verify each claim. More accurate but doubles cost and latency — rejected for POC.
|
||||
- *Structured output (JSON)*: Return answer as JSON with claim/source pairs. Most precise but requires significant prompt engineering and frontend changes — deferred.
|
||||
- *No citation enforcement*: Let the model cite freely and show retrieved sources separately. Already the current state — rejected because it doesn't solve citation hallucination.
|
||||
|
||||
**Implementation note**:
|
||||
- Context sections labelled `[S1] Section Title, p.N` through `[Sk]` in `buildContextPrompt()`.
|
||||
- Figures labelled `[F1]`, `[F2]` etc.
|
||||
- System prompt updated: "Cite claims using ONLY the reference labels [S1]…[Sk] and [F1]…[Fj] provided in the context. Do not invent page numbers or section titles."
|
||||
- `CitationValidatorService` scans generated text for `[Sx]` / `[Fx]` patterns, checks each against the known label set, and removes unknown ones.
|
||||
- `Message.sources` is already a `List<Map<String,Object>>`. No schema change needed — but we store the ref-label alongside each source so the frontend can correlate.
|
||||
|
||||
---
|
||||
|
||||
## Decision 3: Frontend Source Display
|
||||
|
||||
**Decision**: No frontend change needed for MVP. The backend already filters out hallucinated citations via `CitationValidatorService` before saving the message. The `sources` list attached to the message is already the retrieved set. The existing `ChatMessage.vue` source panel continues to show all retrieved sources (which is correct — they were all used as context).
|
||||
|
||||
**Rationale**: KISS. The critical correctness fix (no hallucinated citations in answer text) is fully backend-side. The sources panel shows context that was genuinely available to the model — that is accurate and useful. Linking specific claims to specific sources (inline highlighting) is a UX enhancement that can be a follow-on feature.
|
||||
|
||||
**Alternatives considered**:
|
||||
- *Inline citation links*: Highlight `[S1]` in answer text and link to the source card. Better UX but requires markdown parsing and component changes — deferred.
|
||||
- *Show only cited sources*: Filter `sources` to only those actually cited in the answer. Marginally more accurate but the uncited context is still legitimately retrieved — deferred.
|
||||
|
||||
---
|
||||
|
||||
## Decision 4: Persisting Topic Summaries
|
||||
|
||||
**Decision**: Add a new `topic_summary` table that stores each generated summary. Summaries are numbered sequentially per topic (summary #1, #2, …). The POST endpoint continues to generate and now also persists. A new GET endpoint lists saved summaries for a topic.
|
||||
|
||||
**Rationale**: The simplest approach — one new table, one new repository, minimal changes to the existing service. No caching layer, no event system. The sequential number is derived at query time via `ROW_NUMBER` or counted in the repository, keeping the schema minimal.
|
||||
|
||||
**Alternatives considered**:
|
||||
- *Store in frontend state only (session memory)*: Already the current state — summaries are lost on reload. Rejected because the user explicitly wants persistence.
|
||||
- *Store summary_number as a persisted column*: Avoids a query-time count but risks gaps/duplicates on concurrent writes. For a POC with single-user use, a `SELECT COUNT(*) + 1` approach at insert time is sufficient.
|
||||
- *Versioning / soft-delete*: Overkill for a POC where the user just wants to re-read old summaries. No delete endpoint needed initially.
|
||||
|
||||
**Implementation note**:
|
||||
- New Flyway migration `V6__topic_summary.sql`.
|
||||
- `TopicSummaryEntity` (JPA `@Entity` → table `topic_summary`): `id` (UUID), `topicId` (VARCHAR FK), `summaryNumber` (INT), `summary` (TEXT), `sourcesJson` (TEXT — JSON array), `generatedAt` (TIMESTAMP).
|
||||
- `summaryNumber` set at insert time: `COUNT(*) WHERE topic_id = ?` + 1.
|
||||
- `TopicSummaryRepository extends JpaRepository<TopicSummaryEntity, UUID>` with a `findByTopicIdOrderBySummaryNumberAsc` query.
|
||||
- `TopicSummaryService.generateSummary()` saves the result before returning it.
|
||||
- `TopicController` gains `GET /api/v1/topics/{id}/summaries` returning `List<SavedSummaryResponse>` (id, summaryNumber, generatedAt — no full text, for list efficiency) and `GET /api/v1/topics/{id}/summaries/{summaryId}` for the full detail.
|
||||
- Frontend: when a topic card is clicked, fetch the summary list. Show "Summary #1", "Summary #2" chips + "Generate New" button. Clicking a chip loads the full summary.
|
||||
Reference in New Issue
Block a user