enhance rag retrieval + summary
This commit is contained in:
@@ -0,0 +1,91 @@
|
||||
# Data Model: RAG Retrieval Quality + Topic Summary Persistence
|
||||
|
||||
**Branch**: `004-rag-retrieval-quality` | **Date**: 2026-04-07
|
||||
|
||||
---
|
||||
|
||||
## New persistent entity: TopicSummaryEntity
|
||||
|
||||
**Table**: `topic_summary`
|
||||
**Migration**: `V6__topic_summary.sql`
|
||||
|
||||
| Column | Type | Notes |
|
||||
|--------|------|-------|
|
||||
| `id` | `UUID` PK | `gen_random_uuid()` default |
|
||||
| `topic_id` | `VARCHAR(100)` NOT NULL | FK to `topic.id` |
|
||||
| `summary_number` | `INT` NOT NULL | Sequential per topic (1, 2, 3, …). Set at insert time: `COUNT(*) WHERE topic_id = ? + 1`. |
|
||||
| `summary` | `TEXT` NOT NULL | Full markdown summary text |
|
||||
| `sources_json` | `TEXT` NOT NULL | JSON array of `SourceReference` objects (same structure as `TopicSummaryResponse.sources`) |
|
||||
| `generated_at` | `TIMESTAMPTZ` NOT NULL | UTC timestamp of generation |
|
||||
|
||||
**Constraints**: no unique constraint on `summary_number` (sequential, not concurrent-safe for POC). No FK constraint enforced at DB level (topic ids are static seed data).
|
||||
|
||||
---
|
||||
|
||||
## In-memory objects (new, from RAG quality work)
|
||||
|
||||
### ExpandedQuery (value object, not persisted)
|
||||
|
||||
Produced by `QueryExpansionService` for each user message.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `original` | `String` | The user's literal question |
|
||||
| `rewritten` | `String` | Clinically rewritten version used for vector search |
|
||||
|
||||
---
|
||||
|
||||
### LabelledContext (value object, not persisted)
|
||||
|
||||
Produced by `ChatService.buildContextPrompt()` to track the mapping from ref-labels to source entities.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `sectionLabels` | `Map<String, SectionEntity>` | e.g. `{"S1" → SectionEntity, "S2" → SectionEntity}` |
|
||||
| `figureLabels` | `Map<String, FigureEntity>` | e.g. `{"F1" → FigureEntity}` |
|
||||
| `promptText` | `String` | The fully formatted context prompt including `[S1]`, `[F1]` tags |
|
||||
|
||||
---
|
||||
|
||||
## New API DTOs
|
||||
|
||||
### SavedSummaryItem (list view — no full text)
|
||||
|
||||
```java
|
||||
record SavedSummaryItem(UUID id, int summaryNumber, Instant generatedAt) {}
|
||||
```
|
||||
|
||||
Used in `GET /api/v1/topics/{id}/summaries` to show the summary history list without transmitting full text.
|
||||
|
||||
### TopicSummaryResponse (existing, extended)
|
||||
|
||||
Adds `id` (UUID) and `summaryNumber` (int) fields so the frontend knows which saved record was just created.
|
||||
|
||||
---
|
||||
|
||||
## Existing entities (unchanged)
|
||||
|
||||
| Entity | Table | Change |
|
||||
|--------|-------|--------|
|
||||
| `SectionEntity` | `section` | None |
|
||||
| `FigureEntity` | `figure` | None |
|
||||
| `Message` | `message` | `sources` field gets `refLabel` key added per entry |
|
||||
| `ChatSession` | `chat_session` | None |
|
||||
| `Book` | `book` | None |
|
||||
| `Topic` | `topic` | None |
|
||||
|
||||
---
|
||||
|
||||
## Message.sources structure (existing, clarified)
|
||||
|
||||
After the RAG quality feature each entry includes `refLabel`:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "TEXT",
|
||||
"refLabel": "S1",
|
||||
"bookTitle": "Youmans & Winn Neurological Surgery",
|
||||
"page": 142,
|
||||
"chunkText": "..."
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user