92 lines
2.8 KiB
Markdown
92 lines
2.8 KiB
Markdown
# Data Model: RAG Retrieval Quality + Topic Summary Persistence
|
|
|
|
**Branch**: `004-rag-retrieval-quality` | **Date**: 2026-04-07
|
|
|
|
---
|
|
|
|
## New persistent entity: TopicSummaryEntity
|
|
|
|
**Table**: `topic_summary`
|
|
**Migration**: `V6__topic_summary.sql`
|
|
|
|
| Column | Type | Notes |
|
|
|--------|------|-------|
|
|
| `id` | `UUID` PK | `gen_random_uuid()` default |
|
|
| `topic_id` | `VARCHAR(100)` NOT NULL | FK to `topic.id` |
|
|
| `summary_number` | `INT` NOT NULL | Sequential per topic (1, 2, 3, …). Set at insert time: `COUNT(*) WHERE topic_id = ? + 1`. |
|
|
| `summary` | `TEXT` NOT NULL | Full markdown summary text |
|
|
| `sources_json` | `TEXT` NOT NULL | JSON array of `SourceReference` objects (same structure as `TopicSummaryResponse.sources`) |
|
|
| `generated_at` | `TIMESTAMPTZ` NOT NULL | UTC timestamp of generation |
|
|
|
|
**Constraints**: no unique constraint on `summary_number` (sequential, not concurrent-safe for POC). No FK constraint enforced at DB level (topic ids are static seed data).
|
|
|
|
---
|
|
|
|
## In-memory objects (new, from RAG quality work)
|
|
|
|
### ExpandedQuery (value object, not persisted)
|
|
|
|
Produced by `QueryExpansionService` for each user message.
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `original` | `String` | The user's literal question |
|
|
| `rewritten` | `String` | Clinically rewritten version used for vector search |
|
|
|
|
---
|
|
|
|
### LabelledContext (value object, not persisted)
|
|
|
|
Produced by `ChatService.buildContextPrompt()` to track the mapping from ref-labels to source entities.
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `sectionLabels` | `Map<String, SectionEntity>` | e.g. `{"S1" → SectionEntity, "S2" → SectionEntity}` |
|
|
| `figureLabels` | `Map<String, FigureEntity>` | e.g. `{"F1" → FigureEntity}` |
|
|
| `promptText` | `String` | The fully formatted context prompt including `[S1]`, `[F1]` tags |
|
|
|
|
---
|
|
|
|
## New API DTOs
|
|
|
|
### SavedSummaryItem (list view — no full text)
|
|
|
|
```java
|
|
record SavedSummaryItem(UUID id, int summaryNumber, Instant generatedAt) {}
|
|
```
|
|
|
|
Used in `GET /api/v1/topics/{id}/summaries` to show the summary history list without transmitting full text.
|
|
|
|
### TopicSummaryResponse (existing, extended)
|
|
|
|
Adds `id` (UUID) and `summaryNumber` (int) fields so the frontend knows which saved record was just created.
|
|
|
|
---
|
|
|
|
## Existing entities (unchanged)
|
|
|
|
| Entity | Table | Change |
|
|
|--------|-------|--------|
|
|
| `SectionEntity` | `section` | None |
|
|
| `FigureEntity` | `figure` | None |
|
|
| `Message` | `message` | `sources` field gets `refLabel` key added per entry |
|
|
| `ChatSession` | `chat_session` | None |
|
|
| `Book` | `book` | None |
|
|
| `Topic` | `topic` | None |
|
|
|
|
---
|
|
|
|
## Message.sources structure (existing, clarified)
|
|
|
|
After the RAG quality feature each entry includes `refLabel`:
|
|
|
|
```json
|
|
{
|
|
"type": "TEXT",
|
|
"refLabel": "S1",
|
|
"bookTitle": "Youmans & Winn Neurological Surgery",
|
|
"page": 142,
|
|
"chunkText": "..."
|
|
}
|
|
```
|