Files
ai-teacher/specs/004-rag-retrieval-quality/data-model.md
2026-04-07 22:39:28 +02:00

92 lines
2.8 KiB
Markdown

# Data Model: RAG Retrieval Quality + Topic Summary Persistence
**Branch**: `004-rag-retrieval-quality` | **Date**: 2026-04-07
---
## New persistent entity: TopicSummaryEntity
**Table**: `topic_summary`
**Migration**: `V6__topic_summary.sql`
| Column | Type | Notes |
|--------|------|-------|
| `id` | `UUID` PK | `gen_random_uuid()` default |
| `topic_id` | `VARCHAR(100)` NOT NULL | FK to `topic.id` |
| `summary_number` | `INT` NOT NULL | Sequential per topic (1, 2, 3, …). Set at insert time: `COUNT(*) WHERE topic_id = ? + 1`. |
| `summary` | `TEXT` NOT NULL | Full markdown summary text |
| `sources_json` | `TEXT` NOT NULL | JSON array of `SourceReference` objects (same structure as `TopicSummaryResponse.sources`) |
| `generated_at` | `TIMESTAMPTZ` NOT NULL | UTC timestamp of generation |
**Constraints**: no unique constraint on `summary_number` (sequential, not concurrent-safe for POC). No FK constraint enforced at DB level (topic ids are static seed data).
---
## In-memory objects (new, from RAG quality work)
### ExpandedQuery (value object, not persisted)
Produced by `QueryExpansionService` for each user message.
| Field | Type | Description |
|-------|------|-------------|
| `original` | `String` | The user's literal question |
| `rewritten` | `String` | Clinically rewritten version used for vector search |
---
### LabelledContext (value object, not persisted)
Produced by `ChatService.buildContextPrompt()` to track the mapping from ref-labels to source entities.
| Field | Type | Description |
|-------|------|-------------|
| `sectionLabels` | `Map<String, SectionEntity>` | e.g. `{"S1" → SectionEntity, "S2" → SectionEntity}` |
| `figureLabels` | `Map<String, FigureEntity>` | e.g. `{"F1" → FigureEntity}` |
| `promptText` | `String` | The fully formatted context prompt including `[S1]`, `[F1]` tags |
---
## New API DTOs
### SavedSummaryItem (list view — no full text)
```java
record SavedSummaryItem(UUID id, int summaryNumber, Instant generatedAt) {}
```
Used in `GET /api/v1/topics/{id}/summaries` to show the summary history list without transmitting full text.
### TopicSummaryResponse (existing, extended)
Adds `id` (UUID) and `summaryNumber` (int) fields so the frontend knows which saved record was just created.
---
## Existing entities (unchanged)
| Entity | Table | Change |
|--------|-------|--------|
| `SectionEntity` | `section` | None |
| `FigureEntity` | `figure` | None |
| `Message` | `message` | `sources` field gets `refLabel` key added per entry |
| `ChatSession` | `chat_session` | None |
| `Book` | `book` | None |
| `Topic` | `topic` | None |
---
## Message.sources structure (existing, clarified)
After the RAG quality feature each entry includes `refLabel`:
```json
{
"type": "TEXT",
"refLabel": "S1",
"bookTitle": "Youmans & Winn Neurological Surgery",
"page": 142,
"chunkText": "..."
}
```