166 lines
5.3 KiB
Markdown
166 lines
5.3 KiB
Markdown
# Data Model: Neurosurgeon RAG Learning Platform
|
|
|
|
**Branch**: `001-neuro-rag-learning`
|
|
**Date**: 2026-03-31
|
|
|
|
## Entities
|
|
|
|
### Book
|
|
|
|
Represents an uploaded medical textbook.
|
|
|
|
| Field | Type | Constraints | Notes |
|
|
|-------|------|-------------|-------|
|
|
| id | UUID | PK, generated | |
|
|
| title | VARCHAR(500) | NOT NULL | Extracted from PDF metadata or filename |
|
|
| file_name | VARCHAR(500) | NOT NULL | Original upload filename |
|
|
| file_size_bytes | BIGINT | NOT NULL | |
|
|
| page_count | INT | nullable | Populated after processing |
|
|
| status | ENUM | NOT NULL | `PENDING`, `PROCESSING`, `READY`, `FAILED` |
|
|
| error_message | TEXT | nullable | Populated if status = FAILED |
|
|
| uploaded_at | TIMESTAMPTZ | NOT NULL, default now() | |
|
|
| processed_at | TIMESTAMPTZ | nullable | When embedding completed |
|
|
|
|
**State machine**:
|
|
```
|
|
PENDING → PROCESSING → READY
|
|
↘ FAILED
|
|
```
|
|
|
|
**Business rules**:
|
|
- Only books in `READY` status are used as RAG sources.
|
|
- A `FAILED` book can be deleted and re-uploaded.
|
|
- `title` defaults to the filename (without extension) if PDF metadata is absent.
|
|
|
|
---
|
|
|
|
### EmbeddingChunk
|
|
|
|
A semantically coherent segment of a book's content stored as a vector embedding.
|
|
Managed by Spring AI's pgvector `VectorStore` — the table is auto-created by Spring AI.
|
|
|
|
| Field | Type | Notes |
|
|
|-------|------|-------|
|
|
| id | UUID | PK |
|
|
| content | TEXT | Raw text of the chunk (passage or diagram caption + surrounding text) |
|
|
| embedding | VECTOR(1536) | pgvector column; dimension matches the embedding model |
|
|
| metadata | JSONB | `{ "book_id": "…", "book_title": "…", "page": N, "chunk_type": "text\|diagram" }` |
|
|
|
|
**Notes**:
|
|
- `chunk_type = diagram` means the chunk was derived from a diagram caption and adjacent descriptive text.
|
|
- All chunks for a given book are deleted when the book is deleted.
|
|
- Spring AI manages this table; direct access is through `VectorStore.similaritySearch(…)`.
|
|
|
|
---
|
|
|
|
### Topic
|
|
|
|
A predefined neurosurgery learning subject. **Not stored in the database** for the POC —
|
|
loaded at startup from `backend/src/main/resources/topics.json`.
|
|
|
|
| Field | Type | Notes |
|
|
|-------|------|-------|
|
|
| id | String | Slug, e.g., `cerebral-aneurysm` |
|
|
| name | String | Display name, e.g., "Cerebral Aneurysm Management" |
|
|
| description | String | One-sentence description |
|
|
| category | String | Grouping label, e.g., "Vascular", "Oncology" |
|
|
|
|
**Business rules**:
|
|
- Topics are read-only from the application's perspective.
|
|
- The project owner edits `topics.json` to add/remove topics; no admin UI is needed.
|
|
|
|
---
|
|
|
|
### ChatSession
|
|
|
|
A conversation thread, optionally associated with a topic.
|
|
|
|
| Field | Type | Constraints | Notes |
|
|
|-------|------|-------------|-------|
|
|
| id | UUID | PK, generated | |
|
|
| topic_id | VARCHAR(100) | nullable | References a topic slug; null = free-form chat |
|
|
| created_at | TIMESTAMPTZ | NOT NULL, default now() | |
|
|
|
|
---
|
|
|
|
### Message
|
|
|
|
A single turn in a chat session.
|
|
|
|
| Field | Type | Constraints | Notes |
|
|
|-------|------|-------------|-------|
|
|
| id | UUID | PK, generated | |
|
|
| session_id | UUID | FK → ChatSession.id, ON DELETE CASCADE | |
|
|
| role | ENUM | NOT NULL | `USER`, `ASSISTANT` |
|
|
| content | TEXT | NOT NULL | |
|
|
| sources | JSONB | nullable | Array of `{ "book_title": "…", "page": N }` for ASSISTANT messages |
|
|
| created_at | TIMESTAMPTZ | NOT NULL, default now() | |
|
|
|
|
**Business rules**:
|
|
- Messages are ordered by `created_at` ASC within a session.
|
|
- `sources` is only populated on `ASSISTANT` messages.
|
|
- Deleting a session cascades to delete all its messages.
|
|
|
|
---
|
|
|
|
## Relationships
|
|
|
|
```
|
|
Book (1) ──────────────── (N) EmbeddingChunk
|
|
(via metadata.book_id)
|
|
|
|
Topic (config file) ──────── (N) ChatSession [optional association]
|
|
|
|
ChatSession (1) ──────────── (N) Message
|
|
```
|
|
|
|
---
|
|
|
|
## Database Schema (DDL summary)
|
|
|
|
```sql
|
|
-- Spring AI creates the vector table automatically.
|
|
-- Application-managed tables:
|
|
|
|
CREATE TABLE book (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
title VARCHAR(500) NOT NULL,
|
|
file_name VARCHAR(500) NOT NULL,
|
|
file_size_bytes BIGINT NOT NULL,
|
|
page_count INT,
|
|
status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
|
|
error_message TEXT,
|
|
uploaded_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
|
processed_at TIMESTAMPTZ
|
|
);
|
|
|
|
CREATE TABLE chat_session (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
topic_id VARCHAR(100),
|
|
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
|
);
|
|
|
|
CREATE TABLE message (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
session_id UUID NOT NULL REFERENCES chat_session(id) ON DELETE CASCADE,
|
|
role VARCHAR(10) NOT NULL,
|
|
content TEXT NOT NULL,
|
|
sources JSONB,
|
|
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
|
);
|
|
|
|
CREATE INDEX idx_message_session ON message(session_id, created_at);
|
|
```
|
|
|
|
---
|
|
|
|
## Validation Rules
|
|
|
|
| Entity | Field | Rule |
|
|
|--------|-------|------|
|
|
| Book | status | MUST be one of `PENDING`, `PROCESSING`, `READY`, `FAILED` |
|
|
| Book | file_name | MUST end in `.pdf` (case-insensitive) |
|
|
| Message | role | MUST be `USER` or `ASSISTANT` |
|
|
| EmbeddingChunk | metadata.chunk_type | MUST be `text` or `diagram` |
|
|
| ChatSession | topic_id | If non-null, MUST match a known topic slug from `topics.json` |
|