# Data Model: Neurosurgeon RAG Learning Platform **Branch**: `001-neuro-rag-learning` **Date**: 2026-03-31 ## Entities ### Book Represents an uploaded medical textbook. | Field | Type | Constraints | Notes | |-------|------|-------------|-------| | id | UUID | PK, generated | | | title | VARCHAR(500) | NOT NULL | Extracted from PDF metadata or filename | | file_name | VARCHAR(500) | NOT NULL | Original upload filename | | file_size_bytes | BIGINT | NOT NULL | | | page_count | INT | nullable | Populated after processing | | status | ENUM | NOT NULL | `PENDING`, `PROCESSING`, `READY`, `FAILED` | | error_message | TEXT | nullable | Populated if status = FAILED | | uploaded_at | TIMESTAMPTZ | NOT NULL, default now() | | | processed_at | TIMESTAMPTZ | nullable | When embedding completed | **State machine**: ``` PENDING → PROCESSING → READY ↘ FAILED ``` **Business rules**: - Only books in `READY` status are used as RAG sources. - A `FAILED` book can be deleted and re-uploaded. - `title` defaults to the filename (without extension) if PDF metadata is absent. --- ### EmbeddingChunk A semantically coherent segment of a book's content stored as a vector embedding. Managed by Spring AI's pgvector `VectorStore` — the table is auto-created by Spring AI. | Field | Type | Notes | |-------|------|-------| | id | UUID | PK | | content | TEXT | Raw text of the chunk (passage or diagram caption + surrounding text) | | embedding | VECTOR(1536) | pgvector column; dimension matches the embedding model | | metadata | JSONB | `{ "book_id": "…", "book_title": "…", "page": N, "chunk_type": "text\|diagram" }` | **Notes**: - `chunk_type = diagram` means the chunk was derived from a diagram caption and adjacent descriptive text. - All chunks for a given book are deleted when the book is deleted. - Spring AI manages this table; direct access is through `VectorStore.similaritySearch(…)`. --- ### Topic A predefined neurosurgery learning subject. **Not stored in the database** for the POC — loaded at startup from `backend/src/main/resources/topics.json`. | Field | Type | Notes | |-------|------|-------| | id | String | Slug, e.g., `cerebral-aneurysm` | | name | String | Display name, e.g., "Cerebral Aneurysm Management" | | description | String | One-sentence description | | category | String | Grouping label, e.g., "Vascular", "Oncology" | **Business rules**: - Topics are read-only from the application's perspective. - The project owner edits `topics.json` to add/remove topics; no admin UI is needed. --- ### ChatSession A conversation thread, optionally associated with a topic. | Field | Type | Constraints | Notes | |-------|------|-------------|-------| | id | UUID | PK, generated | | | topic_id | VARCHAR(100) | nullable | References a topic slug; null = free-form chat | | created_at | TIMESTAMPTZ | NOT NULL, default now() | | --- ### Message A single turn in a chat session. | Field | Type | Constraints | Notes | |-------|------|-------------|-------| | id | UUID | PK, generated | | | session_id | UUID | FK → ChatSession.id, ON DELETE CASCADE | | | role | ENUM | NOT NULL | `USER`, `ASSISTANT` | | content | TEXT | NOT NULL | | | sources | JSONB | nullable | Array of `{ "book_title": "…", "page": N }` for ASSISTANT messages | | created_at | TIMESTAMPTZ | NOT NULL, default now() | | **Business rules**: - Messages are ordered by `created_at` ASC within a session. - `sources` is only populated on `ASSISTANT` messages. - Deleting a session cascades to delete all its messages. --- ## Relationships ``` Book (1) ──────────────── (N) EmbeddingChunk (via metadata.book_id) Topic (config file) ──────── (N) ChatSession [optional association] ChatSession (1) ──────────── (N) Message ``` --- ## Database Schema (DDL summary) ```sql -- Spring AI creates the vector table automatically. -- Application-managed tables: CREATE TABLE book ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), title VARCHAR(500) NOT NULL, file_name VARCHAR(500) NOT NULL, file_size_bytes BIGINT NOT NULL, page_count INT, status VARCHAR(20) NOT NULL DEFAULT 'PENDING', error_message TEXT, uploaded_at TIMESTAMPTZ NOT NULL DEFAULT now(), processed_at TIMESTAMPTZ ); CREATE TABLE chat_session ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), topic_id VARCHAR(100), created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); CREATE TABLE message ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), session_id UUID NOT NULL REFERENCES chat_session(id) ON DELETE CASCADE, role VARCHAR(10) NOT NULL, content TEXT NOT NULL, sources JSONB, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); CREATE INDEX idx_message_session ON message(session_id, created_at); ``` --- ## Validation Rules | Entity | Field | Rule | |--------|-------|------| | Book | status | MUST be one of `PENDING`, `PROCESSING`, `READY`, `FAILED` | | Book | file_name | MUST end in `.pdf` (case-insensitive) | | Message | role | MUST be `USER` or `ASSISTANT` | | EmbeddingChunk | metadata.chunk_type | MUST be `text` or `diagram` | | ChatSession | topic_id | If non-null, MUST match a known topic slug from `topics.json` |