5.3 KiB
5.3 KiB
Data Model: Neurosurgeon RAG Learning Platform
Branch: 001-neuro-rag-learning
Date: 2026-03-31
Entities
Book
Represents an uploaded medical textbook.
| Field | Type | Constraints | Notes |
|---|---|---|---|
| id | UUID | PK, generated | |
| title | VARCHAR(500) | NOT NULL | Extracted from PDF metadata or filename |
| file_name | VARCHAR(500) | NOT NULL | Original upload filename |
| file_size_bytes | BIGINT | NOT NULL | |
| page_count | INT | nullable | Populated after processing |
| status | ENUM | NOT NULL | PENDING, PROCESSING, READY, FAILED |
| error_message | TEXT | nullable | Populated if status = FAILED |
| uploaded_at | TIMESTAMPTZ | NOT NULL, default now() | |
| processed_at | TIMESTAMPTZ | nullable | When embedding completed |
State machine:
PENDING → PROCESSING → READY
↘ FAILED
Business rules:
- Only books in
READYstatus are used as RAG sources. - A
FAILEDbook can be deleted and re-uploaded. titledefaults to the filename (without extension) if PDF metadata is absent.
EmbeddingChunk
A semantically coherent segment of a book's content stored as a vector embedding.
Managed by Spring AI's pgvector VectorStore — the table is auto-created by Spring AI.
| Field | Type | Notes |
|---|---|---|
| id | UUID | PK |
| content | TEXT | Raw text of the chunk (passage or diagram caption + surrounding text) |
| embedding | VECTOR(1536) | pgvector column; dimension matches the embedding model |
| metadata | JSONB | { "book_id": "…", "book_title": "…", "page": N, "chunk_type": "text|diagram" } |
Notes:
chunk_type = diagrammeans the chunk was derived from a diagram caption and adjacent descriptive text.- All chunks for a given book are deleted when the book is deleted.
- Spring AI manages this table; direct access is through
VectorStore.similaritySearch(…).
Topic
A predefined neurosurgery learning subject. Not stored in the database for the POC —
loaded at startup from backend/src/main/resources/topics.json.
| Field | Type | Notes |
|---|---|---|
| id | String | Slug, e.g., cerebral-aneurysm |
| name | String | Display name, e.g., "Cerebral Aneurysm Management" |
| description | String | One-sentence description |
| category | String | Grouping label, e.g., "Vascular", "Oncology" |
Business rules:
- Topics are read-only from the application's perspective.
- The project owner edits
topics.jsonto add/remove topics; no admin UI is needed.
ChatSession
A conversation thread, optionally associated with a topic.
| Field | Type | Constraints | Notes |
|---|---|---|---|
| id | UUID | PK, generated | |
| topic_id | VARCHAR(100) | nullable | References a topic slug; null = free-form chat |
| created_at | TIMESTAMPTZ | NOT NULL, default now() |
Message
A single turn in a chat session.
| Field | Type | Constraints | Notes |
|---|---|---|---|
| id | UUID | PK, generated | |
| session_id | UUID | FK → ChatSession.id, ON DELETE CASCADE | |
| role | ENUM | NOT NULL | USER, ASSISTANT |
| content | TEXT | NOT NULL | |
| sources | JSONB | nullable | Array of { "book_title": "…", "page": N } for ASSISTANT messages |
| created_at | TIMESTAMPTZ | NOT NULL, default now() |
Business rules:
- Messages are ordered by
created_atASC within a session. sourcesis only populated onASSISTANTmessages.- Deleting a session cascades to delete all its messages.
Relationships
Book (1) ──────────────── (N) EmbeddingChunk
(via metadata.book_id)
Topic (config file) ──────── (N) ChatSession [optional association]
ChatSession (1) ──────────── (N) Message
Database Schema (DDL summary)
-- Spring AI creates the vector table automatically.
-- Application-managed tables:
CREATE TABLE book (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title VARCHAR(500) NOT NULL,
file_name VARCHAR(500) NOT NULL,
file_size_bytes BIGINT NOT NULL,
page_count INT,
status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
error_message TEXT,
uploaded_at TIMESTAMPTZ NOT NULL DEFAULT now(),
processed_at TIMESTAMPTZ
);
CREATE TABLE chat_session (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
topic_id VARCHAR(100),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE message (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id UUID NOT NULL REFERENCES chat_session(id) ON DELETE CASCADE,
role VARCHAR(10) NOT NULL,
content TEXT NOT NULL,
sources JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_message_session ON message(session_id, created_at);
Validation Rules
| Entity | Field | Rule |
|---|---|---|
| Book | status | MUST be one of PENDING, PROCESSING, READY, FAILED |
| Book | file_name | MUST end in .pdf (case-insensitive) |
| Message | role | MUST be USER or ASSISTANT |
| EmbeddingChunk | metadata.chunk_type | MUST be text or diagram |
| ChatSession | topic_id | If non-null, MUST match a known topic slug from topics.json |