plan
This commit is contained in:
165
specs/001-neuro-rag-learning/data-model.md
Normal file
165
specs/001-neuro-rag-learning/data-model.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Data Model: Neurosurgeon RAG Learning Platform
|
||||
|
||||
**Branch**: `001-neuro-rag-learning`
|
||||
**Date**: 2026-03-31
|
||||
|
||||
## Entities
|
||||
|
||||
### Book
|
||||
|
||||
Represents an uploaded medical textbook.
|
||||
|
||||
| Field | Type | Constraints | Notes |
|
||||
|-------|------|-------------|-------|
|
||||
| id | UUID | PK, generated | |
|
||||
| title | VARCHAR(500) | NOT NULL | Extracted from PDF metadata or filename |
|
||||
| file_name | VARCHAR(500) | NOT NULL | Original upload filename |
|
||||
| file_size_bytes | BIGINT | NOT NULL | |
|
||||
| page_count | INT | nullable | Populated after processing |
|
||||
| status | ENUM | NOT NULL | `PENDING`, `PROCESSING`, `READY`, `FAILED` |
|
||||
| error_message | TEXT | nullable | Populated if status = FAILED |
|
||||
| uploaded_at | TIMESTAMPTZ | NOT NULL, default now() | |
|
||||
| processed_at | TIMESTAMPTZ | nullable | When embedding completed |
|
||||
|
||||
**State machine**:
|
||||
```
|
||||
PENDING → PROCESSING → READY
|
||||
↘ FAILED
|
||||
```
|
||||
|
||||
**Business rules**:
|
||||
- Only books in `READY` status are used as RAG sources.
|
||||
- A `FAILED` book can be deleted and re-uploaded.
|
||||
- `title` defaults to the filename (without extension) if PDF metadata is absent.
|
||||
|
||||
---
|
||||
|
||||
### EmbeddingChunk
|
||||
|
||||
A semantically coherent segment of a book's content stored as a vector embedding.
|
||||
Managed by Spring AI's pgvector `VectorStore` — the table is auto-created by Spring AI.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| id | UUID | PK |
|
||||
| content | TEXT | Raw text of the chunk (passage or diagram caption + surrounding text) |
|
||||
| embedding | VECTOR(1536) | pgvector column; dimension matches the embedding model |
|
||||
| metadata | JSONB | `{ "book_id": "…", "book_title": "…", "page": N, "chunk_type": "text\|diagram" }` |
|
||||
|
||||
**Notes**:
|
||||
- `chunk_type = diagram` means the chunk was derived from a diagram caption and adjacent descriptive text.
|
||||
- All chunks for a given book are deleted when the book is deleted.
|
||||
- Spring AI manages this table; direct access is through `VectorStore.similaritySearch(…)`.
|
||||
|
||||
---
|
||||
|
||||
### Topic
|
||||
|
||||
A predefined neurosurgery learning subject. **Not stored in the database** for the POC —
|
||||
loaded at startup from `backend/src/main/resources/topics.json`.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| id | String | Slug, e.g., `cerebral-aneurysm` |
|
||||
| name | String | Display name, e.g., "Cerebral Aneurysm Management" |
|
||||
| description | String | One-sentence description |
|
||||
| category | String | Grouping label, e.g., "Vascular", "Oncology" |
|
||||
|
||||
**Business rules**:
|
||||
- Topics are read-only from the application's perspective.
|
||||
- The project owner edits `topics.json` to add/remove topics; no admin UI is needed.
|
||||
|
||||
---
|
||||
|
||||
### ChatSession
|
||||
|
||||
A conversation thread, optionally associated with a topic.
|
||||
|
||||
| Field | Type | Constraints | Notes |
|
||||
|-------|------|-------------|-------|
|
||||
| id | UUID | PK, generated | |
|
||||
| topic_id | VARCHAR(100) | nullable | References a topic slug; null = free-form chat |
|
||||
| created_at | TIMESTAMPTZ | NOT NULL, default now() | |
|
||||
|
||||
---
|
||||
|
||||
### Message
|
||||
|
||||
A single turn in a chat session.
|
||||
|
||||
| Field | Type | Constraints | Notes |
|
||||
|-------|------|-------------|-------|
|
||||
| id | UUID | PK, generated | |
|
||||
| session_id | UUID | FK → ChatSession.id, ON DELETE CASCADE | |
|
||||
| role | ENUM | NOT NULL | `USER`, `ASSISTANT` |
|
||||
| content | TEXT | NOT NULL | |
|
||||
| sources | JSONB | nullable | Array of `{ "book_title": "…", "page": N }` for ASSISTANT messages |
|
||||
| created_at | TIMESTAMPTZ | NOT NULL, default now() | |
|
||||
|
||||
**Business rules**:
|
||||
- Messages are ordered by `created_at` ASC within a session.
|
||||
- `sources` is only populated on `ASSISTANT` messages.
|
||||
- Deleting a session cascades to delete all its messages.
|
||||
|
||||
---
|
||||
|
||||
## Relationships
|
||||
|
||||
```
|
||||
Book (1) ──────────────── (N) EmbeddingChunk
|
||||
(via metadata.book_id)
|
||||
|
||||
Topic (config file) ──────── (N) ChatSession [optional association]
|
||||
|
||||
ChatSession (1) ──────────── (N) Message
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema (DDL summary)
|
||||
|
||||
```sql
|
||||
-- Spring AI creates the vector table automatically.
|
||||
-- Application-managed tables:
|
||||
|
||||
CREATE TABLE book (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
title VARCHAR(500) NOT NULL,
|
||||
file_name VARCHAR(500) NOT NULL,
|
||||
file_size_bytes BIGINT NOT NULL,
|
||||
page_count INT,
|
||||
status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
|
||||
error_message TEXT,
|
||||
uploaded_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
processed_at TIMESTAMPTZ
|
||||
);
|
||||
|
||||
CREATE TABLE chat_session (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
topic_id VARCHAR(100),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE TABLE message (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
session_id UUID NOT NULL REFERENCES chat_session(id) ON DELETE CASCADE,
|
||||
role VARCHAR(10) NOT NULL,
|
||||
content TEXT NOT NULL,
|
||||
sources JSONB,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_message_session ON message(session_id, created_at);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Validation Rules
|
||||
|
||||
| Entity | Field | Rule |
|
||||
|--------|-------|------|
|
||||
| Book | status | MUST be one of `PENDING`, `PROCESSING`, `READY`, `FAILED` |
|
||||
| Book | file_name | MUST end in `.pdf` (case-insensitive) |
|
||||
| Message | role | MUST be `USER` or `ASSISTANT` |
|
||||
| EmbeddingChunk | metadata.chunk_type | MUST be `text` or `diagram` |
|
||||
| ChatSession | topic_id | If non-null, MUST match a known topic slug from `topics.json` |
|
||||
Reference in New Issue
Block a user