first implementation - image/drawing integration

2026-04-04 12:56:56 +02:00
parent fc5b22fba1
commit 5acfdd33c1
42 changed files with 2854 additions and 151 deletions
@@ -0,0 +1,73 @@
+# Embedding & Retrieval Pipeline Checklist: Enhanced Embedding with Image Parsing and Metadata
+
+**Purpose**: Author self-review of embedding pipeline and retrieval requirements quality — validates completeness, clarity, and measurability before implementation tasks are written
+**Created**: 2026-04-03
+**Feature**: [spec.md](../spec.md) | [research.md](../research.md) | [data-model.md](../data-model.md)
+**Focus**: A (Embedding pipeline) + B (Retrieval & ranking) | Depth: Standard | Audience: Author
+
+---
+
+## Requirement Completeness — Embedding Pipeline
+
+- [X] CHK001 - Is the definition of "inspect every page" complete — does the spec cover pages that have no extractable content layer (fully scanned/rasterised pages)? Yes [Completeness, Spec §FR-001, Assumption §6]
+
+- [X] CHK002 - Does FR-002 define what "independently searchable" means in practice — specifically, is it clear that image chunks must be retrievable without a co-located text chunk? [Clarity, Spec §FR-002] - No image should be retrieved along linked text.
+
+- [X] CHK003 - Is the minimum acceptable quality of the "descriptive textual representation" (FR-003) specified — e.g., must it include structural relationships, labelled regions, or clinical terms — or is any non-empty description sufficient? [Clarity, Spec §FR-003, Gap] - any non-empty description sufficient. Text just below the image should have the correct clinical term.
+
+- [C] CHK004 - Are the caption-detection rules defined at spec level — specifically, what pattern or signal determines that a piece of text is a caption vs. body text adjacent to an image? [Clarity, Spec §FR-004, Gap] - We assume a text starting with Fig. follewed by number is a text description of a give image. 
+
+- [X] CHK005 - Does FR-004 specify what metadata is stored when a caption is absent — is the caption field omitted, left empty, or populated with a generated substitute? [Completeness, Spec §FR-004] - generated substitute
+
+- [X] CHK006 - Is the "minimum meaningful-content threshold" (FR-007) quantified in the spec, or is it deferred entirely to implementation? The assumption section says "size threshold determined during implementation" — is this intentional and acceptable at the spec level? [Ambiguity, Spec §FR-007, Assumption §3] - Deferred to implementation
+
+- [X] CHK007 - Does FR-008 specify the observable outcome of per-page image failures — specifically, is there a requirement that the book's processing status or error log is accessible to the user or admin after partial failure? [Completeness, Spec §FR-008, Gap] online logs
+
+- [X] CHK008 - Is FR-010 ("MUST NOT degrade accuracy or completeness of text-only embedding") measurable — does the spec define a baseline or acceptance criterion against which degradation can be detected? [Measurability, Spec §FR-010, Gap] no definition
+
+- [X] CHK009 - Are re-embedding requirements complete — does the spec cover what happens to in-progress queries and cached results while a book is being re-embedded? [Coverage, Assumption §8, Gap] - No need to take that into account.
+
+---
+
+## Requirement Completeness — Retrieval & Ranking
+
+- [X] CHK010 - Does FR-006 define how image and text chunks are ranked relative to each other — is ranking unified (single score), or are the two modalities ranked independently with separate topK controls? [Clarity, Spec §FR-006, Gap] - independent separated topK
+
+- [X] CHK011 - Is the relevance threshold for figure retrieval specified — i.e., at what similarity score (or other criterion) should a figure be excluded from results? [Clarity, Spec §FR-006, Gap] not specified
+
+- [X] CHK012 - Are deduplication rules defined for the case where the same figure appears both in the semantic figure search and the chunk-to-figure reference lookup — which representation wins, or are both included? [Completeness, data-model.md §RetrievalResult, Gap] not specified
+
+- [X] CHK013 - Is the requirement for parent section context expansion in the spec — specifically, is there a requirement that the LLM receives the full section text (not just the chunk) when a text chunk is retrieved? [Gap, research.md §Decision 1] - the LLM should receive the full section to have maximum context.
+
+- [X] CHK014 - Does the spec define the required structure of the LLM prompt when both text context and figures are present — or is prompt design left entirely to implementation? [Completeness, Gap] - Left to implementation
+
+- [X] CHK015 - Is SC-002 ("70% recall on image queries") sufficient as a measurability criterion — is the test set composition (10 queries) and evaluation method documented, or does it rely on an undefined manual process? [Measurability, Spec §SC-002] - Manual process.
+
+---
+
+## Scenario Coverage — Edge & Exception Cases
+
+- [X] CHK016 - Does the spec address the scenario where a query is relevant to a book section that has figures but none of those figures rank above the retrieval threshold — is the expected fallback behaviour defined? [Coverage, Edge Case, Gap] - The figure should in this case be retrieved and shon to the user.
+
+- [X] CHK017 - Is the scenario of a figure retrieved in search results but whose image file is missing from the file store covered — what should the system return to the user in that case? [Coverage, Exception Flow, Gap] - missing image error, shown in the front as a broken image link.
+
+- [X] CHK018 - Are requirements defined for multi-image pages where images have conflicting captions or share a single composite caption — which image gets the caption, or is it duplicated? [Coverage, Spec §FR-004, Edge Case] - this case not exist.
+
+---
+
+## Consistency & Alignment
+
+- [X] CHK019 - Are the metadata fields required by FR-004 and FR-005 fully consistent with the metadata schema defined in data-model.md — specifically, do the mandatory fields in the spec match the `type`, `section_id`, and `section_title` fields in the data model? [Consistency, Spec §FR-004, data-model.md §Vector Store Documents] - Left to implementation
+
+- [X] CHK020 - Is SC-003 ("processing time ≤ 3× baseline") consistent with FR-003 — if description generation requires a vision model call per image, is the 3× cap realistic for a 500-page book with dense figures, and is this assumption documented? [Consistency, Spec §SC-003, Assumption §3, Gap] - not documented
+
+- [X] CHK021 - Does the spec's description of citation display (FR-009) align with the `sources` format change documented in contracts/api.md — are the fields the spec says must be "distinct" actually represented distinctly in the API response? [Consistency, Spec §FR-009, contracts/api.md §4] - A section with image-source should be displayed in the front. Text source and image-source are distinct
+
+---
+
+## Notes
+
+- Items marked `[Gap]` indicate requirements that appear absent or deferred; resolve before generating tasks
+- Items marked `[Ambiguity]` require a clearer definition in the spec before implementation starts
+- Items marked `[Consistency]` should be cross-checked between spec.md, data-model.md, and contracts/api.md
+- Mark items `[x]` when resolved; add inline notes with the resolution for traceability
@@ -0,0 +1,34 @@
+# Specification Quality Checklist: Enhanced Embedding with Image Parsing and Metadata
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-04-03
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [x] No implementation details (languages, frameworks, APIs)
+- [x] Focused on user value and business needs
+- [x] Written for non-technical stakeholders
+- [x] All mandatory sections completed
+
+## Requirement Completeness
+
+- [x] No [NEEDS CLARIFICATION] markers remain
+- [x] Requirements are testable and unambiguous
+- [x] Success criteria are measurable
+- [x] Success criteria are technology-agnostic (no implementation details)
+- [x] All acceptance scenarios are defined
+- [x] Edge cases are identified
+- [x] Scope is clearly bounded
+- [x] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [x] All functional requirements have clear acceptance criteria
+- [x] User scenarios cover primary flows
+- [x] Feature meets measurable outcomes defined in Success Criteria
+- [x] No implementation details leak into specification
+
+## Notes
+
+- All items pass. Spec is ready for `/speckit.clarify` or `/speckit.plan`.
@@ -0,0 +1,172 @@
+# API Contracts: Enhanced Embedding with Image Parsing and Metadata
+
+**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03  
+**Base path**: `/api/v1`  
+**Auth**: HTTP Basic (existing)
+
+---
+
+## New / Changed Endpoints
+
+### 1. Re-embed a book (new)
+
+Triggers a full re-embedding of an already-processed book, replacing all existing chunks and
+figures with the new image-aware pipeline output. Safe to call on books previously embedded
+by feature 001.
+
+```
+POST /api/v1/books/{id}/reembed
+```
+
+**Path parameters**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `id` | UUID | Book ID |
+
+**Response** `202 Accepted`
+
+```json
+{ "bookId": "uuid", "status": "PROCESSING" }
+```
+
+**Error responses**
+
+| Status | Condition |
+|--------|-----------|
+| 404 | Book not found |
+| 409 | Book already in PROCESSING state |
+
+---
+
+### 2. Get figures for a book (new)
+
+Returns the list of extracted figures for a book, including their type, caption, and image URL.
+Used by the frontend to display a figure gallery or inline figures in chat responses.
+
+```
+GET /api/v1/books/{id}/figures
+```
+
+**Path parameters**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `id` | UUID | Book ID |
+
+**Response** `200 OK`
+
+```json
+[
+  {
+    "figureId": "youmans-7ed-fig-12-4",
+    "label": "Fig. 12-4",
+    "caption": "Coronal cross-section of the cavernous sinus showing cranial nerve relationships",
+    "figureType": "ANATOMICAL_DIAGRAM",
+    "page": 184,
+    "imageUrl": "/api/v1/figures/550e8400-e29b-41d4-a716-446655440000/youmans-7ed-fig-12-4.png",
+    "sectionId": "youmans-7ed-ch12-s2-3",
+    "sectionTitle": "Cavernous Sinus"
+  }
+]
+```
+
+**Error responses**
+
+| Status | Condition |
+|--------|-----------|
+| 404 | Book not found |
+
+---
+
+### 3. Serve figure image (new)
+
+Serves the extracted figure image file. Mounted as a static resource from the file store.
+
+```
+GET /api/v1/figures/{bookId}/{filename}
+```
+
+**Path parameters**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `bookId` | UUID | Book ID |
+| `filename` | string | Image filename (e.g. `youmans-7ed-fig-12-4.png`) |
+
+**Response** `200 OK` — binary PNG  
+**Content-Type**: `image/png`
+
+**Error responses**
+
+| Status | Condition |
+|--------|-----------|
+| 404 | Image file not found |
+
+---
+
+### 4. Chat message response — extended source format (changed)
+
+The existing `POST /api/v1/chat/sessions/{id}/messages` endpoint is unchanged in its request
+format. The response `sources` field is extended to include figure references.
+
+**Existing request** (unchanged):
+
+```json
+{ "content": "Describe the anatomy of the cavernous sinus" }
+```
+
+**Response** `200 OK` — extended `sources`:
+
+```json
+{
+  "id": "uuid",
+  "role": "ASSISTANT",
+  "content": "The cavernous sinus is ... [Fig. 12-4, p.184] ...",
+  "sources": [
+    {
+      "type": "TEXT",
+      "bookTitle": "Youmans and Winn Neurological Surgery, 7th Ed.",
+      "page": 184,
+      "chunkText": "The cavernous sinus contains ..."
+    },
+    {
+      "type": "FIGURE",
+      "bookTitle": "Youmans and Winn Neurological Surgery, 7th Ed.",
+      "page": 184,
+      "figureId": "youmans-7ed-fig-12-4",
+      "label": "Fig. 12-4",
+      "caption": "Coronal cross-section of the cavernous sinus ...",
+      "figureType": "ANATOMICAL_DIAGRAM",
+      "imageUrl": "/api/v1/figures/550e8400-e29b-41d4-a716-446655440000/youmans-7ed-fig-12-4.png"
+    }
+  ],
+  "createdAt": "2026-04-03T12:00:00Z"
+}
+```
+
+**Changed fields in `sources` array**:
+
+| Field | Old | New |
+|-------|-----|-----|
+| `type` | absent | `"TEXT"` or `"FIGURE"` |
+| `figureId` | absent | figure ID string (FIGURE type only) |
+| `label` | absent | caption label (FIGURE type only) |
+| `caption` | absent | full caption (FIGURE type only) |
+| `figureType` | absent | enum name (FIGURE type only) |
+| `imageUrl` | absent | image URL (FIGURE type only) |
+
+---
+
+## Unchanged Endpoints
+
+All endpoints from feature 001 remain at their existing paths with no breaking changes:
+
+- `POST /api/v1/books/upload`
+- `GET /api/v1/books`
+- `DELETE /api/v1/books/{id}`
+- `GET /api/v1/topics`
+- `GET /api/v1/topics/{id}/summary`
+- `POST /api/v1/chat/sessions`
+- `GET /api/v1/chat/sessions/{id}/messages`
+- `DELETE /api/v1/chat/sessions/{id}`
@@ -0,0 +1,305 @@
+# Data Model: Enhanced Embedding with Image Parsing and Metadata
+
+**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
+
+---
+
+## Overview
+
+Three storage tiers work in concert:
+
+```
+┌──────────────────────────────────────────────────────────────────┐
+│  PDF Upload                                                       │
+│     │                                                             │
+│     ▼                                                             │
+│  Parsing Pipeline                                                 │
+│     │                          │                                  │
+│     ▼                          ▼                                  │
+│  Postgres (source of truth)   pgvector (search index)            │
+│  - book                       - vector_store (text chunks)        │
+│  - chapter                    - vector_store (figure captions)    │
+│  - section (+ fullText)       File Store (images)                 │
+│  - figure (metadata)          - /uploads/figures/{bookId}/*.png  │
+│  - chunk_figure_refs                                              │
+└──────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Postgres Schema
+
+### Existing tables (unchanged)
+
+- `book` — status, metadata, page count (V1)
+- `chat_session`, `message` — conversation (V1)
+- `vector_store` — managed by Spring AI pgvector starter (V2)
+- `topic` — predefined topics (V3)
+
+### New tables (Flyway V4)
+
+```sql
+-- V4: Document hierarchy
+
+CREATE TABLE chapter (
+    id           VARCHAR(200) PRIMARY KEY,  -- "{bookId}-ch{N}"
+    book_id      UUID NOT NULL REFERENCES book(id) ON DELETE CASCADE,
+    number       INT NOT NULL,
+    title        VARCHAR(500),
+    page_start   INT,
+    created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
+);
+
+CREATE TABLE section (
+    id           VARCHAR(200) PRIMARY KEY,  -- "{bookId}-ch{N}-s{X}-{Y}"
+    chapter_id   VARCHAR(200) NOT NULL REFERENCES chapter(id) ON DELETE CASCADE,
+    book_id      UUID NOT NULL REFERENCES book(id) ON DELETE CASCADE,
+    number       VARCHAR(50),               -- "2.3" or "12.2.3"
+    title        VARCHAR(500),
+    page_start   INT NOT NULL,
+    page_end     INT NOT NULL,
+    full_text    TEXT NOT NULL,             -- NOT in vector store
+    created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
+);
+
+CREATE INDEX idx_section_book    ON section(book_id);
+CREATE INDEX idx_section_chapter ON section(chapter_id);
+```
+
+### New tables (Flyway V5)
+
+```sql
+-- V5: Figures and chunk→figure links
+
+CREATE TABLE figure (
+    id                    VARCHAR(200) PRIMARY KEY, -- "{bookId}-fig-{label}"
+    book_id               UUID NOT NULL REFERENCES book(id) ON DELETE CASCADE,
+    section_id            VARCHAR(200) REFERENCES section(id) ON DELETE SET NULL,
+    chapter_id            VARCHAR(200) REFERENCES chapter(id) ON DELETE SET NULL,
+    label                 VARCHAR(100),             -- "Fig. 12-4"
+    caption               TEXT,
+    figure_type           VARCHAR(50) NOT NULL,     -- FigureType enum name
+    page                  INT NOT NULL,
+    image_path            VARCHAR(1000) NOT NULL,   -- relative path on disk
+    caption_embedding_id  UUID,                     -- ID in vector_store
+    created_at            TIMESTAMPTZ NOT NULL DEFAULT now()
+);
+
+CREATE TABLE chunk_figure_ref (
+    chunk_id      UUID NOT NULL,         -- vector_store document ID
+    figure_id     VARCHAR(200) NOT NULL REFERENCES figure(id) ON DELETE CASCADE,
+    mention_page  INT,
+    PRIMARY KEY (chunk_id, figure_id)
+);
+
+CREATE INDEX idx_figure_book    ON figure(book_id);
+CREATE INDEX idx_cfr_chunk      ON chunk_figure_ref(chunk_id);
+```
+
+---
+
+## Java Domain Records
+
+### Document hierarchy (new package `com.aiteacher.document`)
+
+```java
+// Root — in-memory only, not a JPA entity
+public record BookNode(
+    String bookId,
+    String title,
+    String isbn,
+    String edition,
+    List<String> authors,
+    List<ChapterNode> chapters
+) {}
+
+// Chapter — maps to `chapter` table
+public record ChapterNode(
+    String chapterId,
+    String bookId,
+    int number,
+    String title,
+    int pageStart,
+    List<SectionNode> sections
+) {}
+
+// Section — maps to `section` table; fullText stays in Postgres
+public record SectionNode(
+    String sectionId,
+    String chapterId,
+    String bookId,
+    String number,
+    String title,
+    int pageStart,
+    int pageEnd,
+    String fullText,
+    List<TextChunkNode> chunks,
+    List<FigureNode> figures
+) {}
+
+// Text chunk — embedded into vector_store; references its parent section
+public record TextChunkNode(
+    String chunkId,          // UUID → becomes vector_store document ID
+    String sectionId,
+    String chapterId,
+    String bookId,
+    String text,
+    int chunkIndex,
+    int totalChunksInSection,
+    int pageStart,
+    int pageEnd,
+    Map<String, Object> metadata   // flattened for Spring AI filtering
+) {
+    public Map<String, Object> toMetadata() {
+        return Map.of(
+            "type",          "TEXT",
+            "book_id",       bookId,
+            "chapter_id",    chapterId,
+            "section_id",    sectionId,
+            "section_title", /* from parent SectionNode */,
+            "page_start",    pageStart,
+            "page_end",      pageEnd,
+            "chunk_index",   chunkIndex,
+            "total_chunks",  totalChunksInSection
+        );
+    }
+}
+
+// Figure — maps to `figure` table; caption embedded into vector_store
+public record FigureNode(
+    String figureId,
+    String sectionId,
+    String chapterId,
+    String bookId,
+    String label,            // "Fig. 12-4"
+    String caption,
+    FigureType type,
+    int page,
+    String imagePath,        // relative: "figures/{bookId}/{figureId}.png"
+    UUID captionEmbeddingId  // ID in vector_store
+) {}
+```
+
+### Figure type enum
+
+```java
+public enum FigureType {
+    ANATOMICAL_DIAGRAM,
+    SURGICAL_PHOTOGRAPH,
+    MRI_CT_SCAN,
+    TABLE,
+    CHART,
+    INTRAOPERATIVE_IMAGE
+}
+```
+
+Classification heuristic (applied to caption + surrounding text):
+
+| Keyword(s) | FigureType |
+|-----------|-----------|
+| `MRI`, `CT`, `magnetic`, `resonance`, `tomography` | `MRI_CT_SCAN` |
+| `intraoperative`, `intra-op` | `INTRAOPERATIVE_IMAGE` |
+| `table`, `Table` (at line start) | `TABLE` |
+| `chart`, `graph`, `histogram` | `CHART` |
+| `photograph`, `photo` | `SURGICAL_PHOTOGRAPH` |
+| (default) | `ANATOMICAL_DIAGRAM` |
+
+### Chunk–figure join record
+
+```java
+// Maps to `chunk_figure_ref` table
+public record ChunkFigureRef(
+    UUID chunkId,
+    String figureId,
+    int mentionPage
+) {}
+```
+
+---
+
+## Vector Store Documents
+
+All documents in `vector_store` carry a `metadata` JSON column with a `type` field for filtering.
+
+### Text chunk document
+
+| Field | Value |
+|-------|-------|
+| `content` | chunk text (400–600 tokens) |
+| `metadata.type` | `"TEXT"` |
+| `metadata.book_id` | book UUID |
+| `metadata.book_title` | book title string |
+| `metadata.chapter_id` | chapter ID string |
+| `metadata.section_id` | section ID string |
+| `metadata.section_title` | section title string |
+| `metadata.page_start` | int |
+| `metadata.page_end` | int |
+| `metadata.chunk_index` | int (0-based) |
+| `metadata.total_chunks` | int |
+
+### Figure caption document
+
+| Field | Value |
+|-------|-------|
+| `content` | vision-generated description + caption text |
+| `metadata.type` | `"FIGURE"` |
+| `metadata.book_id` | book UUID |
+| `metadata.book_title` | book title string |
+| `metadata.chapter_id` | chapter ID string |
+| `metadata.section_id` | section ID string |
+| `metadata.figure_id` | figure ID string |
+| `metadata.figure_type` | enum name string |
+| `metadata.image_path` | relative file path |
+| `metadata.label` | caption label e.g. `"Fig. 12-4"` |
+| `metadata.page` | int |
+
+---
+
+## File Store Layout
+
+```
+uploads/
+└── figures/
+    └── {bookId}/
+        ├── {figureId}.png
+        └── ...
+```
+
+- Base path configurable via `app.figure-storage.base-path` (default: `./uploads`)
+- Files are served via `GET /api/v1/figures/{bookId}/{filename}` (static resource mapping)
+- Gitignored; not version-controlled
+
+---
+
+## State Transitions
+
+Book processing extends the existing `BookStatus` state machine:
+
+```
+PENDING → PROCESSING → READY
+                    ↘ FAILED
+```
+
+During `PROCESSING`:
+1. Parse PDF structure → extract chapters/sections → persist to Postgres
+2. Split sections into text chunks → embed → write to vector_store
+3. Extract images per page → filter by min size → save PNG → generate vision description → embed caption → write figure to Postgres + vector_store
+4. Write chunk_figure_refs for all detected figure references in text
+
+Failure at step 3 (individual page) → log + skip that page's images; continue.  
+Failure at any other step → set `BookStatus.FAILED`.
+
+---
+
+## Retrieval Result Structure
+
+```java
+public record RetrievalResult(
+    List<SectionNode> parentSections,    // expanded full-text context
+    List<Document> figureVectorHits,     // semantic figure matches
+    List<FigureNode> linkedFigures       // figures explicitly referenced in text chunks
+) {}
+```
+
+The `NeurosurgeryRetriever` service deduplicates figures across both lists before passing
+the result to the LLM prompt builder.
@@ -0,0 +1,105 @@
+# Implementation Plan: Enhanced Embedding with Image Parsing and Metadata
+
+**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03 | **Spec**: [spec.md](spec.md)  
+**Input**: Feature specification from `/specs/002-image-aware-embedding/spec.md`
+
+## Summary
+
+Enhance the book embedding pipeline to extract images from every PDF page, generate descriptive
+text for each image, and store all content (text chunks + figure captions) with rich, consistent
+metadata in the vector store. A new document hierarchy (Book → Chapter → Section → TextChunk +
+Figure) is introduced. Postgres holds the full-text sections and figure metadata; the vector
+store holds chunk and figure caption embeddings; the local file store holds extracted image files.
+At query time, both the text-chunk store and figure-caption store are searched in parallel and
+results are merged before being sent to the LLM.
+
+## Technical Context
+
+**Language/Version**: Java 25 (backend), TypeScript / Node 20 (frontend)  
+**Primary Dependencies**: Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (embeddings + chat), PDFBox (via Spring AI PDF reader dependency)  
+**Storage**: PostgreSQL (JPA + Flyway), pgvector (Spring AI `VectorStore`), local file system (extracted images — `/uploads/figures/`)  
+**Testing**: Spring Boot Test, JUnit 5, Mockito  
+**Target Platform**: Linux server (Docker Compose)  
+**Project Type**: Web application — backend REST API + Vue 3 frontend  
+**Performance Goals**: Full book (up to 500 pages with images) processed in ≤ 30 minutes; query response unchanged from existing baseline  
+**Constraints**: No new deployable units; all changes within the existing `backend/` module; image storage on local disk (S3 migration is a future concern, behind an interface)  
+**Scale/Scope**: POC — <10 concurrent users; single shared book library
+
+## Constitution Check
+
+*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
+
+| Principle | Status | Notes |
+|-----------|--------|-------|
+| I — KISS | ⚠️ Justified violation — see Complexity Tracking | Hierarchical model + dual search adds complexity; justified by precision requirement |
+| II — Easy to Change | ✅ | Figure storage wrapped behind `FigureStorageService` interface; can swap local disk for S3 |
+| III — Web-First | ✅ | All new capabilities exposed via existing REST API; no new deployable units |
+| IV — Docs as Architecture | ⚠️ Required | README Mermaid diagram MUST be updated in this PR to show new storage tiers |
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/002-image-aware-embedding/
+├── plan.md              # This file
+├── research.md          # Phase 0 output
+├── data-model.md        # Phase 1 output
+├── quickstart.md        # Phase 1 output
+├── contracts/           # Phase 1 output
+└── tasks.md             # Phase 2 output (/speckit.tasks)
+```
+
+### Source Code (repository root)
+
+```text
+backend/
+├── src/main/java/com/aiteacher/
+│   ├── book/
+│   │   ├── Book.java                         (existing)
+│   │   ├── BookController.java               (existing)
+│   │   ├── BookService.java                  (existing)
+│   │   ├── BookRepository.java               (existing)
+│   │   ├── BookStatus.java                   (existing)
+│   │   ├── BookEmbeddingService.java         (existing — enhanced)
+│   │   └── NoKnowledgeSourceException.java   (existing)
+│   ├── document/                             (new package)
+│   │   ├── BookNode.java
+│   │   ├── ChapterNode.java
+│   │   ├── SectionNode.java
+│   │   ├── SectionRepository.java
+│   │   ├── TextChunkNode.java
+│   │   ├── FigureNode.java
+│   │   ├── FigureRepository.java
+│   │   ├── FigureType.java
+│   │   ├── ChunkFigureRef.java
+│   │   └── ChunkFigureRefRepository.java
+│   ├── figure/                               (new package)
+│   │   ├── FigureStorageService.java         (interface)
+│   │   └── LocalFigureStorageService.java    (implementation)
+│   ├── retrieval/                            (new package)
+│   │   └── NeurosurgeryRetriever.java
+│   ├── chat/
+│   │   └── ChatService.java                  (updated — uses NeurosurgeryRetriever)
+│   └── config/
+│       └── FigureStorageConfig.java          (new — configures upload dir)
+└── src/main/resources/
+    └── db/migration/
+        ├── V4__document_hierarchy.sql        (new)
+        └── V5__figures_and_refs.sql          (new)
+
+uploads/
+└── figures/                                  (runtime — extracted images; gitignored)
+```
+
+**Structure Decision**: Option 2 (Web Application) confirmed. All backend changes stay within
+`backend/`. Two new packages (`document/`, `retrieval/`) plus one interface package (`figure/`)
+keep concerns separated without adding a deployable unit.
+
+## Complexity Tracking
+
+| Violation | Why Needed | Simpler Alternative Rejected Because |
+|-----------|------------|-------------------------------------|
+| Document hierarchy (BookNode → ChapterNode → SectionNode) | Parent-child retrieval: chunks reference their parent section so the LLM receives full section context, not just the matching fragment. This is the established solution for RAG precision. | Flat page-per-doc model (current) loses inter-sentence context; chunk-only retrieval produces incomplete answers for multi-paragraph clinical questions |
+| Dual vector search (text chunks + figure captions) | Figure captions must be independently searchable — a query about "cavernous sinus anatomy" must surface the diagram even if no text chunk scores highly | Single vector store search would miss figures whose captions don't happen to be the highest-similarity hit; this is the core deliverable of the feature |
+| Third storage tier (local file store for images) | Extracted images cannot live in Postgres (binary blobs degrade query performance) or the vector store (only vectors). A file-per-image approach is standard. | Storing images as base64 in Postgres JSONB would bloat the DB and complicate backup/restore; the `FigureStorageService` interface keeps the implementation swappable |
@@ -0,0 +1,86 @@
+# Quickstart: Enhanced Embedding with Image Parsing and Metadata
+
+**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
+
+---
+
+## Prerequisites
+
+- Docker Compose running (PostgreSQL + pgvector)
+- OpenAI API key set in `backend/src/main/resources/application.properties` or as env var `OPENAI_API_KEY`
+- Java 25 + Maven on PATH
+
+---
+
+## New Configuration
+
+Add to `backend/src/main/resources/application.properties`:
+
+```properties
+# Figure storage
+app.figure-storage.base-path=./uploads
+app.figure-storage.min-image-size-px=100
+```
+
+The `uploads/figures/` directory is created automatically on first use. Add it to `.gitignore`.
+
+---
+
+## Database Migration
+
+Two new Flyway migrations run automatically on startup:
+
+- `V4__document_hierarchy.sql` — adds `chapter` and `section` tables
+- `V5__figures_and_refs.sql` — adds `figure` and `chunk_figure_ref` tables
+
+No manual DB setup needed.
+
+---
+
+## Re-embedding Existing Books
+
+Books embedded by feature 001 (text-only) remain functional for text queries. To add image
+support, trigger a re-embed:
+
+```bash
+curl -X POST http://localhost:8080/api/v1/books/{bookId}/reembed \
+  -u admin:password
+```
+
+The book transitions to `PROCESSING`, old chunks and figures are deleted, and the new
+image-aware pipeline runs. Status can be polled via `GET /api/v1/books`.
+
+---
+
+## Verifying Image Extraction
+
+1. Upload a PDF with diagrams: `POST /api/v1/books/upload`
+2. Wait for `status: "READY"` via `GET /api/v1/books`
+3. List figures: `GET /api/v1/books/{id}/figures` — should return at least one entry per image page
+4. Ask a diagram-specific question in chat — response `sources` should include a `type: "FIGURE"` entry
+
+---
+
+## Frontend: Rendering Inline Figures
+
+The assistant message `content` field will contain figure references in the format
+`[Fig. 12-4, p.184]`. The frontend should:
+
+1. Parse `[Fig. X, p.N]` patterns in assistant message text
+2. Look up the matching entry in `sources` where `type === "FIGURE"`
+3. Render the figure inline using the `imageUrl` field
+
+---
+
+## Running Tests
+
+```bash
+cd backend
+mvn test
+```
+
+Key new test classes:
+- `FigureExtractionServiceTest` — unit tests for image extraction and classification
+- `NeurosurgeryRetrieverTest` — unit tests for dual-search merge and deduplication
+- `BookEmbeddingServiceIntegrationTest` — integration test: upload PDF with known figures,
+  verify figures appear in `GET /api/v1/books/{id}/figures`
@@ -0,0 +1,188 @@
+# Research: Enhanced Embedding with Image Parsing and Metadata
+
+**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
+
+This document resolves all technical unknowns identified during planning. The primary source for
+decisions is the detailed architecture provided directly by the project owner, supplemented by
+Spring AI 2.0.0-M4 API specifics.
+
+---
+
+## Decision 1: Document Hierarchy Model
+
+**Decision**: Adopt a four-level hierarchy — `BookNode` → `ChapterNode` → `SectionNode` →
+`TextChunkNode` + `FigureNode`. The `SectionNode` is the pivotal unit: it holds the full section
+text in Postgres and is used for parent-child context expansion at retrieval time.
+
+**Rationale**: A flat page-per-document model (current implementation) loses structural context.
+When a user asks a multi-faceted clinical question, the LLM needs the surrounding section text,
+not just the matching fragment. Parent-child retrieval — where chunks point to their parent
+section — is the established pattern for RAG precision. The hierarchy also makes figure-to-section
+association explicit and queryable.
+
+**Alternatives considered**:
+- Keep flat page model, add metadata only → rejected: insufficient for precise citation and
+  context expansion
+- Chapter-level retrieval (coarser than section) → rejected: too much irrelevant context sent
+  to LLM; cost and latency increase
+
+---
+
+## Decision 2: Image Extraction Strategy
+
+**Decision**: Use PDFBox (already on classpath via `spring-ai-pdf-document-reader`) to extract
+images per page. Each image is tagged with `page`, `figure_id` (derived from caption, e.g.
+"Fig. 12-4"), and the parent `sectionId`. Images are saved to local disk under
+`/uploads/figures/{bookId}/`.
+
+**Rationale**: PDFBox is already present (Spring AI bundles it). No new dependency needed.
+Per-page extraction ensures every image is captured regardless of PDF structure.
+
+**Alternatives considered**:
+- iText / iText7 → additional commercial dependency; overkill for extraction
+- Screenshot each page as PNG, then OCR → far slower; loses vector quality
+
+---
+
+## Decision 3: Figure Content Representation
+
+**Decision**: Generate a textual description of each extracted image using the OpenAI vision
+model (GPT-4o). This description becomes the `content` field of the figure's vector store
+document. The figure caption (parsed from the surrounding text) is also included to maximise
+retrieval signal.
+
+**Rationale**: Caption-only embedding would miss figures with no caption or with sparse labels.
+Vision-generated descriptions produce richer semantic content (anatomy terms, structural
+relationships) that matches clinical queries. The OpenAI client already in use supports image
+inputs; no additional dependency is required.
+
+**Alternatives considered**:
+- Caption-only embedding → insufficient when captions are absent or terse (common in textbooks)
+- Local vision model (LLaVA) → requires self-hosting; out of scope for POC
+- OCR only → extracts text visible in image but misses non-text visual content (diagrams, MRI)
+
+---
+
+## Decision 4: Dual Vector Search
+
+**Decision**: At query time, run two parallel similarity searches:
+1. Text chunk search (filtered by `type = "TEXT"` and `book_id`)
+2. Figure caption search (filtered by `type = "FIGURE"` and `book_id`)
+
+Results are merged and deduplicated. The LLM prompt receives the expanded parent section text
+plus a structured figure reference list.
+
+**Rationale**: A single search would rank text and figures against each other; figures with
+terse captions would systematically lose to text chunks. Separate searches with independent
+`topK` allow tuning each modality independently.
+
+**Alternatives considered**:
+- Single search, filter by relevance score → figure captions score lower than text; figures
+  are systematically under-retrieved
+- Post-process text results to look up linked figures only → misses figures that are relevant
+  to the query but not explicitly referenced in the retrieved text chunks
+
+---
+
+## Decision 5: Chunk-to-Figure Linking
+
+**Decision**: During text parsing, whenever a pattern matching `Fig.\s+\d+[\-\.]\d+` or
+`Figure\s+\d+[\-\.]\d+` is found in a chunk, insert a row into the `chunk_figure_refs` table
+linking `chunkId` → `figureId`. At retrieval time, after text chunks are retrieved, their
+associated figures are fetched from this table and added to the LLM prompt.
+
+**Rationale**: Explicit linking ensures that when a text chunk is retrieved, its referenced
+figures are always surfaced — even if the figure's caption did not score highly in the vector
+search. This is the higher-recall path; dual search (Decision 4) is the higher-precision path.
+
+**Alternatives considered**:
+- Rely entirely on dual vector search → may miss figures referenced in retrieved text but
+  scoring below the topK threshold in the figure search
+
+---
+
+## Decision 6: Image Storage
+
+**Decision**: Extracted images are saved as PNG files to a local directory
+(`${app.figure-storage.base-path}`, defaults to `./uploads/figures/{bookId}/`). The path is
+stored in `figure.image_path` in Postgres. A `FigureStorageService` interface wraps all disk
+I/O so the implementation can be swapped to S3 or another object store without changing
+callers.
+
+**Rationale**: Local disk is the simplest viable option for a POC with <10 users. The interface
+boundary satisfies Constitution Principle II (Easy to Change).
+
+**Alternatives considered**:
+- S3 from day 1 → operational overhead not justified at POC scale
+- Base64 in Postgres JSONB → bloats DB; complicates backup; query performance degrades
+
+---
+
+## Decision 7: Figure Type Classification
+
+**Decision**: Use the enum `FigureType { ANATOMICAL_DIAGRAM, SURGICAL_PHOTOGRAPH, MRI_CT_SCAN,
+TABLE, CHART, INTRAOPERATIVE_IMAGE }`. Classification is derived from:
+1. Caption keywords ("MRI", "CT", "Fig.", "Table") — heuristic, no model needed
+2. Fall back to `ANATOMICAL_DIAGRAM` if unclassifiable
+
+**Rationale**: Allows the frontend to render different icon/label per type (e.g., "MRI" badge).
+Heuristic classification avoids a separate model call per image at extraction time.
+
+**Alternatives considered**:
+- Vision model classification → accurate but adds latency and cost per figure; deferrable
+- Single `FIGURE` type → loses citation granularity required by spec FR-004
+
+---
+
+## Decision 8: Metadata Schema for Vector Store Documents
+
+**Decision**: All vector store documents carry a flat `Map<String, Object>` metadata for Spring
+AI filtering. Schema:
+
+| Field | Text Chunk | Figure Chunk |
+|-------|-----------|-------------|
+| `type` | `"TEXT"` | `"FIGURE"` |
+| `book_id` | ✓ | ✓ |
+| `book_title` | ✓ | ✓ |
+| `chapter_id` | ✓ | ✓ |
+| `section_id` | ✓ | ✓ |
+| `section_title` | ✓ | ✓ |
+| `page_start` | ✓ | — |
+| `page_end` | ✓ | — |
+| `chunk_index` | ✓ | — |
+| `total_chunks` | ✓ | — |
+| `figure_id` | — | ✓ |
+| `figure_type` | — | ✓ |
+| `image_path` | — | ✓ |
+| `label` | — | ✓ |
+| `page` | — | ✓ |
+
+**Rationale**: Flat map is required by Spring AI `FilterExpressionBuilder`. Separation by `type`
+allows independent filtering in dual search.
+
+---
+
+## Decision 9: Re-embedding Existing Books
+
+**Decision**: Books already processed under feature 001 (text-only) are NOT automatically
+re-embedded. An explicit re-embed action is exposed via `POST /api/v1/books/{id}/reembed`
+(admin-triggered). The existing chunks remain valid for text queries until re-embedding completes.
+
+**Rationale**: Automatic re-embedding on deploy would block the system and risk data loss if
+the process fails mid-way. An explicit, idempotent trigger is safer and more observable.
+
+---
+
+## Decision 10: Minimum Image Size Threshold
+
+**Decision**: Images smaller than 100×100 pixels are discarded and no chunk is created. This
+threshold filters out decorative elements (bullets, dividers, publisher logos) without a
+classification model.
+
+**Rationale**: Neurosurgery textbook diagrams and MRI scans are never smaller than 100×100 px.
+The threshold is configurable via `app.figure-storage.min-image-size-px` in
+`application.properties`.
+
+**Alternatives considered**:
+- No threshold → decorative icons pollute the figure index
+- ML-based classification → accurate but adds model dependency; not needed at POC scale
@@ -0,0 +1,176 @@
+# Feature Specification: Enhanced Embedding with Image Parsing and Metadata
+
+**Feature Branch**: `002-image-aware-embedding`  
+**Created**: 2026-04-03  
+**Status**: Draft  
+**Input**: User description: "I want to enhance the embedding process. I want also parse image from each pages if any and add proper metadata so that it can match the retrieved chunk/vector that match what user are querying."
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 - Image Content Surfaced in Query Results (Priority: P1)
+
+A neurosurgeon asks a question in the chat (e.g., "Show me the anatomy of the Circle of Willis")
+that is best answered by a diagram or figure in an uploaded book. The system retrieves the image
+content — its description and surrounding context — and uses it to construct a grounded answer,
+citing the page and book where the image appeared.
+
+**Why this priority**: This is the direct, user-visible payoff of the feature. Without it, the
+enhancement has no observable benefit. All other stories support this outcome.
+
+**Independent Test**: Upload a book containing a labelled anatomical diagram. Ask a query whose
+answer is conveyed by that diagram (not in the surrounding text). Confirm the system returns an
+answer that references the diagram's content and cites the correct book and page.
+
+**Acceptance Scenarios**:
+
+1. **Given** a book with an anatomical diagram on page 42, **When** a user asks a question whose
+   answer is only depicted in that diagram, **Then** the system returns a response that draws on
+   the diagram's content and cites "Page 42, [Book Title]".
+2. **Given** a page with both text and an image, **When** the system retrieves that page's content,
+   **Then** the image-derived content and the surrounding text are each independently retrievable
+   and independently citable.
+3. **Given** a query that has no relevant image in any uploaded book, **When** the system searches,
+   **Then** it does not fabricate image-derived content and falls back to text-only results (or
+   states no relevant content was found).
+
+---
+
+### User Story 2 - All Pages Scanned for Images During Embedding (Priority: P1)
+
+When a book is uploaded and processed, every page is inspected for images. Any image found is
+extracted and represented as a searchable content chunk enriched with metadata (page number,
+book title, position on page, caption if present). Pages without images are processed as
+text-only chunks, unchanged from the existing behaviour.
+
+**Why this priority**: This is the prerequisite for User Story 1. Without systematic per-page
+image detection, image content cannot be retrieved.
+
+**Independent Test**: Upload a book whose pages include a mix of text-only and image-containing
+pages. After processing completes, verify that chunks exist for each image page and that each
+image chunk carries the correct metadata (page number, source book, caption).
+
+**Acceptance Scenarios**:
+
+1. **Given** a book being processed, **When** the embedding pipeline runs, **Then** every page
+   is evaluated for images and each detected image generates at least one content chunk.
+2. **Given** an image with a caption or label, **When** the chunk is created, **Then** the
+   caption or label text is included in the chunk's content and metadata.
+3. **Given** a page with multiple images, **When** processing completes, **Then** each image is
+   represented as a separate chunk with its own metadata, not merged into a single chunk.
+4. **Given** a page with no images, **When** processing completes, **Then** no image chunk is
+   created for that page and text processing is unaffected.
+
+---
+
+### User Story 3 - Rich Metadata Enables Precise Source Attribution (Priority: P2)
+
+When the system returns a result based on image content, the user can see exactly where that
+image appeared: which book, which page, and what type of content (diagram, table, photograph,
+etc.). This gives the user confidence in the source and lets them locate the original image
+in their physical or digital copy of the book.
+
+**Why this priority**: Metadata quality directly impacts user trust. Neurosurgeons require
+traceable, citable evidence. Richer metadata also improves retrieval accuracy by giving the
+search engine more signals to match against a query.
+
+**Independent Test**: Retrieve a result sourced from an image chunk. Inspect the displayed
+citation and verify it includes: book title, page number, content type (e.g., "diagram"),
+and caption (if present in the original).
+
+**Acceptance Scenarios**:
+
+1. **Given** a retrieved image chunk, **When** the system displays the source citation,
+   **Then** the citation includes at minimum: book title, page number, and a content-type
+   label (e.g., diagram, table, figure).
+2. **Given** an image chunk with a detected caption, **When** the citation is displayed,
+   **Then** the caption text is shown alongside the other metadata fields.
+3. **Given** a topic summary that draws on both text and image chunks, **When** the user
+   inspects citations, **Then** image-sourced and text-sourced claims are distinguishable
+   from each other.
+
+---
+
+### Edge Cases
+
+- What happens when an image is too small to contain meaningful content (e.g., a decorative
+  bullet icon or a publisher logo)?
+- How does the system handle a page that is entirely an image (scanned page with no digital text)?
+- What if an image spans multiple pages (e.g., a fold-out diagram)?
+- How does the system behave when an image has no caption and its surrounding text provides
+  no useful context?
+- What happens if image processing fails for a specific page — does it abort the whole book
+  or continue with the remaining pages?
+
+## Requirements *(mandatory)*
+
+### Functional Requirements
+
+- **FR-001**: System MUST inspect every page of an uploaded book for the presence of images
+  during the embedding process.
+- **FR-002**: System MUST extract each detected image and create a dedicated, independently
+  searchable content chunk for it.
+- **FR-003**: System MUST generate a descriptive textual representation of each extracted
+  image so its content is semantically searchable by the retrieval system.
+- **FR-004**: System MUST associate the following metadata with every image chunk: book title,
+  page number, content type (e.g., diagram, table, figure, photograph), and caption text
+  (where present).
+- **FR-005**: System MUST include the same base metadata (book title, page number) on text
+  chunks so that all retrieved content — image or text — carries consistent, comparable
+  source attribution.
+- **FR-006**: System MUST treat image chunks as first-class retrievable units: they must be
+  ranked and returned alongside text chunks when they are relevant to a user query.
+- **FR-007**: System MUST skip images that fall below a minimum meaningful-content threshold
+  (e.g., decorative icons, page separators) and MUST NOT create chunks for them.
+- **FR-008**: If image processing fails for a specific page, the system MUST log the failure,
+  skip that page's image, and continue processing the remaining pages and text content of
+  the book.
+- **FR-009**: System MUST display image-sourced content citations distinctly from text-sourced
+  citations so users can identify when a result originates from a visual element.
+- **FR-010**: Processing a book that contains images MUST NOT degrade the accuracy or
+  completeness of the existing text-only embedding for that book.
+
+### Key Entities
+
+- **Image Chunk**: A searchable content unit derived from a page image. Attributes: generated
+  description, source book title, page number, content type, caption (optional), embedding vector.
+- **Text Chunk**: Existing unit; extended to carry explicit metadata: source book title,
+  page number, section heading (if detectable), content type ("text").
+- **Chunk Metadata**: Structured attributes attached to every chunk regardless of type,
+  enabling consistent filtering and citation. Mandatory fields: book title, page number,
+  content type. Optional fields: caption, section heading.
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001**: At least 90% of pages containing images in a test book result in a retrievable
+  image chunk after processing completes.
+- **SC-002**: A controlled set of 10 queries whose answers are conveyed by diagrams in an
+  uploaded book returns at least 7 correct image-sourced answers (70% recall on image queries).
+- **SC-003**: Embedding processing time for a book with images increases by no more than 3×
+  compared to processing the same book as text-only, for books up to 500 pages.
+- **SC-004**: Every retrieved result — text or image — includes a citation that identifies
+  at minimum the source book title and page number, with 100% coverage across a test result set.
+- **SC-005**: In a user evaluation with 5 representative queries that previously returned
+  no useful results (because the answer was only in a diagram), at least 4 now return a
+  useful, grounded answer.
+
+## Assumptions
+
+- Books are still uploaded exclusively as PDFs; image parsing applies to PDF pages only.
+- The platform already has a working text-only embedding pipeline (from feature 001); this
+  feature enhances it without replacing or rewriting the text processing logic.
+- Images worth processing are those that occupy a meaningful portion of the page; small
+  decorative or structural images (logos, dividers, icons) are excluded based on a size
+  threshold determined during implementation.
+- The descriptive representation of an image (FR-003) is generated at embedding time, not
+  at query time; query latency is not affected by image interpretation.
+- The shared global book library model from feature 001 is retained; image chunks from a
+  processed book are available to all users immediately upon completion.
+- Scanned pages (fully rasterised pages with no digital text layer) are treated as a single
+  full-page image; the system attempts to extract content from them but does not guarantee
+  the same fidelity as pages with digital text.
+- Per-chunk metadata is stored alongside the vector so it can be used for both retrieval
+  filtering and source citation display without a separate lookup.
+- Books already processed under feature 001 (text-only) are not automatically re-processed;
+  re-embedding must be triggered explicitly by the user or an administrator.
@@ -0,0 +1,168 @@
+# Tasks: Enhanced Embedding with Image Parsing and Metadata
+
+**Input**: Design documents from `/specs/002-image-aware-embedding/`
+**Prerequisites**: plan.md ✓ | spec.md ✓ | research.md ✓ | data-model.md ✓ | contracts/ ✓
+
+**Organization**: Tasks grouped by user story to enable independent implementation and testing.
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel (different files, no shared dependencies)
+- **[US1/US2/US3]**: Which user story this task belongs to
+
+---
+
+## Phase 1: Setup (Shared Infrastructure)
+
+**Purpose**: Database migrations and configuration that establish the foundation for all new code
+
+- [X] T001 Create Flyway migration `V4__document_hierarchy.sql` — add `chapter` and `section` tables per data-model.md §Postgres Schema in `backend/src/main/resources/db/migration/V4__document_hierarchy.sql`
+- [X] T002 Create Flyway migration `V5__figures_and_refs.sql` — add `figure` and `chunk_figure_ref` tables per data-model.md §Postgres Schema in `backend/src/main/resources/db/migration/V5__figures_and_refs.sql`
+- [X] T003 Add figure-storage configuration keys to `backend/src/main/resources/application.properties`: `app.figure-storage.base-path=./uploads` and `app.figure-storage.min-image-size-px=100`
+- [X] T004 Add `uploads/` directory to `.gitignore` at repo root; create `uploads/figures/.gitkeep` to preserve directory structure
+
+---
+
+## Phase 2: Foundational (Blocking Prerequisites)
+
+**Purpose**: Core types and infrastructure that ALL user stories depend on — nothing in Phase 3+ can start until this phase is complete
+
+**⚠️ CRITICAL**: No user story work can begin until this phase is complete
+
+- [X] T005 [P] Create `FigureType` enum in `backend/src/main/java/com/aiteacher/document/FigureType.java` — values: `ANATOMICAL_DIAGRAM`, `SURGICAL_PHOTOGRAPH`, `MRI_CT_SCAN`, `TABLE`, `CHART`, `INTRAOPERATIVE_IMAGE`
+- [X] T006 [P] Create `FigureStorageService` interface in `backend/src/main/java/com/aiteacher/figure/FigureStorageService.java` — declare `Path save(UUID bookId, String figureId, BufferedImage image)`, `Path resolve(UUID bookId, String filename)`, and `void delete(UUID bookId)`
+- [X] T007 Create `LocalFigureStorageService` implementation in `backend/src/main/java/com/aiteacher/figure/LocalFigureStorageService.java` — writes PNG files under `${app.figure-storage.base-path}/figures/{bookId}/`; implements `FigureStorageService`; depends on T006
+- [X] T008 Create `FigureStorageConfig` bean in `backend/src/main/java/com/aiteacher/config/FigureStorageConfig.java` — reads `app.figure-storage.base-path` and `app.figure-storage.min-image-size-px` as `@ConfigurationProperties`; registers `LocalFigureStorageService` as `@Bean`; adds `ResourceHandler` mapping `GET /api/v1/figures/**` to the base-path directory
+- [X] T009 [P] Create `ChapterEntity` JPA entity and `ChapterRepository` in `backend/src/main/java/com/aiteacher/document/` — `@Entity(name="chapter")`, fields: `id` (String PK), `bookId` (UUID FK → book), `number` (int), `title` (String), `pageStart` (int), `createdAt` (Instant); `ChapterRepository extends JpaRepository<ChapterEntity, String>`
+- [X] T010 [P] Create `SectionEntity` JPA entity and `SectionRepository` in `backend/src/main/java/com/aiteacher/document/` — `@Entity(name="section")`, fields: `id` (String PK), `chapterId` (String FK → chapter), `bookId` (UUID FK → book), `number` (String), `title` (String), `pageStart`/`pageEnd` (int), `fullText` (TEXT column), `createdAt` (Instant); `SectionRepository extends JpaRepository<SectionEntity, String>` with `findAllByBookId(UUID)`
+- [X] T011 [P] Create `FigureEntity` JPA entity and `FigureRepository` in `backend/src/main/java/com/aiteacher/document/` — `@Entity(name="figure")`, fields: `id` (String PK), `bookId` (UUID), `sectionId` (String, nullable), `chapterId` (String, nullable), `label` (String), `caption` (TEXT), `figureType` (`@Enumerated` FigureType), `page` (int), `imagePath` (String), `captionEmbeddingId` (UUID, nullable), `createdAt` (Instant); `FigureRepository` with `findAllByBookId(UUID)`, `deleteAllByBookId(UUID)`
+- [X] T012 Create `ChunkFigureRefEntity` JPA entity and `ChunkFigureRefRepository` in `backend/src/main/java/com/aiteacher/document/` — composite PK `(chunkId UUID, figureId String)`, `mentionPage` (int); `ChunkFigureRefRepository` with `findByChunkIdIn(List<UUID>)`, `deleteByFigureIdIn(List<String>)`
+
+**Checkpoint**: Migrations will run on next startup; all JPA entities are wired; figure storage reads config correctly
+
+---
+
+## Phase 3: User Story 2 — All Pages Scanned for Images During Embedding (Priority: P1)
+
+**Goal**: When a book is uploaded, every page is inspected for images; each found image is extracted, persisted, described, and embedded as a searchable chunk alongside its metadata
+
+**Independent Test**: Upload a PDF containing at least one page with a labelled anatomical diagram. After status shows `READY`, call `GET /api/v1/books/{id}/figures` — response must contain at least one entry with `figureType`, `caption`, `page`, and `imageUrl` populated. Verify the PNG file exists at the path in `imagePath`.
+
+- [X] T013 [US2] Create `PdfStructureParser` service in `backend/src/main/java/com/aiteacher/document/PdfStructureParser.java` — uses Spring AI's `PagePdfDocumentReader` to extract per-page text; groups pages into `SectionEntity` records using heading-detection heuristics (lines matching `^\d+(\.\d+)*\s+[A-Z]`); groups sections into `ChapterEntity` records; persists both to Postgres via `ChapterRepository` and `SectionRepository`; returns `List<SectionEntity>` for the book
+- [X] T014 [US2] Create `FigureExtractionService` in `backend/src/main/java/com/aiteacher/document/FigureExtractionService.java` — opens PDF with PDFBox `PDDocument`; iterates pages; extracts `PDImageXObject` instances; skips images whose width or height are below `min-image-size-px`; classifies `FigureType` using the keyword-matching table from data-model.md §FigureType; parses caption from the nearest text line matching `CAPTION_PATTERN`; saves PNG via `FigureStorageService`; persists `FigureEntity` to `FigureRepository`; returns `List<FigureEntity>` per book
+- [X] T015 [US2] Create `VisionDescriptionService` in `backend/src/main/java/com/aiteacher/document/VisionDescriptionService.java` — accepts a `Path` to a PNG and a caption String; calls the OpenAI vision model (via Spring AI `ChatClient` with image media type) to generate a 2–4 sentence clinical description; returns the generated description string; handles API failures by returning the caption as fallback
+- [X] T016 [US2] Create `TextChunkingService` in `backend/src/main/java/com/aiteacher/document/TextChunkingService.java` — accepts a `SectionEntity`; splits `fullText` into overlapping 400–600 token windows (20-token overlap); wraps each window in a Spring AI `Document` with the flat metadata map defined in data-model.md §Text chunk document; returns `List<Document>`
+- [X] T017 [US2] Create `ChunkFigureRefService` in `backend/src/main/java/com/aiteacher/document/ChunkFigureRefService.java` — accepts a Spring AI `Document` (with its `id` as `chunkId`) and a `List<FigureEntity>` for the book; scans chunk text for patterns `Fig\.\s*\d+[\-\.]\d+` and `Figure\s+\d+[\-\.]\d+`; matches against figure labels; persists `ChunkFigureRefEntity` rows via `ChunkFigureRefRepository`
+- [X] T018 [US2] Rewrite `BookEmbeddingService.embedBook()` in `backend/src/main/java/com/aiteacher/book/BookEmbeddingService.java` to orchestrate the full pipeline: (1) `PdfStructureParser` → sections; (2) parallel: `FigureExtractionService` + `TextChunkingService` for each section; (3) `VisionDescriptionService` for each figure; (4) embed figure captions+descriptions as `Document`s (metadata per data-model.md §Figure caption document) into `vectorStore`; (5) embed text chunks into `vectorStore`; (6) `ChunkFigureRefService` for each chunk; update `captionEmbeddingId` on `FigureEntity` after embedding
+- [X] T019 [US2] Extend `BookEmbeddingService.deleteBookChunks()` to also delete: all `ChunkFigureRefEntity` rows (via `findByFigureIdIn`), all `FigureEntity` rows (via `deleteAllByBookId`), all figure PNG files (via `FigureStorageService.delete(bookId)`), all `SectionEntity` and `ChapterEntity` rows for the book
+- [X] T020 [US2] Add `POST /api/v1/books/{id}/reembed` endpoint to `BookController` in `backend/src/main/java/com/aiteacher/book/BookController.java` — returns `202` with `{ bookId, status: "PROCESSING" }`; returns `404` if not found; returns `409` if already `PROCESSING`; calls `deleteBookChunks()` then `embedBook()` asynchronously
+
+**Checkpoint**: Upload a PDF with figures → poll `GET /api/v1/books` for `READY` → `GET /api/v1/books/{id}/figures` returns figure list → PNG accessible at `GET /api/v1/figures/{bookId}/{filename}`
+
+---
+
+## Phase 4: User Story 1 — Image Content Surfaced in Query Results (Priority: P1)
+
+**Goal**: User asks a question answered by a diagram — the system retrieves that diagram's content and surfaces it in the chat response with a citation
+
+**Independent Test**: With a book embedded (Phase 3 checkpoint passed), ask a chat question whose answer is depicted only in a diagram. The response `sources` array must contain at least one entry with `type: "FIGURE"` and a non-empty `imageUrl`.
+
+- [X] T021 [US1] Create `NeurosurgeryRetriever` service in `backend/src/main/java/com/aiteacher/retrieval/NeurosurgeryRetriever.java` — (1) text chunk search: `vectorStore.similaritySearch` with filter `type == TEXT AND book_id == bookId`, topK=5; (2) figure search: same store, filter `type == FIGURE AND book_id == bookId`, topK=3; (3) expand text chunk results to parent sections via `SectionRepository.findAllById(sectionIds)`; (4) fetch explicitly linked figures via `ChunkFigureRefRepository.findByChunkIdIn(chunkIds)` + `FigureRepository.findAllById`; (5) deduplicate figures across lists by `figureId`; return `RetrievalResult(parentSections, figureVectorHits, linkedFigures)` — add `RetrievalResult` record in same package
+- [X] T022 [US1] Refactor `ChatService.sendMessage()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java` — replace `QuestionAnswerAdvisor` with a manual call to `NeurosurgeryRetriever`; build the LLM user message from: section full texts as `[Section X.Y — Title, pp.A-B]\n{fullText}` blocks, followed by `AVAILABLE FIGURES FOR THIS SECTION:` list with `- {label} (p.{page}): {caption} [image: {filename}]` lines per figure; append the instruction `When referencing diagrams, cite them as [Fig. X, p.N].`; send via `chatClient.prompt().system(SYSTEM_PROMPT).user(prompt).call()`
+- [X] T023 [US1] Add `GET /api/v1/books/{id}/figures` endpoint to `BookController` — returns `200` with `List<FigureResponse>`; `FigureResponse` is a new record in `backend/src/main/java/com/aiteacher/book/FigureResponse.java` with fields `figureId`, `label`, `caption`, `figureType`, `page`, `imageUrl` (assembled as `/api/v1/figures/{bookId}/{filename}`), `sectionId`, `sectionTitle`; returns `404` if book not found
+- [X] T024 [US1] Update `extractSources()` in `ChatService` to build both TEXT and FIGURE source entries: TEXT entries keep existing fields plus `"type": "TEXT"`; FIGURE entries add `"type": "FIGURE"`, `"figureId"`, `"label"`, `"caption"`, `"figureType"`, `"imageUrl"` — source data comes from `RetrievalResult` (text chunk Documents and merged FigureEntity list)
+
+**Checkpoint**: Chat question answered by a diagram → response body contains `sources[n].type == "FIGURE"` with populated `imageUrl`; image loads from the returned URL
+
+---
+
+## Phase 5: User Story 3 — Rich Metadata Enables Precise Source Attribution (Priority: P2)
+
+**Goal**: Users see distinct, informative citations for text vs. image sources; image sources render inline in the chat UI
+
+**Independent Test**: After triggering a response with figure sources, inspect the chat message in the UI — text sources and figure sources are visually distinguishable; figure sources render the actual image inline using the `imageUrl`
+
+- [X] T025 [P] [US3] Update API response types in `frontend/src/services/api.ts` — extend the `Source` type to include `type: 'TEXT' | 'FIGURE'`, `figureId?: string`, `label?: string`, `caption?: string`, `figureType?: string`, `imageUrl?: string`
+- [X] T026 [P] [US3] Update the chat source/citation display in the frontend (wherever sources are currently rendered, e.g. `frontend/src/components/` or `frontend/src/views/`) — render TEXT sources with a document icon and page number; render FIGURE sources with the image (`<img :src="source.imageUrl">`) below the label and caption text
+- [X] T027 [US3] Add figure-type badge rendering in the frontend figure display: show a label derived from `figureType` (e.g. "MRI / CT", "Anatomical Diagram", "Table") alongside the figure caption so users can identify content type without opening the image
+
+---
+
+## Phase 6: Polish & Cross-Cutting Concerns
+
+- [X] T028 Update `README.md` Mermaid architecture diagram to show three storage tiers: pgvector (semantic search), Postgres (source of truth — sections, figures, refs), and file store (extracted PNGs) — **required by Constitution Principle IV in the same PR as the other changes**
+- [X] T029 [P] Write `FigureExtractionServiceTest` unit test in `backend/src/test/java/com/aiteacher/document/FigureExtractionServiceTest.java` — test: images below min size are skipped; `FigureType` classification matches keyword table in data-model.md; caption parsed from adjacent text line
+- [X] T030 [P] Write `NeurosurgeryRetrieverTest` unit test in `backend/src/test/java/com/aiteacher/retrieval/NeurosurgeryRetrieverTest.java` — test: figure IDs from both vector hits and chunk refs are merged without duplicates; `RetrievalResult` contains the deduplicated set
+- [X] T031 Run quickstart.md validation end-to-end: upload a real PDF with a labelled diagram → wait for `READY` → call `GET /api/v1/books/{id}/figures` → send a chat message about the diagram → verify `sources` contains a `FIGURE` entry → verify `imageUrl` resolves to a PNG
+
+---
+
+## Dependencies & Execution Order
+
+### Phase Dependencies
+
+- **Phase 1 (Setup)**: No dependencies — start immediately
+- **Phase 2 (Foundational)**: Requires Phase 1 complete (migrations must run before JPA entities can be wired)
+- **Phase 3 (US2)**: Requires Phase 2 complete — all JPA entities + FigureStorageService must exist
+- **Phase 4 (US1)**: Requires Phase 3 complete — figures must exist in Postgres + vector store before retrieval can surface them
+- **Phase 5 (US3)**: Requires Phase 4 complete — frontend depends on the extended `sources` format from T024
+- **Phase 6 (Polish)**: Requires all story phases complete
+
+### Within Phase 3 (Embedding Pipeline)
+
+```
+T013 (PdfStructureParser) ──────────────────────────┐
+T014 (FigureExtractionService) ─────────────────────┤
+T015 (VisionDescriptionService) ────────────────────┤─→ T018 (BookEmbeddingService orchestrator)
+T016 (TextChunkingService) ─────────────────────────┤           └─→ T019 (cleanup)
+T017 (ChunkFigureRefService) ───────────────────────┘           └─→ T020 (reembed endpoint)
+```
+
+T013–T017 can be implemented in parallel (different files, no shared dependencies). T018 depends on all of them.
+
+### Within Phase 4 (Retrieval)
+
+```
+T021 (NeurosurgeryRetriever) ──────────────────────┐
+                                                   └─→ T022 (ChatService update)
+                                                   └─→ T024 (extractSources update)
+T023 (figures endpoint) ── independent [P]
+```
+
+### Parallel Opportunities per Phase
+
+**Phase 2**: T005, T006, T009, T010, T011 can all run in parallel. T007 depends on T006. T012 can follow T010/T011.
+
+**Phase 3**: T013, T014, T015, T016, T017 all in parallel. T018 depends on all.
+
+**Phase 5**: T025 and T026 in parallel; T027 can follow T026.
+
+**Phase 6**: T029 and T030 in parallel.
+
+---
+
+## Implementation Strategy
+
+### MVP: User Story 2 Only (Embedding Pipeline)
+
+1. Phase 1 (Setup) → Phase 2 (Foundational) → Phase 3 (US2, T013–T020)
+2. **Validate**: `GET /api/v1/books/{id}/figures` returns figures for a test book
+3. **Stop and demo** — the pipeline produces image chunks without any retrieval changes
+
+### Full Feature Delivery
+
+1. Phase 1 + 2 → Foundation ready
+2. Phase 3 (US2) → Embedding pipeline produces image chunks ← **demo point**
+3. Phase 4 (US1) → Chat surfaces image content in responses ← **core payoff**
+4. Phase 5 (US3) → Frontend renders inline figures with type badges
+5. Phase 6 (Polish) → README, tests, end-to-end validation
+
+---
+
+## Notes
+
+- [P] tasks = different files, no dependencies on each other within the same phase
+- [US1/US2/US3] label maps each task to a user story for traceability
+- Phase 3 (US2) must be fully complete before beginning Phase 4 (US1) — retrieval cannot surface figures that do not yet exist
+- The `uploads/figures/` directory must exist and be writable at runtime; `FigureStorageService` creates subdirectories automatically
+- Re-embedding (T020) deletes all existing chunks and figures for the book before re-running — safe to call on books processed by feature 001