first implementation - image/drawing integration
This commit is contained in:
@@ -0,0 +1,73 @@
|
||||
# Embedding & Retrieval Pipeline Checklist: Enhanced Embedding with Image Parsing and Metadata
|
||||
|
||||
**Purpose**: Author self-review of embedding pipeline and retrieval requirements quality — validates completeness, clarity, and measurability before implementation tasks are written
|
||||
**Created**: 2026-04-03
|
||||
**Feature**: [spec.md](../spec.md) | [research.md](../research.md) | [data-model.md](../data-model.md)
|
||||
**Focus**: A (Embedding pipeline) + B (Retrieval & ranking) | Depth: Standard | Audience: Author
|
||||
|
||||
---
|
||||
|
||||
## Requirement Completeness — Embedding Pipeline
|
||||
|
||||
- [X] CHK001 - Is the definition of "inspect every page" complete — does the spec cover pages that have no extractable content layer (fully scanned/rasterised pages)? Yes [Completeness, Spec §FR-001, Assumption §6]
|
||||
|
||||
- [X] CHK002 - Does FR-002 define what "independently searchable" means in practice — specifically, is it clear that image chunks must be retrievable without a co-located text chunk? [Clarity, Spec §FR-002] - No image should be retrieved along linked text.
|
||||
|
||||
- [X] CHK003 - Is the minimum acceptable quality of the "descriptive textual representation" (FR-003) specified — e.g., must it include structural relationships, labelled regions, or clinical terms — or is any non-empty description sufficient? [Clarity, Spec §FR-003, Gap] - any non-empty description sufficient. Text just below the image should have the correct clinical term.
|
||||
|
||||
- [C] CHK004 - Are the caption-detection rules defined at spec level — specifically, what pattern or signal determines that a piece of text is a caption vs. body text adjacent to an image? [Clarity, Spec §FR-004, Gap] - We assume a text starting with Fig. follewed by number is a text description of a give image.
|
||||
|
||||
- [X] CHK005 - Does FR-004 specify what metadata is stored when a caption is absent — is the caption field omitted, left empty, or populated with a generated substitute? [Completeness, Spec §FR-004] - generated substitute
|
||||
|
||||
- [X] CHK006 - Is the "minimum meaningful-content threshold" (FR-007) quantified in the spec, or is it deferred entirely to implementation? The assumption section says "size threshold determined during implementation" — is this intentional and acceptable at the spec level? [Ambiguity, Spec §FR-007, Assumption §3] - Deferred to implementation
|
||||
|
||||
- [X] CHK007 - Does FR-008 specify the observable outcome of per-page image failures — specifically, is there a requirement that the book's processing status or error log is accessible to the user or admin after partial failure? [Completeness, Spec §FR-008, Gap] online logs
|
||||
|
||||
- [X] CHK008 - Is FR-010 ("MUST NOT degrade accuracy or completeness of text-only embedding") measurable — does the spec define a baseline or acceptance criterion against which degradation can be detected? [Measurability, Spec §FR-010, Gap] no definition
|
||||
|
||||
- [X] CHK009 - Are re-embedding requirements complete — does the spec cover what happens to in-progress queries and cached results while a book is being re-embedded? [Coverage, Assumption §8, Gap] - No need to take that into account.
|
||||
|
||||
---
|
||||
|
||||
## Requirement Completeness — Retrieval & Ranking
|
||||
|
||||
- [X] CHK010 - Does FR-006 define how image and text chunks are ranked relative to each other — is ranking unified (single score), or are the two modalities ranked independently with separate topK controls? [Clarity, Spec §FR-006, Gap] - independent separated topK
|
||||
|
||||
- [X] CHK011 - Is the relevance threshold for figure retrieval specified — i.e., at what similarity score (or other criterion) should a figure be excluded from results? [Clarity, Spec §FR-006, Gap] not specified
|
||||
|
||||
- [X] CHK012 - Are deduplication rules defined for the case where the same figure appears both in the semantic figure search and the chunk-to-figure reference lookup — which representation wins, or are both included? [Completeness, data-model.md §RetrievalResult, Gap] not specified
|
||||
|
||||
- [X] CHK013 - Is the requirement for parent section context expansion in the spec — specifically, is there a requirement that the LLM receives the full section text (not just the chunk) when a text chunk is retrieved? [Gap, research.md §Decision 1] - the LLM should receive the full section to have maximum context.
|
||||
|
||||
- [X] CHK014 - Does the spec define the required structure of the LLM prompt when both text context and figures are present — or is prompt design left entirely to implementation? [Completeness, Gap] - Left to implementation
|
||||
|
||||
- [X] CHK015 - Is SC-002 ("70% recall on image queries") sufficient as a measurability criterion — is the test set composition (10 queries) and evaluation method documented, or does it rely on an undefined manual process? [Measurability, Spec §SC-002] - Manual process.
|
||||
|
||||
---
|
||||
|
||||
## Scenario Coverage — Edge & Exception Cases
|
||||
|
||||
- [X] CHK016 - Does the spec address the scenario where a query is relevant to a book section that has figures but none of those figures rank above the retrieval threshold — is the expected fallback behaviour defined? [Coverage, Edge Case, Gap] - The figure should in this case be retrieved and shon to the user.
|
||||
|
||||
- [X] CHK017 - Is the scenario of a figure retrieved in search results but whose image file is missing from the file store covered — what should the system return to the user in that case? [Coverage, Exception Flow, Gap] - missing image error, shown in the front as a broken image link.
|
||||
|
||||
- [X] CHK018 - Are requirements defined for multi-image pages where images have conflicting captions or share a single composite caption — which image gets the caption, or is it duplicated? [Coverage, Spec §FR-004, Edge Case] - this case not exist.
|
||||
|
||||
---
|
||||
|
||||
## Consistency & Alignment
|
||||
|
||||
- [X] CHK019 - Are the metadata fields required by FR-004 and FR-005 fully consistent with the metadata schema defined in data-model.md — specifically, do the mandatory fields in the spec match the `type`, `section_id`, and `section_title` fields in the data model? [Consistency, Spec §FR-004, data-model.md §Vector Store Documents] - Left to implementation
|
||||
|
||||
- [X] CHK020 - Is SC-003 ("processing time ≤ 3× baseline") consistent with FR-003 — if description generation requires a vision model call per image, is the 3× cap realistic for a 500-page book with dense figures, and is this assumption documented? [Consistency, Spec §SC-003, Assumption §3, Gap] - not documented
|
||||
|
||||
- [X] CHK021 - Does the spec's description of citation display (FR-009) align with the `sources` format change documented in contracts/api.md — are the fields the spec says must be "distinct" actually represented distinctly in the API response? [Consistency, Spec §FR-009, contracts/api.md §4] - A section with image-source should be displayed in the front. Text source and image-source are distinct
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- Items marked `[Gap]` indicate requirements that appear absent or deferred; resolve before generating tasks
|
||||
- Items marked `[Ambiguity]` require a clearer definition in the spec before implementation starts
|
||||
- Items marked `[Consistency]` should be cross-checked between spec.md, data-model.md, and contracts/api.md
|
||||
- Mark items `[x]` when resolved; add inline notes with the resolution for traceability
|
||||
@@ -0,0 +1,34 @@
|
||||
# Specification Quality Checklist: Enhanced Embedding with Image Parsing and Metadata
|
||||
|
||||
**Purpose**: Validate specification completeness and quality before proceeding to planning
|
||||
**Created**: 2026-04-03
|
||||
**Feature**: [spec.md](../spec.md)
|
||||
|
||||
## Content Quality
|
||||
|
||||
- [x] No implementation details (languages, frameworks, APIs)
|
||||
- [x] Focused on user value and business needs
|
||||
- [x] Written for non-technical stakeholders
|
||||
- [x] All mandatory sections completed
|
||||
|
||||
## Requirement Completeness
|
||||
|
||||
- [x] No [NEEDS CLARIFICATION] markers remain
|
||||
- [x] Requirements are testable and unambiguous
|
||||
- [x] Success criteria are measurable
|
||||
- [x] Success criteria are technology-agnostic (no implementation details)
|
||||
- [x] All acceptance scenarios are defined
|
||||
- [x] Edge cases are identified
|
||||
- [x] Scope is clearly bounded
|
||||
- [x] Dependencies and assumptions identified
|
||||
|
||||
## Feature Readiness
|
||||
|
||||
- [x] All functional requirements have clear acceptance criteria
|
||||
- [x] User scenarios cover primary flows
|
||||
- [x] Feature meets measurable outcomes defined in Success Criteria
|
||||
- [x] No implementation details leak into specification
|
||||
|
||||
## Notes
|
||||
|
||||
- All items pass. Spec is ready for `/speckit.clarify` or `/speckit.plan`.
|
||||
@@ -0,0 +1,172 @@
|
||||
# API Contracts: Enhanced Embedding with Image Parsing and Metadata
|
||||
|
||||
**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
|
||||
**Base path**: `/api/v1`
|
||||
**Auth**: HTTP Basic (existing)
|
||||
|
||||
---
|
||||
|
||||
## New / Changed Endpoints
|
||||
|
||||
### 1. Re-embed a book (new)
|
||||
|
||||
Triggers a full re-embedding of an already-processed book, replacing all existing chunks and
|
||||
figures with the new image-aware pipeline output. Safe to call on books previously embedded
|
||||
by feature 001.
|
||||
|
||||
```
|
||||
POST /api/v1/books/{id}/reembed
|
||||
```
|
||||
|
||||
**Path parameters**
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `id` | UUID | Book ID |
|
||||
|
||||
**Response** `202 Accepted`
|
||||
|
||||
```json
|
||||
{ "bookId": "uuid", "status": "PROCESSING" }
|
||||
```
|
||||
|
||||
**Error responses**
|
||||
|
||||
| Status | Condition |
|
||||
|--------|-----------|
|
||||
| 404 | Book not found |
|
||||
| 409 | Book already in PROCESSING state |
|
||||
|
||||
---
|
||||
|
||||
### 2. Get figures for a book (new)
|
||||
|
||||
Returns the list of extracted figures for a book, including their type, caption, and image URL.
|
||||
Used by the frontend to display a figure gallery or inline figures in chat responses.
|
||||
|
||||
```
|
||||
GET /api/v1/books/{id}/figures
|
||||
```
|
||||
|
||||
**Path parameters**
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `id` | UUID | Book ID |
|
||||
|
||||
**Response** `200 OK`
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"figureId": "youmans-7ed-fig-12-4",
|
||||
"label": "Fig. 12-4",
|
||||
"caption": "Coronal cross-section of the cavernous sinus showing cranial nerve relationships",
|
||||
"figureType": "ANATOMICAL_DIAGRAM",
|
||||
"page": 184,
|
||||
"imageUrl": "/api/v1/figures/550e8400-e29b-41d4-a716-446655440000/youmans-7ed-fig-12-4.png",
|
||||
"sectionId": "youmans-7ed-ch12-s2-3",
|
||||
"sectionTitle": "Cavernous Sinus"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
**Error responses**
|
||||
|
||||
| Status | Condition |
|
||||
|--------|-----------|
|
||||
| 404 | Book not found |
|
||||
|
||||
---
|
||||
|
||||
### 3. Serve figure image (new)
|
||||
|
||||
Serves the extracted figure image file. Mounted as a static resource from the file store.
|
||||
|
||||
```
|
||||
GET /api/v1/figures/{bookId}/{filename}
|
||||
```
|
||||
|
||||
**Path parameters**
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `bookId` | UUID | Book ID |
|
||||
| `filename` | string | Image filename (e.g. `youmans-7ed-fig-12-4.png`) |
|
||||
|
||||
**Response** `200 OK` — binary PNG
|
||||
**Content-Type**: `image/png`
|
||||
|
||||
**Error responses**
|
||||
|
||||
| Status | Condition |
|
||||
|--------|-----------|
|
||||
| 404 | Image file not found |
|
||||
|
||||
---
|
||||
|
||||
### 4. Chat message response — extended source format (changed)
|
||||
|
||||
The existing `POST /api/v1/chat/sessions/{id}/messages` endpoint is unchanged in its request
|
||||
format. The response `sources` field is extended to include figure references.
|
||||
|
||||
**Existing request** (unchanged):
|
||||
|
||||
```json
|
||||
{ "content": "Describe the anatomy of the cavernous sinus" }
|
||||
```
|
||||
|
||||
**Response** `200 OK` — extended `sources`:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"role": "ASSISTANT",
|
||||
"content": "The cavernous sinus is ... [Fig. 12-4, p.184] ...",
|
||||
"sources": [
|
||||
{
|
||||
"type": "TEXT",
|
||||
"bookTitle": "Youmans and Winn Neurological Surgery, 7th Ed.",
|
||||
"page": 184,
|
||||
"chunkText": "The cavernous sinus contains ..."
|
||||
},
|
||||
{
|
||||
"type": "FIGURE",
|
||||
"bookTitle": "Youmans and Winn Neurological Surgery, 7th Ed.",
|
||||
"page": 184,
|
||||
"figureId": "youmans-7ed-fig-12-4",
|
||||
"label": "Fig. 12-4",
|
||||
"caption": "Coronal cross-section of the cavernous sinus ...",
|
||||
"figureType": "ANATOMICAL_DIAGRAM",
|
||||
"imageUrl": "/api/v1/figures/550e8400-e29b-41d4-a716-446655440000/youmans-7ed-fig-12-4.png"
|
||||
}
|
||||
],
|
||||
"createdAt": "2026-04-03T12:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Changed fields in `sources` array**:
|
||||
|
||||
| Field | Old | New |
|
||||
|-------|-----|-----|
|
||||
| `type` | absent | `"TEXT"` or `"FIGURE"` |
|
||||
| `figureId` | absent | figure ID string (FIGURE type only) |
|
||||
| `label` | absent | caption label (FIGURE type only) |
|
||||
| `caption` | absent | full caption (FIGURE type only) |
|
||||
| `figureType` | absent | enum name (FIGURE type only) |
|
||||
| `imageUrl` | absent | image URL (FIGURE type only) |
|
||||
|
||||
---
|
||||
|
||||
## Unchanged Endpoints
|
||||
|
||||
All endpoints from feature 001 remain at their existing paths with no breaking changes:
|
||||
|
||||
- `POST /api/v1/books/upload`
|
||||
- `GET /api/v1/books`
|
||||
- `DELETE /api/v1/books/{id}`
|
||||
- `GET /api/v1/topics`
|
||||
- `GET /api/v1/topics/{id}/summary`
|
||||
- `POST /api/v1/chat/sessions`
|
||||
- `GET /api/v1/chat/sessions/{id}/messages`
|
||||
- `DELETE /api/v1/chat/sessions/{id}`
|
||||
@@ -0,0 +1,305 @@
|
||||
# Data Model: Enhanced Embedding with Image Parsing and Metadata
|
||||
|
||||
**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Three storage tiers work in concert:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ PDF Upload │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ Parsing Pipeline │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ Postgres (source of truth) pgvector (search index) │
|
||||
│ - book - vector_store (text chunks) │
|
||||
│ - chapter - vector_store (figure captions) │
|
||||
│ - section (+ fullText) File Store (images) │
|
||||
│ - figure (metadata) - /uploads/figures/{bookId}/*.png │
|
||||
│ - chunk_figure_refs │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Postgres Schema
|
||||
|
||||
### Existing tables (unchanged)
|
||||
|
||||
- `book` — status, metadata, page count (V1)
|
||||
- `chat_session`, `message` — conversation (V1)
|
||||
- `vector_store` — managed by Spring AI pgvector starter (V2)
|
||||
- `topic` — predefined topics (V3)
|
||||
|
||||
### New tables (Flyway V4)
|
||||
|
||||
```sql
|
||||
-- V4: Document hierarchy
|
||||
|
||||
CREATE TABLE chapter (
|
||||
id VARCHAR(200) PRIMARY KEY, -- "{bookId}-ch{N}"
|
||||
book_id UUID NOT NULL REFERENCES book(id) ON DELETE CASCADE,
|
||||
number INT NOT NULL,
|
||||
title VARCHAR(500),
|
||||
page_start INT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE TABLE section (
|
||||
id VARCHAR(200) PRIMARY KEY, -- "{bookId}-ch{N}-s{X}-{Y}"
|
||||
chapter_id VARCHAR(200) NOT NULL REFERENCES chapter(id) ON DELETE CASCADE,
|
||||
book_id UUID NOT NULL REFERENCES book(id) ON DELETE CASCADE,
|
||||
number VARCHAR(50), -- "2.3" or "12.2.3"
|
||||
title VARCHAR(500),
|
||||
page_start INT NOT NULL,
|
||||
page_end INT NOT NULL,
|
||||
full_text TEXT NOT NULL, -- NOT in vector store
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_section_book ON section(book_id);
|
||||
CREATE INDEX idx_section_chapter ON section(chapter_id);
|
||||
```
|
||||
|
||||
### New tables (Flyway V5)
|
||||
|
||||
```sql
|
||||
-- V5: Figures and chunk→figure links
|
||||
|
||||
CREATE TABLE figure (
|
||||
id VARCHAR(200) PRIMARY KEY, -- "{bookId}-fig-{label}"
|
||||
book_id UUID NOT NULL REFERENCES book(id) ON DELETE CASCADE,
|
||||
section_id VARCHAR(200) REFERENCES section(id) ON DELETE SET NULL,
|
||||
chapter_id VARCHAR(200) REFERENCES chapter(id) ON DELETE SET NULL,
|
||||
label VARCHAR(100), -- "Fig. 12-4"
|
||||
caption TEXT,
|
||||
figure_type VARCHAR(50) NOT NULL, -- FigureType enum name
|
||||
page INT NOT NULL,
|
||||
image_path VARCHAR(1000) NOT NULL, -- relative path on disk
|
||||
caption_embedding_id UUID, -- ID in vector_store
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE TABLE chunk_figure_ref (
|
||||
chunk_id UUID NOT NULL, -- vector_store document ID
|
||||
figure_id VARCHAR(200) NOT NULL REFERENCES figure(id) ON DELETE CASCADE,
|
||||
mention_page INT,
|
||||
PRIMARY KEY (chunk_id, figure_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_figure_book ON figure(book_id);
|
||||
CREATE INDEX idx_cfr_chunk ON chunk_figure_ref(chunk_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Java Domain Records
|
||||
|
||||
### Document hierarchy (new package `com.aiteacher.document`)
|
||||
|
||||
```java
|
||||
// Root — in-memory only, not a JPA entity
|
||||
public record BookNode(
|
||||
String bookId,
|
||||
String title,
|
||||
String isbn,
|
||||
String edition,
|
||||
List<String> authors,
|
||||
List<ChapterNode> chapters
|
||||
) {}
|
||||
|
||||
// Chapter — maps to `chapter` table
|
||||
public record ChapterNode(
|
||||
String chapterId,
|
||||
String bookId,
|
||||
int number,
|
||||
String title,
|
||||
int pageStart,
|
||||
List<SectionNode> sections
|
||||
) {}
|
||||
|
||||
// Section — maps to `section` table; fullText stays in Postgres
|
||||
public record SectionNode(
|
||||
String sectionId,
|
||||
String chapterId,
|
||||
String bookId,
|
||||
String number,
|
||||
String title,
|
||||
int pageStart,
|
||||
int pageEnd,
|
||||
String fullText,
|
||||
List<TextChunkNode> chunks,
|
||||
List<FigureNode> figures
|
||||
) {}
|
||||
|
||||
// Text chunk — embedded into vector_store; references its parent section
|
||||
public record TextChunkNode(
|
||||
String chunkId, // UUID → becomes vector_store document ID
|
||||
String sectionId,
|
||||
String chapterId,
|
||||
String bookId,
|
||||
String text,
|
||||
int chunkIndex,
|
||||
int totalChunksInSection,
|
||||
int pageStart,
|
||||
int pageEnd,
|
||||
Map<String, Object> metadata // flattened for Spring AI filtering
|
||||
) {
|
||||
public Map<String, Object> toMetadata() {
|
||||
return Map.of(
|
||||
"type", "TEXT",
|
||||
"book_id", bookId,
|
||||
"chapter_id", chapterId,
|
||||
"section_id", sectionId,
|
||||
"section_title", /* from parent SectionNode */,
|
||||
"page_start", pageStart,
|
||||
"page_end", pageEnd,
|
||||
"chunk_index", chunkIndex,
|
||||
"total_chunks", totalChunksInSection
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// Figure — maps to `figure` table; caption embedded into vector_store
|
||||
public record FigureNode(
|
||||
String figureId,
|
||||
String sectionId,
|
||||
String chapterId,
|
||||
String bookId,
|
||||
String label, // "Fig. 12-4"
|
||||
String caption,
|
||||
FigureType type,
|
||||
int page,
|
||||
String imagePath, // relative: "figures/{bookId}/{figureId}.png"
|
||||
UUID captionEmbeddingId // ID in vector_store
|
||||
) {}
|
||||
```
|
||||
|
||||
### Figure type enum
|
||||
|
||||
```java
|
||||
public enum FigureType {
|
||||
ANATOMICAL_DIAGRAM,
|
||||
SURGICAL_PHOTOGRAPH,
|
||||
MRI_CT_SCAN,
|
||||
TABLE,
|
||||
CHART,
|
||||
INTRAOPERATIVE_IMAGE
|
||||
}
|
||||
```
|
||||
|
||||
Classification heuristic (applied to caption + surrounding text):
|
||||
|
||||
| Keyword(s) | FigureType |
|
||||
|-----------|-----------|
|
||||
| `MRI`, `CT`, `magnetic`, `resonance`, `tomography` | `MRI_CT_SCAN` |
|
||||
| `intraoperative`, `intra-op` | `INTRAOPERATIVE_IMAGE` |
|
||||
| `table`, `Table` (at line start) | `TABLE` |
|
||||
| `chart`, `graph`, `histogram` | `CHART` |
|
||||
| `photograph`, `photo` | `SURGICAL_PHOTOGRAPH` |
|
||||
| (default) | `ANATOMICAL_DIAGRAM` |
|
||||
|
||||
### Chunk–figure join record
|
||||
|
||||
```java
|
||||
// Maps to `chunk_figure_ref` table
|
||||
public record ChunkFigureRef(
|
||||
UUID chunkId,
|
||||
String figureId,
|
||||
int mentionPage
|
||||
) {}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Vector Store Documents
|
||||
|
||||
All documents in `vector_store` carry a `metadata` JSON column with a `type` field for filtering.
|
||||
|
||||
### Text chunk document
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| `content` | chunk text (400–600 tokens) |
|
||||
| `metadata.type` | `"TEXT"` |
|
||||
| `metadata.book_id` | book UUID |
|
||||
| `metadata.book_title` | book title string |
|
||||
| `metadata.chapter_id` | chapter ID string |
|
||||
| `metadata.section_id` | section ID string |
|
||||
| `metadata.section_title` | section title string |
|
||||
| `metadata.page_start` | int |
|
||||
| `metadata.page_end` | int |
|
||||
| `metadata.chunk_index` | int (0-based) |
|
||||
| `metadata.total_chunks` | int |
|
||||
|
||||
### Figure caption document
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| `content` | vision-generated description + caption text |
|
||||
| `metadata.type` | `"FIGURE"` |
|
||||
| `metadata.book_id` | book UUID |
|
||||
| `metadata.book_title` | book title string |
|
||||
| `metadata.chapter_id` | chapter ID string |
|
||||
| `metadata.section_id` | section ID string |
|
||||
| `metadata.figure_id` | figure ID string |
|
||||
| `metadata.figure_type` | enum name string |
|
||||
| `metadata.image_path` | relative file path |
|
||||
| `metadata.label` | caption label e.g. `"Fig. 12-4"` |
|
||||
| `metadata.page` | int |
|
||||
|
||||
---
|
||||
|
||||
## File Store Layout
|
||||
|
||||
```
|
||||
uploads/
|
||||
└── figures/
|
||||
└── {bookId}/
|
||||
├── {figureId}.png
|
||||
└── ...
|
||||
```
|
||||
|
||||
- Base path configurable via `app.figure-storage.base-path` (default: `./uploads`)
|
||||
- Files are served via `GET /api/v1/figures/{bookId}/{filename}` (static resource mapping)
|
||||
- Gitignored; not version-controlled
|
||||
|
||||
---
|
||||
|
||||
## State Transitions
|
||||
|
||||
Book processing extends the existing `BookStatus` state machine:
|
||||
|
||||
```
|
||||
PENDING → PROCESSING → READY
|
||||
↘ FAILED
|
||||
```
|
||||
|
||||
During `PROCESSING`:
|
||||
1. Parse PDF structure → extract chapters/sections → persist to Postgres
|
||||
2. Split sections into text chunks → embed → write to vector_store
|
||||
3. Extract images per page → filter by min size → save PNG → generate vision description → embed caption → write figure to Postgres + vector_store
|
||||
4. Write chunk_figure_refs for all detected figure references in text
|
||||
|
||||
Failure at step 3 (individual page) → log + skip that page's images; continue.
|
||||
Failure at any other step → set `BookStatus.FAILED`.
|
||||
|
||||
---
|
||||
|
||||
## Retrieval Result Structure
|
||||
|
||||
```java
|
||||
public record RetrievalResult(
|
||||
List<SectionNode> parentSections, // expanded full-text context
|
||||
List<Document> figureVectorHits, // semantic figure matches
|
||||
List<FigureNode> linkedFigures // figures explicitly referenced in text chunks
|
||||
) {}
|
||||
```
|
||||
|
||||
The `NeurosurgeryRetriever` service deduplicates figures across both lists before passing
|
||||
the result to the LLM prompt builder.
|
||||
@@ -0,0 +1,105 @@
|
||||
# Implementation Plan: Enhanced Embedding with Image Parsing and Metadata
|
||||
|
||||
**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03 | **Spec**: [spec.md](spec.md)
|
||||
**Input**: Feature specification from `/specs/002-image-aware-embedding/spec.md`
|
||||
|
||||
## Summary
|
||||
|
||||
Enhance the book embedding pipeline to extract images from every PDF page, generate descriptive
|
||||
text for each image, and store all content (text chunks + figure captions) with rich, consistent
|
||||
metadata in the vector store. A new document hierarchy (Book → Chapter → Section → TextChunk +
|
||||
Figure) is introduced. Postgres holds the full-text sections and figure metadata; the vector
|
||||
store holds chunk and figure caption embeddings; the local file store holds extracted image files.
|
||||
At query time, both the text-chunk store and figure-caption store are searched in parallel and
|
||||
results are merged before being sent to the LLM.
|
||||
|
||||
## Technical Context
|
||||
|
||||
**Language/Version**: Java 25 (backend), TypeScript / Node 20 (frontend)
|
||||
**Primary Dependencies**: Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (embeddings + chat), PDFBox (via Spring AI PDF reader dependency)
|
||||
**Storage**: PostgreSQL (JPA + Flyway), pgvector (Spring AI `VectorStore`), local file system (extracted images — `/uploads/figures/`)
|
||||
**Testing**: Spring Boot Test, JUnit 5, Mockito
|
||||
**Target Platform**: Linux server (Docker Compose)
|
||||
**Project Type**: Web application — backend REST API + Vue 3 frontend
|
||||
**Performance Goals**: Full book (up to 500 pages with images) processed in ≤ 30 minutes; query response unchanged from existing baseline
|
||||
**Constraints**: No new deployable units; all changes within the existing `backend/` module; image storage on local disk (S3 migration is a future concern, behind an interface)
|
||||
**Scale/Scope**: POC — <10 concurrent users; single shared book library
|
||||
|
||||
## Constitution Check
|
||||
|
||||
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
|
||||
|
||||
| Principle | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| I — KISS | ⚠️ Justified violation — see Complexity Tracking | Hierarchical model + dual search adds complexity; justified by precision requirement |
|
||||
| II — Easy to Change | ✅ | Figure storage wrapped behind `FigureStorageService` interface; can swap local disk for S3 |
|
||||
| III — Web-First | ✅ | All new capabilities exposed via existing REST API; no new deployable units |
|
||||
| IV — Docs as Architecture | ⚠️ Required | README Mermaid diagram MUST be updated in this PR to show new storage tiers |
|
||||
|
||||
## Project Structure
|
||||
|
||||
### Documentation (this feature)
|
||||
|
||||
```text
|
||||
specs/002-image-aware-embedding/
|
||||
├── plan.md # This file
|
||||
├── research.md # Phase 0 output
|
||||
├── data-model.md # Phase 1 output
|
||||
├── quickstart.md # Phase 1 output
|
||||
├── contracts/ # Phase 1 output
|
||||
└── tasks.md # Phase 2 output (/speckit.tasks)
|
||||
```
|
||||
|
||||
### Source Code (repository root)
|
||||
|
||||
```text
|
||||
backend/
|
||||
├── src/main/java/com/aiteacher/
|
||||
│ ├── book/
|
||||
│ │ ├── Book.java (existing)
|
||||
│ │ ├── BookController.java (existing)
|
||||
│ │ ├── BookService.java (existing)
|
||||
│ │ ├── BookRepository.java (existing)
|
||||
│ │ ├── BookStatus.java (existing)
|
||||
│ │ ├── BookEmbeddingService.java (existing — enhanced)
|
||||
│ │ └── NoKnowledgeSourceException.java (existing)
|
||||
│ ├── document/ (new package)
|
||||
│ │ ├── BookNode.java
|
||||
│ │ ├── ChapterNode.java
|
||||
│ │ ├── SectionNode.java
|
||||
│ │ ├── SectionRepository.java
|
||||
│ │ ├── TextChunkNode.java
|
||||
│ │ ├── FigureNode.java
|
||||
│ │ ├── FigureRepository.java
|
||||
│ │ ├── FigureType.java
|
||||
│ │ ├── ChunkFigureRef.java
|
||||
│ │ └── ChunkFigureRefRepository.java
|
||||
│ ├── figure/ (new package)
|
||||
│ │ ├── FigureStorageService.java (interface)
|
||||
│ │ └── LocalFigureStorageService.java (implementation)
|
||||
│ ├── retrieval/ (new package)
|
||||
│ │ └── NeurosurgeryRetriever.java
|
||||
│ ├── chat/
|
||||
│ │ └── ChatService.java (updated — uses NeurosurgeryRetriever)
|
||||
│ └── config/
|
||||
│ └── FigureStorageConfig.java (new — configures upload dir)
|
||||
└── src/main/resources/
|
||||
└── db/migration/
|
||||
├── V4__document_hierarchy.sql (new)
|
||||
└── V5__figures_and_refs.sql (new)
|
||||
|
||||
uploads/
|
||||
└── figures/ (runtime — extracted images; gitignored)
|
||||
```
|
||||
|
||||
**Structure Decision**: Option 2 (Web Application) confirmed. All backend changes stay within
|
||||
`backend/`. Two new packages (`document/`, `retrieval/`) plus one interface package (`figure/`)
|
||||
keep concerns separated without adding a deployable unit.
|
||||
|
||||
## Complexity Tracking
|
||||
|
||||
| Violation | Why Needed | Simpler Alternative Rejected Because |
|
||||
|-----------|------------|-------------------------------------|
|
||||
| Document hierarchy (BookNode → ChapterNode → SectionNode) | Parent-child retrieval: chunks reference their parent section so the LLM receives full section context, not just the matching fragment. This is the established solution for RAG precision. | Flat page-per-doc model (current) loses inter-sentence context; chunk-only retrieval produces incomplete answers for multi-paragraph clinical questions |
|
||||
| Dual vector search (text chunks + figure captions) | Figure captions must be independently searchable — a query about "cavernous sinus anatomy" must surface the diagram even if no text chunk scores highly | Single vector store search would miss figures whose captions don't happen to be the highest-similarity hit; this is the core deliverable of the feature |
|
||||
| Third storage tier (local file store for images) | Extracted images cannot live in Postgres (binary blobs degrade query performance) or the vector store (only vectors). A file-per-image approach is standard. | Storing images as base64 in Postgres JSONB would bloat the DB and complicate backup/restore; the `FigureStorageService` interface keeps the implementation swappable |
|
||||
@@ -0,0 +1,86 @@
|
||||
# Quickstart: Enhanced Embedding with Image Parsing and Metadata
|
||||
|
||||
**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Docker Compose running (PostgreSQL + pgvector)
|
||||
- OpenAI API key set in `backend/src/main/resources/application.properties` or as env var `OPENAI_API_KEY`
|
||||
- Java 25 + Maven on PATH
|
||||
|
||||
---
|
||||
|
||||
## New Configuration
|
||||
|
||||
Add to `backend/src/main/resources/application.properties`:
|
||||
|
||||
```properties
|
||||
# Figure storage
|
||||
app.figure-storage.base-path=./uploads
|
||||
app.figure-storage.min-image-size-px=100
|
||||
```
|
||||
|
||||
The `uploads/figures/` directory is created automatically on first use. Add it to `.gitignore`.
|
||||
|
||||
---
|
||||
|
||||
## Database Migration
|
||||
|
||||
Two new Flyway migrations run automatically on startup:
|
||||
|
||||
- `V4__document_hierarchy.sql` — adds `chapter` and `section` tables
|
||||
- `V5__figures_and_refs.sql` — adds `figure` and `chunk_figure_ref` tables
|
||||
|
||||
No manual DB setup needed.
|
||||
|
||||
---
|
||||
|
||||
## Re-embedding Existing Books
|
||||
|
||||
Books embedded by feature 001 (text-only) remain functional for text queries. To add image
|
||||
support, trigger a re-embed:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/v1/books/{bookId}/reembed \
|
||||
-u admin:password
|
||||
```
|
||||
|
||||
The book transitions to `PROCESSING`, old chunks and figures are deleted, and the new
|
||||
image-aware pipeline runs. Status can be polled via `GET /api/v1/books`.
|
||||
|
||||
---
|
||||
|
||||
## Verifying Image Extraction
|
||||
|
||||
1. Upload a PDF with diagrams: `POST /api/v1/books/upload`
|
||||
2. Wait for `status: "READY"` via `GET /api/v1/books`
|
||||
3. List figures: `GET /api/v1/books/{id}/figures` — should return at least one entry per image page
|
||||
4. Ask a diagram-specific question in chat — response `sources` should include a `type: "FIGURE"` entry
|
||||
|
||||
---
|
||||
|
||||
## Frontend: Rendering Inline Figures
|
||||
|
||||
The assistant message `content` field will contain figure references in the format
|
||||
`[Fig. 12-4, p.184]`. The frontend should:
|
||||
|
||||
1. Parse `[Fig. X, p.N]` patterns in assistant message text
|
||||
2. Look up the matching entry in `sources` where `type === "FIGURE"`
|
||||
3. Render the figure inline using the `imageUrl` field
|
||||
|
||||
---
|
||||
|
||||
## Running Tests
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
mvn test
|
||||
```
|
||||
|
||||
Key new test classes:
|
||||
- `FigureExtractionServiceTest` — unit tests for image extraction and classification
|
||||
- `NeurosurgeryRetrieverTest` — unit tests for dual-search merge and deduplication
|
||||
- `BookEmbeddingServiceIntegrationTest` — integration test: upload PDF with known figures,
|
||||
verify figures appear in `GET /api/v1/books/{id}/figures`
|
||||
@@ -0,0 +1,188 @@
|
||||
# Research: Enhanced Embedding with Image Parsing and Metadata
|
||||
|
||||
**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
|
||||
|
||||
This document resolves all technical unknowns identified during planning. The primary source for
|
||||
decisions is the detailed architecture provided directly by the project owner, supplemented by
|
||||
Spring AI 2.0.0-M4 API specifics.
|
||||
|
||||
---
|
||||
|
||||
## Decision 1: Document Hierarchy Model
|
||||
|
||||
**Decision**: Adopt a four-level hierarchy — `BookNode` → `ChapterNode` → `SectionNode` →
|
||||
`TextChunkNode` + `FigureNode`. The `SectionNode` is the pivotal unit: it holds the full section
|
||||
text in Postgres and is used for parent-child context expansion at retrieval time.
|
||||
|
||||
**Rationale**: A flat page-per-document model (current implementation) loses structural context.
|
||||
When a user asks a multi-faceted clinical question, the LLM needs the surrounding section text,
|
||||
not just the matching fragment. Parent-child retrieval — where chunks point to their parent
|
||||
section — is the established pattern for RAG precision. The hierarchy also makes figure-to-section
|
||||
association explicit and queryable.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Keep flat page model, add metadata only → rejected: insufficient for precise citation and
|
||||
context expansion
|
||||
- Chapter-level retrieval (coarser than section) → rejected: too much irrelevant context sent
|
||||
to LLM; cost and latency increase
|
||||
|
||||
---
|
||||
|
||||
## Decision 2: Image Extraction Strategy
|
||||
|
||||
**Decision**: Use PDFBox (already on classpath via `spring-ai-pdf-document-reader`) to extract
|
||||
images per page. Each image is tagged with `page`, `figure_id` (derived from caption, e.g.
|
||||
"Fig. 12-4"), and the parent `sectionId`. Images are saved to local disk under
|
||||
`/uploads/figures/{bookId}/`.
|
||||
|
||||
**Rationale**: PDFBox is already present (Spring AI bundles it). No new dependency needed.
|
||||
Per-page extraction ensures every image is captured regardless of PDF structure.
|
||||
|
||||
**Alternatives considered**:
|
||||
- iText / iText7 → additional commercial dependency; overkill for extraction
|
||||
- Screenshot each page as PNG, then OCR → far slower; loses vector quality
|
||||
|
||||
---
|
||||
|
||||
## Decision 3: Figure Content Representation
|
||||
|
||||
**Decision**: Generate a textual description of each extracted image using the OpenAI vision
|
||||
model (GPT-4o). This description becomes the `content` field of the figure's vector store
|
||||
document. The figure caption (parsed from the surrounding text) is also included to maximise
|
||||
retrieval signal.
|
||||
|
||||
**Rationale**: Caption-only embedding would miss figures with no caption or with sparse labels.
|
||||
Vision-generated descriptions produce richer semantic content (anatomy terms, structural
|
||||
relationships) that matches clinical queries. The OpenAI client already in use supports image
|
||||
inputs; no additional dependency is required.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Caption-only embedding → insufficient when captions are absent or terse (common in textbooks)
|
||||
- Local vision model (LLaVA) → requires self-hosting; out of scope for POC
|
||||
- OCR only → extracts text visible in image but misses non-text visual content (diagrams, MRI)
|
||||
|
||||
---
|
||||
|
||||
## Decision 4: Dual Vector Search
|
||||
|
||||
**Decision**: At query time, run two parallel similarity searches:
|
||||
1. Text chunk search (filtered by `type = "TEXT"` and `book_id`)
|
||||
2. Figure caption search (filtered by `type = "FIGURE"` and `book_id`)
|
||||
|
||||
Results are merged and deduplicated. The LLM prompt receives the expanded parent section text
|
||||
plus a structured figure reference list.
|
||||
|
||||
**Rationale**: A single search would rank text and figures against each other; figures with
|
||||
terse captions would systematically lose to text chunks. Separate searches with independent
|
||||
`topK` allow tuning each modality independently.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Single search, filter by relevance score → figure captions score lower than text; figures
|
||||
are systematically under-retrieved
|
||||
- Post-process text results to look up linked figures only → misses figures that are relevant
|
||||
to the query but not explicitly referenced in the retrieved text chunks
|
||||
|
||||
---
|
||||
|
||||
## Decision 5: Chunk-to-Figure Linking
|
||||
|
||||
**Decision**: During text parsing, whenever a pattern matching `Fig.\s+\d+[\-\.]\d+` or
|
||||
`Figure\s+\d+[\-\.]\d+` is found in a chunk, insert a row into the `chunk_figure_refs` table
|
||||
linking `chunkId` → `figureId`. At retrieval time, after text chunks are retrieved, their
|
||||
associated figures are fetched from this table and added to the LLM prompt.
|
||||
|
||||
**Rationale**: Explicit linking ensures that when a text chunk is retrieved, its referenced
|
||||
figures are always surfaced — even if the figure's caption did not score highly in the vector
|
||||
search. This is the higher-recall path; dual search (Decision 4) is the higher-precision path.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Rely entirely on dual vector search → may miss figures referenced in retrieved text but
|
||||
scoring below the topK threshold in the figure search
|
||||
|
||||
---
|
||||
|
||||
## Decision 6: Image Storage
|
||||
|
||||
**Decision**: Extracted images are saved as PNG files to a local directory
|
||||
(`${app.figure-storage.base-path}`, defaults to `./uploads/figures/{bookId}/`). The path is
|
||||
stored in `figure.image_path` in Postgres. A `FigureStorageService` interface wraps all disk
|
||||
I/O so the implementation can be swapped to S3 or another object store without changing
|
||||
callers.
|
||||
|
||||
**Rationale**: Local disk is the simplest viable option for a POC with <10 users. The interface
|
||||
boundary satisfies Constitution Principle II (Easy to Change).
|
||||
|
||||
**Alternatives considered**:
|
||||
- S3 from day 1 → operational overhead not justified at POC scale
|
||||
- Base64 in Postgres JSONB → bloats DB; complicates backup; query performance degrades
|
||||
|
||||
---
|
||||
|
||||
## Decision 7: Figure Type Classification
|
||||
|
||||
**Decision**: Use the enum `FigureType { ANATOMICAL_DIAGRAM, SURGICAL_PHOTOGRAPH, MRI_CT_SCAN,
|
||||
TABLE, CHART, INTRAOPERATIVE_IMAGE }`. Classification is derived from:
|
||||
1. Caption keywords ("MRI", "CT", "Fig.", "Table") — heuristic, no model needed
|
||||
2. Fall back to `ANATOMICAL_DIAGRAM` if unclassifiable
|
||||
|
||||
**Rationale**: Allows the frontend to render different icon/label per type (e.g., "MRI" badge).
|
||||
Heuristic classification avoids a separate model call per image at extraction time.
|
||||
|
||||
**Alternatives considered**:
|
||||
- Vision model classification → accurate but adds latency and cost per figure; deferrable
|
||||
- Single `FIGURE` type → loses citation granularity required by spec FR-004
|
||||
|
||||
---
|
||||
|
||||
## Decision 8: Metadata Schema for Vector Store Documents
|
||||
|
||||
**Decision**: All vector store documents carry a flat `Map<String, Object>` metadata for Spring
|
||||
AI filtering. Schema:
|
||||
|
||||
| Field | Text Chunk | Figure Chunk |
|
||||
|-------|-----------|-------------|
|
||||
| `type` | `"TEXT"` | `"FIGURE"` |
|
||||
| `book_id` | ✓ | ✓ |
|
||||
| `book_title` | ✓ | ✓ |
|
||||
| `chapter_id` | ✓ | ✓ |
|
||||
| `section_id` | ✓ | ✓ |
|
||||
| `section_title` | ✓ | ✓ |
|
||||
| `page_start` | ✓ | — |
|
||||
| `page_end` | ✓ | — |
|
||||
| `chunk_index` | ✓ | — |
|
||||
| `total_chunks` | ✓ | — |
|
||||
| `figure_id` | — | ✓ |
|
||||
| `figure_type` | — | ✓ |
|
||||
| `image_path` | — | ✓ |
|
||||
| `label` | — | ✓ |
|
||||
| `page` | — | ✓ |
|
||||
|
||||
**Rationale**: Flat map is required by Spring AI `FilterExpressionBuilder`. Separation by `type`
|
||||
allows independent filtering in dual search.
|
||||
|
||||
---
|
||||
|
||||
## Decision 9: Re-embedding Existing Books
|
||||
|
||||
**Decision**: Books already processed under feature 001 (text-only) are NOT automatically
|
||||
re-embedded. An explicit re-embed action is exposed via `POST /api/v1/books/{id}/reembed`
|
||||
(admin-triggered). The existing chunks remain valid for text queries until re-embedding completes.
|
||||
|
||||
**Rationale**: Automatic re-embedding on deploy would block the system and risk data loss if
|
||||
the process fails mid-way. An explicit, idempotent trigger is safer and more observable.
|
||||
|
||||
---
|
||||
|
||||
## Decision 10: Minimum Image Size Threshold
|
||||
|
||||
**Decision**: Images smaller than 100×100 pixels are discarded and no chunk is created. This
|
||||
threshold filters out decorative elements (bullets, dividers, publisher logos) without a
|
||||
classification model.
|
||||
|
||||
**Rationale**: Neurosurgery textbook diagrams and MRI scans are never smaller than 100×100 px.
|
||||
The threshold is configurable via `app.figure-storage.min-image-size-px` in
|
||||
`application.properties`.
|
||||
|
||||
**Alternatives considered**:
|
||||
- No threshold → decorative icons pollute the figure index
|
||||
- ML-based classification → accurate but adds model dependency; not needed at POC scale
|
||||
@@ -0,0 +1,176 @@
|
||||
# Feature Specification: Enhanced Embedding with Image Parsing and Metadata
|
||||
|
||||
**Feature Branch**: `002-image-aware-embedding`
|
||||
**Created**: 2026-04-03
|
||||
**Status**: Draft
|
||||
**Input**: User description: "I want to enhance the embedding process. I want also parse image from each pages if any and add proper metadata so that it can match the retrieved chunk/vector that match what user are querying."
|
||||
|
||||
## User Scenarios & Testing *(mandatory)*
|
||||
|
||||
### User Story 1 - Image Content Surfaced in Query Results (Priority: P1)
|
||||
|
||||
A neurosurgeon asks a question in the chat (e.g., "Show me the anatomy of the Circle of Willis")
|
||||
that is best answered by a diagram or figure in an uploaded book. The system retrieves the image
|
||||
content — its description and surrounding context — and uses it to construct a grounded answer,
|
||||
citing the page and book where the image appeared.
|
||||
|
||||
**Why this priority**: This is the direct, user-visible payoff of the feature. Without it, the
|
||||
enhancement has no observable benefit. All other stories support this outcome.
|
||||
|
||||
**Independent Test**: Upload a book containing a labelled anatomical diagram. Ask a query whose
|
||||
answer is conveyed by that diagram (not in the surrounding text). Confirm the system returns an
|
||||
answer that references the diagram's content and cites the correct book and page.
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** a book with an anatomical diagram on page 42, **When** a user asks a question whose
|
||||
answer is only depicted in that diagram, **Then** the system returns a response that draws on
|
||||
the diagram's content and cites "Page 42, [Book Title]".
|
||||
2. **Given** a page with both text and an image, **When** the system retrieves that page's content,
|
||||
**Then** the image-derived content and the surrounding text are each independently retrievable
|
||||
and independently citable.
|
||||
3. **Given** a query that has no relevant image in any uploaded book, **When** the system searches,
|
||||
**Then** it does not fabricate image-derived content and falls back to text-only results (or
|
||||
states no relevant content was found).
|
||||
|
||||
---
|
||||
|
||||
### User Story 2 - All Pages Scanned for Images During Embedding (Priority: P1)
|
||||
|
||||
When a book is uploaded and processed, every page is inspected for images. Any image found is
|
||||
extracted and represented as a searchable content chunk enriched with metadata (page number,
|
||||
book title, position on page, caption if present). Pages without images are processed as
|
||||
text-only chunks, unchanged from the existing behaviour.
|
||||
|
||||
**Why this priority**: This is the prerequisite for User Story 1. Without systematic per-page
|
||||
image detection, image content cannot be retrieved.
|
||||
|
||||
**Independent Test**: Upload a book whose pages include a mix of text-only and image-containing
|
||||
pages. After processing completes, verify that chunks exist for each image page and that each
|
||||
image chunk carries the correct metadata (page number, source book, caption).
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** a book being processed, **When** the embedding pipeline runs, **Then** every page
|
||||
is evaluated for images and each detected image generates at least one content chunk.
|
||||
2. **Given** an image with a caption or label, **When** the chunk is created, **Then** the
|
||||
caption or label text is included in the chunk's content and metadata.
|
||||
3. **Given** a page with multiple images, **When** processing completes, **Then** each image is
|
||||
represented as a separate chunk with its own metadata, not merged into a single chunk.
|
||||
4. **Given** a page with no images, **When** processing completes, **Then** no image chunk is
|
||||
created for that page and text processing is unaffected.
|
||||
|
||||
---
|
||||
|
||||
### User Story 3 - Rich Metadata Enables Precise Source Attribution (Priority: P2)
|
||||
|
||||
When the system returns a result based on image content, the user can see exactly where that
|
||||
image appeared: which book, which page, and what type of content (diagram, table, photograph,
|
||||
etc.). This gives the user confidence in the source and lets them locate the original image
|
||||
in their physical or digital copy of the book.
|
||||
|
||||
**Why this priority**: Metadata quality directly impacts user trust. Neurosurgeons require
|
||||
traceable, citable evidence. Richer metadata also improves retrieval accuracy by giving the
|
||||
search engine more signals to match against a query.
|
||||
|
||||
**Independent Test**: Retrieve a result sourced from an image chunk. Inspect the displayed
|
||||
citation and verify it includes: book title, page number, content type (e.g., "diagram"),
|
||||
and caption (if present in the original).
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** a retrieved image chunk, **When** the system displays the source citation,
|
||||
**Then** the citation includes at minimum: book title, page number, and a content-type
|
||||
label (e.g., diagram, table, figure).
|
||||
2. **Given** an image chunk with a detected caption, **When** the citation is displayed,
|
||||
**Then** the caption text is shown alongside the other metadata fields.
|
||||
3. **Given** a topic summary that draws on both text and image chunks, **When** the user
|
||||
inspects citations, **Then** image-sourced and text-sourced claims are distinguishable
|
||||
from each other.
|
||||
|
||||
---
|
||||
|
||||
### Edge Cases
|
||||
|
||||
- What happens when an image is too small to contain meaningful content (e.g., a decorative
|
||||
bullet icon or a publisher logo)?
|
||||
- How does the system handle a page that is entirely an image (scanned page with no digital text)?
|
||||
- What if an image spans multiple pages (e.g., a fold-out diagram)?
|
||||
- How does the system behave when an image has no caption and its surrounding text provides
|
||||
no useful context?
|
||||
- What happens if image processing fails for a specific page — does it abort the whole book
|
||||
or continue with the remaining pages?
|
||||
|
||||
## Requirements *(mandatory)*
|
||||
|
||||
### Functional Requirements
|
||||
|
||||
- **FR-001**: System MUST inspect every page of an uploaded book for the presence of images
|
||||
during the embedding process.
|
||||
- **FR-002**: System MUST extract each detected image and create a dedicated, independently
|
||||
searchable content chunk for it.
|
||||
- **FR-003**: System MUST generate a descriptive textual representation of each extracted
|
||||
image so its content is semantically searchable by the retrieval system.
|
||||
- **FR-004**: System MUST associate the following metadata with every image chunk: book title,
|
||||
page number, content type (e.g., diagram, table, figure, photograph), and caption text
|
||||
(where present).
|
||||
- **FR-005**: System MUST include the same base metadata (book title, page number) on text
|
||||
chunks so that all retrieved content — image or text — carries consistent, comparable
|
||||
source attribution.
|
||||
- **FR-006**: System MUST treat image chunks as first-class retrievable units: they must be
|
||||
ranked and returned alongside text chunks when they are relevant to a user query.
|
||||
- **FR-007**: System MUST skip images that fall below a minimum meaningful-content threshold
|
||||
(e.g., decorative icons, page separators) and MUST NOT create chunks for them.
|
||||
- **FR-008**: If image processing fails for a specific page, the system MUST log the failure,
|
||||
skip that page's image, and continue processing the remaining pages and text content of
|
||||
the book.
|
||||
- **FR-009**: System MUST display image-sourced content citations distinctly from text-sourced
|
||||
citations so users can identify when a result originates from a visual element.
|
||||
- **FR-010**: Processing a book that contains images MUST NOT degrade the accuracy or
|
||||
completeness of the existing text-only embedding for that book.
|
||||
|
||||
### Key Entities
|
||||
|
||||
- **Image Chunk**: A searchable content unit derived from a page image. Attributes: generated
|
||||
description, source book title, page number, content type, caption (optional), embedding vector.
|
||||
- **Text Chunk**: Existing unit; extended to carry explicit metadata: source book title,
|
||||
page number, section heading (if detectable), content type ("text").
|
||||
- **Chunk Metadata**: Structured attributes attached to every chunk regardless of type,
|
||||
enabling consistent filtering and citation. Mandatory fields: book title, page number,
|
||||
content type. Optional fields: caption, section heading.
|
||||
|
||||
## Success Criteria *(mandatory)*
|
||||
|
||||
### Measurable Outcomes
|
||||
|
||||
- **SC-001**: At least 90% of pages containing images in a test book result in a retrievable
|
||||
image chunk after processing completes.
|
||||
- **SC-002**: A controlled set of 10 queries whose answers are conveyed by diagrams in an
|
||||
uploaded book returns at least 7 correct image-sourced answers (70% recall on image queries).
|
||||
- **SC-003**: Embedding processing time for a book with images increases by no more than 3×
|
||||
compared to processing the same book as text-only, for books up to 500 pages.
|
||||
- **SC-004**: Every retrieved result — text or image — includes a citation that identifies
|
||||
at minimum the source book title and page number, with 100% coverage across a test result set.
|
||||
- **SC-005**: In a user evaluation with 5 representative queries that previously returned
|
||||
no useful results (because the answer was only in a diagram), at least 4 now return a
|
||||
useful, grounded answer.
|
||||
|
||||
## Assumptions
|
||||
|
||||
- Books are still uploaded exclusively as PDFs; image parsing applies to PDF pages only.
|
||||
- The platform already has a working text-only embedding pipeline (from feature 001); this
|
||||
feature enhances it without replacing or rewriting the text processing logic.
|
||||
- Images worth processing are those that occupy a meaningful portion of the page; small
|
||||
decorative or structural images (logos, dividers, icons) are excluded based on a size
|
||||
threshold determined during implementation.
|
||||
- The descriptive representation of an image (FR-003) is generated at embedding time, not
|
||||
at query time; query latency is not affected by image interpretation.
|
||||
- The shared global book library model from feature 001 is retained; image chunks from a
|
||||
processed book are available to all users immediately upon completion.
|
||||
- Scanned pages (fully rasterised pages with no digital text layer) are treated as a single
|
||||
full-page image; the system attempts to extract content from them but does not guarantee
|
||||
the same fidelity as pages with digital text.
|
||||
- Per-chunk metadata is stored alongside the vector so it can be used for both retrieval
|
||||
filtering and source citation display without a separate lookup.
|
||||
- Books already processed under feature 001 (text-only) are not automatically re-processed;
|
||||
re-embedding must be triggered explicitly by the user or an administrator.
|
||||
@@ -0,0 +1,168 @@
|
||||
# Tasks: Enhanced Embedding with Image Parsing and Metadata
|
||||
|
||||
**Input**: Design documents from `/specs/002-image-aware-embedding/`
|
||||
**Prerequisites**: plan.md ✓ | spec.md ✓ | research.md ✓ | data-model.md ✓ | contracts/ ✓
|
||||
|
||||
**Organization**: Tasks grouped by user story to enable independent implementation and testing.
|
||||
|
||||
## Format: `[ID] [P?] [Story] Description`
|
||||
|
||||
- **[P]**: Can run in parallel (different files, no shared dependencies)
|
||||
- **[US1/US2/US3]**: Which user story this task belongs to
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Setup (Shared Infrastructure)
|
||||
|
||||
**Purpose**: Database migrations and configuration that establish the foundation for all new code
|
||||
|
||||
- [X] T001 Create Flyway migration `V4__document_hierarchy.sql` — add `chapter` and `section` tables per data-model.md §Postgres Schema in `backend/src/main/resources/db/migration/V4__document_hierarchy.sql`
|
||||
- [X] T002 Create Flyway migration `V5__figures_and_refs.sql` — add `figure` and `chunk_figure_ref` tables per data-model.md §Postgres Schema in `backend/src/main/resources/db/migration/V5__figures_and_refs.sql`
|
||||
- [X] T003 Add figure-storage configuration keys to `backend/src/main/resources/application.properties`: `app.figure-storage.base-path=./uploads` and `app.figure-storage.min-image-size-px=100`
|
||||
- [X] T004 Add `uploads/` directory to `.gitignore` at repo root; create `uploads/figures/.gitkeep` to preserve directory structure
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Foundational (Blocking Prerequisites)
|
||||
|
||||
**Purpose**: Core types and infrastructure that ALL user stories depend on — nothing in Phase 3+ can start until this phase is complete
|
||||
|
||||
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
|
||||
|
||||
- [X] T005 [P] Create `FigureType` enum in `backend/src/main/java/com/aiteacher/document/FigureType.java` — values: `ANATOMICAL_DIAGRAM`, `SURGICAL_PHOTOGRAPH`, `MRI_CT_SCAN`, `TABLE`, `CHART`, `INTRAOPERATIVE_IMAGE`
|
||||
- [X] T006 [P] Create `FigureStorageService` interface in `backend/src/main/java/com/aiteacher/figure/FigureStorageService.java` — declare `Path save(UUID bookId, String figureId, BufferedImage image)`, `Path resolve(UUID bookId, String filename)`, and `void delete(UUID bookId)`
|
||||
- [X] T007 Create `LocalFigureStorageService` implementation in `backend/src/main/java/com/aiteacher/figure/LocalFigureStorageService.java` — writes PNG files under `${app.figure-storage.base-path}/figures/{bookId}/`; implements `FigureStorageService`; depends on T006
|
||||
- [X] T008 Create `FigureStorageConfig` bean in `backend/src/main/java/com/aiteacher/config/FigureStorageConfig.java` — reads `app.figure-storage.base-path` and `app.figure-storage.min-image-size-px` as `@ConfigurationProperties`; registers `LocalFigureStorageService` as `@Bean`; adds `ResourceHandler` mapping `GET /api/v1/figures/**` to the base-path directory
|
||||
- [X] T009 [P] Create `ChapterEntity` JPA entity and `ChapterRepository` in `backend/src/main/java/com/aiteacher/document/` — `@Entity(name="chapter")`, fields: `id` (String PK), `bookId` (UUID FK → book), `number` (int), `title` (String), `pageStart` (int), `createdAt` (Instant); `ChapterRepository extends JpaRepository<ChapterEntity, String>`
|
||||
- [X] T010 [P] Create `SectionEntity` JPA entity and `SectionRepository` in `backend/src/main/java/com/aiteacher/document/` — `@Entity(name="section")`, fields: `id` (String PK), `chapterId` (String FK → chapter), `bookId` (UUID FK → book), `number` (String), `title` (String), `pageStart`/`pageEnd` (int), `fullText` (TEXT column), `createdAt` (Instant); `SectionRepository extends JpaRepository<SectionEntity, String>` with `findAllByBookId(UUID)`
|
||||
- [X] T011 [P] Create `FigureEntity` JPA entity and `FigureRepository` in `backend/src/main/java/com/aiteacher/document/` — `@Entity(name="figure")`, fields: `id` (String PK), `bookId` (UUID), `sectionId` (String, nullable), `chapterId` (String, nullable), `label` (String), `caption` (TEXT), `figureType` (`@Enumerated` FigureType), `page` (int), `imagePath` (String), `captionEmbeddingId` (UUID, nullable), `createdAt` (Instant); `FigureRepository` with `findAllByBookId(UUID)`, `deleteAllByBookId(UUID)`
|
||||
- [X] T012 Create `ChunkFigureRefEntity` JPA entity and `ChunkFigureRefRepository` in `backend/src/main/java/com/aiteacher/document/` — composite PK `(chunkId UUID, figureId String)`, `mentionPage` (int); `ChunkFigureRefRepository` with `findByChunkIdIn(List<UUID>)`, `deleteByFigureIdIn(List<String>)`
|
||||
|
||||
**Checkpoint**: Migrations will run on next startup; all JPA entities are wired; figure storage reads config correctly
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: User Story 2 — All Pages Scanned for Images During Embedding (Priority: P1)
|
||||
|
||||
**Goal**: When a book is uploaded, every page is inspected for images; each found image is extracted, persisted, described, and embedded as a searchable chunk alongside its metadata
|
||||
|
||||
**Independent Test**: Upload a PDF containing at least one page with a labelled anatomical diagram. After status shows `READY`, call `GET /api/v1/books/{id}/figures` — response must contain at least one entry with `figureType`, `caption`, `page`, and `imageUrl` populated. Verify the PNG file exists at the path in `imagePath`.
|
||||
|
||||
- [X] T013 [US2] Create `PdfStructureParser` service in `backend/src/main/java/com/aiteacher/document/PdfStructureParser.java` — uses Spring AI's `PagePdfDocumentReader` to extract per-page text; groups pages into `SectionEntity` records using heading-detection heuristics (lines matching `^\d+(\.\d+)*\s+[A-Z]`); groups sections into `ChapterEntity` records; persists both to Postgres via `ChapterRepository` and `SectionRepository`; returns `List<SectionEntity>` for the book
|
||||
- [X] T014 [US2] Create `FigureExtractionService` in `backend/src/main/java/com/aiteacher/document/FigureExtractionService.java` — opens PDF with PDFBox `PDDocument`; iterates pages; extracts `PDImageXObject` instances; skips images whose width or height are below `min-image-size-px`; classifies `FigureType` using the keyword-matching table from data-model.md §FigureType; parses caption from the nearest text line matching `CAPTION_PATTERN`; saves PNG via `FigureStorageService`; persists `FigureEntity` to `FigureRepository`; returns `List<FigureEntity>` per book
|
||||
- [X] T015 [US2] Create `VisionDescriptionService` in `backend/src/main/java/com/aiteacher/document/VisionDescriptionService.java` — accepts a `Path` to a PNG and a caption String; calls the OpenAI vision model (via Spring AI `ChatClient` with image media type) to generate a 2–4 sentence clinical description; returns the generated description string; handles API failures by returning the caption as fallback
|
||||
- [X] T016 [US2] Create `TextChunkingService` in `backend/src/main/java/com/aiteacher/document/TextChunkingService.java` — accepts a `SectionEntity`; splits `fullText` into overlapping 400–600 token windows (20-token overlap); wraps each window in a Spring AI `Document` with the flat metadata map defined in data-model.md §Text chunk document; returns `List<Document>`
|
||||
- [X] T017 [US2] Create `ChunkFigureRefService` in `backend/src/main/java/com/aiteacher/document/ChunkFigureRefService.java` — accepts a Spring AI `Document` (with its `id` as `chunkId`) and a `List<FigureEntity>` for the book; scans chunk text for patterns `Fig\.\s*\d+[\-\.]\d+` and `Figure\s+\d+[\-\.]\d+`; matches against figure labels; persists `ChunkFigureRefEntity` rows via `ChunkFigureRefRepository`
|
||||
- [X] T018 [US2] Rewrite `BookEmbeddingService.embedBook()` in `backend/src/main/java/com/aiteacher/book/BookEmbeddingService.java` to orchestrate the full pipeline: (1) `PdfStructureParser` → sections; (2) parallel: `FigureExtractionService` + `TextChunkingService` for each section; (3) `VisionDescriptionService` for each figure; (4) embed figure captions+descriptions as `Document`s (metadata per data-model.md §Figure caption document) into `vectorStore`; (5) embed text chunks into `vectorStore`; (6) `ChunkFigureRefService` for each chunk; update `captionEmbeddingId` on `FigureEntity` after embedding
|
||||
- [X] T019 [US2] Extend `BookEmbeddingService.deleteBookChunks()` to also delete: all `ChunkFigureRefEntity` rows (via `findByFigureIdIn`), all `FigureEntity` rows (via `deleteAllByBookId`), all figure PNG files (via `FigureStorageService.delete(bookId)`), all `SectionEntity` and `ChapterEntity` rows for the book
|
||||
- [X] T020 [US2] Add `POST /api/v1/books/{id}/reembed` endpoint to `BookController` in `backend/src/main/java/com/aiteacher/book/BookController.java` — returns `202` with `{ bookId, status: "PROCESSING" }`; returns `404` if not found; returns `409` if already `PROCESSING`; calls `deleteBookChunks()` then `embedBook()` asynchronously
|
||||
|
||||
**Checkpoint**: Upload a PDF with figures → poll `GET /api/v1/books` for `READY` → `GET /api/v1/books/{id}/figures` returns figure list → PNG accessible at `GET /api/v1/figures/{bookId}/{filename}`
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: User Story 1 — Image Content Surfaced in Query Results (Priority: P1)
|
||||
|
||||
**Goal**: User asks a question answered by a diagram — the system retrieves that diagram's content and surfaces it in the chat response with a citation
|
||||
|
||||
**Independent Test**: With a book embedded (Phase 3 checkpoint passed), ask a chat question whose answer is depicted only in a diagram. The response `sources` array must contain at least one entry with `type: "FIGURE"` and a non-empty `imageUrl`.
|
||||
|
||||
- [X] T021 [US1] Create `NeurosurgeryRetriever` service in `backend/src/main/java/com/aiteacher/retrieval/NeurosurgeryRetriever.java` — (1) text chunk search: `vectorStore.similaritySearch` with filter `type == TEXT AND book_id == bookId`, topK=5; (2) figure search: same store, filter `type == FIGURE AND book_id == bookId`, topK=3; (3) expand text chunk results to parent sections via `SectionRepository.findAllById(sectionIds)`; (4) fetch explicitly linked figures via `ChunkFigureRefRepository.findByChunkIdIn(chunkIds)` + `FigureRepository.findAllById`; (5) deduplicate figures across lists by `figureId`; return `RetrievalResult(parentSections, figureVectorHits, linkedFigures)` — add `RetrievalResult` record in same package
|
||||
- [X] T022 [US1] Refactor `ChatService.sendMessage()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java` — replace `QuestionAnswerAdvisor` with a manual call to `NeurosurgeryRetriever`; build the LLM user message from: section full texts as `[Section X.Y — Title, pp.A-B]\n{fullText}` blocks, followed by `AVAILABLE FIGURES FOR THIS SECTION:` list with `- {label} (p.{page}): {caption} [image: {filename}]` lines per figure; append the instruction `When referencing diagrams, cite them as [Fig. X, p.N].`; send via `chatClient.prompt().system(SYSTEM_PROMPT).user(prompt).call()`
|
||||
- [X] T023 [US1] Add `GET /api/v1/books/{id}/figures` endpoint to `BookController` — returns `200` with `List<FigureResponse>`; `FigureResponse` is a new record in `backend/src/main/java/com/aiteacher/book/FigureResponse.java` with fields `figureId`, `label`, `caption`, `figureType`, `page`, `imageUrl` (assembled as `/api/v1/figures/{bookId}/{filename}`), `sectionId`, `sectionTitle`; returns `404` if book not found
|
||||
- [X] T024 [US1] Update `extractSources()` in `ChatService` to build both TEXT and FIGURE source entries: TEXT entries keep existing fields plus `"type": "TEXT"`; FIGURE entries add `"type": "FIGURE"`, `"figureId"`, `"label"`, `"caption"`, `"figureType"`, `"imageUrl"` — source data comes from `RetrievalResult` (text chunk Documents and merged FigureEntity list)
|
||||
|
||||
**Checkpoint**: Chat question answered by a diagram → response body contains `sources[n].type == "FIGURE"` with populated `imageUrl`; image loads from the returned URL
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: User Story 3 — Rich Metadata Enables Precise Source Attribution (Priority: P2)
|
||||
|
||||
**Goal**: Users see distinct, informative citations for text vs. image sources; image sources render inline in the chat UI
|
||||
|
||||
**Independent Test**: After triggering a response with figure sources, inspect the chat message in the UI — text sources and figure sources are visually distinguishable; figure sources render the actual image inline using the `imageUrl`
|
||||
|
||||
- [X] T025 [P] [US3] Update API response types in `frontend/src/services/api.ts` — extend the `Source` type to include `type: 'TEXT' | 'FIGURE'`, `figureId?: string`, `label?: string`, `caption?: string`, `figureType?: string`, `imageUrl?: string`
|
||||
- [X] T026 [P] [US3] Update the chat source/citation display in the frontend (wherever sources are currently rendered, e.g. `frontend/src/components/` or `frontend/src/views/`) — render TEXT sources with a document icon and page number; render FIGURE sources with the image (`<img :src="source.imageUrl">`) below the label and caption text
|
||||
- [X] T027 [US3] Add figure-type badge rendering in the frontend figure display: show a label derived from `figureType` (e.g. "MRI / CT", "Anatomical Diagram", "Table") alongside the figure caption so users can identify content type without opening the image
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Polish & Cross-Cutting Concerns
|
||||
|
||||
- [X] T028 Update `README.md` Mermaid architecture diagram to show three storage tiers: pgvector (semantic search), Postgres (source of truth — sections, figures, refs), and file store (extracted PNGs) — **required by Constitution Principle IV in the same PR as the other changes**
|
||||
- [X] T029 [P] Write `FigureExtractionServiceTest` unit test in `backend/src/test/java/com/aiteacher/document/FigureExtractionServiceTest.java` — test: images below min size are skipped; `FigureType` classification matches keyword table in data-model.md; caption parsed from adjacent text line
|
||||
- [X] T030 [P] Write `NeurosurgeryRetrieverTest` unit test in `backend/src/test/java/com/aiteacher/retrieval/NeurosurgeryRetrieverTest.java` — test: figure IDs from both vector hits and chunk refs are merged without duplicates; `RetrievalResult` contains the deduplicated set
|
||||
- [X] T031 Run quickstart.md validation end-to-end: upload a real PDF with a labelled diagram → wait for `READY` → call `GET /api/v1/books/{id}/figures` → send a chat message about the diagram → verify `sources` contains a `FIGURE` entry → verify `imageUrl` resolves to a PNG
|
||||
|
||||
---
|
||||
|
||||
## Dependencies & Execution Order
|
||||
|
||||
### Phase Dependencies
|
||||
|
||||
- **Phase 1 (Setup)**: No dependencies — start immediately
|
||||
- **Phase 2 (Foundational)**: Requires Phase 1 complete (migrations must run before JPA entities can be wired)
|
||||
- **Phase 3 (US2)**: Requires Phase 2 complete — all JPA entities + FigureStorageService must exist
|
||||
- **Phase 4 (US1)**: Requires Phase 3 complete — figures must exist in Postgres + vector store before retrieval can surface them
|
||||
- **Phase 5 (US3)**: Requires Phase 4 complete — frontend depends on the extended `sources` format from T024
|
||||
- **Phase 6 (Polish)**: Requires all story phases complete
|
||||
|
||||
### Within Phase 3 (Embedding Pipeline)
|
||||
|
||||
```
|
||||
T013 (PdfStructureParser) ──────────────────────────┐
|
||||
T014 (FigureExtractionService) ─────────────────────┤
|
||||
T015 (VisionDescriptionService) ────────────────────┤─→ T018 (BookEmbeddingService orchestrator)
|
||||
T016 (TextChunkingService) ─────────────────────────┤ └─→ T019 (cleanup)
|
||||
T017 (ChunkFigureRefService) ───────────────────────┘ └─→ T020 (reembed endpoint)
|
||||
```
|
||||
|
||||
T013–T017 can be implemented in parallel (different files, no shared dependencies). T018 depends on all of them.
|
||||
|
||||
### Within Phase 4 (Retrieval)
|
||||
|
||||
```
|
||||
T021 (NeurosurgeryRetriever) ──────────────────────┐
|
||||
└─→ T022 (ChatService update)
|
||||
└─→ T024 (extractSources update)
|
||||
T023 (figures endpoint) ── independent [P]
|
||||
```
|
||||
|
||||
### Parallel Opportunities per Phase
|
||||
|
||||
**Phase 2**: T005, T006, T009, T010, T011 can all run in parallel. T007 depends on T006. T012 can follow T010/T011.
|
||||
|
||||
**Phase 3**: T013, T014, T015, T016, T017 all in parallel. T018 depends on all.
|
||||
|
||||
**Phase 5**: T025 and T026 in parallel; T027 can follow T026.
|
||||
|
||||
**Phase 6**: T029 and T030 in parallel.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Strategy
|
||||
|
||||
### MVP: User Story 2 Only (Embedding Pipeline)
|
||||
|
||||
1. Phase 1 (Setup) → Phase 2 (Foundational) → Phase 3 (US2, T013–T020)
|
||||
2. **Validate**: `GET /api/v1/books/{id}/figures` returns figures for a test book
|
||||
3. **Stop and demo** — the pipeline produces image chunks without any retrieval changes
|
||||
|
||||
### Full Feature Delivery
|
||||
|
||||
1. Phase 1 + 2 → Foundation ready
|
||||
2. Phase 3 (US2) → Embedding pipeline produces image chunks ← **demo point**
|
||||
3. Phase 4 (US1) → Chat surfaces image content in responses ← **core payoff**
|
||||
4. Phase 5 (US3) → Frontend renders inline figures with type badges
|
||||
5. Phase 6 (Polish) → README, tests, end-to-end validation
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- [P] tasks = different files, no dependencies on each other within the same phase
|
||||
- [US1/US2/US3] label maps each task to a user story for traceability
|
||||
- Phase 3 (US2) must be fully complete before beginning Phase 4 (US1) — retrieval cannot surface figures that do not yet exist
|
||||
- The `uploads/figures/` directory must exist and be writable at runtime; `FigureStorageService` creates subdirectories automatically
|
||||
- Re-embedding (T020) deletes all existing chunks and figures for the book before re-running — safe to call on books processed by feature 001
|
||||
Reference in New Issue
Block a user