ai-teacher/specs/002-image-aware-embedding/plan.md

# Implementation Plan: Enhanced Embedding with Image Parsing and Metadata

**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-04 | **Spec**: [spec.md](spec.md)
**Input**: Feature specification from `/specs/002-image-aware-embedding/spec.md`

## Summary

Enhance the PDF embedding pipeline to extract figures and generate AI descriptions for them,
making image content semantically searchable alongside text. PDF parsing and figure extraction
are delegated to a local **Marker** server (`http://localhost:8000/marker/upload`), which
returns reading-order text and pre-cropped figure images (base64) in a single JSON response,
eliminating the need for PDFBox column heuristics and figure bbox rendering.

## Technical Context

**Language/Version**: Java 25 (backend), TypeScript / Node 20 (frontend)
**Primary Dependencies**: Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (embeddings +
GPT-4o vision), PDFBox 3.0.3 (via `spring-ai-pdf-document-reader` — retained transitively,
no longer used directly), Marker local HTTP API (`http://localhost:8000/marker/upload`)
**Storage**: PostgreSQL (JPA + Flyway), pgvector (Spring AI `VectorStore`), S3-compatible
object store (figure images via `FigureStorageService`)
**Testing**: Maven / JUnit 5 (`spring-boot-starter-test`)
**Target Platform**: Linux server
**Project Type**: Web application (backend API + frontend client)
**Performance Goals**: SC-003 — book processing time ≤ 3× text-only for ≤ 500 pages
**Constraints**: REST API only (Constitution III); Marker server must be running locally;
S3-compatible storage configured via env vars
**Scale/Scope**: POC — handful of books, <10 users

## Constitution Check

*GATE: Must pass before Phase 0 research. Re-checked after Phase 1 design.*

| Principle | Status | Notes |
|-----------|--------|-------|
| **I. KISS** | ✅ Justified | Marker replaces a bespoke PDFBox column heuristic + Google Cloud SDK with one HTTP call. Net complexity reduction vs. the Document AI approach. |
| **II. Easy to Change** | ✅ | `MarkerPageParser` is the only class that knows about Marker; swap the implementation to replace Marker with any other parser. `PageResult` DTO remains unchanged. |
| **III. Web-First** | ✅ | Internal pipeline change; no public API contract change. |
| **IV. Documentation** | ✅ | README must be updated to show Marker as a local external service. |

## Project Structure

### Documentation (this feature)

```text
specs/002-image-aware-embedding/
├── plan.md              # This file
├── research.md          # Phase 0 output
├── data-model.md        # Phase 1 output
├── quickstart.md        # Phase 1 output
├── contracts/
│   ├── api.md           # HTTP API contracts (unchanged from initial plan)
│   └── marker-page-result.md  # Internal DTO contract (MarkerPageParser → downstream)
└── tasks.md             # Phase 2 output (/speckit.tasks — not created here)
```

### Source Code

```text
backend/
├── src/main/java/com/aiteacher/
│   ├── config/
│   │   └── MarkerConfig.java          # NEW: RestClient bean + base-url property
│   ├── document/
│   │   ├── MarkerPageParser.java      # NEW: replaces DocumentAiPageParser + PdfStructureParser
│   │   ├── PageResult.java            # UPDATED: FigureBbox → FigureData (bytes not bbox)
│   │   ├── FigureExtractionService.java  # UPDATED: no PDFBox render; decode bytes directly
│   │   ├── TextChunkingService.java   # UNCHANGED
│   │   ├── VisionDescriptionService.java # UNCHANGED
│   │   └── [removed] DocumentAiPageParser.java
│   ├── book/
│   │   └── BookEmbeddingService.java  # MINOR UPDATE: inject MarkerPageParser, drop DocumentAiPageParser
│   └── [removed] config/DocumentAiConfig.java
├── src/main/resources/
│   └── application.yaml               # UPDATED: remove document-ai.*, add marker.base-url
└── pom.xml                            # UPDATED: remove google-cloud-document-ai
```

**Structure Decision**: Option 2 (backend + frontend) per constitution Technology Constraints.
Frontend changes are display-only (render figure citations inline).

## Complexity Tracking

> No constitution violations — Marker reduces complexity compared to the previous
> Google Document AI approach (fewer dependencies, no GCP credentials, no 15-page batching).