# Implementation Plan: Enhanced Embedding with Image Parsing and Metadata **Branch**: `002-image-aware-embedding` | **Date**: 2026-04-04 | **Spec**: [spec.md](spec.md) **Input**: Feature specification from `/specs/002-image-aware-embedding/spec.md` ## Summary Enhance the PDF embedding pipeline to extract figures and generate AI descriptions for them, making image content semantically searchable alongside text. PDF parsing and figure extraction are delegated to a local **Marker** server (`http://localhost:8000/marker/upload`), which returns reading-order text and pre-cropped figure images (base64) in a single JSON response, eliminating the need for PDFBox column heuristics and figure bbox rendering. ## Technical Context **Language/Version**: Java 25 (backend), TypeScript / Node 20 (frontend) **Primary Dependencies**: Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (embeddings + GPT-4o vision), PDFBox 3.0.3 (via `spring-ai-pdf-document-reader` — retained transitively, no longer used directly), Marker local HTTP API (`http://localhost:8000/marker/upload`) **Storage**: PostgreSQL (JPA + Flyway), pgvector (Spring AI `VectorStore`), S3-compatible object store (figure images via `FigureStorageService`) **Testing**: Maven / JUnit 5 (`spring-boot-starter-test`) **Target Platform**: Linux server **Project Type**: Web application (backend API + frontend client) **Performance Goals**: SC-003 — book processing time ≤ 3× text-only for ≤ 500 pages **Constraints**: REST API only (Constitution III); Marker server must be running locally; S3-compatible storage configured via env vars **Scale/Scope**: POC — handful of books, <10 users ## Constitution Check *GATE: Must pass before Phase 0 research. Re-checked after Phase 1 design.* | Principle | Status | Notes | |-----------|--------|-------| | **I. KISS** | ✅ Justified | Marker replaces a bespoke PDFBox column heuristic + Google Cloud SDK with one HTTP call. Net complexity reduction vs. the Document AI approach. | | **II. Easy to Change** | ✅ | `MarkerPageParser` is the only class that knows about Marker; swap the implementation to replace Marker with any other parser. `PageResult` DTO remains unchanged. | | **III. Web-First** | ✅ | Internal pipeline change; no public API contract change. | | **IV. Documentation** | ✅ | README must be updated to show Marker as a local external service. | ## Project Structure ### Documentation (this feature) ```text specs/002-image-aware-embedding/ ├── plan.md # This file ├── research.md # Phase 0 output ├── data-model.md # Phase 1 output ├── quickstart.md # Phase 1 output ├── contracts/ │ ├── api.md # HTTP API contracts (unchanged from initial plan) │ └── marker-page-result.md # Internal DTO contract (MarkerPageParser → downstream) └── tasks.md # Phase 2 output (/speckit.tasks — not created here) ``` ### Source Code ```text backend/ ├── src/main/java/com/aiteacher/ │ ├── config/ │ │ └── MarkerConfig.java # NEW: RestClient bean + base-url property │ ├── document/ │ │ ├── MarkerPageParser.java # NEW: replaces DocumentAiPageParser + PdfStructureParser │ │ ├── PageResult.java # UPDATED: FigureBbox → FigureData (bytes not bbox) │ │ ├── FigureExtractionService.java # UPDATED: no PDFBox render; decode bytes directly │ │ ├── TextChunkingService.java # UNCHANGED │ │ ├── VisionDescriptionService.java # UNCHANGED │ │ └── [removed] DocumentAiPageParser.java │ ├── book/ │ │ └── BookEmbeddingService.java # MINOR UPDATE: inject MarkerPageParser, drop DocumentAiPageParser │ └── [removed] config/DocumentAiConfig.java ├── src/main/resources/ │ └── application.yaml # UPDATED: remove document-ai.*, add marker.base-url └── pom.xml # UPDATED: remove google-cloud-document-ai ``` **Structure Decision**: Option 2 (backend + frontend) per constitution Technology Constraints. Frontend changes are display-only (render figure citations inline). ## Complexity Tracking > No constitution violations — Marker reduces complexity compared to the previous > Google Document AI approach (fewer dependencies, no GCP credentials, no 15-page batching).