86 lines
4.4 KiB
Markdown
86 lines
4.4 KiB
Markdown
# Implementation Plan: Enhanced Embedding with Image Parsing and Metadata
|
||
|
||
**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-04 | **Spec**: [spec.md](spec.md)
|
||
**Input**: Feature specification from `/specs/002-image-aware-embedding/spec.md`
|
||
|
||
## Summary
|
||
|
||
Enhance the PDF embedding pipeline to extract figures and generate AI descriptions for them,
|
||
making image content semantically searchable alongside text. PDF parsing and figure extraction
|
||
are delegated to a local **Marker** server (`http://localhost:8000/marker/upload`), which
|
||
returns reading-order text and pre-cropped figure images (base64) in a single JSON response,
|
||
eliminating the need for PDFBox column heuristics and figure bbox rendering.
|
||
|
||
## Technical Context
|
||
|
||
**Language/Version**: Java 25 (backend), TypeScript / Node 20 (frontend)
|
||
**Primary Dependencies**: Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (embeddings +
|
||
GPT-4o vision), PDFBox 3.0.3 (via `spring-ai-pdf-document-reader` — retained transitively,
|
||
no longer used directly), Marker local HTTP API (`http://localhost:8000/marker/upload`)
|
||
**Storage**: PostgreSQL (JPA + Flyway), pgvector (Spring AI `VectorStore`), S3-compatible
|
||
object store (figure images via `FigureStorageService`)
|
||
**Testing**: Maven / JUnit 5 (`spring-boot-starter-test`)
|
||
**Target Platform**: Linux server
|
||
**Project Type**: Web application (backend API + frontend client)
|
||
**Performance Goals**: SC-003 — book processing time ≤ 3× text-only for ≤ 500 pages
|
||
**Constraints**: REST API only (Constitution III); Marker server must be running locally;
|
||
S3-compatible storage configured via env vars
|
||
**Scale/Scope**: POC — handful of books, <10 users
|
||
|
||
## Constitution Check
|
||
|
||
*GATE: Must pass before Phase 0 research. Re-checked after Phase 1 design.*
|
||
|
||
| Principle | Status | Notes |
|
||
|-----------|--------|-------|
|
||
| **I. KISS** | ✅ Justified | Marker replaces a bespoke PDFBox column heuristic + Google Cloud SDK with one HTTP call. Net complexity reduction vs. the Document AI approach. |
|
||
| **II. Easy to Change** | ✅ | `MarkerPageParser` is the only class that knows about Marker; swap the implementation to replace Marker with any other parser. `PageResult` DTO remains unchanged. |
|
||
| **III. Web-First** | ✅ | Internal pipeline change; no public API contract change. |
|
||
| **IV. Documentation** | ✅ | README must be updated to show Marker as a local external service. |
|
||
|
||
## Project Structure
|
||
|
||
### Documentation (this feature)
|
||
|
||
```text
|
||
specs/002-image-aware-embedding/
|
||
├── plan.md # This file
|
||
├── research.md # Phase 0 output
|
||
├── data-model.md # Phase 1 output
|
||
├── quickstart.md # Phase 1 output
|
||
├── contracts/
|
||
│ ├── api.md # HTTP API contracts (unchanged from initial plan)
|
||
│ └── marker-page-result.md # Internal DTO contract (MarkerPageParser → downstream)
|
||
└── tasks.md # Phase 2 output (/speckit.tasks — not created here)
|
||
```
|
||
|
||
### Source Code
|
||
|
||
```text
|
||
backend/
|
||
├── src/main/java/com/aiteacher/
|
||
│ ├── config/
|
||
│ │ └── MarkerConfig.java # NEW: RestClient bean + base-url property
|
||
│ ├── document/
|
||
│ │ ├── MarkerPageParser.java # NEW: replaces DocumentAiPageParser + PdfStructureParser
|
||
│ │ ├── PageResult.java # UPDATED: FigureBbox → FigureData (bytes not bbox)
|
||
│ │ ├── FigureExtractionService.java # UPDATED: no PDFBox render; decode bytes directly
|
||
│ │ ├── TextChunkingService.java # UNCHANGED
|
||
│ │ ├── VisionDescriptionService.java # UNCHANGED
|
||
│ │ └── [removed] DocumentAiPageParser.java
|
||
│ ├── book/
|
||
│ │ └── BookEmbeddingService.java # MINOR UPDATE: inject MarkerPageParser, drop DocumentAiPageParser
|
||
│ └── [removed] config/DocumentAiConfig.java
|
||
├── src/main/resources/
|
||
│ └── application.yaml # UPDATED: remove document-ai.*, add marker.base-url
|
||
└── pom.xml # UPDATED: remove google-cloud-document-ai
|
||
```
|
||
|
||
**Structure Decision**: Option 2 (backend + frontend) per constitution Technology Constraints.
|
||
Frontend changes are display-only (render figure citations inline).
|
||
|
||
## Complexity Tracking
|
||
|
||
> No constitution violations — Marker reduces complexity compared to the previous
|
||
> Google Document AI approach (fewer dependencies, no GCP credentials, no 15-page batching).
|