17 KiB
Tasks: Enhanced Embedding with Image Parsing and Metadata
Input: Design documents from /specs/002-image-aware-embedding/
Prerequisites: plan.md ✓ | spec.md ✓ | research.md ✓ | data-model.md ✓ | contracts/ ✓
Organization: Tasks grouped by user story to enable independent implementation and testing.
Format: [ID] [P?] [Story] Description
- [P]: Can run in parallel (different files, no shared dependencies)
- [US1/US2/US3]: Which user story this task belongs to
Phase 1: Setup (Shared Infrastructure)
Purpose: Database migrations and configuration that establish the foundation for all new code
- T001 Create Flyway migration
V4__document_hierarchy.sql— addchapterandsectiontables per data-model.md §Postgres Schema inbackend/src/main/resources/db/migration/V4__document_hierarchy.sql - T002 Create Flyway migration
V5__figures_and_refs.sql— addfigureandchunk_figure_reftables per data-model.md §Postgres Schema inbackend/src/main/resources/db/migration/V5__figures_and_refs.sql - T003 Add figure-storage configuration keys to
backend/src/main/resources/application.properties:app.figure-storage.base-path=./uploadsandapp.figure-storage.min-image-size-px=100 - T004 Add
uploads/directory to.gitignoreat repo root; createuploads/figures/.gitkeepto preserve directory structure
Phase 2: Foundational (Blocking Prerequisites)
Purpose: Core types and infrastructure that ALL user stories depend on — nothing in Phase 3+ can start until this phase is complete
⚠️ CRITICAL: No user story work can begin until this phase is complete
- T005 [P] Create
FigureTypeenum inbackend/src/main/java/com/aiteacher/document/FigureType.java— values:ANATOMICAL_DIAGRAM,SURGICAL_PHOTOGRAPH,MRI_CT_SCAN,TABLE,CHART,INTRAOPERATIVE_IMAGE - T006 [P] Create
FigureStorageServiceinterface inbackend/src/main/java/com/aiteacher/figure/FigureStorageService.java— declarePath save(UUID bookId, String figureId, BufferedImage image),Path resolve(UUID bookId, String filename), andvoid delete(UUID bookId) - T007 Create
LocalFigureStorageServiceimplementation inbackend/src/main/java/com/aiteacher/figure/LocalFigureStorageService.java— writes PNG files under${app.figure-storage.base-path}/figures/{bookId}/; implementsFigureStorageService; depends on T006 - T008 Create
FigureStorageConfigbean inbackend/src/main/java/com/aiteacher/config/FigureStorageConfig.java— readsapp.figure-storage.base-pathandapp.figure-storage.min-image-size-pxas@ConfigurationProperties; registersLocalFigureStorageServiceas@Bean; addsResourceHandlermappingGET /api/v1/figures/**to the base-path directory - T009 [P] Create
ChapterEntityJPA entity andChapterRepositoryinbackend/src/main/java/com/aiteacher/document/—@Entity(name="chapter"), fields:id(String PK),bookId(UUID FK → book),number(int),title(String),pageStart(int),createdAt(Instant);ChapterRepository extends JpaRepository<ChapterEntity, String> - T010 [P] Create
SectionEntityJPA entity andSectionRepositoryinbackend/src/main/java/com/aiteacher/document/—@Entity(name="section"), fields:id(String PK),chapterId(String FK → chapter),bookId(UUID FK → book),number(String),title(String),pageStart/pageEnd(int),fullText(TEXT column),createdAt(Instant);SectionRepository extends JpaRepository<SectionEntity, String>withfindAllByBookId(UUID) - T011 [P] Create
FigureEntityJPA entity andFigureRepositoryinbackend/src/main/java/com/aiteacher/document/—@Entity(name="figure"), fields:id(String PK),bookId(UUID),sectionId(String, nullable),chapterId(String, nullable),label(String),caption(TEXT),figureType(@EnumeratedFigureType),page(int),imagePath(String),captionEmbeddingId(UUID, nullable),createdAt(Instant);FigureRepositorywithfindAllByBookId(UUID),deleteAllByBookId(UUID) - T012 Create
ChunkFigureRefEntityJPA entity andChunkFigureRefRepositoryinbackend/src/main/java/com/aiteacher/document/— composite PK(chunkId UUID, figureId String),mentionPage(int);ChunkFigureRefRepositorywithfindByChunkIdIn(List<UUID>),deleteByFigureIdIn(List<String>)
Checkpoint: Migrations will run on next startup; all JPA entities are wired; figure storage reads config correctly
Phase 3: User Story 2 — All Pages Scanned for Images During Embedding (Priority: P1)
Goal: When a book is uploaded, every page is inspected for images; each found image is extracted, persisted, described, and embedded as a searchable chunk alongside its metadata
Independent Test: Upload a PDF containing at least one page with a labelled anatomical diagram. After status shows READY, call GET /api/v1/books/{id}/figures — response must contain at least one entry with figureType, caption, page, and imageUrl populated. Verify the PNG file exists at the path in imagePath.
- T013 [US2] Create
PdfStructureParserservice inbackend/src/main/java/com/aiteacher/document/PdfStructureParser.java— uses Spring AI'sPagePdfDocumentReaderto extract per-page text; groups pages intoSectionEntityrecords using heading-detection heuristics (lines matching^\d+(\.\d+)*\s+[A-Z]); groups sections intoChapterEntityrecords; persists both to Postgres viaChapterRepositoryandSectionRepository; returnsList<SectionEntity>for the book - T014 [US2] Create
FigureExtractionServiceinbackend/src/main/java/com/aiteacher/document/FigureExtractionService.java— opens PDF with PDFBoxPDDocument; iterates pages; extractsPDImageXObjectinstances; skips images whose width or height are belowmin-image-size-px; classifiesFigureTypeusing the keyword-matching table from data-model.md §FigureType; parses caption from the nearest text line matchingCAPTION_PATTERN; saves PNG viaFigureStorageService; persistsFigureEntitytoFigureRepository; returnsList<FigureEntity>per book - T015 [US2] Create
VisionDescriptionServiceinbackend/src/main/java/com/aiteacher/document/VisionDescriptionService.java— accepts aPathto a PNG and a caption String; calls the OpenAI vision model (via Spring AIChatClientwith image media type) to generate a 2–4 sentence clinical description; returns the generated description string; handles API failures by returning the caption as fallback - T016 [US2] Create
TextChunkingServiceinbackend/src/main/java/com/aiteacher/document/TextChunkingService.java— accepts aSectionEntity; splitsfullTextinto overlapping 400–600 token windows (20-token overlap); wraps each window in a Spring AIDocumentwith the flat metadata map defined in data-model.md §Text chunk document; returnsList<Document> - T017 [US2] Create
ChunkFigureRefServiceinbackend/src/main/java/com/aiteacher/document/ChunkFigureRefService.java— accepts a Spring AIDocument(with itsidaschunkId) and aList<FigureEntity>for the book; scans chunk text for patternsFig\.\s*\d+[\-\.]\d+andFigure\s+\d+[\-\.]\d+; matches against figure labels; persistsChunkFigureRefEntityrows viaChunkFigureRefRepository - T018 [US2] Rewrite
BookEmbeddingService.embedBook()inbackend/src/main/java/com/aiteacher/book/BookEmbeddingService.javato orchestrate the full pipeline: (1)PdfStructureParser→ sections; (2) parallel:FigureExtractionService+TextChunkingServicefor each section; (3)VisionDescriptionServicefor each figure; (4) embed figure captions+descriptions asDocuments (metadata per data-model.md §Figure caption document) intovectorStore; (5) embed text chunks intovectorStore; (6)ChunkFigureRefServicefor each chunk; updatecaptionEmbeddingIdonFigureEntityafter embedding - T019 [US2] Extend
BookEmbeddingService.deleteBookChunks()to also delete: allChunkFigureRefEntityrows (viafindByFigureIdIn), allFigureEntityrows (viadeleteAllByBookId), all figure PNG files (viaFigureStorageService.delete(bookId)), allSectionEntityandChapterEntityrows for the book - T020 [US2] Add
POST /api/v1/books/{id}/reembedendpoint toBookControllerinbackend/src/main/java/com/aiteacher/book/BookController.java— returns202with{ bookId, status: "PROCESSING" }; returns404if not found; returns409if alreadyPROCESSING; callsdeleteBookChunks()thenembedBook()asynchronously
Checkpoint: Upload a PDF with figures → poll GET /api/v1/books for READY → GET /api/v1/books/{id}/figures returns figure list → PNG accessible at GET /api/v1/figures/{bookId}/{filename}
Phase 4: User Story 1 — Image Content Surfaced in Query Results (Priority: P1)
Goal: User asks a question answered by a diagram — the system retrieves that diagram's content and surfaces it in the chat response with a citation
Independent Test: With a book embedded (Phase 3 checkpoint passed), ask a chat question whose answer is depicted only in a diagram. The response sources array must contain at least one entry with type: "FIGURE" and a non-empty imageUrl.
- T021 [US1] Create
NeurosurgeryRetrieverservice inbackend/src/main/java/com/aiteacher/retrieval/NeurosurgeryRetriever.java— (1) text chunk search:vectorStore.similaritySearchwith filtertype == TEXT AND book_id == bookId, topK=5; (2) figure search: same store, filtertype == FIGURE AND book_id == bookId, topK=3; (3) expand text chunk results to parent sections viaSectionRepository.findAllById(sectionIds); (4) fetch explicitly linked figures viaChunkFigureRefRepository.findByChunkIdIn(chunkIds)+FigureRepository.findAllById; (5) deduplicate figures across lists byfigureId; returnRetrievalResult(parentSections, figureVectorHits, linkedFigures)— addRetrievalResultrecord in same package - T022 [US1] Refactor
ChatService.sendMessage()inbackend/src/main/java/com/aiteacher/chat/ChatService.java— replaceQuestionAnswerAdvisorwith a manual call toNeurosurgeryRetriever; build the LLM user message from: section full texts as[Section X.Y — Title, pp.A-B]\n{fullText}blocks, followed byAVAILABLE FIGURES FOR THIS SECTION:list with- {label} (p.{page}): {caption} [image: {filename}]lines per figure; append the instructionWhen referencing diagrams, cite them as [Fig. X, p.N].; send viachatClient.prompt().system(SYSTEM_PROMPT).user(prompt).call() - T023 [US1] Add
GET /api/v1/books/{id}/figuresendpoint toBookController— returns200withList<FigureResponse>;FigureResponseis a new record inbackend/src/main/java/com/aiteacher/book/FigureResponse.javawith fieldsfigureId,label,caption,figureType,page,imageUrl(assembled as/api/v1/figures/{bookId}/{filename}),sectionId,sectionTitle; returns404if book not found - T024 [US1] Update
extractSources()inChatServiceto build both TEXT and FIGURE source entries: TEXT entries keep existing fields plus"type": "TEXT"; FIGURE entries add"type": "FIGURE","figureId","label","caption","figureType","imageUrl"— source data comes fromRetrievalResult(text chunk Documents and merged FigureEntity list)
Checkpoint: Chat question answered by a diagram → response body contains sources[n].type == "FIGURE" with populated imageUrl; image loads from the returned URL
Phase 5: User Story 3 — Rich Metadata Enables Precise Source Attribution (Priority: P2)
Goal: Users see distinct, informative citations for text vs. image sources; image sources render inline in the chat UI
Independent Test: After triggering a response with figure sources, inspect the chat message in the UI — text sources and figure sources are visually distinguishable; figure sources render the actual image inline using the imageUrl
- T025 [P] [US3] Update API response types in
frontend/src/services/api.ts— extend theSourcetype to includetype: 'TEXT' | 'FIGURE',figureId?: string,label?: string,caption?: string,figureType?: string,imageUrl?: string - T026 [P] [US3] Update the chat source/citation display in the frontend (wherever sources are currently rendered, e.g.
frontend/src/components/orfrontend/src/views/) — render TEXT sources with a document icon and page number; render FIGURE sources with the image (<img :src="source.imageUrl">) below the label and caption text - T027 [US3] Add figure-type badge rendering in the frontend figure display: show a label derived from
figureType(e.g. "MRI / CT", "Anatomical Diagram", "Table") alongside the figure caption so users can identify content type without opening the image
Phase 6: Polish & Cross-Cutting Concerns
- T028 Update
README.mdMermaid architecture diagram to show three storage tiers: pgvector (semantic search), Postgres (source of truth — sections, figures, refs), and file store (extracted PNGs) — required by Constitution Principle IV in the same PR as the other changes - T029 [P] Write
FigureExtractionServiceTestunit test inbackend/src/test/java/com/aiteacher/document/FigureExtractionServiceTest.java— test: images below min size are skipped;FigureTypeclassification matches keyword table in data-model.md; caption parsed from adjacent text line - T030 [P] Write
NeurosurgeryRetrieverTestunit test inbackend/src/test/java/com/aiteacher/retrieval/NeurosurgeryRetrieverTest.java— test: figure IDs from both vector hits and chunk refs are merged without duplicates;RetrievalResultcontains the deduplicated set - T031 Run quickstart.md validation end-to-end: upload a real PDF with a labelled diagram → wait for
READY→ callGET /api/v1/books/{id}/figures→ send a chat message about the diagram → verifysourcescontains aFIGUREentry → verifyimageUrlresolves to a PNG
Dependencies & Execution Order
Phase Dependencies
- Phase 1 (Setup): No dependencies — start immediately
- Phase 2 (Foundational): Requires Phase 1 complete (migrations must run before JPA entities can be wired)
- Phase 3 (US2): Requires Phase 2 complete — all JPA entities + FigureStorageService must exist
- Phase 4 (US1): Requires Phase 3 complete — figures must exist in Postgres + vector store before retrieval can surface them
- Phase 5 (US3): Requires Phase 4 complete — frontend depends on the extended
sourcesformat from T024 - Phase 6 (Polish): Requires all story phases complete
Within Phase 3 (Embedding Pipeline)
T013 (PdfStructureParser) ──────────────────────────┐
T014 (FigureExtractionService) ─────────────────────┤
T015 (VisionDescriptionService) ────────────────────┤─→ T018 (BookEmbeddingService orchestrator)
T016 (TextChunkingService) ─────────────────────────┤ └─→ T019 (cleanup)
T017 (ChunkFigureRefService) ───────────────────────┘ └─→ T020 (reembed endpoint)
T013–T017 can be implemented in parallel (different files, no shared dependencies). T018 depends on all of them.
Within Phase 4 (Retrieval)
T021 (NeurosurgeryRetriever) ──────────────────────┐
└─→ T022 (ChatService update)
└─→ T024 (extractSources update)
T023 (figures endpoint) ── independent [P]
Parallel Opportunities per Phase
Phase 2: T005, T006, T009, T010, T011 can all run in parallel. T007 depends on T006. T012 can follow T010/T011.
Phase 3: T013, T014, T015, T016, T017 all in parallel. T018 depends on all.
Phase 5: T025 and T026 in parallel; T027 can follow T026.
Phase 6: T029 and T030 in parallel.
Implementation Strategy
MVP: User Story 2 Only (Embedding Pipeline)
- Phase 1 (Setup) → Phase 2 (Foundational) → Phase 3 (US2, T013–T020)
- Validate:
GET /api/v1/books/{id}/figuresreturns figures for a test book - Stop and demo — the pipeline produces image chunks without any retrieval changes
Full Feature Delivery
- Phase 1 + 2 → Foundation ready
- Phase 3 (US2) → Embedding pipeline produces image chunks ← demo point
- Phase 4 (US1) → Chat surfaces image content in responses ← core payoff
- Phase 5 (US3) → Frontend renders inline figures with type badges
- Phase 6 (Polish) → README, tests, end-to-end validation
Notes
- [P] tasks = different files, no dependencies on each other within the same phase
- [US1/US2/US3] label maps each task to a user story for traceability
- Phase 3 (US2) must be fully complete before beginning Phase 4 (US1) — retrieval cannot surface figures that do not yet exist
- The
uploads/figures/directory must exist and be writable at runtime;FigureStorageServicecreates subdirectories automatically - Re-embedding (T020) deletes all existing chunks and figures for the book before re-running — safe to call on books processed by feature 001