first implementation - image/drawing integration

This commit is contained in:
Adrien
2026-04-04 12:56:56 +02:00
parent fc5b22fba1
commit 5acfdd33c1
42 changed files with 2854 additions and 151 deletions
+12
View File
@@ -1,3 +1,15 @@
# Runtime uploads (extracted figures)
uploads/
# Java build
target/
*.class
*.jar
# Node
node_modules/
dist/
# OS # OS
.DS_Store .DS_Store
Thumbs.db Thumbs.db
+4 -1
View File
@@ -1,8 +1,10 @@
# ai-teacher Development Guidelines # ai-teacher Development Guidelines
Auto-generated from all feature plans. Last updated: 2026-03-31 Auto-generated from all feature plans. Last updated: 2026-04-03
## Active Technologies ## Active Technologies
- Java 25 (backend), TypeScript / Node 20 (frontend) + Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (embeddings + chat), PDFBox (via Spring AI PDF reader dependency) (002-image-aware-embedding)
- PostgreSQL (JPA + Flyway), pgvector (Spring AI `VectorStore`), local file system (extracted images — `/uploads/figures/`) (002-image-aware-embedding)
- Java 21 (backend), TypeScript / Node 20 (frontend) (001-neuro-rag-learning) - Java 21 (backend), TypeScript / Node 20 (frontend) (001-neuro-rag-learning)
@@ -22,6 +24,7 @@ npm test && npm run lint
Java 21 (backend), TypeScript / Node 20 (frontend): Follow standard conventions Java 21 (backend), TypeScript / Node 20 (frontend): Follow standard conventions
## Recent Changes ## Recent Changes
- 002-image-aware-embedding: Added Java 25 (backend), TypeScript / Node 20 (frontend) + Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (embeddings + chat), PDFBox (via Spring AI PDF reader dependency)
- 001-neuro-rag-learning: Added Java 21 (backend), TypeScript / Node 20 (frontend) - 001-neuro-rag-learning: Added Java 21 (backend), TypeScript / Node 20 (frontend)
+37 -4
View File
@@ -11,13 +11,45 @@ graph TD
User["Neurosurgeon (Browser)"] User["Neurosurgeon (Browser)"]
FE["Frontend\nVue.js 3 / Vite\n:5173"] FE["Frontend\nVue.js 3 / Vite\n:5173"]
BE["Backend\nSpring Boot 4 / Spring AI\n:8080"] BE["Backend\nSpring Boot 4 / Spring AI\n:8080"]
DB["PostgreSQL + pgvector\n(provided)"] DB["PostgreSQL + pgvector\n(source of truth)"]
LLM["LLM Provider\n(OpenAI / configurable)"] FS["File Store\nuploads/ (local disk)\nExtracted figure PNGs"]
LLM["LLM Provider\n(OpenAI)\nEmbeddings + Chat + Vision"]
User -->|HTTP| FE User -->|HTTP| FE
FE -->|REST /api/v1/...| BE FE -->|REST /api/v1/...| BE
BE -->|JDBC / pgvector| DB BE -->|"JDBC — books, chapters,\nsections, figures, refs"| DB
BE -->|Embedding + Chat API| LLM BE -->|"pgvector — text chunks\n+ figure caption vectors"| DB
BE -->|"PNG read/write\n(figure extraction)"| FS
FE -->|"GET /api/v1/figures/**\n(static file serving)"| BE
BE -->|"Embedding + Chat\n+ Vision (image description)"| LLM
subgraph "Embedding Pipeline (per PDF upload)"
EP1["Parse pages → SectionEntity"]
EP2["Extract images → FigureEntity"]
EP3["Vision describe → embed caption"]
EP4["Chunk text → embed chunks"]
EP5["Link chunks ↔ figures"]
EP1 --> EP2
EP1 --> EP4
EP2 --> EP3
EP4 --> EP5
EP3 --> EP5
end
subgraph "Retrieval Pipeline (per chat query)"
RP1["Text chunk search (topK=5)"]
RP2["Figure caption search (topK=3)"]
RP3["Expand chunks → full section text"]
RP4["Fetch linked figures (chunk_figure_ref)"]
RP5["Merge + deduplicate figures"]
RP6["Build LLM prompt + call"]
RP1 --> RP3
RP1 --> RP4
RP2 --> RP5
RP4 --> RP5
RP3 --> RP6
RP5 --> RP6
end
``` ```
## Stack ## Stack
@@ -56,3 +88,4 @@ npm run dev
| `DB_URL` | Yes | JDBC URL, e.g. `jdbc:postgresql://localhost:5432/aiteacher` | | `DB_URL` | Yes | JDBC URL, e.g. `jdbc:postgresql://localhost:5432/aiteacher` |
| `DB_USERNAME` | Yes | Database username | | `DB_USERNAME` | Yes | Database username |
| `DB_PASSWORD` | Yes | Database password | | `DB_PASSWORD` | Yes | Database password |
| `FIGURE_STORAGE_PATH` | No | Base path for uploaded PDFs and extracted figures (default: `./uploads`) |
+8 -1
View File
@@ -95,12 +95,19 @@
<artifactId>spring-ai-advisors-vector-store</artifactId> <artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency> </dependency>
<!-- Spring AI — PDF document reader --> <!-- Spring AI — PDF document reader (includes PDFBox transitively) -->
<dependency> <dependency>
<groupId>org.springframework.ai</groupId> <groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId> <artifactId>spring-ai-pdf-document-reader</artifactId>
</dependency> </dependency>
<!-- PDFBox — explicit for image extraction per page -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>3.0.3</version>
</dependency>
<!-- Jackson (JSON) --> <!-- Jackson (JSON) -->
<dependency> <dependency>
<groupId>com.fasterxml.jackson.core</groupId> <groupId>com.fasterxml.jackson.core</groupId>
@@ -1,5 +1,7 @@
package com.aiteacher.book; package com.aiteacher.book;
import com.aiteacher.document.FigureEntity;
import com.aiteacher.document.FigureRepository;
import org.springframework.http.HttpStatus; import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity; import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*; import org.springframework.web.bind.annotation.*;
@@ -15,9 +17,11 @@ import java.util.UUID;
public class BookController { public class BookController {
private final BookService bookService; private final BookService bookService;
private final FigureRepository figureRepository;
public BookController(BookService bookService) { public BookController(BookService bookService, FigureRepository figureRepository) {
this.bookService = bookService; this.bookService = bookService;
this.figureRepository = figureRepository;
} }
@PostMapping(consumes = "multipart/form-data") @PostMapping(consumes = "multipart/form-data")
@@ -46,6 +50,36 @@ public class BookController {
return ResponseEntity.noContent().build(); return ResponseEntity.noContent().build();
} }
@PostMapping("/{id}/reembed")
public ResponseEntity<Map<String, Object>> reembed(@PathVariable UUID id) {
Book book = bookService.reembed(id);
return ResponseEntity.accepted().body(Map.of(
"bookId", book.getId(),
"status", BookStatus.PROCESSING.name()
));
}
@GetMapping("/{id}/figures")
public ResponseEntity<List<FigureResponse>> figures(@PathVariable UUID id) {
bookService.getById(id); // 404 if not found
List<FigureResponse> responses = figureRepository.findAllByBookId(id)
.stream()
.map(f -> toFigureResponse(id, f))
.toList();
return ResponseEntity.ok(responses);
}
private FigureResponse toFigureResponse(UUID bookId, FigureEntity f) {
String filename = f.getImagePath().substring(f.getImagePath().lastIndexOf('/') + 1);
String imageUrl = "/api/v1/figures/" + bookId + "/" + filename;
return new FigureResponse(
f.getId(), f.getLabel(), f.getCaption(),
f.getFigureType().name(), f.getPage(), imageUrl,
f.getSectionId(),
null // section title not eagerly loaded here
);
}
private Map<String, Object> toSummaryResponse(Book book) { private Map<String, Object> toSummaryResponse(Book book) {
return Map.of( return Map.of(
"id", book.getId(), "id", book.getId(),
@@ -1,41 +1,75 @@
package com.aiteacher.book; package com.aiteacher.book;
import com.aiteacher.document.*;
import com.aiteacher.figure.FigureStorageService;
import org.slf4j.Logger; import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import org.slf4j.LoggerFactory;
import org.springframework.ai.document.Document; import org.springframework.ai.document.Document;
import org.springframework.ai.reader.pdf.PagePdfDocumentReader;
import org.springframework.ai.reader.pdf.config.PdfDocumentReaderConfig;
import org.springframework.ai.vectorstore.VectorStore; import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.vectorstore.filter.FilterExpressionBuilder; import org.springframework.ai.vectorstore.filter.FilterExpressionBuilder;
import org.springframework.core.io.FileSystemResource; import org.springframework.beans.factory.annotation.Value;
import org.springframework.scheduling.annotation.Async; import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service; import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import java.nio.file.Path; import java.nio.file.Path;
import java.util.List; import java.time.Instant;
import java.util.UUID; import java.util.*;
import java.util.regex.Pattern;
@Service @Service
public class BookEmbeddingService { public class BookEmbeddingService {
private static final Logger log = LoggerFactory.getLogger(BookEmbeddingService.class); private static final Logger log = LoggerFactory.getLogger(BookEmbeddingService.class);
// Pattern to detect diagram/figure captions
private static final Pattern CAPTION_PATTERN =
Pattern.compile("^(Figure|Fig\\.|Table|Diagram)\\s+[\\d.]+", Pattern.CASE_INSENSITIVE);
private final VectorStore vectorStore; private final VectorStore vectorStore;
private final BookRepository bookRepository; private final BookRepository bookRepository;
public BookEmbeddingService(VectorStore vectorStore, BookRepository bookRepository) { @Value("${app.embedding.batch-size:50}")
private int embeddingBatchSize;
@Value("${app.embedding.batch-delay-ms:1000}")
private long embeddingBatchDelayMs;
private final PdfStructureParser pdfStructureParser;
private final FigureExtractionService figureExtractionService;
private final VisionDescriptionService visionDescriptionService;
private final TextChunkingService textChunkingService;
private final ChunkFigureRefService chunkFigureRefService;
private final SectionRepository sectionRepository;
private final ChapterRepository chapterRepository;
private final FigureRepository figureRepository;
private final ChunkFigureRefRepository chunkFigureRefRepository;
private final FigureStorageService figureStorageService;
public BookEmbeddingService(
VectorStore vectorStore,
BookRepository bookRepository,
PdfStructureParser pdfStructureParser,
FigureExtractionService figureExtractionService,
VisionDescriptionService visionDescriptionService,
TextChunkingService textChunkingService,
ChunkFigureRefService chunkFigureRefService,
SectionRepository sectionRepository,
ChapterRepository chapterRepository,
FigureRepository figureRepository,
ChunkFigureRefRepository chunkFigureRefRepository,
FigureStorageService figureStorageService) {
this.vectorStore = vectorStore; this.vectorStore = vectorStore;
this.bookRepository = bookRepository; this.bookRepository = bookRepository;
this.pdfStructureParser = pdfStructureParser;
this.figureExtractionService = figureExtractionService;
this.visionDescriptionService = visionDescriptionService;
this.textChunkingService = textChunkingService;
this.chunkFigureRefService = chunkFigureRefService;
this.sectionRepository = sectionRepository;
this.chapterRepository = chapterRepository;
this.figureRepository = figureRepository;
this.chunkFigureRefRepository = chunkFigureRefRepository;
this.figureStorageService = figureStorageService;
} }
@Async @Async
public void embedBook(UUID bookId, String bookTitle, Path pdfPath) { public void embedBook(UUID bookId, String bookTitle, Path pdfPath) {
log.info("Starting embedding for book {} ({})", bookId, bookTitle); log.info("Starting image-aware embedding for book {} ({})", bookId, bookTitle);
Book book = bookRepository.findById(bookId).orElse(null); Book book = bookRepository.findById(bookId).orElse(null);
if (book == null) { if (book == null) {
@@ -47,29 +81,68 @@ public class BookEmbeddingService {
book.setStatus(BookStatus.PROCESSING); book.setStatus(BookStatus.PROCESSING);
bookRepository.save(book); bookRepository.save(book);
PagePdfDocumentReader reader = new PagePdfDocumentReader( // Step 1: Parse PDF into page-level sections persisted in Postgres
new FileSystemResource(pdfPath.toFile()), List<SectionEntity> sections = pdfStructureParser.parse(bookId, bookTitle, pdfPath);
PdfDocumentReaderConfig.builder() String chapterId = bookId + "-ch1";
.withPagesPerDocument(1)
.build()
);
List<Document> pages = reader.get(); // Step 2: Build and embed text chunks for all sections in batches
int pageCount = pages.size(); List<Document> allChunks = new ArrayList<>();
for (SectionEntity section : sections) {
List<Document> chunks = textChunkingService.chunk(section, bookTitle);
allChunks.addAll(chunks);
}
embedInBatches(allChunks, bookId);
log.info("Embedded {} text chunks for book {}", allChunks.size(), bookId);
// Enrich metadata and tag diagram captions // Step 3: Extract images from the PDF, save to file store, persist FigureEntity
List<Document> enriched = pages.stream() List<FigureEntity> figures = figureExtractionService.extract(
.map(doc -> enrichDocument(doc, bookId.toString(), bookTitle)) bookId, chapterId, sections, pdfPath);
// Step 4: For each figure, generate vision description and embed caption
for (FigureEntity figure : figures) {
Path imagePath = figureStorageService.resolve(figure.getImagePath());
String description = visionDescriptionService.describe(
imagePath, figure.getCaption());
// Use description as caption fallback if no caption was detected
if (figure.getCaption() == null || figure.getCaption().isBlank()) {
figure.setCaption(description);
figureRepository.save(figure);
}
// Content for embedding = vision description + caption for maximum signal
String embeddingContent = description
+ (figure.getCaption() != null ? "\n" + figure.getCaption() : "");
String embeddingId = UUID.randomUUID().toString();
Map<String, Object> metadata = buildFigureMetadata(figure, bookTitle, embeddingId);
Document figureDoc = new Document(embeddingId, embeddingContent, metadata);
vectorStore.add(List.of(figureDoc));
figure.setCaptionEmbeddingId(UUID.fromString(embeddingId));
figureRepository.save(figure);
}
log.info("Embedded {} figure captions for book {}", figures.size(), bookId);
// Step 5: Link text chunks to figures via text references
for (SectionEntity section : sections) {
List<Document> sectionChunks = allChunks.stream()
.filter(d -> section.getId().equals(d.getMetadata().get("section_id")))
.toList(); .toList();
List<FigureEntity> sectionFigures = figures.stream()
vectorStore.add(enriched); .filter(f -> section.getId().equals(f.getSectionId()))
.toList();
chunkFigureRefService.linkChunksToFigures(
sectionChunks, sectionFigures, section.getPageStart());
}
book.setStatus(BookStatus.READY); book.setStatus(BookStatus.READY);
book.setPageCount(pageCount); book.setPageCount(sections.size());
book.setProcessedAt(java.time.Instant.now()); book.setProcessedAt(Instant.now());
bookRepository.save(book); bookRepository.save(book);
log.info("Finished embedding book {} — {} pages", bookId, pageCount); log.info("Finished embedding book {} — {} pages, {} figures",
bookId, sections.size(), figures.size());
} catch (Exception ex) { } catch (Exception ex) {
log.error("Failed to embed book {}", bookId, ex); log.error("Failed to embed book {}", bookId, ex);
@@ -79,40 +152,74 @@ public class BookEmbeddingService {
} }
} }
private Document enrichDocument(Document doc, String bookId, String bookTitle) { @Transactional
String content = doc.getText();
String chunkType = detectChunkType(content);
doc.getMetadata().put("book_id", bookId);
doc.getMetadata().put("book_title", bookTitle);
doc.getMetadata().put("chunk_type", chunkType);
return doc;
}
private String detectChunkType(String content) {
if (content != null) {
for (String line : content.split("\\r?\\n")) {
if (CAPTION_PATTERN.matcher(line.trim()).find()) {
return "diagram";
}
}
}
return "text";
}
public void deleteBookChunks(UUID bookId) { public void deleteBookChunks(UUID bookId) {
log.info("Deleting vector chunks for book {}", bookId); log.info("Deleting all data for book {}", bookId);
try { try {
// Delete chunk-figure refs (by figureId for this book)
List<String> figureIds = figureRepository.findAllByBookId(bookId)
.stream().map(FigureEntity::getId).toList();
if (!figureIds.isEmpty()) {
chunkFigureRefRepository.deleteByFigureIdIn(figureIds);
}
// Delete figures from Postgres
figureRepository.deleteAllByBookId(bookId);
// Delete figure files from disk
figureStorageService.deleteAll(bookId);
// Delete sections and chapters from Postgres
sectionRepository.deleteAllByBookId(bookId);
chapterRepository.deleteAllByBookId(bookId);
// Delete vector store entries (text chunks + figure embeddings)
FilterExpressionBuilder b = new FilterExpressionBuilder(); FilterExpressionBuilder b = new FilterExpressionBuilder();
vectorStore.delete(b.eq("book_id", bookId.toString()).build()); vectorStore.delete(b.eq("book_id", bookId.toString()).build());
} catch (Exception ex) { } catch (Exception ex) {
log.warn("Could not delete vector chunks for book {}: {}", bookId, ex.getMessage()); log.warn("Error during cleanup for book {}: {}", bookId, ex.getMessage());
} }
} }
private String truncate(String message, int maxLength) { private void embedInBatches(List<Document> docs, UUID bookId) {
if (message == null) return null; int total = docs.size();
return message.length() <= maxLength ? message : message.substring(0, maxLength); for (int i = 0; i < total; i += embeddingBatchSize) {
List<Document> batch = docs.subList(i, Math.min(i + embeddingBatchSize, total));
vectorStore.add(batch);
int batchNum = i / embeddingBatchSize + 1;
int totalBatches = (total - 1) / embeddingBatchSize + 1;
log.debug("Embedded batch {}/{} for book {}", batchNum, totalBatches, bookId);
if (i + embeddingBatchSize < total) {
try {
Thread.sleep(embeddingBatchDelayMs);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
log.warn("Embedding batch sleep interrupted for book {}", bookId);
}
}
}
}
private Map<String, Object> buildFigureMetadata(FigureEntity figure, String bookTitle,
String embeddingId) {
Map<String, Object> m = new HashMap<>();
m.put("type", "FIGURE");
m.put("book_id", figure.getBookId().toString());
m.put("book_title", bookTitle);
m.put("chapter_id", figure.getChapterId() != null ? figure.getChapterId() : "");
m.put("section_id", figure.getSectionId() != null ? figure.getSectionId() : "");
m.put("figure_id", figure.getId());
m.put("figure_type", figure.getFigureType().name());
m.put("image_path", figure.getImagePath());
m.put("label", figure.getLabel() != null ? figure.getLabel() : "");
m.put("page", figure.getPage());
m.put("embedding_id", embeddingId);
return m;
}
private String truncate(String msg, int max) {
if (msg == null) return null;
return msg.length() <= max ? msg : msg.substring(0, max);
} }
} }
@@ -1,11 +1,13 @@
package com.aiteacher.book; package com.aiteacher.book;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service; import org.springframework.stereotype.Service;
import org.springframework.web.multipart.MultipartFile; import org.springframework.web.multipart.MultipartFile;
import java.io.IOException; import java.io.IOException;
import java.nio.file.Files; import java.nio.file.Files;
import java.nio.file.Path; import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List; import java.util.List;
import java.util.NoSuchElementException; import java.util.NoSuchElementException;
import java.util.UUID; import java.util.UUID;
@@ -15,10 +17,15 @@ public class BookService {
private final BookRepository bookRepository; private final BookRepository bookRepository;
private final BookEmbeddingService bookEmbeddingService; private final BookEmbeddingService bookEmbeddingService;
private final Path bookStoragePath;
public BookService(BookRepository bookRepository, BookEmbeddingService bookEmbeddingService) { public BookService(
BookRepository bookRepository,
BookEmbeddingService bookEmbeddingService,
@Value("${app.figure-storage.base-path:./uploads}") String basePath) {
this.bookRepository = bookRepository; this.bookRepository = bookRepository;
this.bookEmbeddingService = bookEmbeddingService; this.bookEmbeddingService = bookEmbeddingService;
this.bookStoragePath = Paths.get(basePath).toAbsolutePath().normalize().resolve("books");
} }
public Book upload(MultipartFile file) throws IOException { public Book upload(MultipartFile file) throws IOException {
@@ -28,20 +35,35 @@ public class BookService {
} }
String title = deriveTitle(originalFilename); String title = deriveTitle(originalFilename);
Book book = new Book(title, originalFilename, file.getSize()); Book book = new Book(title, originalFilename, file.getSize());
book = bookRepository.save(book); book = bookRepository.save(book);
// Write to a temp file so the async task can read it // Persist PDF in a stable location for potential re-embedding
Path tempFile = Files.createTempFile("aiteacher-", "-" + book.getId() + ".pdf"); Files.createDirectories(bookStoragePath);
file.transferTo(tempFile.toFile()); Path pdfPath = bookStoragePath.resolve(book.getId() + ".pdf");
file.transferTo(pdfPath.toFile());
UUID bookId = book.getId(); UUID bookId = book.getId();
Path pdfPath = tempFile; bookEmbeddingService.embedBook(bookId, title, pdfPath);
String bookTitle = title; return book;
}
bookEmbeddingService.embedBook(bookId, bookTitle, pdfPath); public Book reembed(UUID id) {
Book book = bookRepository.findById(id)
.orElseThrow(() -> new NoSuchElementException("Book not found."));
if (book.getStatus() == BookStatus.PROCESSING) {
throw new IllegalStateException("Book is already being processed.");
}
Path pdfPath = bookStoragePath.resolve(id + ".pdf");
if (!Files.exists(pdfPath)) {
throw new IllegalStateException(
"Original PDF not found. Please re-upload the book before re-embedding.");
}
bookEmbeddingService.deleteBookChunks(id);
bookEmbeddingService.embedBook(id, book.getTitle(), pdfPath);
return book; return book;
} }
@@ -63,14 +85,21 @@ public class BookService {
} }
bookEmbeddingService.deleteBookChunks(id); bookEmbeddingService.deleteBookChunks(id);
// Delete the stored PDF
Path pdfPath = bookStoragePath.resolve(id + ".pdf");
try {
Files.deleteIfExists(pdfPath);
} catch (IOException ex) {
// Non-fatal — log only
}
bookRepository.deleteById(id); bookRepository.deleteById(id);
} }
private String deriveTitle(String filename) { private String deriveTitle(String filename) {
// Strip .pdf extension and replace separators with spaces
String name = filename.replaceAll("(?i)\\.pdf$", ""); String name = filename.replaceAll("(?i)\\.pdf$", "");
name = name.replaceAll("[-_]", " "); name = name.replaceAll("[-_]", " ");
// Capitalise first letter
if (!name.isEmpty()) { if (!name.isEmpty()) {
name = Character.toUpperCase(name.charAt(0)) + name.substring(1); name = Character.toUpperCase(name.charAt(0)) + name.substring(1);
} }
@@ -0,0 +1,12 @@
package com.aiteacher.book;
public record FigureResponse(
String figureId,
String label,
String caption,
String figureType,
int page,
String imageUrl,
String sectionId,
String sectionTitle
) {}
@@ -3,22 +3,16 @@ package com.aiteacher.chat;
import com.aiteacher.book.BookRepository; import com.aiteacher.book.BookRepository;
import com.aiteacher.book.BookStatus; import com.aiteacher.book.BookStatus;
import com.aiteacher.book.NoKnowledgeSourceException; import com.aiteacher.book.NoKnowledgeSourceException;
import com.aiteacher.document.FigureEntity;
import com.aiteacher.document.SectionEntity;
import com.aiteacher.retrieval.NeurosurgeryRetriever;
import com.aiteacher.retrieval.RetrievalResult;
import org.slf4j.Logger; import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import org.slf4j.LoggerFactory;
import org.springframework.ai.chat.client.ChatClient; import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service; import org.springframework.stereotype.Service;
import java.util.ArrayList; import java.util.*;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.NoSuchElementException;
import java.util.UUID;
@Service @Service
public class ChatService { public class ChatService {
@@ -35,26 +29,28 @@ public class ChatService {
- Build answers from what is present: procedures, conditions, techniques, and descriptions all contribute; combine them into a rich, structured response - Build answers from what is present: procedures, conditions, techniques, and descriptions all contribute; combine them into a rich, structured response
- Use clear structure: headings, bullet points, or numbered steps where appropriate to maximize clarity - Use clear structure: headings, bullet points, or numbered steps where appropriate to maximize clarity
- Only say you cannot answer if the context is entirely unrelated to the question - Only say you cannot answer if the context is entirely unrelated to the question
- Cite sources for each major point (book title and page number from the context metadata) - Cite sources for each major point (book title and page number from the context)
- When referencing diagrams or figures, cite them as [Fig. X, p.N]
- Maintain continuity with the conversation history - Maintain continuity with the conversation history
- Never fabricate clinical information not present in the context - Never fabricate clinical information not present in the context
"""; """;
private final ChatClient chatClient; private final ChatClient chatClient;
private final VectorStore vectorStore;
private final BookRepository bookRepository; private final BookRepository bookRepository;
private final ChatSessionRepository sessionRepository; private final ChatSessionRepository sessionRepository;
private final MessageRepository messageRepository; private final MessageRepository messageRepository;
private final NeurosurgeryRetriever retriever;
public ChatService(ChatClient chatClient, VectorStore vectorStore, public ChatService(ChatClient chatClient,
BookRepository bookRepository, BookRepository bookRepository,
ChatSessionRepository sessionRepository, ChatSessionRepository sessionRepository,
MessageRepository messageRepository) { MessageRepository messageRepository,
NeurosurgeryRetriever retriever) {
this.chatClient = chatClient; this.chatClient = chatClient;
this.vectorStore = vectorStore;
this.bookRepository = bookRepository; this.bookRepository = bookRepository;
this.sessionRepository = sessionRepository; this.sessionRepository = sessionRepository;
this.messageRepository = messageRepository; this.messageRepository = messageRepository;
this.retriever = retriever;
} }
public ChatSession createSession(String topicId) { public ChatSession createSession(String topicId) {
@@ -73,7 +69,11 @@ public class ChatService {
ChatSession session = sessionRepository.findById(sessionId) ChatSession session = sessionRepository.findById(sessionId)
.orElseThrow(() -> new NoSuchElementException("Session not found.")); .orElseThrow(() -> new NoSuchElementException("Session not found."));
if (!bookRepository.existsByStatus(BookStatus.READY)) { List<com.aiteacher.book.Book> readyBooks = bookRepository.findAll().stream()
.filter(b -> b.getStatus() == BookStatus.READY)
.toList();
if (readyBooks.isEmpty()) {
throw new NoKnowledgeSourceException("No books are available as knowledge sources."); throw new NoKnowledgeSourceException("No books are available as knowledge sources.");
} }
@@ -81,27 +81,31 @@ public class ChatService {
Message userMessage = new Message(sessionId, MessageRole.USER, userContent); Message userMessage = new Message(sessionId, MessageRole.USER, userContent);
messageRepository.save(userMessage); messageRepository.save(userMessage);
// Build conversation history for context // Build full question with conversation history
List<Message> history = messageRepository.findBySessionIdOrderByCreatedAtAsc(sessionId); List<Message> history = messageRepository.findBySessionIdOrderByCreatedAtAsc(sessionId);
// Build the prompt with full conversation history as context
String fullQuestion = buildQuestionWithHistory(history, userContent, session.getTopicId()); String fullQuestion = buildQuestionWithHistory(history, userContent, session.getTopicId());
var qaAdvisor = QuestionAnswerAdvisor.builder(vectorStore) // Retrieve context from all ready books (aggregate across books)
.searchRequest(SearchRequest.builder().similarityThreshold(0.5d).topK(6).build()) List<SectionEntity> allSections = new ArrayList<>();
.build(); List<FigureEntity> allFigures = new ArrayList<>();
for (com.aiteacher.book.Book book : readyBooks) {
RetrievalResult result = retriever.retrieve(fullQuestion, book.getId());
allSections.addAll(result.parentSections());
allFigures.addAll(result.figures());
}
ChatResponse response = chatClient.prompt() // Build LLM prompt with section full texts and figure references
.advisors(qaAdvisor) String contextPrompt = buildContextPrompt(fullQuestion, allSections, allFigures);
String assistantContent = chatClient.prompt()
.system(SYSTEM_PROMPT) .system(SYSTEM_PROMPT)
.user(fullQuestion) .user(contextPrompt)
.call() .call()
.chatResponse(); .content();
String assistantContent = response.getResult().getOutput().getText(); // Build sources list with TEXT and FIGURE entries
List<Map<String, Object>> sources = extractSources(response); List<Map<String, Object>> sources = buildSources(allSections, allFigures);
// Persist assistant message
Message assistantMessage = new Message(sessionId, MessageRole.ASSISTANT, assistantContent); Message assistantMessage = new Message(sessionId, MessageRole.ASSISTANT, assistantContent);
assistantMessage.setSources(sources); assistantMessage.setSources(sources);
return messageRepository.save(assistantMessage); return messageRepository.save(assistantMessage);
@@ -118,24 +122,95 @@ public class ChatService {
sessionRepository.deleteById(sessionId); sessionRepository.deleteById(sessionId);
} }
// -------------------------------------------------------------------------
// Private helpers
// -------------------------------------------------------------------------
private String buildContextPrompt(String question,
List<SectionEntity> sections,
List<FigureEntity> figures) {
StringBuilder sb = new StringBuilder();
if (!sections.isEmpty()) {
sb.append("CONTEXT:\n\n");
for (SectionEntity section : sections) {
sb.append("[").append(section.getTitle())
.append(", p.").append(section.getPageStart()).append("]\n");
sb.append(section.getFullText()).append("\n\n");
}
}
if (!figures.isEmpty()) {
sb.append("AVAILABLE FIGURES:\n");
for (FigureEntity figure : figures) {
sb.append("- ").append(figure.getLabel() != null ? figure.getLabel() : "Figure")
.append(" (p.").append(figure.getPage()).append("): ")
.append(figure.getCaption() != null ? figure.getCaption() : "")
.append("\n");
}
sb.append("\nWhen referencing diagrams, cite them as [Fig. X, p.N].\n\n");
}
sb.append("QUESTION:\n").append(question);
return sb.toString();
}
private List<Map<String, Object>> buildSources(List<SectionEntity> sections,
List<FigureEntity> figures) {
List<Map<String, Object>> sources = new ArrayList<>();
for (SectionEntity section : sections) {
Map<String, Object> source = new LinkedHashMap<>();
source.put("type", "TEXT");
source.put("bookTitle", deriveTitleFromSection(section));
source.put("page", section.getPageStart());
source.put("chunkText", truncate(section.getFullText(), 500));
sources.add(source);
}
for (FigureEntity figure : figures) {
Map<String, Object> source = new LinkedHashMap<>();
source.put("type", "FIGURE");
source.put("bookTitle", bookRepository.findById(figure.getBookId())
.map(com.aiteacher.book.Book::getTitle).orElse("Book"));
source.put("page", figure.getPage());
source.put("figureId", figure.getId());
source.put("label", figure.getLabel() != null ? figure.getLabel() : "");
source.put("caption", figure.getCaption() != null ? figure.getCaption() : "");
source.put("figureType", figure.getFigureType().name());
// imageUrl assembled from relative path: figures/{bookId}/{filename}
String filename = figure.getImagePath().substring(
figure.getImagePath().lastIndexOf('/') + 1);
source.put("imageUrl", "/api/v1/figures/" + figure.getBookId() + "/" + filename);
sources.add(source);
}
return sources;
}
private String deriveTitleFromSection(SectionEntity section) {
if (section == null) return "Book";
return bookRepository.findById(section.getBookId())
.map(com.aiteacher.book.Book::getTitle)
.orElse("Book");
}
private String buildQuestionWithHistory(List<Message> history, String currentQuestion, private String buildQuestionWithHistory(List<Message> history, String currentQuestion,
String topicId) { String topicId) {
boolean hasTopic = topicId != null && !topicId.equals("free-form"); boolean hasTopic = topicId != null && !topicId.equals("free-form");
if (history.size() <= 1) { if (history.size() <= 1) {
return hasTopic return hasTopic
? String.format("[Context: This is a question about the neurosurgery topic '%s']\n%s", ? String.format("[Context: question about neurosurgery topic '%s']\n%s",
topicId, currentQuestion) topicId, currentQuestion)
: currentQuestion; : currentQuestion;
} }
StringBuilder sb = new StringBuilder(); StringBuilder sb = new StringBuilder();
if (hasTopic) { if (hasTopic) {
sb.append(String.format("[Context: This conversation is about the neurosurgery topic '%s']\n\n", sb.append(String.format("[Context: conversation about '%s']\n\n", topicId));
topicId));
} }
sb.append("Previous conversation:\n"); sb.append("Previous conversation:\n");
// Include all messages except the last (which is the current user message just saved)
for (int i = 0; i < history.size() - 1; i++) { for (int i = 0; i < history.size() - 1; i++) {
Message msg = history.get(i); Message msg = history.get(i);
sb.append(msg.getRole().name()).append(": ").append(msg.getContent()).append("\n"); sb.append(msg.getRole().name()).append(": ").append(msg.getContent()).append("\n");
@@ -144,30 +219,8 @@ public class ChatService {
return sb.toString(); return sb.toString();
} }
private List<Map<String, Object>> extractSources(ChatResponse response) { private String truncate(String text, int maxChars) {
List<Map<String, Object>> sources = new ArrayList<>(); if (text == null) return "";
return text.length() <= maxChars ? text : text.substring(0, maxChars) + "";
if (response.getMetadata() != null) {
Object retrieved = response.getMetadata().get(QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS);
if (retrieved instanceof List<?> docs) {
for (Object docObj : docs) {
if (docObj instanceof Document doc) {
Map<String, Object> metadata = doc.getMetadata();
String bookTitle = (String) metadata.get("book_title");
Object pageObj = metadata.get("page_number");
Integer page = pageObj instanceof Number n ? n.intValue() : null;
if (bookTitle != null) {
Map<String, Object> source = new HashMap<>();
source.put("bookTitle", bookTitle);
source.put("page", page);
source.put("chunkText", doc.getText());
sources.add(source);
}
}
}
}
}
return sources;
} }
} }
@@ -0,0 +1,25 @@
package com.aiteacher.config;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.servlet.config.annotation.ResourceHandlerRegistry;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
import java.nio.file.Paths;
@Configuration
public class FigureStorageConfig implements WebMvcConfigurer {
private final String basePath;
public FigureStorageConfig(@Value("${app.figure-storage.base-path:./uploads}") String basePath) {
this.basePath = Paths.get(basePath).toAbsolutePath().normalize().toString();
}
@Override
public void addResourceHandlers(ResourceHandlerRegistry registry) {
// Serve GET /api/v1/figures/** from the local file store
registry.addResourceHandler("/api/v1/figures/**")
.addResourceLocations("file:" + basePath + "/figures/");
}
}
@@ -0,0 +1,47 @@
package com.aiteacher.document;
import jakarta.persistence.*;
import java.time.Instant;
import java.util.UUID;
@Entity
@Table(name = "chapter")
public class ChapterEntity {
@Id
@Column(name = "id", length = 200)
private String id;
@Column(name = "book_id", nullable = false)
private UUID bookId;
@Column(name = "number", nullable = false)
private int number;
@Column(name = "title", length = 500)
private String title;
@Column(name = "page_start")
private Integer pageStart;
@Column(name = "created_at", nullable = false)
private Instant createdAt;
public ChapterEntity() {}
public ChapterEntity(String id, UUID bookId, int number, String title, Integer pageStart) {
this.id = id;
this.bookId = bookId;
this.number = number;
this.title = title;
this.pageStart = pageStart;
this.createdAt = Instant.now();
}
public String getId() { return id; }
public UUID getBookId() { return bookId; }
public int getNumber() { return number; }
public String getTitle() { return title; }
public Integer getPageStart() { return pageStart; }
public Instant getCreatedAt() { return createdAt; }
}
@@ -0,0 +1,9 @@
package com.aiteacher.document;
import org.springframework.data.jpa.repository.JpaRepository;
import java.util.UUID;
public interface ChapterRepository extends JpaRepository<ChapterEntity, String> {
void deleteAllByBookId(UUID bookId);
}
@@ -0,0 +1,58 @@
package com.aiteacher.document;
import jakarta.persistence.*;
import java.io.Serializable;
import java.util.Objects;
import java.util.UUID;
@Entity
@Table(name = "chunk_figure_ref")
@IdClass(ChunkFigureRefEntity.PK.class)
public class ChunkFigureRefEntity {
@Id
@Column(name = "chunk_id", nullable = false)
private UUID chunkId;
@Id
@Column(name = "figure_id", nullable = false, length = 200)
private String figureId;
@Column(name = "mention_page")
private Integer mentionPage;
public ChunkFigureRefEntity() {}
public ChunkFigureRefEntity(UUID chunkId, String figureId, Integer mentionPage) {
this.chunkId = chunkId;
this.figureId = figureId;
this.mentionPage = mentionPage;
}
public UUID getChunkId() { return chunkId; }
public String getFigureId() { return figureId; }
public Integer getMentionPage() { return mentionPage; }
public static class PK implements Serializable {
private UUID chunkId;
private String figureId;
public PK() {}
public PK(UUID chunkId, String figureId) {
this.chunkId = chunkId;
this.figureId = figureId;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof PK pk)) return false;
return Objects.equals(chunkId, pk.chunkId) && Objects.equals(figureId, pk.figureId);
}
@Override
public int hashCode() {
return Objects.hash(chunkId, figureId);
}
}
}
@@ -0,0 +1,18 @@
package com.aiteacher.document;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.query.Param;
import java.util.List;
import java.util.UUID;
public interface ChunkFigureRefRepository extends JpaRepository<ChunkFigureRefEntity, ChunkFigureRefEntity.PK> {
@Query("SELECT r FROM ChunkFigureRefEntity r WHERE r.chunkId IN :chunkIds")
List<ChunkFigureRefEntity> findByChunkIdIn(@Param("chunkIds") List<UUID> chunkIds);
@Query("DELETE FROM ChunkFigureRefEntity r WHERE r.figureId IN :figureIds")
@org.springframework.data.jpa.repository.Modifying
void deleteByFigureIdIn(@Param("figureIds") List<String> figureIds);
}
@@ -0,0 +1,62 @@
package com.aiteacher.document;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.document.Document;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.UUID;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Scans chunk text for "Fig. X" and "Figure X" references and persists
* ChunkFigureRefEntity rows linking that chunk to its referenced figures.
*/
@Service
public class ChunkFigureRefService {
private static final Logger log = LoggerFactory.getLogger(ChunkFigureRefService.class);
// Matches: "Fig. 12-4", "Fig. 12.4", "Fig 12", "Figure 12-4", etc.
private static final Pattern REF_PATTERN =
Pattern.compile("(?i)\\b(Fig\\.?|Figure)\\s+(\\d+[\\-.\\d]*)");
private final ChunkFigureRefRepository refRepository;
public ChunkFigureRefService(ChunkFigureRefRepository refRepository) {
this.refRepository = refRepository;
}
/**
* For each text chunk, finds figure references and persists ChunkFigureRefEntity rows.
*/
public void linkChunksToFigures(List<Document> chunks, List<FigureEntity> bookFigures,
int pageNum) {
if (bookFigures.isEmpty()) return;
for (Document chunk : chunks) {
String chunkIdStr = chunk.getId();
UUID chunkId;
try {
chunkId = UUID.fromString(chunkIdStr);
} catch (IllegalArgumentException ex) {
log.warn("Chunk has non-UUID id: {}", chunkIdStr);
continue;
}
Matcher m = REF_PATTERN.matcher(chunk.getText());
while (m.find()) {
String refNum = m.group(2).trim();
// Find matching figure by label suffix
for (FigureEntity figure : bookFigures) {
if (figure.getLabel() != null && figure.getLabel().endsWith(refNum)) {
refRepository.save(new ChunkFigureRefEntity(chunkId, figure.getId(), pageNum));
break;
}
}
}
}
}
}
@@ -0,0 +1,82 @@
package com.aiteacher.document;
import jakarta.persistence.*;
import java.time.Instant;
import java.util.UUID;
@Entity
@Table(name = "figure")
public class FigureEntity {
@Id
@Column(name = "id", length = 200)
private String id;
@Column(name = "book_id", nullable = false)
private UUID bookId;
@Column(name = "section_id", length = 200)
private String sectionId;
@Column(name = "chapter_id", length = 200)
private String chapterId;
@Column(name = "label", length = 100)
private String label;
@Column(name = "caption", columnDefinition = "TEXT")
private String caption;
@Enumerated(EnumType.STRING)
@Column(name = "figure_type", nullable = false, length = 50)
private FigureType figureType;
@Column(name = "page", nullable = false)
private int page;
@Column(name = "image_path", nullable = false, length = 1000)
private String imagePath;
@Column(name = "caption_embedding_id")
private UUID captionEmbeddingId;
@Column(name = "created_at", nullable = false)
private Instant createdAt;
public FigureEntity() {}
public FigureEntity(String id, UUID bookId, String sectionId, String chapterId,
String label, String caption, FigureType figureType,
int page, String imagePath) {
this.id = id;
this.bookId = bookId;
this.sectionId = sectionId;
this.chapterId = chapterId;
this.label = label;
this.caption = caption;
this.figureType = figureType;
this.page = page;
this.imagePath = imagePath;
this.createdAt = Instant.now();
}
public String getId() { return id; }
public UUID getBookId() { return bookId; }
public String getSectionId() { return sectionId; }
public String getChapterId() { return chapterId; }
public String getLabel() { return label; }
public String getCaption() { return caption; }
public FigureType getFigureType() { return figureType; }
public int getPage() { return page; }
public String getImagePath() { return imagePath; }
public UUID getCaptionEmbeddingId() { return captionEmbeddingId; }
public Instant getCreatedAt() { return createdAt; }
public void setCaptionEmbeddingId(UUID captionEmbeddingId) {
this.captionEmbeddingId = captionEmbeddingId;
}
public void setCaption(String caption) {
this.caption = caption;
}
}
@@ -0,0 +1,135 @@
package com.aiteacher.document;
import com.aiteacher.figure.FigureStorageService;
import org.apache.pdfbox.Loader;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import java.awt.image.BufferedImage;
import java.io.IOException;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Extracts images from each PDF page using PDFBox.
* Images below the configured minimum size are skipped.
* Caption is detected by the "Fig." pattern in page text.
*/
@Service
public class FigureExtractionService {
private static final Logger log = LoggerFactory.getLogger(FigureExtractionService.class);
// Caption: line starting with "Fig." or "Figure" followed by a number
private static final Pattern CAPTION_PATTERN =
Pattern.compile("(?m)^(Fig\\.?\\s*\\d+[\\-.]?\\d*[^\\n]*)", Pattern.CASE_INSENSITIVE);
// Figure label: "Fig. 12-4" or "Fig. 12.4"
private static final Pattern LABEL_PATTERN =
Pattern.compile("(?i)Fig\\.?\\s*(\\d+[\\-.\\d]*)");
private final FigureStorageService storageService;
private final FigureRepository figureRepository;
private final int minImageSizePx;
public FigureExtractionService(
FigureStorageService storageService,
FigureRepository figureRepository,
@Value("${app.figure-storage.min-image-size-px:100}") int minImageSizePx) {
this.storageService = storageService;
this.figureRepository = figureRepository;
this.minImageSizePx = minImageSizePx;
}
/**
* Extracts all qualifying images from the PDF for the given book.
* Returns persisted FigureEntity list (without vision descriptions set later).
*/
public List<FigureEntity> extract(UUID bookId, String chapterId,
List<SectionEntity> sections, Path pdfPath) {
List<FigureEntity> figures = new ArrayList<>();
int figureCounter = 0;
try (PDDocument doc = Loader.loadPDF(pdfPath.toFile())) {
for (SectionEntity section : sections) {
int pageIndex = section.getPageStart() - 1; // 0-based
if (pageIndex < 0 || pageIndex >= doc.getNumberOfPages()) continue;
PDPage page = doc.getPage(pageIndex);
String pageText = section.getFullText();
try {
for (COSName name : page.getResources().getXObjectNames()) {
PDXObject xObject = page.getResources().getXObject(name);
if (!(xObject instanceof PDImageXObject image)) continue;
BufferedImage bufferedImage = image.getImage();
if (bufferedImage.getWidth() < minImageSizePx
|| bufferedImage.getHeight() < minImageSizePx) {
continue; // skip decorative images
}
figureCounter++;
String figureId = bookId + "-fig-" + pageIndex + "-" + figureCounter;
String caption = detectCaption(pageText);
String label = detectLabel(caption, figureCounter);
FigureType type = classifyType(caption, pageText);
String imagePath = storageService.save(bookId, figureId, bufferedImage);
FigureEntity figure = new FigureEntity(
figureId, bookId, section.getId(), chapterId,
label, caption, type, section.getPageStart(), imagePath
);
figures.add(figureRepository.save(figure));
}
} catch (IOException ex) {
log.warn("Failed to extract images from page {} of book {}: {}",
section.getPageStart(), bookId, ex.getMessage());
}
}
} catch (IOException ex) {
log.error("Could not open PDF for image extraction, book {}", bookId, ex);
}
log.info("Extracted {} figures for book {}", figures.size(), bookId);
return figures;
}
private String detectCaption(String pageText) {
if (pageText == null) return null;
Matcher m = CAPTION_PATTERN.matcher(pageText);
return m.find() ? m.group(1).trim() : null;
}
private String detectLabel(String caption, int counter) {
if (caption != null) {
Matcher m = LABEL_PATTERN.matcher(caption);
if (m.find()) return "Fig. " + m.group(1).trim();
}
return "Fig. " + counter;
}
private FigureType classifyType(String caption, String pageText) {
String combined = ((caption != null ? caption : "") + " " + (pageText != null ? pageText : "")).toLowerCase();
if (combined.contains("mri") || combined.contains("ct ") || combined.contains("magnetic")
|| combined.contains("tomography")) return FigureType.MRI_CT_SCAN;
if (combined.contains("intraoperative") || combined.contains("intra-op")) return FigureType.INTRAOPERATIVE_IMAGE;
if (caption != null && caption.toLowerCase().startsWith("table")) return FigureType.TABLE;
if (combined.contains("chart") || combined.contains("histogram") || combined.contains("graph"))
return FigureType.CHART;
if (combined.contains("photograph") || combined.contains("photo")) return FigureType.SURGICAL_PHOTOGRAPH;
return FigureType.ANATOMICAL_DIAGRAM;
}
}
@@ -0,0 +1,11 @@
package com.aiteacher.document;
import org.springframework.data.jpa.repository.JpaRepository;
import java.util.List;
import java.util.UUID;
public interface FigureRepository extends JpaRepository<FigureEntity, String> {
List<FigureEntity> findAllByBookId(UUID bookId);
void deleteAllByBookId(UUID bookId);
}
@@ -0,0 +1,10 @@
package com.aiteacher.document;
public enum FigureType {
ANATOMICAL_DIAGRAM,
SURGICAL_PHOTOGRAPH,
MRI_CT_SCAN,
TABLE,
CHART,
INTRAOPERATIVE_IMAGE
}
@@ -0,0 +1,71 @@
package com.aiteacher.document;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.reader.pdf.PagePdfDocumentReader;
import org.springframework.ai.reader.pdf.config.PdfDocumentReaderConfig;
import org.springframework.core.io.FileSystemResource;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
/**
* Parses a PDF into page-level SectionEntity records stored in Postgres.
* Each page becomes one section, grouped under a single chapter per book.
*/
@Service
public class PdfStructureParser {
private static final Logger log = LoggerFactory.getLogger(PdfStructureParser.class);
private final ChapterRepository chapterRepository;
private final SectionRepository sectionRepository;
public PdfStructureParser(ChapterRepository chapterRepository,
SectionRepository sectionRepository) {
this.chapterRepository = chapterRepository;
this.sectionRepository = sectionRepository;
}
@Transactional
public List<SectionEntity> parse(UUID bookId, String bookTitle, Path pdfPath) {
log.info("Parsing PDF structure for book {}", bookId);
// One chapter per book
String chapterId = bookId + "-ch1";
ChapterEntity chapter = new ChapterEntity(chapterId, bookId, 1, bookTitle, 1);
chapterRepository.save(chapter);
// One section per page
PagePdfDocumentReader reader = new PagePdfDocumentReader(
new FileSystemResource(pdfPath.toFile()),
PdfDocumentReaderConfig.builder().withPagesPerDocument(1).build()
);
List<org.springframework.ai.document.Document> pages = reader.get();
List<SectionEntity> sections = new ArrayList<>();
for (int i = 0; i < pages.size(); i++) {
int pageNum = i + 1;
String text = pages.get(i).getText();
if (text == null || text.isBlank()) continue;
String sectionId = bookId + "-p" + pageNum;
SectionEntity section = new SectionEntity(
sectionId, chapterId, bookId,
String.valueOf(pageNum),
"Page " + pageNum,
pageNum, pageNum,
text
);
sections.add(sectionRepository.save(section));
}
log.info("Parsed {} sections for book {}", sections.size(), bookId);
return sections;
}
}
@@ -0,0 +1,63 @@
package com.aiteacher.document;
import jakarta.persistence.*;
import java.time.Instant;
import java.util.UUID;
@Entity
@Table(name = "section")
public class SectionEntity {
@Id
@Column(name = "id", length = 200)
private String id;
@Column(name = "chapter_id", nullable = false, length = 200)
private String chapterId;
@Column(name = "book_id", nullable = false)
private UUID bookId;
@Column(name = "number", length = 50)
private String number;
@Column(name = "title", length = 500)
private String title;
@Column(name = "page_start", nullable = false)
private int pageStart;
@Column(name = "page_end", nullable = false)
private int pageEnd;
@Column(name = "full_text", nullable = false, columnDefinition = "TEXT")
private String fullText;
@Column(name = "created_at", nullable = false)
private Instant createdAt;
public SectionEntity() {}
public SectionEntity(String id, String chapterId, UUID bookId, String number,
String title, int pageStart, int pageEnd, String fullText) {
this.id = id;
this.chapterId = chapterId;
this.bookId = bookId;
this.number = number;
this.title = title;
this.pageStart = pageStart;
this.pageEnd = pageEnd;
this.fullText = fullText;
this.createdAt = Instant.now();
}
public String getId() { return id; }
public String getChapterId() { return chapterId; }
public UUID getBookId() { return bookId; }
public String getNumber() { return number; }
public String getTitle() { return title; }
public int getPageStart() { return pageStart; }
public int getPageEnd() { return pageEnd; }
public String getFullText() { return fullText; }
public Instant getCreatedAt() { return createdAt; }
}
@@ -0,0 +1,11 @@
package com.aiteacher.document;
import org.springframework.data.jpa.repository.JpaRepository;
import java.util.List;
import java.util.UUID;
public interface SectionRepository extends JpaRepository<SectionEntity, String> {
List<SectionEntity> findAllByBookId(UUID bookId);
void deleteAllByBookId(UUID bookId);
}
@@ -0,0 +1,65 @@
package com.aiteacher.document;
import org.springframework.ai.document.Document;
import org.springframework.stereotype.Service;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;
/**
* Splits a SectionEntity's full text into overlapping chunks for vector embedding.
* Target size: ~1800 characters (~450 tokens); overlap: 200 characters.
*/
@Service
public class TextChunkingService {
private static final int TARGET_CHARS = 1800;
private static final int OVERLAP_CHARS = 200;
public List<Document> chunk(SectionEntity section, String bookTitle) {
String text = section.getFullText();
if (text == null || text.isBlank()) return List.of();
List<String> windows = split(text);
List<Document> documents = new ArrayList<>();
for (int i = 0; i < windows.size(); i++) {
String chunkId = UUID.randomUUID().toString();
Map<String, Object> metadata = buildMetadata(section, bookTitle, i, windows.size(), chunkId);
documents.add(new Document(chunkId, windows.get(i), metadata));
}
return documents;
}
private List<String> split(String text) {
List<String> windows = new ArrayList<>();
int start = 0;
while (start < text.length()) {
int end = Math.min(start + TARGET_CHARS, text.length());
windows.add(text.substring(start, end));
if (end == text.length()) break;
start = end - OVERLAP_CHARS;
}
return windows;
}
private Map<String, Object> buildMetadata(SectionEntity section, String bookTitle,
int index, int total, String chunkId) {
Map<String, Object> m = new HashMap<>();
m.put("type", "TEXT");
m.put("book_id", section.getBookId().toString());
m.put("book_title", bookTitle);
m.put("chapter_id", section.getChapterId());
m.put("section_id", section.getId());
m.put("section_title", section.getTitle() != null ? section.getTitle() : "");
m.put("page_start", section.getPageStart());
m.put("page_end", section.getPageEnd());
m.put("chunk_index", index);
m.put("total_chunks", total);
m.put("chunk_id", chunkId);
return m;
}
}
@@ -0,0 +1,49 @@
package com.aiteacher.document;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.core.io.FileSystemResource;
import org.springframework.stereotype.Service;
import org.springframework.util.MimeTypeUtils;
import java.nio.file.Path;
/**
* Generates a clinical text description for an extracted figure image
* using the OpenAI vision model via Spring AI ChatClient.
*/
@Service
public class VisionDescriptionService {
private static final Logger log = LoggerFactory.getLogger(VisionDescriptionService.class);
private static final String PROMPT =
"You are a neurosurgery educator. Provide a brief 2-3 sentence clinical description of " +
"this image. Focus on anatomical structures, surgical landmarks, labels, and clinical " +
"significance. If text or labels are visible, include them verbatim.";
private final ChatClient chatClient;
public VisionDescriptionService(ChatClient chatClient) {
this.chatClient = chatClient;
}
/**
* Returns a description string. Falls back to the provided caption if vision fails.
*/
public String describe(Path imagePath, String captionFallback) {
try {
return chatClient.prompt()
.user(u -> u
.text(PROMPT)
.media(MimeTypeUtils.IMAGE_PNG, new FileSystemResource(imagePath.toFile())))
.call()
.content();
} catch (Exception ex) {
log.warn("Vision description failed for {}: {} — using caption as fallback",
imagePath.getFileName(), ex.getMessage());
return captionFallback != null ? captionFallback : "Figure";
}
}
}
@@ -0,0 +1,24 @@
package com.aiteacher.figure;
import java.awt.image.BufferedImage;
import java.nio.file.Path;
import java.util.UUID;
public interface FigureStorageService {
/**
* Saves an extracted image to the figure store and returns the relative path
* (relative to the configured base-path) stored in the database.
*/
String save(UUID bookId, String figureId, BufferedImage image);
/**
* Resolves a stored relative path to an absolute filesystem path.
*/
Path resolve(String relativePath);
/**
* Deletes all figure files for the given book.
*/
void deleteAll(UUID bookId);
}
@@ -0,0 +1,59 @@
package com.aiteacher.figure;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.UUID;
@Service
public class LocalFigureStorageService implements FigureStorageService {
private static final Logger log = LoggerFactory.getLogger(LocalFigureStorageService.class);
private final Path basePath;
public LocalFigureStorageService(@Value("${app.figure-storage.base-path:./uploads}") String basePath) {
this.basePath = Paths.get(basePath).toAbsolutePath().normalize();
}
@Override
public String save(UUID bookId, String figureId, BufferedImage image) {
try {
Path dir = basePath.resolve("figures").resolve(bookId.toString());
Files.createDirectories(dir);
String filename = figureId + ".png";
Path file = dir.resolve(filename);
ImageIO.write(image, "PNG", file.toFile());
// Return relative path for storage in DB
return "figures/" + bookId + "/" + filename;
} catch (IOException ex) {
throw new RuntimeException("Failed to save figure " + figureId, ex);
}
}
@Override
public Path resolve(String relativePath) {
return basePath.resolve(relativePath);
}
@Override
public void deleteAll(UUID bookId) {
Path dir = basePath.resolve("figures").resolve(bookId.toString());
if (!Files.exists(dir)) return;
try (var walk = Files.walk(dir)) {
walk.sorted(java.util.Comparator.reverseOrder())
.map(Path::toFile)
.forEach(java.io.File::delete);
} catch (IOException ex) {
log.warn("Could not fully delete figures for book {}: {}", bookId, ex.getMessage());
}
}
}
@@ -0,0 +1,111 @@
package com.aiteacher.retrieval;
import com.aiteacher.document.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.vectorstore.filter.FilterExpressionBuilder;
import org.springframework.stereotype.Service;
import java.util.*;
/**
* Dual-modality retriever: searches text chunks and figure captions independently,
* then expands text hits to their parent sections and merges linked figures.
*/
@Service
public class NeurosurgeryRetriever {
private static final Logger log = LoggerFactory.getLogger(NeurosurgeryRetriever.class);
private static final int TEXT_TOP_K = 5;
private static final int FIGURE_TOP_K = 3;
private final VectorStore vectorStore;
private final SectionRepository sectionRepository;
private final FigureRepository figureRepository;
private final ChunkFigureRefRepository chunkFigureRefRepository;
public NeurosurgeryRetriever(VectorStore vectorStore,
SectionRepository sectionRepository,
FigureRepository figureRepository,
ChunkFigureRefRepository chunkFigureRefRepository) {
this.vectorStore = vectorStore;
this.sectionRepository = sectionRepository;
this.figureRepository = figureRepository;
this.chunkFigureRefRepository = chunkFigureRefRepository;
}
public RetrievalResult retrieve(String query, UUID bookId) {
FilterExpressionBuilder b = new FilterExpressionBuilder();
// 1. Text chunk search
List<Document> textHits = vectorStore.similaritySearch(
SearchRequest.builder()
.query(query)
.topK(TEXT_TOP_K)
.filterExpression(b.and(
b.eq("type", "TEXT"),
b.eq("book_id", bookId.toString())
).build())
.build()
);
// 2. Figure caption search (independent topK)
List<Document> figureHits = vectorStore.similaritySearch(
SearchRequest.builder()
.query(query)
.topK(FIGURE_TOP_K)
.filterExpression(b.and(
b.eq("type", "FIGURE"),
b.eq("book_id", bookId.toString())
).build())
.build()
);
// 3. Expand text chunks to parent sections from Postgres
List<String> sectionIds = textHits.stream()
.map(d -> (String) d.getMetadata().get("section_id"))
.filter(Objects::nonNull)
.distinct()
.toList();
List<SectionEntity> sections = sectionIds.isEmpty()
? List.of()
: sectionRepository.findAllById(sectionIds);
// 4. Fetch figures explicitly linked to retrieved chunks
List<UUID> chunkIds = textHits.stream()
.map(d -> {
try { return UUID.fromString(d.getId()); }
catch (Exception e) { return null; }
})
.filter(Objects::nonNull)
.toList();
List<String> linkedFigureIds = chunkIds.isEmpty()
? List.of()
: chunkFigureRefRepository.findByChunkIdIn(chunkIds)
.stream().map(ChunkFigureRefEntity::getFigureId).distinct().toList();
List<FigureEntity> linkedFigures = linkedFigureIds.isEmpty()
? List.of()
: figureRepository.findAllById(linkedFigureIds);
// 5. Collect figures from semantic figure search
List<String> semanticFigureIds = figureHits.stream()
.map(d -> (String) d.getMetadata().get("figure_id"))
.filter(Objects::nonNull)
.toList();
List<FigureEntity> semanticFigures = semanticFigureIds.isEmpty()
? List.of()
: figureRepository.findAllById(semanticFigureIds);
// 6. Merge and deduplicate figures by figureId (linked figures take precedence)
Map<String, FigureEntity> merged = new LinkedHashMap<>();
linkedFigures.forEach(f -> merged.put(f.getId(), f));
semanticFigures.forEach(f -> merged.putIfAbsent(f.getId(), f));
log.debug("Retrieved {} sections, {} figures for query", sections.size(), merged.size());
return new RetrievalResult(sections, new ArrayList<>(merged.values()));
}
}
@@ -0,0 +1,11 @@
package com.aiteacher.retrieval;
import com.aiteacher.document.FigureEntity;
import com.aiteacher.document.SectionEntity;
import java.util.List;
public record RetrievalResult(
List<SectionEntity> parentSections,
List<FigureEntity> figures
) {}
@@ -47,6 +47,16 @@ spring:
max-size: 8 max-size: 8
queue-capacity: 50 queue-capacity: 50
logging:
level:
"[org.apache.pdfbox]": ERROR
app: app:
auth: auth:
password: ${APP_PASSWORD:changeme} password: ${APP_PASSWORD:changeme}
figure-storage:
base-path: ${FIGURE_STORAGE_PATH:./uploads}
min-image-size-px: 100
embedding:
batch-size: 20
batch-delay-ms: 2000
@@ -0,0 +1,28 @@
-- ============================================================
-- V4: Document hierarchy — chapter and section tables
-- Supports parent-child retrieval pattern for RAG precision.
-- ============================================================
CREATE TABLE IF NOT EXISTS chapter (
id VARCHAR(200) PRIMARY KEY,
book_id UUID NOT NULL REFERENCES book(id) ON DELETE CASCADE,
number INT NOT NULL DEFAULT 1,
title VARCHAR(500),
page_start INT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE IF NOT EXISTS section (
id VARCHAR(200) PRIMARY KEY,
chapter_id VARCHAR(200) NOT NULL REFERENCES chapter(id) ON DELETE CASCADE,
book_id UUID NOT NULL REFERENCES book(id) ON DELETE CASCADE,
number VARCHAR(50),
title VARCHAR(500),
page_start INT NOT NULL,
page_end INT NOT NULL,
full_text TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_section_book ON section(book_id);
CREATE INDEX IF NOT EXISTS idx_section_chapter ON section(chapter_id);
@@ -0,0 +1,29 @@
-- ============================================================
-- V5: Figures and chunk-to-figure reference table
-- figure: metadata + file path for each extracted image
-- chunk_figure_ref: links vector-store chunks to figures
-- ============================================================
CREATE TABLE IF NOT EXISTS figure (
id VARCHAR(200) PRIMARY KEY,
book_id UUID NOT NULL REFERENCES book(id) ON DELETE CASCADE,
section_id VARCHAR(200) REFERENCES section(id) ON DELETE SET NULL,
chapter_id VARCHAR(200) REFERENCES chapter(id) ON DELETE SET NULL,
label VARCHAR(100),
caption TEXT,
figure_type VARCHAR(50) NOT NULL,
page INT NOT NULL,
image_path VARCHAR(1000) NOT NULL,
caption_embedding_id UUID,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE IF NOT EXISTS chunk_figure_ref (
chunk_id UUID NOT NULL,
figure_id VARCHAR(200) NOT NULL REFERENCES figure(id) ON DELETE CASCADE,
mention_page INT,
PRIMARY KEY (chunk_id, figure_id)
);
CREATE INDEX IF NOT EXISTS idx_figure_book ON figure(book_id);
CREATE INDEX IF NOT EXISTS idx_cfr_chunk ON chunk_figure_ref(chunk_id);
+125 -21
View File
@@ -5,22 +5,47 @@
<div v-if="isUser" class="message-content">{{ message.content }}</div> <div v-if="isUser" class="message-content">{{ message.content }}</div>
<div v-else class="message-content message-content--markdown" v-html="renderedContent"></div> <div v-else class="message-content message-content--markdown" v-html="renderedContent"></div>
<!-- Source chips for assistant messages --> <!-- Sources for assistant messages -->
<div v-if="!isUser && message.sources && message.sources.length > 0" class="message-sources"> <div v-if="!isUser && message.sources && message.sources.length > 0" class="message-sources">
<div class="sources-label">Sources:</div> <div class="sources-label">Sources:</div>
<div class="source-list"> <div class="source-list">
<!-- TEXT sources -->
<div <div
v-for="(source, idx) in message.sources" v-for="(source, idx) in textSources"
:key="idx" :key="'text-' + idx"
class="source-item" class="source-item"
> >
<div class="source-chip"> <div class="source-chip source-chip--text">
<span class="source-book-icon">📖</span> <span class="source-icon">📖</span>
<span class="source-book-title">{{ source.bookTitle }}</span> <span class="source-book-title">{{ source.bookTitle }}</span>
<span v-if="source.page" class="source-page">p.&nbsp;{{ source.page }}</span> <span v-if="source.page" class="source-page">p.&nbsp;{{ source.page }}</span>
</div> </div>
<div v-if="source.chunkText" class="source-chunk">{{ source.chunkText }}</div> <div v-if="source.chunkText" class="source-chunk">{{ source.chunkText }}</div>
</div> </div>
<!-- FIGURE sources -->
<div
v-for="(source, idx) in figureSources"
:key="'fig-' + idx"
class="source-item source-item--figure"
>
<div class="source-chip source-chip--figure">
<span class="source-icon">🖼</span>
<span class="source-figure-label">{{ source.label || 'Figure' }}</span>
<span v-if="source.page" class="source-page">p.&nbsp;{{ source.page }}</span>
<span v-if="source.figureType" class="source-figure-type">{{ formatFigureType(source.figureType) }}</span>
</div>
<div v-if="source.caption" class="source-caption">{{ source.caption }}</div>
<div class="source-figure-image">
<img
:src="source.imageUrl"
:alt="source.caption || source.label || 'Figure'"
class="figure-img"
loading="lazy"
@error="onImageError"
/>
</div>
</div>
</div> </div>
</div> </div>
@@ -32,7 +57,7 @@
<script setup lang="ts"> <script setup lang="ts">
import { computed } from 'vue' import { computed } from 'vue'
import { marked } from 'marked' import { marked } from 'marked'
import type { ChatMessage } from '@/stores/chatStore' import type { ChatMessage, ChatSource } from '@/stores/chatStore'
const props = defineProps<{ const props = defineProps<{
message: ChatMessage message: ChatMessage
@@ -41,6 +66,36 @@ const props = defineProps<{
const isUser = computed(() => props.message.role === 'USER') const isUser = computed(() => props.message.role === 'USER')
const renderedContent = computed(() => marked.parse(props.message.content) as string) const renderedContent = computed(() => marked.parse(props.message.content) as string)
const textSources = computed(() =>
(props.message.sources ?? []).filter((s: ChatSource) => s.type === 'TEXT' || !s.type)
)
const figureSources = computed(() =>
(props.message.sources ?? []).filter((s: ChatSource) => s.type === 'FIGURE')
)
function formatFigureType(type: string): string {
const labels: Record<string, string> = {
ANATOMICAL_DIAGRAM: 'Anatomical Diagram',
SURGICAL_PHOTOGRAPH: 'Surgical Photo',
MRI_CT_SCAN: 'MRI / CT',
TABLE: 'Table',
CHART: 'Chart',
INTRAOPERATIVE_IMAGE: 'Intraoperative'
}
return labels[type] ?? type
}
function onImageError(e: Event) {
const img = e.target as HTMLImageElement
img.alt = 'Image unavailable'
img.style.display = 'none'
const wrapper = img.parentElement
if (wrapper) {
wrapper.innerHTML = '<span class="figure-missing">Image unavailable</span>'
}
}
function formatTime(iso: string): string { function formatTime(iso: string): string {
return new Date(iso).toLocaleTimeString([], { hour: '2-digit', minute: '2-digit' }) return new Date(iso).toLocaleTimeString([], { hour: '2-digit', minute: '2-digit' })
} }
@@ -182,6 +237,55 @@ function formatTime(iso: string): string {
gap: 0.25rem; gap: 0.25rem;
} }
.source-item--figure {
gap: 0.4rem;
}
.source-chip {
display: inline-flex;
align-items: center;
gap: 0.25rem;
border-radius: 4px;
padding: 0.2rem 0.5rem;
font-size: 0.78rem;
}
.source-chip--text {
background: #ebf8ff;
border: 1px solid #bee3f8;
}
.source-chip--figure {
background: #f0fff4;
border: 1px solid #9ae6b4;
}
.source-icon {
font-size: 0.8rem;
}
.source-book-title {
color: #2b6cb0;
font-weight: 500;
}
.source-figure-label {
color: #276749;
font-weight: 600;
}
.source-figure-type {
color: #718096;
font-size: 0.72rem;
background: #e2e8f0;
border-radius: 3px;
padding: 0 0.3rem;
}
.source-page {
color: #718096;
}
.source-chunk { .source-chunk {
font-size: 0.78rem; font-size: 0.78rem;
color: #4a5568; color: #4a5568;
@@ -194,28 +298,28 @@ function formatTime(iso: string): string {
line-height: 1.5; line-height: 1.5;
} }
.source-chip { .source-caption {
display: inline-flex;
align-items: center;
gap: 0.25rem;
background: #ebf8ff;
border: 1px solid #bee3f8;
border-radius: 4px;
padding: 0.2rem 0.5rem;
font-size: 0.78rem; font-size: 0.78rem;
color: #4a5568;
font-style: italic;
} }
.source-book-icon { .source-figure-image {
font-size: 0.8rem; max-width: 100%;
} }
.source-book-title { .figure-img {
color: #2b6cb0; max-width: 100%;
font-weight: 500; max-height: 300px;
border-radius: 6px;
border: 1px solid #e2e8f0;
object-fit: contain;
} }
.source-page { .figure-missing {
color: #718096; font-size: 0.78rem;
color: #a0aec0;
font-style: italic;
} }
.message-timestamp { .message-timestamp {
+15 -1
View File
@@ -2,11 +2,25 @@ import { defineStore } from 'pinia'
import { ref } from 'vue' import { ref } from 'vue'
import { api } from '@/services/api' import { api } from '@/services/api'
export interface ChatSource {
type: 'TEXT' | 'FIGURE'
bookTitle: string
page: number | null
// TEXT-specific
chunkText?: string
// FIGURE-specific
figureId?: string
label?: string
caption?: string
figureType?: string
imageUrl?: string
}
export interface ChatMessage { export interface ChatMessage {
id: string id: string
role: 'USER' | 'ASSISTANT' role: 'USER' | 'ASSISTANT'
content: string content: string
sources: Array<{ bookTitle: string; page: number | null; chunkText?: string }> sources: ChatSource[]
createdAt: string createdAt: string
} }
@@ -0,0 +1,73 @@
# Embedding & Retrieval Pipeline Checklist: Enhanced Embedding with Image Parsing and Metadata
**Purpose**: Author self-review of embedding pipeline and retrieval requirements quality — validates completeness, clarity, and measurability before implementation tasks are written
**Created**: 2026-04-03
**Feature**: [spec.md](../spec.md) | [research.md](../research.md) | [data-model.md](../data-model.md)
**Focus**: A (Embedding pipeline) + B (Retrieval & ranking) | Depth: Standard | Audience: Author
---
## Requirement Completeness — Embedding Pipeline
- [X] CHK001 - Is the definition of "inspect every page" complete — does the spec cover pages that have no extractable content layer (fully scanned/rasterised pages)? Yes [Completeness, Spec §FR-001, Assumption §6]
- [X] CHK002 - Does FR-002 define what "independently searchable" means in practice — specifically, is it clear that image chunks must be retrievable without a co-located text chunk? [Clarity, Spec §FR-002] - No image should be retrieved along linked text.
- [X] CHK003 - Is the minimum acceptable quality of the "descriptive textual representation" (FR-003) specified — e.g., must it include structural relationships, labelled regions, or clinical terms — or is any non-empty description sufficient? [Clarity, Spec §FR-003, Gap] - any non-empty description sufficient. Text just below the image should have the correct clinical term.
- [C] CHK004 - Are the caption-detection rules defined at spec level — specifically, what pattern or signal determines that a piece of text is a caption vs. body text adjacent to an image? [Clarity, Spec §FR-004, Gap] - We assume a text starting with Fig. follewed by number is a text description of a give image.
- [X] CHK005 - Does FR-004 specify what metadata is stored when a caption is absent — is the caption field omitted, left empty, or populated with a generated substitute? [Completeness, Spec §FR-004] - generated substitute
- [X] CHK006 - Is the "minimum meaningful-content threshold" (FR-007) quantified in the spec, or is it deferred entirely to implementation? The assumption section says "size threshold determined during implementation" — is this intentional and acceptable at the spec level? [Ambiguity, Spec §FR-007, Assumption §3] - Deferred to implementation
- [X] CHK007 - Does FR-008 specify the observable outcome of per-page image failures — specifically, is there a requirement that the book's processing status or error log is accessible to the user or admin after partial failure? [Completeness, Spec §FR-008, Gap] online logs
- [X] CHK008 - Is FR-010 ("MUST NOT degrade accuracy or completeness of text-only embedding") measurable — does the spec define a baseline or acceptance criterion against which degradation can be detected? [Measurability, Spec §FR-010, Gap] no definition
- [X] CHK009 - Are re-embedding requirements complete — does the spec cover what happens to in-progress queries and cached results while a book is being re-embedded? [Coverage, Assumption §8, Gap] - No need to take that into account.
---
## Requirement Completeness — Retrieval & Ranking
- [X] CHK010 - Does FR-006 define how image and text chunks are ranked relative to each other — is ranking unified (single score), or are the two modalities ranked independently with separate topK controls? [Clarity, Spec §FR-006, Gap] - independent separated topK
- [X] CHK011 - Is the relevance threshold for figure retrieval specified — i.e., at what similarity score (or other criterion) should a figure be excluded from results? [Clarity, Spec §FR-006, Gap] not specified
- [X] CHK012 - Are deduplication rules defined for the case where the same figure appears both in the semantic figure search and the chunk-to-figure reference lookup — which representation wins, or are both included? [Completeness, data-model.md §RetrievalResult, Gap] not specified
- [X] CHK013 - Is the requirement for parent section context expansion in the spec — specifically, is there a requirement that the LLM receives the full section text (not just the chunk) when a text chunk is retrieved? [Gap, research.md §Decision 1] - the LLM should receive the full section to have maximum context.
- [X] CHK014 - Does the spec define the required structure of the LLM prompt when both text context and figures are present — or is prompt design left entirely to implementation? [Completeness, Gap] - Left to implementation
- [X] CHK015 - Is SC-002 ("70% recall on image queries") sufficient as a measurability criterion — is the test set composition (10 queries) and evaluation method documented, or does it rely on an undefined manual process? [Measurability, Spec §SC-002] - Manual process.
---
## Scenario Coverage — Edge & Exception Cases
- [X] CHK016 - Does the spec address the scenario where a query is relevant to a book section that has figures but none of those figures rank above the retrieval threshold — is the expected fallback behaviour defined? [Coverage, Edge Case, Gap] - The figure should in this case be retrieved and shon to the user.
- [X] CHK017 - Is the scenario of a figure retrieved in search results but whose image file is missing from the file store covered — what should the system return to the user in that case? [Coverage, Exception Flow, Gap] - missing image error, shown in the front as a broken image link.
- [X] CHK018 - Are requirements defined for multi-image pages where images have conflicting captions or share a single composite caption — which image gets the caption, or is it duplicated? [Coverage, Spec §FR-004, Edge Case] - this case not exist.
---
## Consistency & Alignment
- [X] CHK019 - Are the metadata fields required by FR-004 and FR-005 fully consistent with the metadata schema defined in data-model.md — specifically, do the mandatory fields in the spec match the `type`, `section_id`, and `section_title` fields in the data model? [Consistency, Spec §FR-004, data-model.md §Vector Store Documents] - Left to implementation
- [X] CHK020 - Is SC-003 ("processing time ≤ 3× baseline") consistent with FR-003 — if description generation requires a vision model call per image, is the 3× cap realistic for a 500-page book with dense figures, and is this assumption documented? [Consistency, Spec §SC-003, Assumption §3, Gap] - not documented
- [X] CHK021 - Does the spec's description of citation display (FR-009) align with the `sources` format change documented in contracts/api.md — are the fields the spec says must be "distinct" actually represented distinctly in the API response? [Consistency, Spec §FR-009, contracts/api.md §4] - A section with image-source should be displayed in the front. Text source and image-source are distinct
---
## Notes
- Items marked `[Gap]` indicate requirements that appear absent or deferred; resolve before generating tasks
- Items marked `[Ambiguity]` require a clearer definition in the spec before implementation starts
- Items marked `[Consistency]` should be cross-checked between spec.md, data-model.md, and contracts/api.md
- Mark items `[x]` when resolved; add inline notes with the resolution for traceability
@@ -0,0 +1,34 @@
# Specification Quality Checklist: Enhanced Embedding with Image Parsing and Metadata
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-04-03
**Feature**: [spec.md](../spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification
## Notes
- All items pass. Spec is ready for `/speckit.clarify` or `/speckit.plan`.
@@ -0,0 +1,172 @@
# API Contracts: Enhanced Embedding with Image Parsing and Metadata
**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
**Base path**: `/api/v1`
**Auth**: HTTP Basic (existing)
---
## New / Changed Endpoints
### 1. Re-embed a book (new)
Triggers a full re-embedding of an already-processed book, replacing all existing chunks and
figures with the new image-aware pipeline output. Safe to call on books previously embedded
by feature 001.
```
POST /api/v1/books/{id}/reembed
```
**Path parameters**
| Parameter | Type | Description |
|-----------|------|-------------|
| `id` | UUID | Book ID |
**Response** `202 Accepted`
```json
{ "bookId": "uuid", "status": "PROCESSING" }
```
**Error responses**
| Status | Condition |
|--------|-----------|
| 404 | Book not found |
| 409 | Book already in PROCESSING state |
---
### 2. Get figures for a book (new)
Returns the list of extracted figures for a book, including their type, caption, and image URL.
Used by the frontend to display a figure gallery or inline figures in chat responses.
```
GET /api/v1/books/{id}/figures
```
**Path parameters**
| Parameter | Type | Description |
|-----------|------|-------------|
| `id` | UUID | Book ID |
**Response** `200 OK`
```json
[
{
"figureId": "youmans-7ed-fig-12-4",
"label": "Fig. 12-4",
"caption": "Coronal cross-section of the cavernous sinus showing cranial nerve relationships",
"figureType": "ANATOMICAL_DIAGRAM",
"page": 184,
"imageUrl": "/api/v1/figures/550e8400-e29b-41d4-a716-446655440000/youmans-7ed-fig-12-4.png",
"sectionId": "youmans-7ed-ch12-s2-3",
"sectionTitle": "Cavernous Sinus"
}
]
```
**Error responses**
| Status | Condition |
|--------|-----------|
| 404 | Book not found |
---
### 3. Serve figure image (new)
Serves the extracted figure image file. Mounted as a static resource from the file store.
```
GET /api/v1/figures/{bookId}/{filename}
```
**Path parameters**
| Parameter | Type | Description |
|-----------|------|-------------|
| `bookId` | UUID | Book ID |
| `filename` | string | Image filename (e.g. `youmans-7ed-fig-12-4.png`) |
**Response** `200 OK` — binary PNG
**Content-Type**: `image/png`
**Error responses**
| Status | Condition |
|--------|-----------|
| 404 | Image file not found |
---
### 4. Chat message response — extended source format (changed)
The existing `POST /api/v1/chat/sessions/{id}/messages` endpoint is unchanged in its request
format. The response `sources` field is extended to include figure references.
**Existing request** (unchanged):
```json
{ "content": "Describe the anatomy of the cavernous sinus" }
```
**Response** `200 OK` — extended `sources`:
```json
{
"id": "uuid",
"role": "ASSISTANT",
"content": "The cavernous sinus is ... [Fig. 12-4, p.184] ...",
"sources": [
{
"type": "TEXT",
"bookTitle": "Youmans and Winn Neurological Surgery, 7th Ed.",
"page": 184,
"chunkText": "The cavernous sinus contains ..."
},
{
"type": "FIGURE",
"bookTitle": "Youmans and Winn Neurological Surgery, 7th Ed.",
"page": 184,
"figureId": "youmans-7ed-fig-12-4",
"label": "Fig. 12-4",
"caption": "Coronal cross-section of the cavernous sinus ...",
"figureType": "ANATOMICAL_DIAGRAM",
"imageUrl": "/api/v1/figures/550e8400-e29b-41d4-a716-446655440000/youmans-7ed-fig-12-4.png"
}
],
"createdAt": "2026-04-03T12:00:00Z"
}
```
**Changed fields in `sources` array**:
| Field | Old | New |
|-------|-----|-----|
| `type` | absent | `"TEXT"` or `"FIGURE"` |
| `figureId` | absent | figure ID string (FIGURE type only) |
| `label` | absent | caption label (FIGURE type only) |
| `caption` | absent | full caption (FIGURE type only) |
| `figureType` | absent | enum name (FIGURE type only) |
| `imageUrl` | absent | image URL (FIGURE type only) |
---
## Unchanged Endpoints
All endpoints from feature 001 remain at their existing paths with no breaking changes:
- `POST /api/v1/books/upload`
- `GET /api/v1/books`
- `DELETE /api/v1/books/{id}`
- `GET /api/v1/topics`
- `GET /api/v1/topics/{id}/summary`
- `POST /api/v1/chat/sessions`
- `GET /api/v1/chat/sessions/{id}/messages`
- `DELETE /api/v1/chat/sessions/{id}`
@@ -0,0 +1,305 @@
# Data Model: Enhanced Embedding with Image Parsing and Metadata
**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
---
## Overview
Three storage tiers work in concert:
```
┌──────────────────────────────────────────────────────────────────┐
│ PDF Upload │
│ │ │
│ ▼ │
│ Parsing Pipeline │
│ │ │ │
│ ▼ ▼ │
│ Postgres (source of truth) pgvector (search index) │
│ - book - vector_store (text chunks) │
│ - chapter - vector_store (figure captions) │
│ - section (+ fullText) File Store (images) │
│ - figure (metadata) - /uploads/figures/{bookId}/*.png │
│ - chunk_figure_refs │
└──────────────────────────────────────────────────────────────────┘
```
---
## Postgres Schema
### Existing tables (unchanged)
- `book` — status, metadata, page count (V1)
- `chat_session`, `message` — conversation (V1)
- `vector_store` — managed by Spring AI pgvector starter (V2)
- `topic` — predefined topics (V3)
### New tables (Flyway V4)
```sql
-- V4: Document hierarchy
CREATE TABLE chapter (
id VARCHAR(200) PRIMARY KEY, -- "{bookId}-ch{N}"
book_id UUID NOT NULL REFERENCES book(id) ON DELETE CASCADE,
number INT NOT NULL,
title VARCHAR(500),
page_start INT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE section (
id VARCHAR(200) PRIMARY KEY, -- "{bookId}-ch{N}-s{X}-{Y}"
chapter_id VARCHAR(200) NOT NULL REFERENCES chapter(id) ON DELETE CASCADE,
book_id UUID NOT NULL REFERENCES book(id) ON DELETE CASCADE,
number VARCHAR(50), -- "2.3" or "12.2.3"
title VARCHAR(500),
page_start INT NOT NULL,
page_end INT NOT NULL,
full_text TEXT NOT NULL, -- NOT in vector store
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_section_book ON section(book_id);
CREATE INDEX idx_section_chapter ON section(chapter_id);
```
### New tables (Flyway V5)
```sql
-- V5: Figures and chunk→figure links
CREATE TABLE figure (
id VARCHAR(200) PRIMARY KEY, -- "{bookId}-fig-{label}"
book_id UUID NOT NULL REFERENCES book(id) ON DELETE CASCADE,
section_id VARCHAR(200) REFERENCES section(id) ON DELETE SET NULL,
chapter_id VARCHAR(200) REFERENCES chapter(id) ON DELETE SET NULL,
label VARCHAR(100), -- "Fig. 12-4"
caption TEXT,
figure_type VARCHAR(50) NOT NULL, -- FigureType enum name
page INT NOT NULL,
image_path VARCHAR(1000) NOT NULL, -- relative path on disk
caption_embedding_id UUID, -- ID in vector_store
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE chunk_figure_ref (
chunk_id UUID NOT NULL, -- vector_store document ID
figure_id VARCHAR(200) NOT NULL REFERENCES figure(id) ON DELETE CASCADE,
mention_page INT,
PRIMARY KEY (chunk_id, figure_id)
);
CREATE INDEX idx_figure_book ON figure(book_id);
CREATE INDEX idx_cfr_chunk ON chunk_figure_ref(chunk_id);
```
---
## Java Domain Records
### Document hierarchy (new package `com.aiteacher.document`)
```java
// Root — in-memory only, not a JPA entity
public record BookNode(
String bookId,
String title,
String isbn,
String edition,
List<String> authors,
List<ChapterNode> chapters
) {}
// Chapter — maps to `chapter` table
public record ChapterNode(
String chapterId,
String bookId,
int number,
String title,
int pageStart,
List<SectionNode> sections
) {}
// Section — maps to `section` table; fullText stays in Postgres
public record SectionNode(
String sectionId,
String chapterId,
String bookId,
String number,
String title,
int pageStart,
int pageEnd,
String fullText,
List<TextChunkNode> chunks,
List<FigureNode> figures
) {}
// Text chunk — embedded into vector_store; references its parent section
public record TextChunkNode(
String chunkId, // UUID → becomes vector_store document ID
String sectionId,
String chapterId,
String bookId,
String text,
int chunkIndex,
int totalChunksInSection,
int pageStart,
int pageEnd,
Map<String, Object> metadata // flattened for Spring AI filtering
) {
public Map<String, Object> toMetadata() {
return Map.of(
"type", "TEXT",
"book_id", bookId,
"chapter_id", chapterId,
"section_id", sectionId,
"section_title", /* from parent SectionNode */,
"page_start", pageStart,
"page_end", pageEnd,
"chunk_index", chunkIndex,
"total_chunks", totalChunksInSection
);
}
}
// Figure — maps to `figure` table; caption embedded into vector_store
public record FigureNode(
String figureId,
String sectionId,
String chapterId,
String bookId,
String label, // "Fig. 12-4"
String caption,
FigureType type,
int page,
String imagePath, // relative: "figures/{bookId}/{figureId}.png"
UUID captionEmbeddingId // ID in vector_store
) {}
```
### Figure type enum
```java
public enum FigureType {
ANATOMICAL_DIAGRAM,
SURGICAL_PHOTOGRAPH,
MRI_CT_SCAN,
TABLE,
CHART,
INTRAOPERATIVE_IMAGE
}
```
Classification heuristic (applied to caption + surrounding text):
| Keyword(s) | FigureType |
|-----------|-----------|
| `MRI`, `CT`, `magnetic`, `resonance`, `tomography` | `MRI_CT_SCAN` |
| `intraoperative`, `intra-op` | `INTRAOPERATIVE_IMAGE` |
| `table`, `Table` (at line start) | `TABLE` |
| `chart`, `graph`, `histogram` | `CHART` |
| `photograph`, `photo` | `SURGICAL_PHOTOGRAPH` |
| (default) | `ANATOMICAL_DIAGRAM` |
### Chunkfigure join record
```java
// Maps to `chunk_figure_ref` table
public record ChunkFigureRef(
UUID chunkId,
String figureId,
int mentionPage
) {}
```
---
## Vector Store Documents
All documents in `vector_store` carry a `metadata` JSON column with a `type` field for filtering.
### Text chunk document
| Field | Value |
|-------|-------|
| `content` | chunk text (400600 tokens) |
| `metadata.type` | `"TEXT"` |
| `metadata.book_id` | book UUID |
| `metadata.book_title` | book title string |
| `metadata.chapter_id` | chapter ID string |
| `metadata.section_id` | section ID string |
| `metadata.section_title` | section title string |
| `metadata.page_start` | int |
| `metadata.page_end` | int |
| `metadata.chunk_index` | int (0-based) |
| `metadata.total_chunks` | int |
### Figure caption document
| Field | Value |
|-------|-------|
| `content` | vision-generated description + caption text |
| `metadata.type` | `"FIGURE"` |
| `metadata.book_id` | book UUID |
| `metadata.book_title` | book title string |
| `metadata.chapter_id` | chapter ID string |
| `metadata.section_id` | section ID string |
| `metadata.figure_id` | figure ID string |
| `metadata.figure_type` | enum name string |
| `metadata.image_path` | relative file path |
| `metadata.label` | caption label e.g. `"Fig. 12-4"` |
| `metadata.page` | int |
---
## File Store Layout
```
uploads/
└── figures/
└── {bookId}/
├── {figureId}.png
└── ...
```
- Base path configurable via `app.figure-storage.base-path` (default: `./uploads`)
- Files are served via `GET /api/v1/figures/{bookId}/{filename}` (static resource mapping)
- Gitignored; not version-controlled
---
## State Transitions
Book processing extends the existing `BookStatus` state machine:
```
PENDING → PROCESSING → READY
↘ FAILED
```
During `PROCESSING`:
1. Parse PDF structure → extract chapters/sections → persist to Postgres
2. Split sections into text chunks → embed → write to vector_store
3. Extract images per page → filter by min size → save PNG → generate vision description → embed caption → write figure to Postgres + vector_store
4. Write chunk_figure_refs for all detected figure references in text
Failure at step 3 (individual page) → log + skip that page's images; continue.
Failure at any other step → set `BookStatus.FAILED`.
---
## Retrieval Result Structure
```java
public record RetrievalResult(
List<SectionNode> parentSections, // expanded full-text context
List<Document> figureVectorHits, // semantic figure matches
List<FigureNode> linkedFigures // figures explicitly referenced in text chunks
) {}
```
The `NeurosurgeryRetriever` service deduplicates figures across both lists before passing
the result to the LLM prompt builder.
+105
View File
@@ -0,0 +1,105 @@
# Implementation Plan: Enhanced Embedding with Image Parsing and Metadata
**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03 | **Spec**: [spec.md](spec.md)
**Input**: Feature specification from `/specs/002-image-aware-embedding/spec.md`
## Summary
Enhance the book embedding pipeline to extract images from every PDF page, generate descriptive
text for each image, and store all content (text chunks + figure captions) with rich, consistent
metadata in the vector store. A new document hierarchy (Book → Chapter → Section → TextChunk +
Figure) is introduced. Postgres holds the full-text sections and figure metadata; the vector
store holds chunk and figure caption embeddings; the local file store holds extracted image files.
At query time, both the text-chunk store and figure-caption store are searched in parallel and
results are merged before being sent to the LLM.
## Technical Context
**Language/Version**: Java 25 (backend), TypeScript / Node 20 (frontend)
**Primary Dependencies**: Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (embeddings + chat), PDFBox (via Spring AI PDF reader dependency)
**Storage**: PostgreSQL (JPA + Flyway), pgvector (Spring AI `VectorStore`), local file system (extracted images — `/uploads/figures/`)
**Testing**: Spring Boot Test, JUnit 5, Mockito
**Target Platform**: Linux server (Docker Compose)
**Project Type**: Web application — backend REST API + Vue 3 frontend
**Performance Goals**: Full book (up to 500 pages with images) processed in ≤ 30 minutes; query response unchanged from existing baseline
**Constraints**: No new deployable units; all changes within the existing `backend/` module; image storage on local disk (S3 migration is a future concern, behind an interface)
**Scale/Scope**: POC — <10 concurrent users; single shared book library
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
| Principle | Status | Notes |
|-----------|--------|-------|
| I — KISS | ⚠️ Justified violation — see Complexity Tracking | Hierarchical model + dual search adds complexity; justified by precision requirement |
| II — Easy to Change | ✅ | Figure storage wrapped behind `FigureStorageService` interface; can swap local disk for S3 |
| III — Web-First | ✅ | All new capabilities exposed via existing REST API; no new deployable units |
| IV — Docs as Architecture | ⚠️ Required | README Mermaid diagram MUST be updated in this PR to show new storage tiers |
## Project Structure
### Documentation (this feature)
```text
specs/002-image-aware-embedding/
├── plan.md # This file
├── research.md # Phase 0 output
├── data-model.md # Phase 1 output
├── quickstart.md # Phase 1 output
├── contracts/ # Phase 1 output
└── tasks.md # Phase 2 output (/speckit.tasks)
```
### Source Code (repository root)
```text
backend/
├── src/main/java/com/aiteacher/
│ ├── book/
│ │ ├── Book.java (existing)
│ │ ├── BookController.java (existing)
│ │ ├── BookService.java (existing)
│ │ ├── BookRepository.java (existing)
│ │ ├── BookStatus.java (existing)
│ │ ├── BookEmbeddingService.java (existing — enhanced)
│ │ └── NoKnowledgeSourceException.java (existing)
│ ├── document/ (new package)
│ │ ├── BookNode.java
│ │ ├── ChapterNode.java
│ │ ├── SectionNode.java
│ │ ├── SectionRepository.java
│ │ ├── TextChunkNode.java
│ │ ├── FigureNode.java
│ │ ├── FigureRepository.java
│ │ ├── FigureType.java
│ │ ├── ChunkFigureRef.java
│ │ └── ChunkFigureRefRepository.java
│ ├── figure/ (new package)
│ │ ├── FigureStorageService.java (interface)
│ │ └── LocalFigureStorageService.java (implementation)
│ ├── retrieval/ (new package)
│ │ └── NeurosurgeryRetriever.java
│ ├── chat/
│ │ └── ChatService.java (updated — uses NeurosurgeryRetriever)
│ └── config/
│ └── FigureStorageConfig.java (new — configures upload dir)
└── src/main/resources/
└── db/migration/
├── V4__document_hierarchy.sql (new)
└── V5__figures_and_refs.sql (new)
uploads/
└── figures/ (runtime — extracted images; gitignored)
```
**Structure Decision**: Option 2 (Web Application) confirmed. All backend changes stay within
`backend/`. Two new packages (`document/`, `retrieval/`) plus one interface package (`figure/`)
keep concerns separated without adding a deployable unit.
## Complexity Tracking
| Violation | Why Needed | Simpler Alternative Rejected Because |
|-----------|------------|-------------------------------------|
| Document hierarchy (BookNode → ChapterNode → SectionNode) | Parent-child retrieval: chunks reference their parent section so the LLM receives full section context, not just the matching fragment. This is the established solution for RAG precision. | Flat page-per-doc model (current) loses inter-sentence context; chunk-only retrieval produces incomplete answers for multi-paragraph clinical questions |
| Dual vector search (text chunks + figure captions) | Figure captions must be independently searchable — a query about "cavernous sinus anatomy" must surface the diagram even if no text chunk scores highly | Single vector store search would miss figures whose captions don't happen to be the highest-similarity hit; this is the core deliverable of the feature |
| Third storage tier (local file store for images) | Extracted images cannot live in Postgres (binary blobs degrade query performance) or the vector store (only vectors). A file-per-image approach is standard. | Storing images as base64 in Postgres JSONB would bloat the DB and complicate backup/restore; the `FigureStorageService` interface keeps the implementation swappable |
@@ -0,0 +1,86 @@
# Quickstart: Enhanced Embedding with Image Parsing and Metadata
**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
---
## Prerequisites
- Docker Compose running (PostgreSQL + pgvector)
- OpenAI API key set in `backend/src/main/resources/application.properties` or as env var `OPENAI_API_KEY`
- Java 25 + Maven on PATH
---
## New Configuration
Add to `backend/src/main/resources/application.properties`:
```properties
# Figure storage
app.figure-storage.base-path=./uploads
app.figure-storage.min-image-size-px=100
```
The `uploads/figures/` directory is created automatically on first use. Add it to `.gitignore`.
---
## Database Migration
Two new Flyway migrations run automatically on startup:
- `V4__document_hierarchy.sql` — adds `chapter` and `section` tables
- `V5__figures_and_refs.sql` — adds `figure` and `chunk_figure_ref` tables
No manual DB setup needed.
---
## Re-embedding Existing Books
Books embedded by feature 001 (text-only) remain functional for text queries. To add image
support, trigger a re-embed:
```bash
curl -X POST http://localhost:8080/api/v1/books/{bookId}/reembed \
-u admin:password
```
The book transitions to `PROCESSING`, old chunks and figures are deleted, and the new
image-aware pipeline runs. Status can be polled via `GET /api/v1/books`.
---
## Verifying Image Extraction
1. Upload a PDF with diagrams: `POST /api/v1/books/upload`
2. Wait for `status: "READY"` via `GET /api/v1/books`
3. List figures: `GET /api/v1/books/{id}/figures` — should return at least one entry per image page
4. Ask a diagram-specific question in chat — response `sources` should include a `type: "FIGURE"` entry
---
## Frontend: Rendering Inline Figures
The assistant message `content` field will contain figure references in the format
`[Fig. 12-4, p.184]`. The frontend should:
1. Parse `[Fig. X, p.N]` patterns in assistant message text
2. Look up the matching entry in `sources` where `type === "FIGURE"`
3. Render the figure inline using the `imageUrl` field
---
## Running Tests
```bash
cd backend
mvn test
```
Key new test classes:
- `FigureExtractionServiceTest` — unit tests for image extraction and classification
- `NeurosurgeryRetrieverTest` — unit tests for dual-search merge and deduplication
- `BookEmbeddingServiceIntegrationTest` — integration test: upload PDF with known figures,
verify figures appear in `GET /api/v1/books/{id}/figures`
+188
View File
@@ -0,0 +1,188 @@
# Research: Enhanced Embedding with Image Parsing and Metadata
**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
This document resolves all technical unknowns identified during planning. The primary source for
decisions is the detailed architecture provided directly by the project owner, supplemented by
Spring AI 2.0.0-M4 API specifics.
---
## Decision 1: Document Hierarchy Model
**Decision**: Adopt a four-level hierarchy — `BookNode``ChapterNode``SectionNode`
`TextChunkNode` + `FigureNode`. The `SectionNode` is the pivotal unit: it holds the full section
text in Postgres and is used for parent-child context expansion at retrieval time.
**Rationale**: A flat page-per-document model (current implementation) loses structural context.
When a user asks a multi-faceted clinical question, the LLM needs the surrounding section text,
not just the matching fragment. Parent-child retrieval — where chunks point to their parent
section — is the established pattern for RAG precision. The hierarchy also makes figure-to-section
association explicit and queryable.
**Alternatives considered**:
- Keep flat page model, add metadata only → rejected: insufficient for precise citation and
context expansion
- Chapter-level retrieval (coarser than section) → rejected: too much irrelevant context sent
to LLM; cost and latency increase
---
## Decision 2: Image Extraction Strategy
**Decision**: Use PDFBox (already on classpath via `spring-ai-pdf-document-reader`) to extract
images per page. Each image is tagged with `page`, `figure_id` (derived from caption, e.g.
"Fig. 12-4"), and the parent `sectionId`. Images are saved to local disk under
`/uploads/figures/{bookId}/`.
**Rationale**: PDFBox is already present (Spring AI bundles it). No new dependency needed.
Per-page extraction ensures every image is captured regardless of PDF structure.
**Alternatives considered**:
- iText / iText7 → additional commercial dependency; overkill for extraction
- Screenshot each page as PNG, then OCR → far slower; loses vector quality
---
## Decision 3: Figure Content Representation
**Decision**: Generate a textual description of each extracted image using the OpenAI vision
model (GPT-4o). This description becomes the `content` field of the figure's vector store
document. The figure caption (parsed from the surrounding text) is also included to maximise
retrieval signal.
**Rationale**: Caption-only embedding would miss figures with no caption or with sparse labels.
Vision-generated descriptions produce richer semantic content (anatomy terms, structural
relationships) that matches clinical queries. The OpenAI client already in use supports image
inputs; no additional dependency is required.
**Alternatives considered**:
- Caption-only embedding → insufficient when captions are absent or terse (common in textbooks)
- Local vision model (LLaVA) → requires self-hosting; out of scope for POC
- OCR only → extracts text visible in image but misses non-text visual content (diagrams, MRI)
---
## Decision 4: Dual Vector Search
**Decision**: At query time, run two parallel similarity searches:
1. Text chunk search (filtered by `type = "TEXT"` and `book_id`)
2. Figure caption search (filtered by `type = "FIGURE"` and `book_id`)
Results are merged and deduplicated. The LLM prompt receives the expanded parent section text
plus a structured figure reference list.
**Rationale**: A single search would rank text and figures against each other; figures with
terse captions would systematically lose to text chunks. Separate searches with independent
`topK` allow tuning each modality independently.
**Alternatives considered**:
- Single search, filter by relevance score → figure captions score lower than text; figures
are systematically under-retrieved
- Post-process text results to look up linked figures only → misses figures that are relevant
to the query but not explicitly referenced in the retrieved text chunks
---
## Decision 5: Chunk-to-Figure Linking
**Decision**: During text parsing, whenever a pattern matching `Fig.\s+\d+[\-\.]\d+` or
`Figure\s+\d+[\-\.]\d+` is found in a chunk, insert a row into the `chunk_figure_refs` table
linking `chunkId``figureId`. At retrieval time, after text chunks are retrieved, their
associated figures are fetched from this table and added to the LLM prompt.
**Rationale**: Explicit linking ensures that when a text chunk is retrieved, its referenced
figures are always surfaced — even if the figure's caption did not score highly in the vector
search. This is the higher-recall path; dual search (Decision 4) is the higher-precision path.
**Alternatives considered**:
- Rely entirely on dual vector search → may miss figures referenced in retrieved text but
scoring below the topK threshold in the figure search
---
## Decision 6: Image Storage
**Decision**: Extracted images are saved as PNG files to a local directory
(`${app.figure-storage.base-path}`, defaults to `./uploads/figures/{bookId}/`). The path is
stored in `figure.image_path` in Postgres. A `FigureStorageService` interface wraps all disk
I/O so the implementation can be swapped to S3 or another object store without changing
callers.
**Rationale**: Local disk is the simplest viable option for a POC with <10 users. The interface
boundary satisfies Constitution Principle II (Easy to Change).
**Alternatives considered**:
- S3 from day 1 → operational overhead not justified at POC scale
- Base64 in Postgres JSONB → bloats DB; complicates backup; query performance degrades
---
## Decision 7: Figure Type Classification
**Decision**: Use the enum `FigureType { ANATOMICAL_DIAGRAM, SURGICAL_PHOTOGRAPH, MRI_CT_SCAN,
TABLE, CHART, INTRAOPERATIVE_IMAGE }`. Classification is derived from:
1. Caption keywords ("MRI", "CT", "Fig.", "Table") — heuristic, no model needed
2. Fall back to `ANATOMICAL_DIAGRAM` if unclassifiable
**Rationale**: Allows the frontend to render different icon/label per type (e.g., "MRI" badge).
Heuristic classification avoids a separate model call per image at extraction time.
**Alternatives considered**:
- Vision model classification → accurate but adds latency and cost per figure; deferrable
- Single `FIGURE` type → loses citation granularity required by spec FR-004
---
## Decision 8: Metadata Schema for Vector Store Documents
**Decision**: All vector store documents carry a flat `Map<String, Object>` metadata for Spring
AI filtering. Schema:
| Field | Text Chunk | Figure Chunk |
|-------|-----------|-------------|
| `type` | `"TEXT"` | `"FIGURE"` |
| `book_id` | ✓ | ✓ |
| `book_title` | ✓ | ✓ |
| `chapter_id` | ✓ | ✓ |
| `section_id` | ✓ | ✓ |
| `section_title` | ✓ | ✓ |
| `page_start` | ✓ | — |
| `page_end` | ✓ | — |
| `chunk_index` | ✓ | — |
| `total_chunks` | ✓ | — |
| `figure_id` | — | ✓ |
| `figure_type` | — | ✓ |
| `image_path` | — | ✓ |
| `label` | — | ✓ |
| `page` | — | ✓ |
**Rationale**: Flat map is required by Spring AI `FilterExpressionBuilder`. Separation by `type`
allows independent filtering in dual search.
---
## Decision 9: Re-embedding Existing Books
**Decision**: Books already processed under feature 001 (text-only) are NOT automatically
re-embedded. An explicit re-embed action is exposed via `POST /api/v1/books/{id}/reembed`
(admin-triggered). The existing chunks remain valid for text queries until re-embedding completes.
**Rationale**: Automatic re-embedding on deploy would block the system and risk data loss if
the process fails mid-way. An explicit, idempotent trigger is safer and more observable.
---
## Decision 10: Minimum Image Size Threshold
**Decision**: Images smaller than 100×100 pixels are discarded and no chunk is created. This
threshold filters out decorative elements (bullets, dividers, publisher logos) without a
classification model.
**Rationale**: Neurosurgery textbook diagrams and MRI scans are never smaller than 100×100 px.
The threshold is configurable via `app.figure-storage.min-image-size-px` in
`application.properties`.
**Alternatives considered**:
- No threshold → decorative icons pollute the figure index
- ML-based classification → accurate but adds model dependency; not needed at POC scale
+176
View File
@@ -0,0 +1,176 @@
# Feature Specification: Enhanced Embedding with Image Parsing and Metadata
**Feature Branch**: `002-image-aware-embedding`
**Created**: 2026-04-03
**Status**: Draft
**Input**: User description: "I want to enhance the embedding process. I want also parse image from each pages if any and add proper metadata so that it can match the retrieved chunk/vector that match what user are querying."
## User Scenarios & Testing *(mandatory)*
### User Story 1 - Image Content Surfaced in Query Results (Priority: P1)
A neurosurgeon asks a question in the chat (e.g., "Show me the anatomy of the Circle of Willis")
that is best answered by a diagram or figure in an uploaded book. The system retrieves the image
content — its description and surrounding context — and uses it to construct a grounded answer,
citing the page and book where the image appeared.
**Why this priority**: This is the direct, user-visible payoff of the feature. Without it, the
enhancement has no observable benefit. All other stories support this outcome.
**Independent Test**: Upload a book containing a labelled anatomical diagram. Ask a query whose
answer is conveyed by that diagram (not in the surrounding text). Confirm the system returns an
answer that references the diagram's content and cites the correct book and page.
**Acceptance Scenarios**:
1. **Given** a book with an anatomical diagram on page 42, **When** a user asks a question whose
answer is only depicted in that diagram, **Then** the system returns a response that draws on
the diagram's content and cites "Page 42, [Book Title]".
2. **Given** a page with both text and an image, **When** the system retrieves that page's content,
**Then** the image-derived content and the surrounding text are each independently retrievable
and independently citable.
3. **Given** a query that has no relevant image in any uploaded book, **When** the system searches,
**Then** it does not fabricate image-derived content and falls back to text-only results (or
states no relevant content was found).
---
### User Story 2 - All Pages Scanned for Images During Embedding (Priority: P1)
When a book is uploaded and processed, every page is inspected for images. Any image found is
extracted and represented as a searchable content chunk enriched with metadata (page number,
book title, position on page, caption if present). Pages without images are processed as
text-only chunks, unchanged from the existing behaviour.
**Why this priority**: This is the prerequisite for User Story 1. Without systematic per-page
image detection, image content cannot be retrieved.
**Independent Test**: Upload a book whose pages include a mix of text-only and image-containing
pages. After processing completes, verify that chunks exist for each image page and that each
image chunk carries the correct metadata (page number, source book, caption).
**Acceptance Scenarios**:
1. **Given** a book being processed, **When** the embedding pipeline runs, **Then** every page
is evaluated for images and each detected image generates at least one content chunk.
2. **Given** an image with a caption or label, **When** the chunk is created, **Then** the
caption or label text is included in the chunk's content and metadata.
3. **Given** a page with multiple images, **When** processing completes, **Then** each image is
represented as a separate chunk with its own metadata, not merged into a single chunk.
4. **Given** a page with no images, **When** processing completes, **Then** no image chunk is
created for that page and text processing is unaffected.
---
### User Story 3 - Rich Metadata Enables Precise Source Attribution (Priority: P2)
When the system returns a result based on image content, the user can see exactly where that
image appeared: which book, which page, and what type of content (diagram, table, photograph,
etc.). This gives the user confidence in the source and lets them locate the original image
in their physical or digital copy of the book.
**Why this priority**: Metadata quality directly impacts user trust. Neurosurgeons require
traceable, citable evidence. Richer metadata also improves retrieval accuracy by giving the
search engine more signals to match against a query.
**Independent Test**: Retrieve a result sourced from an image chunk. Inspect the displayed
citation and verify it includes: book title, page number, content type (e.g., "diagram"),
and caption (if present in the original).
**Acceptance Scenarios**:
1. **Given** a retrieved image chunk, **When** the system displays the source citation,
**Then** the citation includes at minimum: book title, page number, and a content-type
label (e.g., diagram, table, figure).
2. **Given** an image chunk with a detected caption, **When** the citation is displayed,
**Then** the caption text is shown alongside the other metadata fields.
3. **Given** a topic summary that draws on both text and image chunks, **When** the user
inspects citations, **Then** image-sourced and text-sourced claims are distinguishable
from each other.
---
### Edge Cases
- What happens when an image is too small to contain meaningful content (e.g., a decorative
bullet icon or a publisher logo)?
- How does the system handle a page that is entirely an image (scanned page with no digital text)?
- What if an image spans multiple pages (e.g., a fold-out diagram)?
- How does the system behave when an image has no caption and its surrounding text provides
no useful context?
- What happens if image processing fails for a specific page — does it abort the whole book
or continue with the remaining pages?
## Requirements *(mandatory)*
### Functional Requirements
- **FR-001**: System MUST inspect every page of an uploaded book for the presence of images
during the embedding process.
- **FR-002**: System MUST extract each detected image and create a dedicated, independently
searchable content chunk for it.
- **FR-003**: System MUST generate a descriptive textual representation of each extracted
image so its content is semantically searchable by the retrieval system.
- **FR-004**: System MUST associate the following metadata with every image chunk: book title,
page number, content type (e.g., diagram, table, figure, photograph), and caption text
(where present).
- **FR-005**: System MUST include the same base metadata (book title, page number) on text
chunks so that all retrieved content — image or text — carries consistent, comparable
source attribution.
- **FR-006**: System MUST treat image chunks as first-class retrievable units: they must be
ranked and returned alongside text chunks when they are relevant to a user query.
- **FR-007**: System MUST skip images that fall below a minimum meaningful-content threshold
(e.g., decorative icons, page separators) and MUST NOT create chunks for them.
- **FR-008**: If image processing fails for a specific page, the system MUST log the failure,
skip that page's image, and continue processing the remaining pages and text content of
the book.
- **FR-009**: System MUST display image-sourced content citations distinctly from text-sourced
citations so users can identify when a result originates from a visual element.
- **FR-010**: Processing a book that contains images MUST NOT degrade the accuracy or
completeness of the existing text-only embedding for that book.
### Key Entities
- **Image Chunk**: A searchable content unit derived from a page image. Attributes: generated
description, source book title, page number, content type, caption (optional), embedding vector.
- **Text Chunk**: Existing unit; extended to carry explicit metadata: source book title,
page number, section heading (if detectable), content type ("text").
- **Chunk Metadata**: Structured attributes attached to every chunk regardless of type,
enabling consistent filtering and citation. Mandatory fields: book title, page number,
content type. Optional fields: caption, section heading.
## Success Criteria *(mandatory)*
### Measurable Outcomes
- **SC-001**: At least 90% of pages containing images in a test book result in a retrievable
image chunk after processing completes.
- **SC-002**: A controlled set of 10 queries whose answers are conveyed by diagrams in an
uploaded book returns at least 7 correct image-sourced answers (70% recall on image queries).
- **SC-003**: Embedding processing time for a book with images increases by no more than 3×
compared to processing the same book as text-only, for books up to 500 pages.
- **SC-004**: Every retrieved result — text or image — includes a citation that identifies
at minimum the source book title and page number, with 100% coverage across a test result set.
- **SC-005**: In a user evaluation with 5 representative queries that previously returned
no useful results (because the answer was only in a diagram), at least 4 now return a
useful, grounded answer.
## Assumptions
- Books are still uploaded exclusively as PDFs; image parsing applies to PDF pages only.
- The platform already has a working text-only embedding pipeline (from feature 001); this
feature enhances it without replacing or rewriting the text processing logic.
- Images worth processing are those that occupy a meaningful portion of the page; small
decorative or structural images (logos, dividers, icons) are excluded based on a size
threshold determined during implementation.
- The descriptive representation of an image (FR-003) is generated at embedding time, not
at query time; query latency is not affected by image interpretation.
- The shared global book library model from feature 001 is retained; image chunks from a
processed book are available to all users immediately upon completion.
- Scanned pages (fully rasterised pages with no digital text layer) are treated as a single
full-page image; the system attempts to extract content from them but does not guarantee
the same fidelity as pages with digital text.
- Per-chunk metadata is stored alongside the vector so it can be used for both retrieval
filtering and source citation display without a separate lookup.
- Books already processed under feature 001 (text-only) are not automatically re-processed;
re-embedding must be triggered explicitly by the user or an administrator.
+168
View File
@@ -0,0 +1,168 @@
# Tasks: Enhanced Embedding with Image Parsing and Metadata
**Input**: Design documents from `/specs/002-image-aware-embedding/`
**Prerequisites**: plan.md ✓ | spec.md ✓ | research.md ✓ | data-model.md ✓ | contracts/ ✓
**Organization**: Tasks grouped by user story to enable independent implementation and testing.
## Format: `[ID] [P?] [Story] Description`
- **[P]**: Can run in parallel (different files, no shared dependencies)
- **[US1/US2/US3]**: Which user story this task belongs to
---
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Database migrations and configuration that establish the foundation for all new code
- [X] T001 Create Flyway migration `V4__document_hierarchy.sql` — add `chapter` and `section` tables per data-model.md §Postgres Schema in `backend/src/main/resources/db/migration/V4__document_hierarchy.sql`
- [X] T002 Create Flyway migration `V5__figures_and_refs.sql` — add `figure` and `chunk_figure_ref` tables per data-model.md §Postgres Schema in `backend/src/main/resources/db/migration/V5__figures_and_refs.sql`
- [X] T003 Add figure-storage configuration keys to `backend/src/main/resources/application.properties`: `app.figure-storage.base-path=./uploads` and `app.figure-storage.min-image-size-px=100`
- [X] T004 Add `uploads/` directory to `.gitignore` at repo root; create `uploads/figures/.gitkeep` to preserve directory structure
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Core types and infrastructure that ALL user stories depend on — nothing in Phase 3+ can start until this phase is complete
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
- [X] T005 [P] Create `FigureType` enum in `backend/src/main/java/com/aiteacher/document/FigureType.java` — values: `ANATOMICAL_DIAGRAM`, `SURGICAL_PHOTOGRAPH`, `MRI_CT_SCAN`, `TABLE`, `CHART`, `INTRAOPERATIVE_IMAGE`
- [X] T006 [P] Create `FigureStorageService` interface in `backend/src/main/java/com/aiteacher/figure/FigureStorageService.java` — declare `Path save(UUID bookId, String figureId, BufferedImage image)`, `Path resolve(UUID bookId, String filename)`, and `void delete(UUID bookId)`
- [X] T007 Create `LocalFigureStorageService` implementation in `backend/src/main/java/com/aiteacher/figure/LocalFigureStorageService.java` — writes PNG files under `${app.figure-storage.base-path}/figures/{bookId}/`; implements `FigureStorageService`; depends on T006
- [X] T008 Create `FigureStorageConfig` bean in `backend/src/main/java/com/aiteacher/config/FigureStorageConfig.java` — reads `app.figure-storage.base-path` and `app.figure-storage.min-image-size-px` as `@ConfigurationProperties`; registers `LocalFigureStorageService` as `@Bean`; adds `ResourceHandler` mapping `GET /api/v1/figures/**` to the base-path directory
- [X] T009 [P] Create `ChapterEntity` JPA entity and `ChapterRepository` in `backend/src/main/java/com/aiteacher/document/``@Entity(name="chapter")`, fields: `id` (String PK), `bookId` (UUID FK → book), `number` (int), `title` (String), `pageStart` (int), `createdAt` (Instant); `ChapterRepository extends JpaRepository<ChapterEntity, String>`
- [X] T010 [P] Create `SectionEntity` JPA entity and `SectionRepository` in `backend/src/main/java/com/aiteacher/document/``@Entity(name="section")`, fields: `id` (String PK), `chapterId` (String FK → chapter), `bookId` (UUID FK → book), `number` (String), `title` (String), `pageStart`/`pageEnd` (int), `fullText` (TEXT column), `createdAt` (Instant); `SectionRepository extends JpaRepository<SectionEntity, String>` with `findAllByBookId(UUID)`
- [X] T011 [P] Create `FigureEntity` JPA entity and `FigureRepository` in `backend/src/main/java/com/aiteacher/document/``@Entity(name="figure")`, fields: `id` (String PK), `bookId` (UUID), `sectionId` (String, nullable), `chapterId` (String, nullable), `label` (String), `caption` (TEXT), `figureType` (`@Enumerated` FigureType), `page` (int), `imagePath` (String), `captionEmbeddingId` (UUID, nullable), `createdAt` (Instant); `FigureRepository` with `findAllByBookId(UUID)`, `deleteAllByBookId(UUID)`
- [X] T012 Create `ChunkFigureRefEntity` JPA entity and `ChunkFigureRefRepository` in `backend/src/main/java/com/aiteacher/document/` — composite PK `(chunkId UUID, figureId String)`, `mentionPage` (int); `ChunkFigureRefRepository` with `findByChunkIdIn(List<UUID>)`, `deleteByFigureIdIn(List<String>)`
**Checkpoint**: Migrations will run on next startup; all JPA entities are wired; figure storage reads config correctly
---
## Phase 3: User Story 2 — All Pages Scanned for Images During Embedding (Priority: P1)
**Goal**: When a book is uploaded, every page is inspected for images; each found image is extracted, persisted, described, and embedded as a searchable chunk alongside its metadata
**Independent Test**: Upload a PDF containing at least one page with a labelled anatomical diagram. After status shows `READY`, call `GET /api/v1/books/{id}/figures` — response must contain at least one entry with `figureType`, `caption`, `page`, and `imageUrl` populated. Verify the PNG file exists at the path in `imagePath`.
- [X] T013 [US2] Create `PdfStructureParser` service in `backend/src/main/java/com/aiteacher/document/PdfStructureParser.java` — uses Spring AI's `PagePdfDocumentReader` to extract per-page text; groups pages into `SectionEntity` records using heading-detection heuristics (lines matching `^\d+(\.\d+)*\s+[A-Z]`); groups sections into `ChapterEntity` records; persists both to Postgres via `ChapterRepository` and `SectionRepository`; returns `List<SectionEntity>` for the book
- [X] T014 [US2] Create `FigureExtractionService` in `backend/src/main/java/com/aiteacher/document/FigureExtractionService.java` — opens PDF with PDFBox `PDDocument`; iterates pages; extracts `PDImageXObject` instances; skips images whose width or height are below `min-image-size-px`; classifies `FigureType` using the keyword-matching table from data-model.md §FigureType; parses caption from the nearest text line matching `CAPTION_PATTERN`; saves PNG via `FigureStorageService`; persists `FigureEntity` to `FigureRepository`; returns `List<FigureEntity>` per book
- [X] T015 [US2] Create `VisionDescriptionService` in `backend/src/main/java/com/aiteacher/document/VisionDescriptionService.java` — accepts a `Path` to a PNG and a caption String; calls the OpenAI vision model (via Spring AI `ChatClient` with image media type) to generate a 24 sentence clinical description; returns the generated description string; handles API failures by returning the caption as fallback
- [X] T016 [US2] Create `TextChunkingService` in `backend/src/main/java/com/aiteacher/document/TextChunkingService.java` — accepts a `SectionEntity`; splits `fullText` into overlapping 400600 token windows (20-token overlap); wraps each window in a Spring AI `Document` with the flat metadata map defined in data-model.md §Text chunk document; returns `List<Document>`
- [X] T017 [US2] Create `ChunkFigureRefService` in `backend/src/main/java/com/aiteacher/document/ChunkFigureRefService.java` — accepts a Spring AI `Document` (with its `id` as `chunkId`) and a `List<FigureEntity>` for the book; scans chunk text for patterns `Fig\.\s*\d+[\-\.]\d+` and `Figure\s+\d+[\-\.]\d+`; matches against figure labels; persists `ChunkFigureRefEntity` rows via `ChunkFigureRefRepository`
- [X] T018 [US2] Rewrite `BookEmbeddingService.embedBook()` in `backend/src/main/java/com/aiteacher/book/BookEmbeddingService.java` to orchestrate the full pipeline: (1) `PdfStructureParser` → sections; (2) parallel: `FigureExtractionService` + `TextChunkingService` for each section; (3) `VisionDescriptionService` for each figure; (4) embed figure captions+descriptions as `Document`s (metadata per data-model.md §Figure caption document) into `vectorStore`; (5) embed text chunks into `vectorStore`; (6) `ChunkFigureRefService` for each chunk; update `captionEmbeddingId` on `FigureEntity` after embedding
- [X] T019 [US2] Extend `BookEmbeddingService.deleteBookChunks()` to also delete: all `ChunkFigureRefEntity` rows (via `findByFigureIdIn`), all `FigureEntity` rows (via `deleteAllByBookId`), all figure PNG files (via `FigureStorageService.delete(bookId)`), all `SectionEntity` and `ChapterEntity` rows for the book
- [X] T020 [US2] Add `POST /api/v1/books/{id}/reembed` endpoint to `BookController` in `backend/src/main/java/com/aiteacher/book/BookController.java` — returns `202` with `{ bookId, status: "PROCESSING" }`; returns `404` if not found; returns `409` if already `PROCESSING`; calls `deleteBookChunks()` then `embedBook()` asynchronously
**Checkpoint**: Upload a PDF with figures → poll `GET /api/v1/books` for `READY``GET /api/v1/books/{id}/figures` returns figure list → PNG accessible at `GET /api/v1/figures/{bookId}/{filename}`
---
## Phase 4: User Story 1 — Image Content Surfaced in Query Results (Priority: P1)
**Goal**: User asks a question answered by a diagram — the system retrieves that diagram's content and surfaces it in the chat response with a citation
**Independent Test**: With a book embedded (Phase 3 checkpoint passed), ask a chat question whose answer is depicted only in a diagram. The response `sources` array must contain at least one entry with `type: "FIGURE"` and a non-empty `imageUrl`.
- [X] T021 [US1] Create `NeurosurgeryRetriever` service in `backend/src/main/java/com/aiteacher/retrieval/NeurosurgeryRetriever.java` — (1) text chunk search: `vectorStore.similaritySearch` with filter `type == TEXT AND book_id == bookId`, topK=5; (2) figure search: same store, filter `type == FIGURE AND book_id == bookId`, topK=3; (3) expand text chunk results to parent sections via `SectionRepository.findAllById(sectionIds)`; (4) fetch explicitly linked figures via `ChunkFigureRefRepository.findByChunkIdIn(chunkIds)` + `FigureRepository.findAllById`; (5) deduplicate figures across lists by `figureId`; return `RetrievalResult(parentSections, figureVectorHits, linkedFigures)` — add `RetrievalResult` record in same package
- [X] T022 [US1] Refactor `ChatService.sendMessage()` in `backend/src/main/java/com/aiteacher/chat/ChatService.java` — replace `QuestionAnswerAdvisor` with a manual call to `NeurosurgeryRetriever`; build the LLM user message from: section full texts as `[Section X.Y — Title, pp.A-B]\n{fullText}` blocks, followed by `AVAILABLE FIGURES FOR THIS SECTION:` list with `- {label} (p.{page}): {caption} [image: {filename}]` lines per figure; append the instruction `When referencing diagrams, cite them as [Fig. X, p.N].`; send via `chatClient.prompt().system(SYSTEM_PROMPT).user(prompt).call()`
- [X] T023 [US1] Add `GET /api/v1/books/{id}/figures` endpoint to `BookController` — returns `200` with `List<FigureResponse>`; `FigureResponse` is a new record in `backend/src/main/java/com/aiteacher/book/FigureResponse.java` with fields `figureId`, `label`, `caption`, `figureType`, `page`, `imageUrl` (assembled as `/api/v1/figures/{bookId}/{filename}`), `sectionId`, `sectionTitle`; returns `404` if book not found
- [X] T024 [US1] Update `extractSources()` in `ChatService` to build both TEXT and FIGURE source entries: TEXT entries keep existing fields plus `"type": "TEXT"`; FIGURE entries add `"type": "FIGURE"`, `"figureId"`, `"label"`, `"caption"`, `"figureType"`, `"imageUrl"` — source data comes from `RetrievalResult` (text chunk Documents and merged FigureEntity list)
**Checkpoint**: Chat question answered by a diagram → response body contains `sources[n].type == "FIGURE"` with populated `imageUrl`; image loads from the returned URL
---
## Phase 5: User Story 3 — Rich Metadata Enables Precise Source Attribution (Priority: P2)
**Goal**: Users see distinct, informative citations for text vs. image sources; image sources render inline in the chat UI
**Independent Test**: After triggering a response with figure sources, inspect the chat message in the UI — text sources and figure sources are visually distinguishable; figure sources render the actual image inline using the `imageUrl`
- [X] T025 [P] [US3] Update API response types in `frontend/src/services/api.ts` — extend the `Source` type to include `type: 'TEXT' | 'FIGURE'`, `figureId?: string`, `label?: string`, `caption?: string`, `figureType?: string`, `imageUrl?: string`
- [X] T026 [P] [US3] Update the chat source/citation display in the frontend (wherever sources are currently rendered, e.g. `frontend/src/components/` or `frontend/src/views/`) — render TEXT sources with a document icon and page number; render FIGURE sources with the image (`<img :src="source.imageUrl">`) below the label and caption text
- [X] T027 [US3] Add figure-type badge rendering in the frontend figure display: show a label derived from `figureType` (e.g. "MRI / CT", "Anatomical Diagram", "Table") alongside the figure caption so users can identify content type without opening the image
---
## Phase 6: Polish & Cross-Cutting Concerns
- [X] T028 Update `README.md` Mermaid architecture diagram to show three storage tiers: pgvector (semantic search), Postgres (source of truth — sections, figures, refs), and file store (extracted PNGs) — **required by Constitution Principle IV in the same PR as the other changes**
- [X] T029 [P] Write `FigureExtractionServiceTest` unit test in `backend/src/test/java/com/aiteacher/document/FigureExtractionServiceTest.java` — test: images below min size are skipped; `FigureType` classification matches keyword table in data-model.md; caption parsed from adjacent text line
- [X] T030 [P] Write `NeurosurgeryRetrieverTest` unit test in `backend/src/test/java/com/aiteacher/retrieval/NeurosurgeryRetrieverTest.java` — test: figure IDs from both vector hits and chunk refs are merged without duplicates; `RetrievalResult` contains the deduplicated set
- [X] T031 Run quickstart.md validation end-to-end: upload a real PDF with a labelled diagram → wait for `READY` → call `GET /api/v1/books/{id}/figures` → send a chat message about the diagram → verify `sources` contains a `FIGURE` entry → verify `imageUrl` resolves to a PNG
---
## Dependencies & Execution Order
### Phase Dependencies
- **Phase 1 (Setup)**: No dependencies — start immediately
- **Phase 2 (Foundational)**: Requires Phase 1 complete (migrations must run before JPA entities can be wired)
- **Phase 3 (US2)**: Requires Phase 2 complete — all JPA entities + FigureStorageService must exist
- **Phase 4 (US1)**: Requires Phase 3 complete — figures must exist in Postgres + vector store before retrieval can surface them
- **Phase 5 (US3)**: Requires Phase 4 complete — frontend depends on the extended `sources` format from T024
- **Phase 6 (Polish)**: Requires all story phases complete
### Within Phase 3 (Embedding Pipeline)
```
T013 (PdfStructureParser) ──────────────────────────┐
T014 (FigureExtractionService) ─────────────────────┤
T015 (VisionDescriptionService) ────────────────────┤─→ T018 (BookEmbeddingService orchestrator)
T016 (TextChunkingService) ─────────────────────────┤ └─→ T019 (cleanup)
T017 (ChunkFigureRefService) ───────────────────────┘ └─→ T020 (reembed endpoint)
```
T013T017 can be implemented in parallel (different files, no shared dependencies). T018 depends on all of them.
### Within Phase 4 (Retrieval)
```
T021 (NeurosurgeryRetriever) ──────────────────────┐
└─→ T022 (ChatService update)
└─→ T024 (extractSources update)
T023 (figures endpoint) ── independent [P]
```
### Parallel Opportunities per Phase
**Phase 2**: T005, T006, T009, T010, T011 can all run in parallel. T007 depends on T006. T012 can follow T010/T011.
**Phase 3**: T013, T014, T015, T016, T017 all in parallel. T018 depends on all.
**Phase 5**: T025 and T026 in parallel; T027 can follow T026.
**Phase 6**: T029 and T030 in parallel.
---
## Implementation Strategy
### MVP: User Story 2 Only (Embedding Pipeline)
1. Phase 1 (Setup) → Phase 2 (Foundational) → Phase 3 (US2, T013T020)
2. **Validate**: `GET /api/v1/books/{id}/figures` returns figures for a test book
3. **Stop and demo** — the pipeline produces image chunks without any retrieval changes
### Full Feature Delivery
1. Phase 1 + 2 → Foundation ready
2. Phase 3 (US2) → Embedding pipeline produces image chunks ← **demo point**
3. Phase 4 (US1) → Chat surfaces image content in responses ← **core payoff**
4. Phase 5 (US3) → Frontend renders inline figures with type badges
5. Phase 6 (Polish) → README, tests, end-to-end validation
---
## Notes
- [P] tasks = different files, no dependencies on each other within the same phase
- [US1/US2/US3] label maps each task to a user story for traceability
- Phase 3 (US2) must be fully complete before beginning Phase 4 (US1) — retrieval cannot surface figures that do not yet exist
- The `uploads/figures/` directory must exist and be writable at runtime; `FigureStorageService` creates subdirectories automatically
- Re-embedding (T020) deletes all existing chunks and figures for the book before re-running — safe to call on books processed by feature 001