Files
ai-teacher/specs/002-image-aware-embedding/contracts/document-ai-page-result.md
T
2026-04-04 21:30:18 +02:00

2.8 KiB

Internal Contract: DocumentAiPageParser → FigureExtractionService

Branch: 002-image-aware-embedding | Date: 2026-04-04
Type: Internal Java DTO (not an HTTP contract)


Purpose

PageResult is the internal data transfer object produced by DocumentAiPageParser for each PDF page. It decouples the Google Document AI SDK types from the rest of the pipeline so that PdfStructureParser can be replaced without cascading changes.


Java Record

package com.aiteacher.document;

import java.util.List;

/**
 * Internal DTO produced by DocumentAiPageParser for one PDF page.
 * Decouples the Document AI SDK types from downstream services.
 */
public record PageResult(
    int pageNumber,           // 1-based, matches Document.Page.getPageNumber()
    String orderedText,       // full page text in correct reading order (blocks joined by \n\n)
    String headingTitle,      // first HEADING block on page, or null
    List<FigureBbox> figures  // detected figure regions (may be empty)
) {

    /**
     * Normalized bounding box for a detected figure region.
     * Coordinates are in the [0.0, 1.0] range relative to page dimensions.
     */
    public record FigureBbox(
        float x,       // left edge (normalized)
        float y,       // top edge (normalized)
        float width,   // width (normalized)
        float height,  // height (normalized)
        String nearestCaption  // text of adjacent paragraph block, or null
    ) {}
}

Production Rules

Field Rule
orderedText Concatenation of all PARAGRAPH and HEADING_* blocks, joined with \n\n. Tables are represented as tab-separated text.
headingTitle First block whose blockType is HEADING_1 through HEADING_6. null if no heading detected.
figures One entry per VisualElement with type == "figure" and confidence ≥ 0.5. Sorted top-to-bottom by y.
nearestCaption The PARAGRAPH block immediately following the figure bbox (by Y coordinate). May be null if no paragraph follows within 10% of page height.

Mapping from Document AI Proto

Document.Page.Block         → orderedText (concatenated)
Document.Page.Block (HEADING_*) → headingTitle (first match)
Document.Page.VisualElement → FigureBbox
  └─ layout.bounding_poly.normalized_vertices[0] → (x, y) top-left
  └─ normalized_vertices[2] → (x+w, y+h) bottom-right

Consumers

Consumer What It Uses
BookEmbeddingService orderedTextSectionEntity.fullText; headingTitleSectionEntity.title
FigureExtractionService figures list → renders page via PDFBox, crops each bbox to BufferedImage
TextChunkingService Receives SectionEntity (indirectly uses orderedText) — unchanged