fix deserialization error in native image

Add thai support in summary
add new concept report
2026-04-18 20:46:16 +02:00 · 2026-04-18 19:55:19 +02:00 · 2026-04-18 17:54:54 +02:00 · 2026-04-12 18:56:18 +02:00 · 2026-04-12 18:25:12 +02:00 · 2026-04-12 16:26:25 +02:00
122 changed files with 10249 additions and 760 deletions
@@ -0,0 +1,27 @@
 .git/
 .gitignore
 *.md
 .DS_Store
 Thumbs.db
 # Java build artifacts
 target/
 *.class
 *.jar
 # Node
 node_modules/
 dist/
 *.log
 # Env files (never bake secrets into images)
 .env
 .env.*
 !.env.example
 # Spec / docs
 specs/
 # Editor
 .vscode/
 .idea/
@@ -0,0 +1,13 @@
 # Copy this file to .env and fill in your values before running docker-compose.native.yml
 # .env is gitignored — never commit real credentials
 # OpenAI
 OPENAI_API_KEY=sk-...
 # AWS S3 (figure storage — leave blank if using local filesystem)
 AWS_ACCESS_KEY_ID=
 AWS_SECRET_ACCESS_KEY=
 AWS_REGION=eu-west-1
 # S3 bucket name (if S3 storage enabled)
 APP_STORAGE_S3_BUCKET=ai-teacher-figures
@@ -1,10 +1,23 @@
 # ai-teacher Development Guidelines
-Auto-generated from all feature plans. Last updated: 2026-04-03
+Auto-generated from all feature plans. Last updated: 2026-04-10
 ## Active Technologies
 - Java 25 (backend), TypeScript / Node 20 (frontend) + Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (embeddings + chat), PDFBox (via Spring AI PDF reader dependency) (002-image-aware-embedding)
 - PostgreSQL (JPA + Flyway), pgvector (Spring AI `VectorStore`), local file system (extracted images — `/uploads/figures/`) (002-image-aware-embedding)
 - Java 25 (backend), TypeScript / Node 20 (frontend) + Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API, PDFBox (rendering only), `com.google.cloud:google-cloud-documentai` (~2.40.x) (002-image-aware-embedding)
 - PostgreSQL (JPA + Flyway), pgvector (Spring AI VectorStore), S3 / local filesystem (figure images) (002-image-aware-embedding)
 - PostgreSQL (JPA + Flyway), pgvector (Spring AI `VectorStore`), S3-compatible (002-image-aware-embedding)
 - Java 21 (backend) / TypeScript + Node 20 (frontend) + Spring Boot 4.0.5, Spring Security (already included), Vue 3.4, Vue Router 4.3, Pinia 2.1, Axios 1.7 (003-basic-login)
 - No new storage — credentials held in browser `sessionStorage` (frontend only) (003-basic-login)
 - Java 21 (backend), TypeScript / Node 20 (frontend) + Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (chat + embeddings), pgvector, Vue 3.4, Pinia 2.1 (004-rag-retrieval-quality)
 - PostgreSQL (sections, figures, messages — unchanged). No new tables needed. (004-rag-retrieval-quality)
 - Java 21 (backend), TypeScript / Node 20 (frontend) + Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (chat + embeddings), Vue 3.4, Pinia 2.1, Axios 1.7 (004-rag-retrieval-quality)
 - PostgreSQL (JPA + Flyway), pgvector (`VectorStore`) (004-rag-retrieval-quality)
 - Java 25 (backend), TypeScript / Node 20 (frontend) + Spring Boot 4.0.5, Spring AI 2.0.0-M4, `native-maven-plugin` 0.10.6, (005-native-image-deployment)
 - PostgreSQL 16 + pgvector (unchanged) (005-native-image-deployment)
 - TypeScript / Node 20 (frontend only) + Vue 3.4, Vue Router 4.3, Pinia 2.1 — no changes (006-mobile-responsive-ui)
 - N/A (frontend-only change) (006-mobile-responsive-ui)
 - Java 21 (backend), TypeScript / Node 20 (frontend) (001-neuro-rag-learning)
@@ -24,9 +37,10 @@ npm test && npm run lint
 Java 21 (backend), TypeScript / Node 20 (frontend): Follow standard conventions
 ## Recent Changes
- 002-image-aware-embedding: Added Java 25 (backend), TypeScript / Node 20 (frontend) + Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (embeddings + chat), PDFBox (via Spring AI PDF reader dependency)
+- 006-mobile-responsive-ui: Added TypeScript / Node 20 (frontend only) + Vue 3.4, Vue Router 4.3, Pinia 2.1 — no changes
 - 005-native-image-deployment: Added Java 25 (backend), TypeScript / Node 20 (frontend) + Spring Boot 4.0.5, Spring AI 2.0.0-M4, `native-maven-plugin` 0.10.6,
 - 004-rag-retrieval-quality: Added Java 21 (backend), TypeScript / Node 20 (frontend) + Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (chat + embeddings), Vue 3.4, Pinia 2.1, Axios 1.7
 - 001-neuro-rag-learning: Added Java 21 (backend), TypeScript / Node 20 (frontend)
 <!-- MANUAL ADDITIONS START -->
 <!-- MANUAL ADDITIONS END -->
@@ -0,0 +1,566 @@
 # Marker
 Marker converts documents to markdown, JSON, chunks, and HTML quickly and accurately.
 - Converts PDF, image, PPTX, DOCX, XLSX, HTML, EPUB files in all languages
 - Formats tables, forms, equations, inline math, links, references, and code blocks
 - Extracts and saves images
 - Removes headers/footers/other artifacts
 - Extensible with your own formatting and logic
 - Does structured extraction, given a JSON schema (beta)
 - Optionally boost accuracy with LLMs (and your own prompt)
 - Works on GPU, CPU, or MPS
 For our managed API or on-prem document intelligence solution, check out [our platform here](https://datalab.to?utm_source=gh-marker).
 ## Performance
 <img src="data/images/overall.png" width="800px"/>
 Marker benchmarks favorably compared to cloud services like Llamaparse and Mathpix, as well as other open source tools.
 The above results are running single PDF pages serially.  Marker is significantly faster when running in batch mode, with a projected throughput of 25 pages/second on an H100.
 See [below](#benchmarks) for detailed speed and accuracy benchmarks, and instructions on how to run your own benchmarks.
 ## Hybrid Mode
 For the highest accuracy, pass the `--use_llm` flag to use an LLM alongside marker.  This will do things like merge tables across pages, handle inline math, format tables properly, and extract values from forms.  It can use any gemini or ollama model.  By default, it uses `gemini-2.0-flash`.  See [below](#llm-services) for details.
 Here is a table benchmark comparing marker, gemini flash alone, and marker with use_llm:
 <img src="data/images/table.png" width="400px"/>
 As you can see, the use_llm mode offers higher accuracy than marker or gemini alone.
 ## Examples
 | PDF | File type | Markdown                                                                                                                     | JSON                                                                                                   |
 |-----|-----------|------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
 | [Think Python](https://greenteapress.com/thinkpython/thinkpython.pdf) | Textbook | [View](https://github.com/VikParuchuri/marker/blob/master/data/examples/markdown/thinkpython/thinkpython.md)                 | [View](https://github.com/VikParuchuri/marker/blob/master/data/examples/json/thinkpython.json)         |
 | [Switch Transformers](https://arxiv.org/pdf/2101.03961.pdf) | arXiv paper | [View](https://github.com/VikParuchuri/marker/blob/master/data/examples/markdown/switch_transformers/switch_trans.md) | [View](https://github.com/VikParuchuri/marker/blob/master/data/examples/json/switch_trans.json) |
 | [Multi-column CNN](https://arxiv.org/pdf/1804.07821.pdf) | arXiv paper | [View](https://github.com/VikParuchuri/marker/blob/master/data/examples/markdown/multicolcnn/multicolcnn.md)                 | [View](https://github.com/VikParuchuri/marker/blob/master/data/examples/json/multicolcnn.json)         |
 # Commercial usage
 Our model weights use a modified AI Pubs Open Rail-M license (free for research, personal use, and startups under $2M funding/revenue) and our code is GPL. For broader commercial licensing or to remove GPL requirements, visit our pricing page [here](https://www.datalab.to/pricing?utm_source=gh-marker).
 # Hosted API & On-prem
 There's a [hosted API](https://www.datalab.to?utm_source=gh-marker) and [painless on-prem solution](https://www.datalab.to/blog/self-serve-on-prem-licensing) for marker - it's free to sign up, and we'll throw in credits for you to test it out.
 The API:
 - Supports PDF, image, PPT, PPTX, DOC, DOCX, XLS, XLSX, HTML, EPUB files
 - Is 1/4th the price of leading cloud-based competitors
 - Fast - ~15s for a 250 page PDF
 - Supports LLM mode
 - High uptime (99.99%)
 # Community
 [Discord](https://discord.gg//KuZwXNGnfH) is where we discuss future development.
 # Installation
 You'll need python 3.10+ and [PyTorch](https://pytorch.org/get-started/locally/).
 Install with:
 ```shell
 pip install marker-pdf
 ```
 If you want to use marker on documents other than PDFs, you will need to install additional dependencies with:
 ```shell
 pip install marker-pdf[full]
 ```
 # Usage
 First, some configuration:
 - Your torch device will be automatically detected, but you can override this.  For example, `TORCH_DEVICE=cuda`.
 - Some PDFs, even digital ones, have bad text in them.  Set `--force_ocr` to force OCR on all lines, or the `strip_existing_ocr` to keep all digital text, and strip out any existing OCR text.
 - If you care about inline math, set `force_ocr` to convert inline math to LaTeX.
 ## Interactive App
 I've included a streamlit app that lets you interactively try marker with some basic options.  Run it with:
 ```shell
 pip install streamlit streamlit-ace
 marker_gui
 ```
 ## Convert a single file
 ```shell
 marker_single /path/to/file.pdf
 ```
 You can pass in PDFs or images.
 Options:
 - `--page_range TEXT`: Specify which pages to process. Accepts comma-separated page numbers and ranges. Example: `--page_range "0,5-10,20"` will process pages 0, 5 through 10, and page 20.
 - `--output_format [markdown|json|html|chunks]`: Specify the format for the output results.
 - `--output_dir PATH`: Directory where output files will be saved. Defaults to the value specified in settings.OUTPUT_DIR.
 - `--paginate_output`: Paginates the output, using `\n\n{PAGE_NUMBER}` followed by `-` * 48, then `\n\n`
 - `--use_llm`: Uses an LLM to improve accuracy.  You will need to configure the LLM backend - see [below](#llm-services).
 - `--force_ocr`: Force OCR processing on the entire document, even for pages that might contain extractable text.  This will also format inline math properly.
 - `--block_correction_prompt`: if LLM mode is active, an optional prompt that will be used to correct the output of marker.  This is useful for custom formatting or logic that you want to apply to the output.
 - `--strip_existing_ocr`: Remove all existing OCR text in the document and re-OCR with surya.
 - `--redo_inline_math`: If you want the absolute highest quality inline math conversion, use this along with `--use_llm`.
 - `--disable_image_extraction`: Don't extract images from the PDF.  If you also specify `--use_llm`, then images will be replaced with a description.
 - `--debug`: Enable debug mode for additional logging and diagnostic information.
 - `--processors TEXT`: Override the default processors by providing their full module paths, separated by commas. Example: `--processors "module1.processor1,module2.processor2"`
 - `--config_json PATH`: Path to a JSON configuration file containing additional settings.
 - `config --help`: List all available builders, processors, and converters, and their associated configuration.  These values can be used to build a JSON configuration file for additional tweaking of marker defaults.
 - `--converter_cls`: One of `marker.converters.pdf.PdfConverter` (default) or `marker.converters.table.TableConverter`.  The `PdfConverter` will convert the whole PDF, the `TableConverter` will only extract and convert tables.
 - `--llm_service`: Which llm service to use if `--use_llm` is passed.  This defaults to `marker.services.gemini.GoogleGeminiService`.
 - `--help`: see all of the flags that can be passed into marker.  (it supports many more options then are listed above)
 The list of supported languages for surya OCR is [here](https://github.com/VikParuchuri/surya/blob/master/surya/recognition/languages.py).  If you don't need OCR, marker can work with any language.
 ## Convert multiple files
 ```shell
 marker /path/to/input/folder
 ```
 - `marker` supports all the same options from `marker_single` above.
 - `--workers` is the number of conversion workers to run simultaneously.  This is automatically set by default, but you can increase it to increase throughput, at the cost of more CPU/GPU usage.  Marker will use 5GB of VRAM per worker at the peak, and 3.5GB average.
 ## Convert multiple files on multiple GPUs
 ```shell
 NUM_DEVICES=4 NUM_WORKERS=15 marker_chunk_convert ../pdf_in ../md_out
 ```
 - `NUM_DEVICES` is the number of GPUs to use.  Should be `2` or greater.
 - `NUM_WORKERS` is the number of parallel processes to run on each GPU.
 ## Use from python
 See the `PdfConverter` class at `marker/converters/pdf.py` function for additional arguments that can be passed.
 ```python
 from marker.converters.pdf import PdfConverter
 from marker.models import create_model_dict
 from marker.output import text_from_rendered
 converter = PdfConverter(
    artifact_dict=create_model_dict(),
 )
 rendered = converter("FILEPATH")
 text, _, images = text_from_rendered(rendered)
 ```
 `rendered` will be a pydantic basemodel with different properties depending on the output type requested.  With markdown output (default), you'll have the properties `markdown`, `metadata`, and `images`.  For json output, you'll have `children`, `block_type`, and `metadata`.
 ### Custom configuration
 You can pass configuration using the `ConfigParser`.  To see all available options, do `marker_single --help`.
 ```python
 from marker.converters.pdf import PdfConverter
 from marker.models import create_model_dict
 from marker.config.parser import ConfigParser
 config = {
    "output_format": "json",
    "ADDITIONAL_KEY": "VALUE"
 }
 config_parser = ConfigParser(config)
 converter = PdfConverter(
    config=config_parser.generate_config_dict(),
    artifact_dict=create_model_dict(),
    processor_list=config_parser.get_processors(),
    renderer=config_parser.get_renderer(),
    llm_service=config_parser.get_llm_service()
 )
 rendered = converter("FILEPATH")
 ```
 ### Extract blocks
 Each document consists of one or more pages.  Pages contain blocks, which can themselves contain other blocks.  It's possible to programmatically manipulate these blocks.
 Here's an example of extracting all forms from a document:
 ```python
 from marker.converters.pdf import PdfConverter
 from marker.models import create_model_dict
 from marker.schema import BlockTypes
 converter = PdfConverter(
    artifact_dict=create_model_dict(),
 )
 document = converter.build_document("FILEPATH")
 forms = document.contained_blocks((BlockTypes.Form,))
 ```
 Look at the processors for more examples of extracting and manipulating blocks.
 ## Other converters
 You can also use other converters that define different conversion pipelines:
 ### Extract tables
 The `TableConverter` will only convert and extract tables:
 ```python
 from marker.converters.table import TableConverter
 from marker.models import create_model_dict
 from marker.output import text_from_rendered
 converter = TableConverter(
    artifact_dict=create_model_dict(),
 )
 rendered = converter("FILEPATH")
 text, _, images = text_from_rendered(rendered)
 ```
 This takes all the same configuration as the PdfConverter.  You can specify the configuration `force_layout_block=Table` to avoid layout detection and instead assume every page is a table.  Set `output_format=json` to also get cell bounding boxes.
 You can also run this via the CLI with
 ```shell
 marker_single FILENAME --use_llm --force_layout_block Table --converter_cls marker.converters.table.TableConverter --output_format json
 ```
 ### OCR Only
 If you only want to run OCR, you can also do that through the `OCRConverter`.  Set `--keep_chars` to keep individual characters and bounding boxes.
 ```python
 from marker.converters.ocr import OCRConverter
 from marker.models import create_model_dict
 converter = OCRConverter(
    artifact_dict=create_model_dict(),
 )
 rendered = converter("FILEPATH")
 ```
 This takes all the same configuration as the PdfConverter.
 You can also run this via the CLI with
 ```shell
 marker_single FILENAME --converter_cls marker.converters.ocr.OCRConverter
 ```
 ### Structured Extraction (beta)
 You can run structured extraction via the `ExtractionConverter`.  This requires an llm service to be setup first (see [here](#llm-services) for details).  You'll get a JSON output with the extracted values.
 ```python
 from marker.converters.extraction import ExtractionConverter
 from marker.models import create_model_dict
 from marker.config.parser import ConfigParser
 from pydantic import BaseModel
 class Links(BaseModel):
    links: list[str]
 schema = Links.model_json_schema()
 config_parser = ConfigParser({
    "page_schema": schema
 })
 converter = ExtractionConverter(
    artifact_dict=create_model_dict(),
    config=config_parser.generate_config_dict(),
    llm_service=config_parser.get_llm_service(),
 )
 rendered = converter("FILEPATH")
 ```
 Rendered will have an `original_markdown` field.  If you pass this back in next time you run the converter, as the `existing_markdown` config key, you can skip re-parsing the document.
 # Output Formats
 ## Markdown
 Markdown output will include:
 - image links (images will be saved in the same folder)
 - formatted tables
 - embedded LaTeX equations (fenced with `$$`)
 - Code is fenced with triple backticks
 - Superscripts for footnotes
 ## HTML
 HTML output is similar to markdown output:
 - Images are included via `img` tags
 - equations are fenced with `<math>` tags
 - code is in `pre` tags
 ## JSON
 JSON output will be organized in a tree-like structure, with the leaf nodes being blocks.  Examples of leaf nodes are a single list item, a paragraph of text, or an image.
 The output will be a list, with each list item representing a page.  Each page is considered a block in the internal marker schema.  There are different types of blocks to represent different elements.
 Pages have the keys:
 - `id` - unique id for the block.
 - `block_type` - the type of block. The possible block types can be seen in `marker/schema/__init__.py`.  As of this writing, they are ["Line", "Span", "FigureGroup", "TableGroup", "ListGroup", "PictureGroup", "Page", "Caption", "Code", "Figure", "Footnote", "Form", "Equation", "Handwriting", "TextInlineMath", "ListItem", "PageFooter", "PageHeader", "Picture", "SectionHeader", "Table", "Text", "TableOfContents", "Document"]
 - `html` - the HTML for the page.  Note that this will have recursive references to children.  The `content-ref` tags must be replaced with the child content if you want the full html.  You can see an example of this at `marker/output.py:json_to_html`.  That function will take in a single block from the json output, and turn it into HTML.
 - `polygon` - the 4-corner polygon of the page, in (x1,y1), (x2,y2), (x3, y3), (x4, y4) format.  (x1,y1) is the top left, and coordinates go clockwise.
 - `children` - the child blocks.
 The child blocks have two additional keys:
 - `section_hierarchy` - indicates the sections that the block is part of.  `1` indicates an h1 tag, `2` an h2, and so on.
 - `images` - base64 encoded images.  The key will be the block id, and the data will be the encoded image.
 Note that child blocks of pages can have their own children as well (a tree structure).
 ```json
 {
      "id": "/page/10/Page/366",
      "block_type": "Page",
      "html": "<content-ref src='/page/10/SectionHeader/0'></content-ref><content-ref src='/page/10/SectionHeader/1'></content-ref><content-ref src='/page/10/Text/2'></content-ref><content-ref src='/page/10/Text/3'></content-ref><content-ref src='/page/10/Figure/4'></content-ref><content-ref src='/page/10/SectionHeader/5'></content-ref><content-ref src='/page/10/SectionHeader/6'></content-ref><content-ref src='/page/10/TextInlineMath/7'></content-ref><content-ref src='/page/10/TextInlineMath/8'></content-ref><content-ref src='/page/10/Table/9'></content-ref><content-ref src='/page/10/SectionHeader/10'></content-ref><content-ref src='/page/10/Text/11'></content-ref>",
      "polygon": [[0.0, 0.0], [612.0, 0.0], [612.0, 792.0], [0.0, 792.0]],
      "children": [
        {
          "id": "/page/10/SectionHeader/0",
          "block_type": "SectionHeader",
          "html": "<h1>Supplementary Material for <i>Subspace Adversarial Training</i> </h1>",
          "polygon": [
            [217.845703125, 80.630859375], [374.73046875, 80.630859375],
            [374.73046875, 107.0],
            [217.845703125, 107.0]
          ],
          "children": null,
          "section_hierarchy": {
            "1": "/page/10/SectionHeader/1"
          },
          "images": {}
        },
        ...
        ]
    }
 ```
 ## Chunks
 Chunks format is similar to JSON, but flattens everything into a single list instead of a tree.  Only the top level blocks from each page show up. It also has the full HTML of each block inside, so you don't need to crawl the tree to reconstruct it.  This enable flexible and easy chunking for RAG.
 ## Metadata
 All output formats will return a metadata dictionary, with the following fields:
 ```json
 {
    "table_of_contents": [
      {
        "title": "Introduction",
        "heading_level": 1,
        "page_id": 0,
        "polygon": [...]
      }
    ], // computed PDF table of contents
    "page_stats": [
      {
        "page_id":  0,
        "text_extraction_method": "pdftext",
        "block_counts": [("Span", 200), ...]
      },
      ...
    ]
 }
 ```
 # LLM Services
 When running with the `--use_llm` flag, you have a choice of services you can use:
 - `Gemini` - this will use the Gemini developer API by default.  You'll need to pass `--gemini_api_key` to configuration.
 - `Google Vertex` - this will use vertex, which can be more reliable.  You'll need to pass `--vertex_project_id`.  To use it, set `--llm_service=marker.services.vertex.GoogleVertexService`.
 - `Ollama` - this will use local models.  You can configure `--ollama_base_url` and `--ollama_model`. To use it, set `--llm_service=marker.services.ollama.OllamaService`.
 - `Claude` - this will use the anthropic API.  You can configure `--claude_api_key`, and `--claude_model_name`.  To use it, set `--llm_service=marker.services.claude.ClaudeService`.
 - `OpenAI` - this supports any openai-like endpoint. You can configure `--openai_api_key`, `--openai_model`, and `--openai_base_url`. To use it, set `--llm_service=marker.services.openai.OpenAIService`.
 - `Azure OpenAI` - this uses the Azure OpenAI service. You can configure `--azure_endpoint`, `--azure_api_key`, and `--deployment_name`. To use it, set `--llm_service=marker.services.azure_openai.AzureOpenAIService`.
 These services may have additional optional configuration as well - you can see it by viewing the classes.
 # Internals
 Marker is easy to extend.  The core units of marker are:
 - `Providers`, at `marker/providers`.  These provide information from a source file, like a PDF.
 - `Builders`, at `marker/builders`.  These generate the initial document blocks and fill in text, using info from the providers.
 - `Processors`, at `marker/processors`.  These process specific blocks, for example the table formatter is a processor.
 - `Renderers`, at `marker/renderers`. These use the blocks to render output.
 - `Schema`, at `marker/schema`.  The classes for all the block types.
 - `Converters`, at `marker/converters`.  They run the whole end to end pipeline.
 To customize processing behavior, override the `processors`.  To add new output formats, write a new `renderer`.  For additional input formats, write a new `provider.`
 Processors and renderers can be directly passed into the base `PDFConverter`, so you can specify your own custom processing easily.
 ## API server
 There is a very simple API server you can run like this:
 ```shell
 pip install -U uvicorn fastapi python-multipart
 marker_server --port 8001
 ```
 This will start a fastapi server that you can access at `localhost:8001`.  You can go to `localhost:8001/docs` to see the endpoint options.
 You can send requests like this:
 ```
 import requests
 import json
 post_data = {
    'filepath': 'FILEPATH',
    # Add other params here
 }
 requests.post("http://localhost:8001/marker", data=json.dumps(post_data)).json()
 ```
 Note that this is not a very robust API, and is only intended for small-scale use.  If you want to use this server, but want a more robust conversion option, you can use the hosted [Datalab API](https://www.datalab.to/plans).
 # Troubleshooting
 There are some settings that you may find useful if things aren't working the way you expect:
 - If you have issues with accuracy, try setting `--use_llm` to use an LLM to improve quality.  You must set `GOOGLE_API_KEY` to a Gemini API key for this to work.
 - Make sure to set `force_ocr` if you see garbled text - this will re-OCR the document.
 - `TORCH_DEVICE` - set this to force marker to use a given torch device for inference.
 - If you're getting out of memory errors, decrease worker count.  You can also try splitting up long PDFs into multiple files.
 ## Debugging
 Pass the `debug` option to activate debug mode.  This will save images of each page with detected layout and text, as well as output a json file with additional bounding box information.
 # Benchmarks
 ## Overall PDF Conversion
 We created a [benchmark set](https://huggingface.co/datasets/datalab-to/marker_benchmark) by extracting single PDF pages from common crawl.  We scored based on a heuristic that aligns text with ground truth text segments, and an LLM as a judge scoring method.
 | Method     | Avg Time | Heuristic Score | LLM Score |
 |------------|----------|-----------------|-----------|
 | marker     | 2.83837  | 95.6709         | 4.23916   |
 | llamaparse | 23.348   | 84.2442         | 3.97619   |
 | mathpix    | 6.36223  | 86.4281         | 4.15626   |
 | docling    | 3.69949  | 86.7073         | 3.70429   |
 Benchmarks were run on an H100 for markjer and docling - llamaparse and mathpix used their cloud services.  We can also look at it by document type:
 <img src="data/images/per_doc.png" width="1000px"/>
 | Document Type        | Marker heuristic | Marker LLM | Llamaparse Heuristic | Llamaparse LLM | Mathpix Heuristic | Mathpix LLM | Docling Heuristic | Docling LLM |
 |----------------------|------------------|------------|----------------------|----------------|-------------------|-------------|-------------------|-------------|
 | Scientific paper     | 96.6737          | 4.34899    | 87.1651              | 3.96421        | 91.2267           | 4.46861     | 92.135            | 3.72422     |
 | Book page            | 97.1846          | 4.16168    | 90.9532              | 4.07186        | 93.8886           | 4.35329     | 90.0556           | 3.64671     |
 | Other                | 95.1632          | 4.25076    | 81.1385              | 4.01835        | 79.6231           | 4.00306     | 83.8223           | 3.76147     |
 | Form                 | 88.0147          | 3.84663    | 66.3081              | 3.68712        | 64.7512           | 3.33129     | 68.3857           | 3.40491     |
 | Presentation         | 95.1562          | 4.13669    | 81.2261              | 4              | 83.6737           | 3.95683     | 84.8405           | 3.86331     |
 | Financial document   | 95.3697          | 4.39106    | 82.5812              | 4.16111        | 81.3115           | 4.05556     | 86.3882           | 3.8         |
 | Letter               | 98.4021          | 4.5        | 93.4477              | 4.28125        | 96.0383           | 4.45312     | 92.0952           | 4.09375     |
 | Engineering document | 93.9244          | 4.04412    | 77.4854              | 3.72059        | 80.3319           | 3.88235     | 79.6807           | 3.42647     |
 | Legal document       | 96.689           | 4.27759    | 86.9769              | 3.87584        | 91.601            | 4.20805     | 87.8383           | 3.65552     |
 | Newspaper page       | 98.8733          | 4.25806    | 84.7492              | 3.90323        | 96.9963           | 4.45161     | 92.6496           | 3.51613     |
 | Magazine page        | 98.2145          | 4.38776    | 87.2902              | 3.97959        | 93.5934           | 4.16327     | 93.0892           | 4.02041     |
 ## Throughput
 We benchmarked throughput using a [single long PDF](https://www.greenteapress.com/thinkpython/thinkpython.pdf).
 | Method  | Time per page | Time per document | VRAM used |
 |---------|---------------|-------------------|---------- |
 | marker  | 0.18          | 43.42             |  3.17GB   |
 The projected throughput is 122 pages per second on an H100 - we can run 22 individual processes given the VRAM used.
 ## Table Conversion
 Marker can extract tables from PDFs using `marker.converters.table.TableConverter`. The table extraction performance is measured by comparing the extracted HTML representation of tables against the original HTML representations using the test split of [FinTabNet](https://developer.ibm.com/exchanges/data/all/fintabnet/). The HTML representations are compared using a tree edit distance based metric to judge both structure and content. Marker detects and identifies the structure of all tables in a PDF page and achieves these scores:
 | Method           | Avg score | Total tables |
 |------------------|-----------|--------------|
 | marker           | 0.816     | 99           |
 | marker w/use_llm | 0.907     | 99           |
 | gemini           | 0.829     | 99           |
 The `--use_llm` flag can significantly improve table recognition performance, as you can see.
 We filter out tables that we cannot align with the ground truth, since fintabnet and our layout model have slightly different detection methods (this results in some tables being split/merged).
 ## Running your own benchmarks
 You can benchmark the performance of marker on your machine. Install marker manually with:
 ```shell
 git clone https://github.com/VikParuchuri/marker.git
 poetry install
 ```
 ### Overall PDF Conversion
 Download the benchmark data [here](https://drive.google.com/file/d/1ZSeWDo2g1y0BRLT7KnbmytV2bjWARWba/view?usp=sharing) and unzip. Then run the overall benchmark like this:
 ```shell
 python benchmarks/overall.py --methods marker --scores heuristic,llm
 ```
 Options:
 - `--use_llm` use an llm to improve the marker results.
 - `--max_rows` how many rows to process for the benchmark.
 - `--methods` can be `llamaparse`, `mathpix`, `docling`, `marker`.  Comma separated.
 - `--scores` which scoring functions to use, can be `llm`, `heuristic`.  Comma separated.
 ### Table Conversion
 The processed FinTabNet dataset is hosted [here](https://huggingface.co/datasets/datalab-to/fintabnet-test) and is automatically downloaded. Run the benchmark with:
 ```shell
 python benchmarks/table/table.py --max_rows 100
 ```
 Options:
 - `--use_llm` uses an llm with marker to improve accuracy.
 - `--use_gemini` also benchmarks gemini 2.0 flash.
 # How it works
 Marker is a pipeline of deep learning models:
 - Extract text, OCR if necessary (heuristics, [surya](https://github.com/VikParuchuri/surya))
 - Detect page layout and find reading order ([surya](https://github.com/VikParuchuri/surya))
 - Clean and format each block (heuristics, [texify](https://github.com/VikParuchuri/texify), [surya](https://github.com/VikParuchuri/surya))
 - Optionally use an LLM to improve quality
 - Combine blocks and postprocess complete text
 It only uses models where necessary, which improves speed and accuracy.
 # Limitations
 PDF is a tricky format, so marker will not always work perfectly.  Here are some known limitations that are on the roadmap to address:
 - Very complex layouts, with nested tables and forms, may not work
 - Forms may not be rendered well
 Note: Passing the `--use_llm` and `--force_ocr` flags will mostly solve these issues.
 # Usage and Deployment Examples
 You can always run `marker` locally, but if you wanted to expose it as an API, we have a few options:
 - Our platform API which is powered by `marker` and `surya` and is easy to test out - it's free to sign up, and we'll include credits, [try it out here](https://datalab.to)
 - Our painless on-prem solution for commercial use, which you can [read about here](https://www.datalab.to/blog/self-serve-on-prem-licensing) and gives you privacy guarantees with high throughput inference optimizations.
 - [Deployment example with Modal](./examples/README_MODAL.md) that shows you how to deploy and access `marker` through a web endpoint using [`Modal`](https://modal.com). Modal is an AI compute platform that enables developers to deploy and scale models on GPUs in minutes.
@@ -9,14 +9,20 @@ AI-generated cross-book summaries, and engage in grounded RAG chat.
 ```mermaid
 graph TD
    User["Neurosurgeon (Browser)"]
    Login["Login Page\n(username + password form)"]
    FE["Frontend\nVue.js 3 / Vite\n:5173"]
    BE["Backend\nSpring Boot 4 / Spring AI\n:8080"]
    Auth["Spring Security\nHTTP Basic Auth"]
    DB["PostgreSQL + pgvector\n(source of truth)"]
    FS["File Store\nuploads/ (local disk)\nExtracted figure PNGs"]
    LLM["LLM Provider\n(OpenAI)\nEmbeddings + Chat + Vision"]
-    User -->|HTTP| FE
+    User -->|"First visit / unauthenticated"| Login
-    FE -->|REST /api/v1/...| BE
+    Login -->|"POST credentials\n(GET /api/v1/auth/check)"| Auth
    Auth -->|"401 → back to login\n200 → app access"| Login
    Login -->|"Authenticated"| FE
    FE -->|"REST /api/v1/...\n(HTTP Basic on every request)"| Auth
    Auth --> BE
    BE -->|"JDBC — books, chapters,\nsections, figures, refs"| DB
    BE -->|"pgvector — text chunks\n+ figure caption vectors"| DB
    BE -->|"PNG read/write\n(figure extraction)"| FS
@@ -29,32 +35,155 @@ graph TD
        EP3["Vision describe → embed caption"]
        EP4["Chunk text → embed chunks"]
        EP5["Link chunks ↔ figures"]
        EP6["LLM enrich chunk\n(entities, facet, summary)\n→ chunk_metadata"]
        EP1 --> EP2
        EP1 --> EP4
        EP2 --> EP3
        EP4 --> EP5
        EP3 --> EP5
        EP4 --> EP6
    end
    subgraph "Retrieval Pipeline (per chat query)"
        RP0["Query expansion\n(QueryExpansionService)\nlay → clinical terms"]
        RP1["Text chunk search (topK=5)"]
        RP2["Figure caption search (topK=3)"]
-        RP3["Expand chunks → full section text"]
+        RP3["Expand chunks → ±1-page section text"]
        RP4["Fetch linked figures (chunk_figure_ref)"]
        RP5["Merge + deduplicate figures"]
-        RP6["Build LLM prompt + call"]
+        RP6["Build labelled prompt\n[S1],[F1]… tags"]
        RP7["LLM chat call"]
        RP8["Citation validation\n(CitationValidatorService)\nstrip hallucinated refs"]
        RP0 --> RP1
        RP0 --> RP2
        RP1 --> RP3
        RP1 --> RP4
        RP2 --> RP5
        RP4 --> RP5
        RP3 --> RP6
        RP5 --> RP6
        RP6 --> RP7
        RP7 --> RP8
    end
 ```
 ### Concept Retrieval Pipeline (per concept report)
 Concept retrieval is an alternative to the semantic-similarity flow above. It uses the
 LLM-tagged `chunk_metadata` rows written at indexing time to exhaustively gather every
 chunk that *concerns* a concept (e.g. "aneurysm"), bucketed by facet. One synthesis call
 per facet yields a structured, multi-section report.
 ```mermaid
 sequenceDiagram
    participant User
    participant FE as Frontend
    participant BE as Backend (ConceptReportService)
    participant Retr as ConceptRetriever
    participant DB as chunk_metadata (GIN)
    participant Vec as vector_store
    participant LLM
    User->>FE: Click "Generate Concept Report" on topic
    FE->>BE: POST /api/v1/topics/{id}/concept-reports
    loop per READY book
        BE->>Retr: retrieveByConcept(topicName, bookId)
        Retr->>DB: WHERE entities @> [canonical]
        alt SQL hits found
            DB-->>Retr: chunks grouped by facet
        else no match (typo / synonym)
            Retr->>Vec: similaritySearch topK=30
            Vec-->>Retr: chunk ids
            Retr->>DB: findByChunkIdIn → group by facet
        end
    end
    BE->>BE: merge facets across books, assign global [S#]/[F#]
    loop per non-empty facet
        BE->>LLM: synthesize facet section (focused prompt)
        LLM-->>BE: facet markdown
    end
    BE->>BE: persist concept_report
    BE-->>FE: { facets[], sources[] }
    FE->>User: render facet-labelled report + inline figures
 ```
 Backfill path for already-embedded books:
 `POST /api/v1/admin/books/{id}/enrich` scans `vector_store` for TEXT chunks missing
 `chunk_metadata` rows and enriches them in place. Idempotent — re-running is a no-op.
 ## Marker API Response Structure
 The PDF parsing pipeline calls a local [Marker](https://github.com/VikParuchuri/marker) server (`POST /marker/upload`).
 ### Top-level envelope
 ```json
 {
  "format": "json",
  "output": "<JSON-encoded string>"
 }
 ```
 `output` is a **JSON-encoded string** (not a nested object) and must be parsed a second time to get the document tree.
 ### Parsed `output` shape
 ```
 {
  "children": [ <Page block>, ... ]
 }
 ```
 ### Block types
 Every block shares these fields:
 | Field            | Type              | Notes                                      |
 |------------------|-------------------|--------------------------------------------|
 | `id`             | string            | e.g. `/page/0/Picture/2`                   |
 | `block_type`     | string            | see table below                            |
 | `html`           | string            | rendered HTML; may contain `<content-ref>` |
 | `bbox`           | `[x0,y0,x1,y1]`  | bounding box in page coordinates           |
 | `children`       | array or null     | nested blocks                              |
 | `images`         | object or null    | base64 PNG map (leaf image blocks only)    |
 | `section_hierarchy` | object         | heading ancestry                           |
 #### Known `block_type` values
 | block_type       | Category | Notes                                                 |
 |------------------|----------|-------------------------------------------------------|
 | `Page`           | structure | Top-level; direct children are the page content       |
 | `SectionHeader`  | text      | Section / chapter heading                             |
 | `Text`           | text      |                                                       |
 | `TextInlineMath` | text      |                                                       |
 | `ListItem`       | text      |                                                       |
 | `Table`          | text      |                                                       |
 | `Code`           | text      |                                                       |
 | `Equation`       | text      |                                                       |
 | `Footnote`       | text      |                                                       |
 | `Caption`        | text      | Usually a child of a `*Group` block                   |
 | `PageHeader`     | text      |                                                       |
 | `PageFooter`     | text      |                                                       |
 | `Handwriting`    | text      |                                                       |
 | `Picture`        | image     | Leaf block; `images` map holds base64 PNG keyed by ID |
 | `Figure`         | image     | Leaf block; same as `Picture`                         |
 | `PictureGroup`   | container | Wraps one `Picture` + one `Caption` child             |
 | `FigureGroup`    | container | Wraps one `Figure` + one `Caption` child              |
 ### Image extraction
 Images are only present on **leaf** image blocks (`Picture`, `Figure`).
 Group blocks (`PictureGroup`, `FigureGroup`) have `images: null` — the base64 PNG lives on the child leaf block.
 ```
 PictureGroup
 ├── Picture   ← images: { "/page/0/Picture/2": "<base64 PNG>" }
 └── Caption   ← html: "<p>Figure 1 — ...</p>"
 ```
 ## Stack
- **Backend**: Spring Boot 4.0.5 + Spring AI 2.0.0-M4, Java 21, Maven
+- **Backend**: Spring Boot 4.0.5 + Spring AI 2.0.0-M4, Java 25, Maven
 - **Frontend**: Vue.js 3 + Vite + TypeScript + Pinia + Axios
 - **Database**: PostgreSQL 16 + pgvector extension
 - **Auth**: HTTP Basic (single shared in-memory user)
@@ -63,7 +192,7 @@ graph TD
 See [specs/001-neuro-rag-learning/quickstart.md](specs/001-neuro-rag-learning/quickstart.md) for full instructions.
-### Local Dev
+### Local Dev (JVM)
 ```bash
 # Start the database
@@ -79,8 +208,99 @@ npm install
 npm run dev
 ```
 ### Native Image Build
 Produces a GraalVM native binary packaged into a minimal Docker image via Jib.
 **Prerequisite**: GraalVM 25 must be installed and set as `JAVA_HOME`.
 ```bash
 # Install GraalVM 25 CE via sdkman (one-time)
 sdk install java 25-graalce
 sdk use java 25-graalce
 # Build native executable + Docker image (requires Docker daemon)
 cd backend
 mvn -Pnative package jib:build -DskipTests
 mvn -Pnative jib:build -Djib.to.auth.username=admin -Djib.to.auth.password=""
 ```
 ### Backend build (buildah)
 **JVM image** (`Dockerfile` — Eclipse Temurin 21):
 ```bash
 buildah build \
  --platform linux/arm64 \
  --tag zot.immich-ad.ovh/ai-teacher-backend:latest \
  backend/
 buildah login zot.immich-ad.ovh
 buildah push --tls-verify=false zot.immich-ad.ovh/ai-teacher-backend:latest
 ```
 **Native image** (`Dockerfile.native` — GraalVM 25, produces a minimal Debian-slim image):
 ```bash
 buildah build \
  --platform linux/arm64 \
  --file backend/Dockerfile.native \
  --tag zot.immich-ad.ovh/ai-teacher-backend-native:latest \
  backend/
 buildah push --tls-verify=false zot.immich-ad.ovh/ai-teacher-backend-native:latest
 ```
 ### Frontend build
 ```
 buildah build \
  --platform linux/arm64 \
  --tag zot.immich-ad.ovh/ai-teacher-frontend:latest \
  frontend/
 buildah login zot.immich-ad.ovh
 ```
 Push to the private repository:
 ```
 buildah push --tls-verify=false zot.immich-ad.ovh/ai-teacher-frontend:latest
 ```
 ### Run Native Stack (Docker Compose)
 ```bash
 # Copy and fill in secrets
 cp .env.example .env
 # edit .env — add OPENAI_API_KEY at minimum
 # Start PostgreSQL + native backend
 docker compose -f docker-compose.native.yml up
 ```
 App available at `http://localhost:8080`.
 ### Build Pipeline (Native)
 ```mermaid
 graph LR
    SRC["Source Code\n(Java 25)"]
    AOT["Spring Boot AOT\n(process-aot)"]
    NI["GraalVM native-image\n(native-maven-plugin)"]
    EXE["Native Executable\ntarget/ai-teacher-backend"]
    JIB["Jib\n(jib-native-image-extension)"]
    IMG["Docker Image\nai-teacher-backend:latest\n(distroless base)"]
    SRC --> AOT
    AOT --> NI
    NI --> EXE
    EXE --> JIB
    JIB --> IMG
 ```
 ### Environment Variables
 #### Backend
 | Variable | Required | Description |
 |----------|----------|-------------|
 | `OPENAI_API_KEY` | Yes | OpenAI API key for embeddings and chat |
@@ -89,3 +309,14 @@ npm run dev
 | `DB_USERNAME` | Yes | Database username |
 | `DB_PASSWORD` | Yes | Database password |
 | `FIGURE_STORAGE_PATH` | No | Base path for uploaded PDFs and extracted figures (default: `./uploads`) |
 | `UPLOAD_ENABLED` | No | Set to `false` to disable the book upload endpoint (default: `true`) |
 | `DELETE_ENABLED` | No | Set to `false` to disable the book delete endpoint (default: `true`) |
 #### Frontend
 | Variable | Required | Description |
 |----------|----------|-------------|
 | `VITE_API_URL` | No | Backend API base URL (default: `/api/v1`) |
 | `VITE_APP_PASSWORD` | Yes | Shared password for HTTP Basic auth (must match `APP_PASSWORD`) |
 | `VITE_UPLOAD_ENABLED` | No | Set to `false` to hide the upload UI (default: `true`) |
 | `VITE_DELETE_ENABLED` | No | Set to `false` to hide the delete button (default: `true`) |
@@ -0,0 +1,24 @@
 # Java build artifacts
 target/
 *.class
 *.jar
 # Git
 .git/
 .gitignore
 # Editor
 .vscode/
 .idea/
 *.iml
 # OS
 .DS_Store
 Thumbs.db
 # Logs
 *.log
 # Environment
 .env
 .env.*
@@ -0,0 +1,25 @@
 # ---- Pull Maven from its official image (avoids microdnf under QEMU) ----
 FROM docker.io/library/maven:3.9.9-eclipse-temurin-21 AS maven-dist
 # ---- Build stage: GraalVM 25 + Maven ----
 FROM ghcr.io/graalvm/native-image-community:25 AS build
 # Copy Maven from the official Maven image — no package installation needed
 COPY --from=maven-dist /usr/share/maven /opt/maven
 ENV PATH="/opt/maven/bin:$PATH"
 WORKDIR /app
 # Cache dependency resolution separately from source compilation
 COPY pom.xml .
 RUN mvn -Pnative dependency:resolve dependency:resolve-plugins -q
 # Build native executable
 COPY src ./src
 RUN mvn -Pnative package -DskipTests
 # ---- Runtime stage: slim Debian with glibc + libz (required by GraalVM native binary) ----
 FROM docker.io/library/debian:12-slim
 COPY --from=build /app/target/ai-teacher-backend /app/ai-teacher-backend
 EXPOSE 8080
 ENTRYPOINT ["/app/ai-teacher-backend"]
@@ -32,6 +32,13 @@
        <type>pom</type>
        <scope>import</scope>
      </dependency>
      <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>bom</artifactId>
        <version>2.30.14</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
    </dependencies>
  </dependencyManagement>
@@ -101,13 +108,19 @@
      <artifactId>spring-ai-pdf-document-reader</artifactId>
    </dependency>
-    <!-- PDFBox — explicit for image extraction per page -->
+    <!-- PDFBox — page rendering and cropping for figure extraction -->
    <dependency>
      <groupId>org.apache.pdfbox</groupId>
      <artifactId>pdfbox</artifactId>
      <version>3.0.3</version>
    </dependency>
    <!-- AWS SDK v2 — S3 figure storage -->
    <dependency>
      <groupId>software.amazon.awssdk</groupId>
      <artifactId>s3</artifactId>
    </dependency>
    <!-- Jackson (JSON) -->
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
@@ -127,15 +140,119 @@
    </dependency>
  </dependencies>
  <build>
    <plugins>
      <plugin>
        <groupId>org.graalvm.buildtools</groupId>
        <artifactId>native-maven-plugin</artifactId>
      </plugin>
      <plugin>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-maven-plugin</artifactId>
      </plugin>
      <!-- Jib — package native executable (or fat-jar) into Docker image -->
      <plugin>
        <groupId>com.google.cloud.tools</groupId>
        <artifactId>jib-maven-plugin</artifactId>
        <version>3.5.1</version>
        <configuration>
          <from>
            <!-- distroless glibc base — includes libz + libssl needed by GraalVM native binary -->
            <image>gcr.io/distroless/base-debian12</image>
          </from>
          <to>
            <image>zot.immich-ad.ovh/ai-teacher-backend</image>
            <tags>
              <tag>latest</tag>
            </tags>
          </to>
          <container>
            <format>OCI</format>
            <ports>
              <port>8080</port>
            </ports>
            <!-- invoke the native binary directly — no JVM -->
            <entrypoint>
              <arg>/app/ai-teacher-backend</arg>
            </entrypoint>
          </container>
          <!-- copy the GraalVM-compiled binary from target/ into /app/ -->
          <extraDirectories>
            <paths>
              <path>
                <from>${project.build.directory}</from>
                <into>/app</into>
                <includes>ai-teacher-backend</includes>
              </path>
            </paths>
            <permissions>
              <permission>
                <file>/app/ai-teacher-backend</file>
                <mode>755</mode>
              </permission>
            </permissions>
          </extraDirectories>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <profiles>
    <profile>
      <id>native</id>
      <build>
        <plugins>
          <!-- skip jib in native builds — use Dockerfile.native + buildah instead -->
          <plugin>
            <groupId>com.google.cloud.tools</groupId>
            <artifactId>jib-maven-plugin</artifactId>
            <configuration>
              <skip>true</skip>
            </configuration>
          </plugin>
          <!-- GraalVM native-image compilation -->
          <plugin>
            <groupId>org.graalvm.buildtools</groupId>
            <artifactId>native-maven-plugin</artifactId>
            <version>1.0.0</version>
            <executions>
              <execution>
                <id>add-reachability-metadata</id>
                <goals>
                  <goal>add-reachability-metadata</goal>
                </goals>
              </execution>
              <execution>
                <id>compile</id>
                <goals>
                  <goal>compile-no-fork</goal>
                </goals>
                <phase>package</phase>
              </execution>
            </executions>
            <configuration>
              <imageName>ai-teacher-backend</imageName>
              <buildArgs>
                <buildArg>--initialize-at-build-time=org.slf4j,ch.qos.logback</buildArg>
                <buildArg>-H:+ReportExceptionStackTraces</buildArg>
                <buildArg>--gc=serial</buildArg>
                <buildArg>-Os</buildArg>
                <buildArg>-H:+RemoveUnusedSymbols</buildArg>
                <buildArg>-H:-EnableLoggingFeature</buildArg>
                <buildArg>-R:MaxHeapSize=128m</buildArg>
                <buildArg>-R:MinHeapSize=32m</buildArg>
                <!-- Limit native-image compiler RAM (build time, not runtime) -->
                <buildArg>-J-Xmx8g</buildArg>
              </buildArgs>
            </configuration>
          </plugin>
        </plugins>
      </build>
    </profile>
  </profiles>
 </project>
@@ -1,11 +1,15 @@
 package com.aiteacher;
 import org.springframework.context.annotation.ImportRuntimeHints;
 import org.springframework.boot.SpringApplication;
 import org.springframework.boot.autoconfigure.SpringBootApplication;
 import org.springframework.scheduling.annotation.EnableAsync;
 import com.aiteacher.config.NativeHintsConfig;
@SpringBootApplication
@EnableAsync
@ImportRuntimeHints(NativeHintsConfig.class)
 public class AiTeacherApplication {
    public static void main(String[] args) {
@@ -0,0 +1,19 @@
 package com.aiteacher.auth;
 import org.springframework.http.ResponseEntity;
 import org.springframework.web.bind.annotation.GetMapping;
 import org.springframework.web.bind.annotation.RequestMapping;
 import org.springframework.web.bind.annotation.RestController;
 import java.security.Principal;
 import java.util.Map;
@RestController
@RequestMapping("/api/v1/auth")
 public class AuthController {
    @GetMapping("/check")
    public ResponseEntity<Map<String, String>> check(Principal principal) {
        return ResponseEntity.ok(Map.of("username", principal.getName()));
    }
 }
@@ -2,7 +2,10 @@ package com.aiteacher.book;
 import com.aiteacher.document.FigureEntity;
 import com.aiteacher.document.FigureRepository;
 import com.aiteacher.document.MarkdownStorageService;
 import org.springframework.beans.factory.annotation.Value;
 import org.springframework.http.HttpStatus;
 import org.springframework.http.MediaType;
 import org.springframework.http.ResponseEntity;
 import org.springframework.web.bind.annotation.*;
 import org.springframework.web.multipart.MultipartFile;
@@ -18,14 +21,24 @@ public class BookController {
    private final BookService bookService;
    private final FigureRepository figureRepository;
    private final MarkdownStorageService markdownStorageService;
-    public BookController(BookService bookService, FigureRepository figureRepository) {
+    @Value("${app.features.upload-enabled:true}")
    private boolean uploadEnabled;
    @Value("${app.features.delete-enabled:true}")
    private boolean deleteEnabled;
    public BookController(BookService bookService, FigureRepository figureRepository,
                          MarkdownStorageService markdownStorageService) {
        this.bookService = bookService;
        this.figureRepository = figureRepository;
        this.markdownStorageService = markdownStorageService;
    }
    @PostMapping(consumes = "multipart/form-data")
    public ResponseEntity<?> upload(@RequestParam("file") MultipartFile file) throws IOException {
        if (!uploadEnabled) return ResponseEntity.status(HttpStatus.METHOD_NOT_ALLOWED).build();
        Book book = bookService.upload(file);
        return ResponseEntity.status(HttpStatus.ACCEPTED).body(toSummaryResponse(book));
    }
@@ -46,6 +59,7 @@ public class BookController {
    @DeleteMapping("/{id}")
    public ResponseEntity<Void> delete(@PathVariable UUID id) {
        if (!deleteEnabled) return ResponseEntity.status(HttpStatus.METHOD_NOT_ALLOWED).build();
        bookService.delete(id);
        return ResponseEntity.noContent().build();
    }
@@ -59,6 +73,17 @@ public class BookController {
        ));
    }
    @GetMapping(value = "/{id}/pages/{pageNumber}/html", produces = MediaType.TEXT_HTML_VALUE)
    public ResponseEntity<String> getPageHtml(@PathVariable UUID id,
                                               @PathVariable int pageNumber) {
        bookService.getById(id); // 404 if not found
        try {
            return ResponseEntity.ok(markdownStorageService.getText(id, pageNumber));
        } catch (Exception e) {
            return ResponseEntity.notFound().build();
        }
    }
    @GetMapping("/{id}/figures")
    public ResponseEntity<List<FigureResponse>> figures(@PathVariable UUID id) {
        bookService.getById(id); // 404 if not found
@@ -1,7 +1,10 @@
 package com.aiteacher.book;
 import com.aiteacher.document.*;
 import com.aiteacher.enrichment.ChunkEnrichmentPipeline;
 import com.aiteacher.enrichment.ChunkMetadataRepository;
 import com.aiteacher.figure.FigureStorageService;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.ai.document.Document;
@@ -23,13 +26,7 @@ public class BookEmbeddingService {
    private final VectorStore vectorStore;
    private final BookRepository bookRepository;
-
+    private final MarkerPageParser markerPageParser;
    @Value("${app.embedding.batch-size:50}")
    private int embeddingBatchSize;
    @Value("${app.embedding.batch-delay-ms:1000}")
    private long embeddingBatchDelayMs;
    private final PdfStructureParser pdfStructureParser;
    private final FigureExtractionService figureExtractionService;
    private final VisionDescriptionService visionDescriptionService;
    private final TextChunkingService textChunkingService;
@@ -39,11 +36,23 @@ public class BookEmbeddingService {
    private final FigureRepository figureRepository;
    private final ChunkFigureRefRepository chunkFigureRefRepository;
    private final FigureStorageService figureStorageService;
    private final MarkdownStorageService markdownStorageService;
    private final ChunkEnrichmentPipeline chunkEnrichmentPipeline;
    private final ChunkMetadataRepository chunkMetadataRepository;
    @Value("${app.embedding.batch-size:50}")
    private int embeddingBatchSize;
    @Value("${app.embedding.batch-delay-ms:1000}")
    private long embeddingBatchDelayMs;
    @Value("${app.embedding.skip-embedding:false}")
    private boolean skipEmbedding;
    public BookEmbeddingService(
            VectorStore vectorStore,
            BookRepository bookRepository,
-            PdfStructureParser pdfStructureParser,
+            MarkerPageParser markerPageParser,
            FigureExtractionService figureExtractionService,
            VisionDescriptionService visionDescriptionService,
            TextChunkingService textChunkingService,
@@ -52,10 +61,13 @@ public class BookEmbeddingService {
            ChapterRepository chapterRepository,
            FigureRepository figureRepository,
            ChunkFigureRefRepository chunkFigureRefRepository,
-            FigureStorageService figureStorageService) {
+            FigureStorageService figureStorageService,
            MarkdownStorageService markdownStorageService,
            ChunkEnrichmentPipeline chunkEnrichmentPipeline,
            ChunkMetadataRepository chunkMetadataRepository) {
        this.vectorStore = vectorStore;
        this.bookRepository = bookRepository;
-        this.pdfStructureParser = pdfStructureParser;
+        this.markerPageParser = markerPageParser;
        this.figureExtractionService = figureExtractionService;
        this.visionDescriptionService = visionDescriptionService;
        this.textChunkingService = textChunkingService;
@@ -65,11 +77,14 @@ public class BookEmbeddingService {
        this.figureRepository = figureRepository;
        this.chunkFigureRefRepository = chunkFigureRefRepository;
        this.figureStorageService = figureStorageService;
        this.markdownStorageService = markdownStorageService;
        this.chunkEnrichmentPipeline = chunkEnrichmentPipeline;
        this.chunkMetadataRepository = chunkMetadataRepository;
    }
    @Async
    public void embedBook(UUID bookId, String bookTitle, Path pdfPath) {
-        log.info("Starting image-aware embedding for book {} ({})", bookId, bookTitle);
+        log.info("Starting Marker-powered embedding for book {} ({})", bookId, bookTitle);
        Book book = bookRepository.findById(bookId).orElse(null);
        if (book == null) {
@@ -81,68 +96,102 @@ public class BookEmbeddingService {
            book.setStatus(BookStatus.PROCESSING);
            bookRepository.save(book);
            // Step 1: Parse PDF into page-level sections persisted in Postgres
            List<SectionEntity> sections = pdfStructureParser.parse(bookId, bookTitle, pdfPath);
            String chapterId = bookId + "-ch1";
            ChapterEntity chapter = new ChapterEntity(chapterId, bookId, 1, bookTitle, 1);
            chapterRepository.save(chapter);
-            // Step 2: Build and embed text chunks for all sections in batches
+            // Step 1: Parse with Marker — split into 100-page chunks, then merge results
            ParsedBook parsed = markerPageParser.parse(pdfPath);
            List<PageResult> pageResults = parsed.pages();
            // Step 2: Build SectionEntity per page and persist
            List<SectionEntity> sections = buildAndSaveSections(bookId, bookTitle, chapterId, pageResults);
            // Step 3: Chunk and embed text
            List<Document> allChunks = new ArrayList<>();
            for (SectionEntity section : sections) {
-                List<Document> chunks = textChunkingService.chunk(section, bookTitle);
+                allChunks.addAll(textChunkingService.chunk(section, bookTitle));
-                allChunks.addAll(chunks);
+            }
            if (skipEmbedding) {
                log.info("skip-embedding=true — skipping text embedding for book {}", bookId);
            } else {
                embedInBatches(allChunks, bookId);
                log.info("Embedded {} text chunks for book {}", allChunks.size(), bookId);
                Map<String, SectionEntity> sectionsById = new HashMap<>();
                for (SectionEntity s : sections) sectionsById.put(s.getId(), s);
                try {
                    chunkEnrichmentPipeline.enrichAndPersist(allChunks, sectionsById, bookTitle);
                } catch (Exception ex) {
                    log.warn("Chunk enrichment failed for book {} — backfill endpoint can recover: {}",
                        bookId, ex.getMessage());
                }
            }
            embedInBatches(allChunks, bookId);
            log.info("Embedded {} text chunks for book {}", allChunks.size(), bookId);
-            // Step 3: Extract images from the PDF, save to file store, persist FigureEntity
+            // Step 4: Decode pre-cropped figures from Marker output
-            List<FigureEntity> figures = figureExtractionService.extract(
+            FigureExtractionService.ExtractionResult extraction =
-                bookId, chapterId, sections, pdfPath);
+                    figureExtractionService.extract(bookId, chapterId, pageResults);
            List<FigureEntity> figures = extraction.figures();
            // Step 4b: Save per-page HTML to S3, replacing Marker image src with API URLs
            parsed.htmlByPage().forEach((pageNumber, html) -> {
                String resolved = resolveImageSrcs(html, bookId, extraction.blockIdToFigureId());
                markdownStorageService.save(bookId, pageNumber, resolved);
            });
            log.info("Saved {} HTML pages to S3 for book {}", parsed.htmlByPage().size(), bookId);
            // Step 5: Vision analysis (description + visible text) → embed figure chunks
            Map<String, SectionEntity> sectionById = new HashMap<>();
            for (SectionEntity s : sections) sectionById.put(s.getId(), s);
            // Step 4: For each figure, generate vision description and embed caption
            for (FigureEntity figure : figures) {
-                Path imagePath = figureStorageService.resolve(figure.getImagePath());
+                // Prefer caption extracted from the linked section's full text
                String description = visionDescriptionService.describe(
                    imagePath, figure.getCaption());
                // Use description as caption fallback if no caption was detected
                if (figure.getCaption() == null || figure.getCaption().isBlank()) {
-                    figure.setCaption(description);
+                    String sectionCaption = extractCaptionFromSection(sectionById.get(figure.getSectionId()));
-                    figureRepository.save(figure);
+                    if (sectionCaption != null) {
                        figure.setCaption(sectionCaption);
                        figureRepository.save(figure);
                    } else {
                        byte[] imageBytes = figureStorageService.getBytes(figure.getImagePath());
                        VisionDescriptionService.ImageAnalysis analysis =
                                visionDescriptionService.analyze(imageBytes, figure.getCaption());
                        figure.setCaption(analysis.description());
                        figureRepository.save(figure);
                    }
                }
-                // Content for embedding = vision description + caption for maximum signal
+                // Embedding content: description
-                String embeddingContent = description
+                String embeddingContent = (figure.getCaption() != null ? "\n" + figure.getCaption() : "");
                    + (figure.getCaption() != null ? "\n" + figure.getCaption() : "");
                String embeddingId = UUID.randomUUID().toString();
-                Map<String, Object> metadata = buildFigureMetadata(figure, bookTitle, embeddingId);
+                if (!skipEmbedding) {
-                Document figureDoc = new Document(embeddingId, embeddingContent, metadata);
+                    Document figureDoc = new Document(embeddingId, embeddingContent,
-                vectorStore.add(List.of(figureDoc));
+                            buildFigureMetadata(figure, bookTitle, embeddingId, ""));
-
+                    vectorStore.add(List.of(figureDoc));
-                figure.setCaptionEmbeddingId(UUID.fromString(embeddingId));
+                    figure.setCaptionEmbeddingId(UUID.fromString(embeddingId));
                }
                figureRepository.save(figure);
            }
-            log.info("Embedded {} figure captions for book {}", figures.size(), bookId);
+            log.info("Embedded {} figure chunks for book {}", figures.size(), bookId);
-            // Step 5: Link text chunks to figures via text references
+            // Step 6: Link text chunks to figures via in-text references
            for (SectionEntity section : sections) {
                List<Document> sectionChunks = allChunks.stream()
-                    .filter(d -> section.getId().equals(d.getMetadata().get("section_id")))
+                        .filter(d -> section.getId().equals(d.getMetadata().get("section_id")))
-                    .toList();
+                        .toList();
                List<FigureEntity> sectionFigures = figures.stream()
-                    .filter(f -> section.getId().equals(f.getSectionId()))
+                        .filter(f -> section.getId().equals(f.getSectionId()))
-                    .toList();
+                        .toList();
-                chunkFigureRefService.linkChunksToFigures(
+                chunkFigureRefService.linkChunksToFigures(sectionChunks, sectionFigures, section.getPageStart());
                    sectionChunks, sectionFigures, section.getPageStart());
            }
            book.setStatus(BookStatus.READY);
-            book.setPageCount(sections.size());
+            book.setPageCount(parsed.htmlByPage().size());
            book.setProcessedAt(Instant.now());
            bookRepository.save(book);
            log.info("Finished embedding book {} — {} pages, {} figures",
-                bookId, sections.size(), figures.size());
+                    bookId, sections.size(), figures.size());
        } catch (Exception ex) {
            log.error("Failed to embed book {}", bookId, ex);
@@ -156,53 +205,65 @@ public class BookEmbeddingService {
    public void deleteBookChunks(UUID bookId) {
        log.info("Deleting all data for book {}", bookId);
        try {
            // Delete chunk-figure refs (by figureId for this book)
            List<String> figureIds = figureRepository.findAllByBookId(bookId)
-                .stream().map(FigureEntity::getId).toList();
+                    .stream().map(FigureEntity::getId).toList();
            if (!figureIds.isEmpty()) {
                chunkFigureRefRepository.deleteByFigureIdIn(figureIds);
            }
            // Delete figures from Postgres
            figureRepository.deleteAllByBookId(bookId);
            // Delete figure files from disk
            figureStorageService.deleteAll(bookId);
-
+            markdownStorageService.deleteAll(bookId);
            // Delete sections and chapters from Postgres
            sectionRepository.deleteAllByBookId(bookId);
            chapterRepository.deleteAllByBookId(bookId);
-            // Delete vector store entries (text chunks + figure embeddings)
+            chunkMetadataRepository.deleteByBookId(bookId);
            FilterExpressionBuilder b = new FilterExpressionBuilder();
            vectorStore.delete(b.eq("book_id", bookId.toString()).build());
        } catch (Exception ex) {
            log.warn("Error during cleanup for book {}: {}", bookId, ex.getMessage());
        }
    }
    // --- Private helpers ---
    private List<SectionEntity> buildAndSaveSections(UUID bookId, String bookTitle,
                                                      String chapterId,
                                                      List<PageResult> pageResults) {
        List<SectionEntity> sections = new ArrayList<>();
        for (PageResult page : pageResults) {
            if (page.orderedText().isBlank()) continue;
            String sectionId = bookId + "-p" + page.pageNumber();
            String title = truncate(page.headingTitle() != null ? page.headingTitle() : "Page " + page.pageNumber(), 500);
            SectionEntity section = new SectionEntity(
                    sectionId, chapterId, bookId,
                    String.valueOf(page.pageNumber()),
                    title,
                    page.pageNumber(), page.pageNumber(),
                    page.orderedText());
            sections.add(sectionRepository.save(section));
        }
        return sections;
    }
    private void embedInBatches(List<Document> docs, UUID bookId) {
        int total = docs.size();
        for (int i = 0; i < total; i += embeddingBatchSize) {
            List<Document> batch = docs.subList(i, Math.min(i + embeddingBatchSize, total));
            vectorStore.add(batch);
-            int batchNum = i / embeddingBatchSize + 1;
+            log.debug("Embedded batch {}/{} for book {}",
-            int totalBatches = (total - 1) / embeddingBatchSize + 1;
+                    i / embeddingBatchSize + 1, (total - 1) / embeddingBatchSize + 1, bookId);
            log.debug("Embedded batch {}/{} for book {}", batchNum, totalBatches, bookId);
            if (i + embeddingBatchSize < total) {
-                try {
+                try { Thread.sleep(embeddingBatchDelayMs); }
-                    Thread.sleep(embeddingBatchDelayMs);
+                catch (InterruptedException e) { Thread.currentThread().interrupt(); }
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                    log.warn("Embedding batch sleep interrupted for book {}", bookId);
                }
            }
        }
    }
    private Map<String, Object> buildFigureMetadata(FigureEntity figure, String bookTitle,
-                                                     String embeddingId) {
+                                                     String embeddingId, String imageText) {
        Map<String, Object> m = new HashMap<>();
        m.put("type", "FIGURE");
        m.put("book_id", figure.getBookId().toString());
@@ -215,9 +276,37 @@ public class BookEmbeddingService {
        m.put("label", figure.getLabel() != null ? figure.getLabel() : "");
        m.put("page", figure.getPage());
        m.put("embedding_id", embeddingId);
        m.put("image_text", imageText);  // verbatim text visible inside the image
        return m;
    }
    /**
     * Replaces Marker's {@code src='{blockId}'} image attributes with resolved API URLs.
     * Block IDs look like {@code /page/0/Figure/2}.
     */
    private String resolveImageSrcs(String html, UUID bookId, Map<String, String> blockIdToFigureId) {
        for (Map.Entry<String, String> entry : blockIdToFigureId.entrySet()) {
            String blockId = entry.getKey();
            String figureId = entry.getValue();
            String apiUrl = "/api/v1/figures/" + bookId + "/" + figureId + ".png";
            // Marker emits both single and double-quoted src attributes
            html = html.replace("src='" + blockId + "'", "src='" + apiUrl + "'");
            html = html.replace("src=\"" + blockId + "\"", "src=\"" + apiUrl + "\"");
        }
        return html;
    }
    private String extractCaptionFromSection(SectionEntity section) {
        if (section == null) return null;
        for (String line : section.getFullText().split("\n")) {
            String trimmed = line.strip();
            if (trimmed.startsWith("Fig.") || trimmed.startsWith("Figure") || trimmed.startsWith("Algorithm")) {
                return trimmed;
            }
        }
        return null;
    }
    private String truncate(String msg, int max) {
        if (msg == null) return null;
        return msg.length() <= max ? msg : msg.substring(0, max);
@@ -5,10 +5,11 @@ import com.aiteacher.book.BookStatus;
 import com.aiteacher.book.NoKnowledgeSourceException;
 import com.aiteacher.document.FigureEntity;
 import com.aiteacher.document.SectionEntity;
 import com.aiteacher.retrieval.CitationValidatorService;
 import com.aiteacher.retrieval.LabelledContext;
 import com.aiteacher.retrieval.NeurosurgeryRetriever;
 import com.aiteacher.retrieval.QueryExpansionService;
 import com.aiteacher.retrieval.RetrievalResult;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.ai.chat.client.ChatClient;
 import org.springframework.stereotype.Service;
@@ -17,8 +18,6 @@ import java.util.*;
@Service
 public class ChatService {
    private static final Logger log = LoggerFactory.getLogger(ChatService.class);
    private static final String SYSTEM_PROMPT = """
        You are an expert neurosurgery educator assistant. Answer questions using the
        medical textbook content provided to you as context.
@@ -29,8 +28,8 @@ public class ChatService {
        - Build answers from what is present: procedures, conditions, techniques, and descriptions all contribute; combine them into a rich, structured response
        - Use clear structure: headings, bullet points, or numbered steps where appropriate to maximize clarity
        - Only say you cannot answer if the context is entirely unrelated to the question
-        - Cite sources for each major point (book title and page number from the context)
+        - Cite sources for each major claim using the reference labels from the context (e.g. [S1], [F2]). Prefer these labels over inventing page numbers, but you may also describe the source naturally if needed.
-        - When referencing diagrams or figures, cite them as [Fig. X, p.N]
+        - Figures (labeled [F1], [F2], etc.) are actual images and drawings from the textbook — they will be rendered as inline illustrations in your response. Use them actively to support your explanations: reference a figure when it visually demonstrates anatomy, a surgical step, or a clinical concept you are describing.
        - Maintain continuity with the conversation history
        - Never fabricate clinical information not present in the context
        """;
@@ -40,17 +39,23 @@ public class ChatService {
    private final ChatSessionRepository sessionRepository;
    private final MessageRepository messageRepository;
    private final NeurosurgeryRetriever retriever;
    private final QueryExpansionService queryExpansionService;
    private final CitationValidatorService citationValidatorService;
    public ChatService(ChatClient chatClient,
                       BookRepository bookRepository,
                       ChatSessionRepository sessionRepository,
                       MessageRepository messageRepository,
-                       NeurosurgeryRetriever retriever) {
+                       NeurosurgeryRetriever retriever,
                       QueryExpansionService queryExpansionService,
                       CitationValidatorService citationValidatorService) {
        this.chatClient = chatClient;
        this.bookRepository = bookRepository;
        this.sessionRepository = sessionRepository;
        this.messageRepository = messageRepository;
        this.retriever = retriever;
        this.queryExpansionService = queryExpansionService;
        this.citationValidatorService = citationValidatorService;
    }
    public ChatSession createSession(String topicId) {
@@ -85,25 +90,34 @@ public class ChatService {
        List<Message> history = messageRepository.findBySessionIdOrderByCreatedAtAsc(sessionId);
        String fullQuestion = buildQuestionWithHistory(history, userContent, session.getTopicId());
-        // Retrieve context from all ready books (aggregate across books)
+        // Expand only the current user question to clinical terminology for retrieval (US1).
        // fullQuestion (which includes conversation history) is used for the LLM context prompt,
        // but retrieval should be driven by a concise clinical rewrite of the actual question.
        String retrievalQuery = queryExpansionService.expand(userContent).rewritten();
        // Retrieve context from all ready books using the expanded query
        List<SectionEntity> allSections = new ArrayList<>();
        List<FigureEntity> allFigures = new ArrayList<>();
        for (com.aiteacher.book.Book book : readyBooks) {
-            RetrievalResult result = retriever.retrieve(fullQuestion, book.getId());
+            RetrievalResult result = retriever.retrieve(retrievalQuery, book.getId());
            allSections.addAll(result.parentSections());
            allFigures.addAll(result.figures());
        }
-        // Build LLM prompt with section full texts and figure references
+        // Build labelled context prompt (US2): assigns [S1]/[F1] labels to each source
-        String contextPrompt = buildContextPrompt(fullQuestion, allSections, allFigures);
+        LabelledContext ctx = buildContextPrompt(fullQuestion, allSections, allFigures);
-        String assistantContent = chatClient.prompt()
+        // Generate answer
        String rawContent = chatClient.prompt()
            .system(SYSTEM_PROMPT)
-            .user(contextPrompt)
+            .user(ctx.promptText())
            .call()
            .content();
-        // Build sources list with TEXT and FIGURE entries
+        // Strip any citation labels not present in the retrieved context (US2)
        String assistantContent = citationValidatorService.validate(rawContent, ctx.allLabels());
        // Attach sources with their ref-labels for frontend traceability
        List<Map<String, Object>> sources = buildSources(allSections, allFigures);
        Message assistantMessage = new Message(sessionId, MessageRole.ASSISTANT, assistantContent);
@@ -126,51 +140,71 @@ public class ChatService {
    // Private helpers
    // -------------------------------------------------------------------------
-    private String buildContextPrompt(String question,
+    /**
-                                      List<SectionEntity> sections,
+     * Builds the LLM context prompt, tagging each section as [S1], [S2]… and
-                                      List<FigureEntity> figures) {
+     * each figure as [F1], [F2]… so the model can cite only known sources.
     */
    private LabelledContext buildContextPrompt(String question,
                                               List<SectionEntity> sections,
                                               List<FigureEntity> figures) {
        Map<String, SectionEntity> sectionLabels = new LinkedHashMap<>();
        Map<String, FigureEntity> figureLabels = new LinkedHashMap<>();
        StringBuilder sb = new StringBuilder();
        if (!sections.isEmpty()) {
            sb.append("CONTEXT:\n\n");
-            for (SectionEntity section : sections) {
+            for (int i = 0; i < sections.size(); i++) {
-                sb.append("[").append(section.getTitle())
+                SectionEntity section = sections.get(i);
-                  .append(", p.").append(section.getPageStart()).append("]\n");
+                String label = "S" + (i + 1);
                sectionLabels.put(label, section);
                sb.append("[").append(label).append("] ")
                  .append(section.getTitle())
                  .append(", p.").append(section.getPageStart()).append("\n");
                sb.append(section.getFullText()).append("\n\n");
            }
        }
        if (!figures.isEmpty()) {
            sb.append("AVAILABLE FIGURES:\n");
-            for (FigureEntity figure : figures) {
+            for (int i = 0; i < figures.size(); i++) {
-                sb.append("- ").append(figure.getLabel() != null ? figure.getLabel() : "Figure")
+                FigureEntity figure = figures.get(i);
                String label = "F" + (i + 1);
                figureLabels.put(label, figure);
                sb.append("[").append(label).append("] ")
                  .append(figure.getLabel() != null ? figure.getLabel() : "Figure")
                  .append(" (p.").append(figure.getPage()).append("): ")
                  .append(figure.getCaption() != null ? figure.getCaption() : "")
                  .append("\n");
            }
-            sb.append("\nWhen referencing diagrams, cite them as [Fig. X, p.N].\n\n");
+            sb.append("\nWhen referencing diagrams, use their label from the context (e.g. [F1]).\n\n");
        }
        sb.append("QUESTION:\n").append(question);
-        return sb.toString();
+        return new LabelledContext(sectionLabels, figureLabels, sb.toString());
    }
    private List<Map<String, Object>> buildSources(List<SectionEntity> sections,
                                                    List<FigureEntity> figures) {
        List<Map<String, Object>> sources = new ArrayList<>();
-        for (SectionEntity section : sections) {
+        for (int i = 0; i < sections.size(); i++) {
            SectionEntity section = sections.get(i);
            Map<String, Object> source = new LinkedHashMap<>();
            source.put("type", "TEXT");
            source.put("refLabel", "S" + (i + 1));
            source.put("bookId", section.getBookId());
            source.put("bookTitle", deriveTitleFromSection(section));
            source.put("page", section.getPageStart());
            source.put("chunkText", truncate(section.getFullText(), 500));
            sources.add(source);
        }
-        for (FigureEntity figure : figures) {
+        for (int i = 0; i < figures.size(); i++) {
            FigureEntity figure = figures.get(i);
            Map<String, Object> source = new LinkedHashMap<>();
            source.put("type", "FIGURE");
            source.put("refLabel", "F" + (i + 1));
            source.put("bookId", figure.getBookId());
            source.put("bookTitle", bookRepository.findById(figure.getBookId())
                .map(com.aiteacher.book.Book::getTitle).orElse("Book"));
            source.put("page", figure.getPage());
@@ -178,7 +212,6 @@ public class ChatService {
            source.put("label", figure.getLabel() != null ? figure.getLabel() : "");
            source.put("caption", figure.getCaption() != null ? figure.getCaption() : "");
            source.put("figureType", figure.getFigureType().name());
            // imageUrl assembled from relative path: figures/{bookId}/{filename}
            String filename = figure.getImagePath().substring(
                figure.getImagePath().lastIndexOf('/') + 1);
            source.put("imageUrl", "/api/v1/figures/" + figure.getBookId() + "/" + filename);
@@ -0,0 +1,52 @@
 package com.aiteacher.concept;
 import com.aiteacher.topic.Topic;
 import com.aiteacher.topic.TopicRepository;
 import org.springframework.http.ResponseEntity;
 import org.springframework.web.bind.annotation.*;
 import java.util.List;
 import java.util.Map;
 import java.util.NoSuchElementException;
 import java.util.UUID;
 import java.util.stream.Collectors;
@RestController
@RequestMapping("/api/v1/topics/{id}/concept-reports")
 public class ConceptReportController {
    private final TopicRepository topicRepository;
    private final ConceptReportService conceptReportService;
    public ConceptReportController(TopicRepository topicRepository,
                                    ConceptReportService conceptReportService) {
        this.topicRepository = topicRepository;
        this.conceptReportService = conceptReportService;
    }
    @PostMapping
    public ResponseEntity<ConceptReportResponse> generate(
            @PathVariable String id,
            @RequestParam(defaultValue = "en") String language) {
        Topic topic = topicRepository.findById(id)
            .orElseThrow(() -> new NoSuchElementException("Topic not found."));
        return ResponseEntity.ok(conceptReportService.generateReport(topic, language));
    }
    @GetMapping
    public ResponseEntity<List<SavedConceptReportItem>> list(@PathVariable String id) {
        topicRepository.findById(id)
            .orElseThrow(() -> new NoSuchElementException("Topic not found."));
        return ResponseEntity.ok(conceptReportService.listReports(id));
    }
    @GetMapping("/{reportId}")
    public ResponseEntity<ConceptReportResponse> get(@PathVariable String id,
                                                      @PathVariable UUID reportId) {
        topicRepository.findById(id)
            .orElseThrow(() -> new NoSuchElementException("Topic not found."));
        Map<String, String> topicNames = topicRepository.findAll().stream()
            .collect(Collectors.toMap(Topic::getId, Topic::getName, (a, b) -> a));
        return ResponseEntity.ok(conceptReportService.getReport(reportId, topicNames));
    }
 }
@@ -0,0 +1,48 @@
 package com.aiteacher.concept;
 import jakarta.persistence.*;
 import java.time.Instant;
 import java.util.UUID;
@Entity
@Table(name = "concept_report")
 public class ConceptReportEntity {
    @Id
    @GeneratedValue(strategy = GenerationType.UUID)
    private UUID id;
    @Column(name = "topic_id", nullable = false, length = 100)
    private String topicId;
    @Column(name = "report_number", nullable = false)
    private int reportNumber;
    @Column(name = "facets_json", nullable = false, columnDefinition = "TEXT")
    private String facetsJson;
    @Column(name = "sources_json", nullable = false, columnDefinition = "TEXT")
    private String sourcesJson;
    @Column(name = "generated_at", nullable = false)
    private Instant generatedAt;
    protected ConceptReportEntity() {}
    public ConceptReportEntity(String topicId, int reportNumber, String facetsJson,
                                String sourcesJson, Instant generatedAt) {
        this.topicId = topicId;
        this.reportNumber = reportNumber;
        this.facetsJson = facetsJson;
        this.sourcesJson = sourcesJson;
        this.generatedAt = generatedAt;
    }
    public UUID getId() { return id; }
    public String getTopicId() { return topicId; }
    public int getReportNumber() { return reportNumber; }
    public String getFacetsJson() { return facetsJson; }
    public String getSourcesJson() { return sourcesJson; }
    public Instant getGeneratedAt() { return generatedAt; }
 }
@@ -0,0 +1,13 @@
 package com.aiteacher.concept;
 import org.springframework.data.jpa.repository.JpaRepository;
 import org.springframework.stereotype.Repository;
 import java.util.List;
 import java.util.UUID;
@Repository
 public interface ConceptReportRepository extends JpaRepository<ConceptReportEntity, UUID> {
    long countByTopicId(String topicId);
    List<ConceptReportEntity> findByTopicIdOrderByReportNumberAsc(String topicId);
 }
@@ -0,0 +1,24 @@
 package com.aiteacher.concept;
 import com.aiteacher.topic.TopicSummaryResponse.SourceReference;
 import java.time.Instant;
 import java.util.List;
 import java.util.UUID;
 public record ConceptReportResponse(
    UUID id,
    int reportNumber,
    String topicId,
    String topicName,
    List<FacetSection> facets,
    List<SourceReference> sources,
    Instant generatedAt
 ) {
    public record FacetSection(
        String facetKey,
        String title,
        String markdown,
        List<String> refLabels
    ) {}
 }
@@ -0,0 +1,299 @@
 package com.aiteacher.concept;
 import com.aiteacher.book.Book;
 import com.aiteacher.book.BookRepository;
 import com.aiteacher.book.BookStatus;
 import com.aiteacher.book.NoKnowledgeSourceException;
 import com.aiteacher.document.FigureEntity;
 import com.aiteacher.document.SectionEntity;
 import com.aiteacher.enrichment.ConceptFacet;
 import com.aiteacher.topic.Topic;
 import com.aiteacher.topic.TopicSummaryResponse.SourceReference;
 import com.fasterxml.jackson.core.JsonProcessingException;
 import com.fasterxml.jackson.databind.ObjectMapper;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.ai.chat.client.ChatClient;
 import org.springframework.stereotype.Service;
 import java.time.Instant;
 import java.util.*;
@Service
 public class ConceptReportService {
    private static final Logger log = LoggerFactory.getLogger(ConceptReportService.class);
    private static final String SYSTEM_PROMPT = """
        You are an expert neurosurgery educator. You write focused, facet-specific sections of
        a structured concept report for highly experienced neurosurgeons. The audience wants
        concise, clinically relevant teaching.
        When writing a facet section:
        - Stick strictly to the facet you are asked about (e.g. definition, complications).
        - Cite claims using ONLY the reference labels provided in the context.
          Do not invent page numbers, section titles, or labels not present in CONTEXT.
        - Citation format: each citation must be a SINGLE label per bracket — write `[S1], [S2]` or
          `[S3] [F2]`. NEVER combine labels inside one bracket (no `[S1 S2]`, `[S1, S2]`, `[S1 2]`).
        - Figures ([F#]) are actual images that will be rendered inline — reference them when they
          visually support your explanation.
        - If CONTEXT is insufficient for the requested facet, write exactly:
          "The uploaded books do not contain sufficient information on this aspect."
        - Never hallucinate clinical information outside the provided context.
        """;
    private final ChatClient chatClient;
    private final BookRepository bookRepository;
    private final ConceptRetriever conceptRetriever;
    private final ConceptReportRepository reportRepository;
    private final ObjectMapper objectMapper;
    public ConceptReportService(ChatClient chatClient,
                                 BookRepository bookRepository,
                                 ConceptRetriever conceptRetriever,
                                 ConceptReportRepository reportRepository,
                                 ObjectMapper objectMapper) {
        this.chatClient = chatClient;
        this.bookRepository = bookRepository;
        this.conceptRetriever = conceptRetriever;
        this.reportRepository = reportRepository;
        this.objectMapper = objectMapper;
    }
    public ConceptReportResponse generateReport(Topic topic, String language) {
        List<Book> readyBooks = bookRepository.findAll().stream()
            .filter(b -> b.getStatus() == BookStatus.READY)
            .toList();
        if (readyBooks.isEmpty()) {
            throw new NoKnowledgeSourceException(
                "No books are available as knowledge sources. Please upload and process at least one book.");
        }
        Map<ConceptFacet, MergedFacet> merged = new EnumMap<>(ConceptFacet.class);
        for (Book book : readyBooks) {
            ConceptRetrievalResult result = conceptRetriever.retrieveByConcept(topic.getName(), book.getId());
            result.byFacet().forEach((facet, bundle) -> merged
                .computeIfAbsent(facet, k -> new MergedFacet())
                .add(bundle));
        }
        // Global, deduplicated sources across all facets
        List<SectionEntity> globalSections = new ArrayList<>();
        Set<String> seenSections = new LinkedHashSet<>();
        List<FigureEntity> globalFigures = new ArrayList<>();
        Set<String> seenFigures = new LinkedHashSet<>();
        for (MergedFacet mf : merged.values()) {
            for (SectionEntity s : mf.sections) if (seenSections.add(s.getId())) globalSections.add(s);
            for (FigureEntity f : mf.figures) if (seenFigures.add(f.getId())) globalFigures.add(f);
        }
        // Global label maps: section id -> "S#", figure id -> "F#"
        Map<String, String> sectionLabel = new HashMap<>();
        for (int i = 0; i < globalSections.size(); i++) {
            sectionLabel.put(globalSections.get(i).getId(), "S" + (i + 1));
        }
        Map<String, String> figureLabel = new HashMap<>();
        for (int i = 0; i < globalFigures.size(); i++) {
            figureLabel.put(globalFigures.get(i).getId(), "F" + (i + 1));
        }
        List<ConceptReportResponse.FacetSection> facetSections = new ArrayList<>();
        // Preserve enum declaration order for consistent UI rendering
        for (ConceptFacet facet : ConceptFacet.values()) {
            MergedFacet mf = merged.get(facet);
            if (mf == null || mf.isEmpty()) continue;
            if (facet == ConceptFacet.OTHER) continue; // skip OTHER bucket in the rendered report
            String prompt = buildFacetPrompt(topic, facet, mf, sectionLabel, figureLabel, language);
            String markdown = chatClient.prompt()
                .system(SYSTEM_PROMPT)
                .user(prompt)
                .call()
                .content();
            List<String> refs = collectRefs(mf, sectionLabel, figureLabel);
            facetSections.add(new ConceptReportResponse.FacetSection(
                facet.name(), facet.displayTitle(), markdown != null ? markdown : "", refs));
        }
        List<SourceReference> sources = buildSources(globalSections, globalFigures, readyBooks);
        Instant generatedAt = Instant.now();
        int reportNumber = (int) reportRepository.countByTopicId(topic.getId()) + 1;
        ConceptReportEntity entity = new ConceptReportEntity(
            topic.getId(), reportNumber,
            serialize(facetSections), serialize(sources), generatedAt);
        entity = reportRepository.save(entity);
        return new ConceptReportResponse(
            entity.getId(), reportNumber, topic.getId(), topic.getName(),
            facetSections, sources, generatedAt);
    }
    public List<SavedConceptReportItem> listReports(String topicId) {
        return reportRepository.findByTopicIdOrderByReportNumberAsc(topicId).stream()
            .map(e -> new SavedConceptReportItem(e.getId(), e.getReportNumber(), e.getGeneratedAt()))
            .toList();
    }
    public ConceptReportResponse getReport(UUID reportId, Map<String, String> topicNamesById) {
        ConceptReportEntity entity = reportRepository.findById(reportId)
            .orElseThrow(() -> new NoSuchElementException("Concept report not found."));
        List<ConceptReportResponse.FacetSection> facets = deserializeFacets(entity.getFacetsJson());
        List<SourceReference> sources = deserializeSources(entity.getSourcesJson());
        String topicName = topicNamesById.getOrDefault(entity.getTopicId(), entity.getTopicId());
        return new ConceptReportResponse(
            entity.getId(), entity.getReportNumber(), entity.getTopicId(), topicName,
            facets, sources, entity.getGeneratedAt());
    }
    private String buildFacetPrompt(Topic topic, ConceptFacet facet, MergedFacet mf,
                                     Map<String, String> sectionLabel,
                                     Map<String, String> figureLabel,
                                     String language) {
        StringBuilder sb = new StringBuilder();
        sb.append("CONCEPT: ").append(topic.getName()).append("\n");
        sb.append("FACET: ").append(facet.displayTitle()).append("\n\n");
        sb.append("CONTEXT:\n\n");
        for (SectionEntity s : mf.sections) {
            String label = sectionLabel.get(s.getId());
            sb.append("[").append(label).append("] ")
              .append(s.getTitle() != null ? s.getTitle() : "")
              .append(", p.").append(s.getPageStart()).append("\n");
            sb.append(s.getFullText()).append("\n\n");
        }
        if (!mf.figures.isEmpty()) {
            sb.append("AVAILABLE FIGURES:\n");
            for (FigureEntity f : mf.figures) {
                String label = figureLabel.get(f.getId());
                sb.append("[").append(label).append("] ")
                  .append(f.getLabel() != null ? f.getLabel() : "Figure")
                  .append(" (p.").append(f.getPage()).append("): ")
                  .append(f.getCaption() != null ? f.getCaption() : "")
                  .append("\n");
            }
            sb.append("\n");
        }
        sb.append("Write the ").append(facet.displayTitle()).append(" section of a concept report on \"")
          .append(topic.getName())
          .append("\". Stay strictly within this facet. Use the [S#]/[F#] labels above for citations.");
        if ("th".equalsIgnoreCase(language)) {
            sb.append("\n\nIMPORTANT: Write the narrative in Thai. ")
              .append("Keep all medical, anatomical, surgical, pharmacological, and clinical ")
              .append("terminology in English (e.g., cerebellopontine angle, glioblastoma, craniotomy, ")
              .append("dexamethasone). Do NOT translate disease names, anatomical structures, drug names, ")
              .append("procedures, eponyms, or imaging modalities. Translate only connective prose, ")
              .append("explanations, and general descriptions. Citation labels [S#]/[F#] stay unchanged. ")
              .append("The sentinel string for insufficient context must remain exactly: ")
              .append("\"The uploaded books do not contain sufficient information on this aspect.\"");
        }
        return sb.toString();
    }
    private List<String> collectRefs(MergedFacet mf,
                                      Map<String, String> sectionLabel,
                                      Map<String, String> figureLabel) {
        List<String> refs = new ArrayList<>();
        for (SectionEntity s : mf.sections) {
            String l = sectionLabel.get(s.getId());
            if (l != null) refs.add(l);
        }
        for (FigureEntity f : mf.figures) {
            String l = figureLabel.get(f.getId());
            if (l != null) refs.add(l);
        }
        return refs;
    }
    private List<SourceReference> buildSources(List<SectionEntity> sections,
                                                 List<FigureEntity> figures,
                                                 List<Book> readyBooks) {
        List<SourceReference> sources = new ArrayList<>();
        for (int i = 0; i < sections.size(); i++) {
            SectionEntity s = sections.get(i);
            Book book = findBook(readyBooks, s.getBookId());
            String title = book != null ? book.getTitle() : "Book";
            String bookId = book != null ? book.getId().toString() : null;
            sources.add(new SourceReference(
                "TEXT", "S" + (i + 1), bookId, title, s.getPageStart(),
                truncate(s.getFullText(), 500), null, null, null, null, null));
        }
        for (int i = 0; i < figures.size(); i++) {
            FigureEntity f = figures.get(i);
            Book book = findBook(readyBooks, f.getBookId());
            String title = book != null ? book.getTitle() : "Book";
            String bookId = book != null ? book.getId().toString() : null;
            String filename = f.getImagePath().substring(f.getImagePath().lastIndexOf('/') + 1);
            String imageUrl = "/api/v1/figures/" + f.getBookId() + "/" + filename;
            sources.add(new SourceReference(
                "FIGURE", "F" + (i + 1), bookId, title, f.getPage(),
                null, f.getId(), f.getLabel(), f.getCaption(),
                f.getFigureType().name(), imageUrl));
        }
        return sources;
    }
    private Book findBook(List<Book> books, UUID bookId) {
        return books.stream().filter(b -> b.getId().equals(bookId)).findFirst().orElse(null);
    }
    private String serialize(Object value) {
        try {
            return objectMapper.writeValueAsString(value);
        } catch (JsonProcessingException e) {
            log.warn("Failed to serialize concept report field", e);
            return "[]";
        }
    }
    private List<ConceptReportResponse.FacetSection> deserializeFacets(String json) {
        try {
            return objectMapper.readValue(json,
                objectMapper.getTypeFactory().constructCollectionType(
                    List.class, ConceptReportResponse.FacetSection.class));
        } catch (JsonProcessingException e) {
            log.warn("Failed to deserialize facets", e);
            return List.of();
        }
    }
    private List<SourceReference> deserializeSources(String json) {
        try {
            return objectMapper.readValue(json,
                objectMapper.getTypeFactory().constructCollectionType(
                    List.class, SourceReference.class));
        } catch (JsonProcessingException e) {
            log.warn("Failed to deserialize sources", e);
            return List.of();
        }
    }
    private String truncate(String text, int maxChars) {
        if (text == null) return "";
        return text.length() <= maxChars ? text : text.substring(0, maxChars) + "…";
    }
    private static class MergedFacet {
        final List<SectionEntity> sections = new ArrayList<>();
        final List<FigureEntity> figures = new ArrayList<>();
        final Set<String> sectionIds = new HashSet<>();
        final Set<String> figureIds = new HashSet<>();
        void add(FacetBundle bundle) {
            for (SectionEntity s : bundle.sections()) {
                if (sectionIds.add(s.getId())) sections.add(s);
            }
            for (FigureEntity f : bundle.figures()) {
                if (figureIds.add(f.getId())) figures.add(f);
            }
        }
        boolean isEmpty() { return sections.isEmpty() && figures.isEmpty(); }
    }
 }
@@ -0,0 +1,10 @@
 package com.aiteacher.concept;
 import com.aiteacher.enrichment.ConceptFacet;
 import java.util.Map;
 public record ConceptRetrievalResult(
    Map<ConceptFacet, FacetBundle> byFacet,
    boolean usedFallback
 ) {}
@@ -0,0 +1,163 @@
 package com.aiteacher.concept;
 import com.aiteacher.document.*;
 import com.aiteacher.enrichment.ChunkMetadataEntity;
 import com.aiteacher.enrichment.ChunkMetadataRepository;
 import com.aiteacher.enrichment.ConceptFacet;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.ai.document.Document;
 import org.springframework.ai.vectorstore.SearchRequest;
 import org.springframework.ai.vectorstore.VectorStore;
 import org.springframework.ai.vectorstore.filter.FilterExpressionBuilder;
 import org.springframework.stereotype.Service;
 import java.util.*;
 import java.util.stream.Collectors;
@Service
 public class ConceptRetriever {
    private static final Logger log = LoggerFactory.getLogger(ConceptRetriever.class);
    private static final int FALLBACK_TOP_K = 30;
    private static final int FIGURE_TOP_K = 6;
    private final ChunkMetadataRepository metadataRepository;
    private final VectorStore vectorStore;
    private final SectionRepository sectionRepository;
    private final FigureRepository figureRepository;
    private final ChunkFigureRefRepository chunkFigureRefRepository;
    public ConceptRetriever(ChunkMetadataRepository metadataRepository,
                             VectorStore vectorStore,
                             SectionRepository sectionRepository,
                             FigureRepository figureRepository,
                             ChunkFigureRefRepository chunkFigureRefRepository) {
        this.metadataRepository = metadataRepository;
        this.vectorStore = vectorStore;
        this.sectionRepository = sectionRepository;
        this.figureRepository = figureRepository;
        this.chunkFigureRefRepository = chunkFigureRefRepository;
    }
    public ConceptRetrievalResult retrieveByConcept(String conceptKeyword, UUID bookId) {
        String canonical = canonicalise(conceptKeyword);
        List<ChunkMetadataEntity> hits = metadataRepository
            .findByBookIdAndEntityContains(bookId, canonical);
        boolean fallback = false;
        if (hits.isEmpty()) {
            log.debug("Entity match miss for '{}' in book {} — falling back to vector search", canonical, bookId);
            fallback = true;
            hits = vectorFallback(conceptKeyword, bookId);
        }
        if (hits.isEmpty()) {
            return new ConceptRetrievalResult(Map.of(), fallback);
        }
        List<FigureEntity> semanticFigures = semanticFigureSearch(conceptKeyword, bookId);
        Map<ConceptFacet, List<ChunkMetadataEntity>> grouped = hits.stream()
            .collect(Collectors.groupingBy(
                ChunkMetadataEntity::getFacet,
                LinkedHashMap::new,
                Collectors.toList()));
        Map<ConceptFacet, FacetBundle> result = new LinkedHashMap<>();
        for (Map.Entry<ConceptFacet, List<ChunkMetadataEntity>> entry : grouped.entrySet()) {
            result.put(entry.getKey(), hydrate(entry.getValue(), semanticFigures));
        }
        return new ConceptRetrievalResult(result, fallback);
    }
    private List<ChunkMetadataEntity> vectorFallback(String query, UUID bookId) {
        FilterExpressionBuilder b = new FilterExpressionBuilder();
        List<Document> textHits = vectorStore.similaritySearch(
            SearchRequest.builder()
                .query(query)
                .topK(FALLBACK_TOP_K)
                .filterExpression(b.and(
                    b.eq("type", "TEXT"),
                    b.eq("book_id", bookId.toString())
                ).build())
                .build()
        );
        List<UUID> chunkIds = textHits.stream()
            .map(d -> {
                try { return UUID.fromString(d.getId()); }
                catch (Exception e) { return null; }
            })
            .filter(Objects::nonNull)
            .toList();
        if (chunkIds.isEmpty()) return List.of();
        return metadataRepository.findByChunkIdIn(chunkIds);
    }
    private FacetBundle hydrate(List<ChunkMetadataEntity> chunks, List<FigureEntity> semanticFigures) {
        List<String> sectionIds = chunks.stream()
            .map(ChunkMetadataEntity::getSectionId)
            .distinct()
            .toList();
        List<SectionEntity> sections = sectionIds.isEmpty()
            ? List.of()
            : sectionRepository.findAllById(sectionIds);
        List<UUID> chunkIds = chunks.stream().map(ChunkMetadataEntity::getChunkId).toList();
        List<String> linkedFigureIds = chunkFigureRefRepository.findByChunkIdIn(chunkIds)
            .stream()
            .map(ChunkFigureRefEntity::getFigureId)
            .distinct()
            .toList();
        List<FigureEntity> linkedFigures = linkedFigureIds.isEmpty()
            ? List.of()
            : figureRepository.findAllById(linkedFigureIds);
        // Merge caption-semantic-search figures with chunk-linked figures (dedupe by id, linked first)
        Map<String, FigureEntity> merged = new LinkedHashMap<>();
        linkedFigures.forEach(f -> merged.put(f.getId(), f));
        semanticFigures.forEach(f -> merged.putIfAbsent(f.getId(), f));
        List<String> summaries = chunks.stream()
            .map(ChunkMetadataEntity::getSummary)
            .filter(s -> s != null && !s.isBlank())
            .distinct()
            .toList();
        return new FacetBundle(sections, new ArrayList<>(merged.values()), summaries);
    }
    private List<FigureEntity> semanticFigureSearch(String query, UUID bookId) {
        FilterExpressionBuilder b = new FilterExpressionBuilder();
        List<Document> figureHits = vectorStore.similaritySearch(
            SearchRequest.builder()
                .query(query)
                .topK(FIGURE_TOP_K)
                .filterExpression(b.and(
                    b.eq("type", "FIGURE"),
                    b.eq("book_id", bookId.toString())
                ).build())
                .build()
        );
        List<String> figureIds = figureHits.stream()
            .map(d -> (String) d.getMetadata().get("figure_id"))
            .filter(Objects::nonNull)
            .toList();
        return figureIds.isEmpty() ? List.of() : figureRepository.findAllById(figureIds);
    }
    static String canonicalise(String raw) {
        if (raw == null) return "";
        String s = raw.trim().toLowerCase(Locale.ROOT);
        if (s.endsWith("ies") && s.length() > 3) {
            s = s.substring(0, s.length() - 3) + "y";
        } else if (s.endsWith("es") && s.length() > 2) {
            s = s.substring(0, s.length() - 2);
        } else if (s.endsWith("s") && s.length() > 1 && !s.endsWith("ss")) {
            s = s.substring(0, s.length() - 1);
        }
        return s;
    }
 }
@@ -0,0 +1,12 @@
 package com.aiteacher.concept;
 import com.aiteacher.document.FigureEntity;
 import com.aiteacher.document.SectionEntity;
 import java.util.List;
 public record FacetBundle(
    List<SectionEntity> sections,
    List<FigureEntity> figures,
    List<String> chunkSummaries
 ) {}
@@ -0,0 +1,10 @@
 package com.aiteacher.concept;
 import java.time.Instant;
 import java.util.UUID;
 public record SavedConceptReportItem(
    UUID id,
    int reportNumber,
    Instant generatedAt
 ) {}
@@ -1,25 +1,37 @@
 package com.aiteacher.config;
-import org.springframework.beans.factory.annotation.Value;
+import com.aiteacher.figure.FigureStorageService;
-import org.springframework.context.annotation.Configuration;
+import org.springframework.http.HttpStatus;
-import org.springframework.web.servlet.config.annotation.ResourceHandlerRegistry;
+import org.springframework.web.bind.annotation.*;
-import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
+import org.springframework.web.server.ResponseStatusException;
-import java.nio.file.Paths;
+import jakarta.servlet.http.HttpServletResponse;
 import java.io.IOException;
-@Configuration
+/**
-public class FigureStorageConfig implements WebMvcConfigurer {
+ * Serves figure images by redirecting to a presigned S3 URL.
 * The key stored in DB is the full S3 object key, e.g. "figures/{bookId}/{figureId}.png".
 */
@RestController
@RequestMapping("/api/v1/figures")
 public class FigureStorageConfig {
-    private final String basePath;
+    private final FigureStorageService figureStorageService;
-    public FigureStorageConfig(@Value("${app.figure-storage.base-path:./uploads}") String basePath) {
+    public FigureStorageConfig(FigureStorageService figureStorageService) {
-        this.basePath = Paths.get(basePath).toAbsolutePath().normalize().toString();
+        this.figureStorageService = figureStorageService;
    }
-    @Override
+    @GetMapping("/{bookId}/{filename}")
-    public void addResourceHandlers(ResourceHandlerRegistry registry) {
+    public void serve(@PathVariable String bookId,
-        // Serve GET /api/v1/figures/** from the local file store
+                      @PathVariable String filename,
-        registry.addResourceHandler("/api/v1/figures/**")
+                      HttpServletResponse response) throws IOException {
-                .addResourceLocations("file:" + basePath + "/figures/");
+        String key = "figures/" + bookId + "/" + filename;
        try {
            String url = figureStorageService.presignedUrl(key);
            response.sendRedirect(url);
        } catch (Exception ex) {
            throw new ResponseStatusException(HttpStatus.NOT_FOUND, "Figure not found: " + key);
        }
    }
 }
@@ -0,0 +1,30 @@
 package com.aiteacher.config;
 import org.springframework.beans.factory.annotation.Value;
 import org.springframework.context.annotation.Bean;
 import org.springframework.context.annotation.Configuration;
 import org.springframework.http.client.JdkClientHttpRequestFactory;
 import org.springframework.web.client.RestClient;
 import java.net.http.HttpClient;
@Configuration
 public class MarkerConfig {
    @Value("${app.marker.base-url:http://localhost:8000}")
    private String markerBaseUrl;
    @Bean
    RestClient markerRestClient() {
        // Use the JDK HTTP client with no timeout — Marker conversions can take several minutes.
        HttpClient httpClient = HttpClient.newBuilder()
                .build();
        JdkClientHttpRequestFactory factory = new JdkClientHttpRequestFactory(httpClient);
        // No read timeout set: JDK HTTP client defaults to no deadline.
        return RestClient.builder()
                .baseUrl(markerBaseUrl)
                .requestFactory(factory)
                .build();
    }
 }
@@ -0,0 +1,92 @@
 package com.aiteacher.config;
 import org.springframework.aot.hint.MemberCategory;
 import org.springframework.aot.hint.RuntimeHints;
 import org.springframework.aot.hint.RuntimeHintsRegistrar;
 import org.springframework.aot.hint.TypeReference;
 import java.util.List;
 /**
 * GraalVM native-image runtime hints for third-party libraries that use reflection
 * or classpath resource scanning not covered by Spring Boot's AOT processor.
 *
 * Registered via @ImportRuntimeHints on AiTeacherApplication.
 */
 public class NativeHintsConfig implements RuntimeHintsRegistrar {
    @Override
    public void registerHints(RuntimeHints hints, ClassLoader classLoader) {
        // PDFBox — font and encoding resources loaded via classpath scanning at runtime
        hints.resources().registerPattern("org/apache/pdfbox/resources/*");
        hints.resources().registerPattern("org/apache/pdfbox/resources/afm/*");
        hints.resources().registerPattern("org/apache/pdfbox/resources/cmap/*");
        hints.resources().registerPattern("org/apache/pdfbox/resources/glyphlist/*");
        hints.resources().registerPattern("org/apache/pdfbox/resources/icc/*");
        hints.resources().registerPattern("org/apache/pdfbox/resources/ttf/*");
        hints.resources().registerPattern("org/apache/pdfbox/resources/version.properties");
        // PDFBox — font encoding classes instantiated via reflection
        hints.reflection().registerType(
                org.apache.pdfbox.pdmodel.font.encoding.GlyphList.class,
                MemberCategory.INVOKE_PUBLIC_CONSTRUCTORS,
                MemberCategory.INVOKE_PUBLIC_METHODS
        );
        hints.reflection().registerType(
                org.apache.pdfbox.pdmodel.font.encoding.WinAnsiEncoding.class,
                MemberCategory.INVOKE_PUBLIC_CONSTRUCTORS
        );
        hints.reflection().registerType(
                org.apache.pdfbox.pdmodel.font.encoding.MacRomanEncoding.class,
                MemberCategory.INVOKE_PUBLIC_CONSTRUCTORS
        );
        hints.reflection().registerType(
                org.apache.pdfbox.pdmodel.font.encoding.MacExpertEncoding.class,
                MemberCategory.INVOKE_PUBLIC_CONSTRUCTORS
        );
        hints.reflection().registerType(
                org.apache.pdfbox.pdmodel.font.encoding.StandardEncoding.class,
                MemberCategory.INVOKE_PUBLIC_CONSTRUCTORS
        );
        // JPA / Hibernate — array types used in entity mappings
        hints.reflection().registerType(java.util.UUID[].class, MemberCategory.INVOKE_PUBLIC_CONSTRUCTORS);
        // JBoss Logging — message logger implementations generated by annotation processor.
        // JBoss Logging uses reflection to look up the generated *_$logger class by name.
        registerJBossLogger(hints, "org.hibernate.jpa.internal.JpaLogger_$logger");
        registerJBossLogger(hints, "org.hibernate.internal.CoreMessageLogger_$logger");
        registerJBossLogger(hints, "org.hibernate.internal.EntityManagerMessageLogger_$logger");
        // AWS SDK v2 — HTTP client and SdkPojo serialization
        hints.resources().registerPattern("software/amazon/awssdk/global/handlers/execution.interceptors");
        hints.resources().registerPattern("software/amazon/awssdk/services/s3/execution.interceptors");
        hints.resources().registerPattern("codegen-resources/s3/*");
        hints.reflection().registerType(
                software.amazon.awssdk.services.s3.S3Client.class,
                MemberCategory.INVOKE_PUBLIC_METHODS
        );
        // Jackson deserialization of records persisted as JSON in DB columns.
        // These are reached only via ObjectMapper.readValue in services, so Spring's
        // BindingReflectionHintsRegistrar does not auto-discover all accessors.
        for (Class<?> type : List.of(
                com.aiteacher.topic.TopicSummaryResponse.class,
                com.aiteacher.topic.TopicSummaryResponse.SourceReference.class,
                com.aiteacher.concept.ConceptReportResponse.class,
                com.aiteacher.concept.ConceptReportResponse.FacetSection.class
        )) {
            hints.reflection().registerType(type,
                    MemberCategory.INVOKE_DECLARED_CONSTRUCTORS,
                    MemberCategory.INVOKE_DECLARED_METHODS);
        }
    }
    private void registerJBossLogger(RuntimeHints hints, String className) {
        hints.reflection().registerType(
                TypeReference.of(className),
                MemberCategory.INVOKE_PUBLIC_CONSTRUCTORS,
                MemberCategory.INVOKE_PUBLIC_METHODS
        );
    }
 }
@@ -20,7 +20,9 @@ public class SecurityConfig {
    @Bean
    public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
        http
-            .authorizeHttpRequests(auth -> auth.anyRequest().authenticated())
+            .authorizeHttpRequests(auth -> auth
                .requestMatchers("/api/v1/figures/**").permitAll()
                .anyRequest().authenticated())
            .httpBasic(Customizer.withDefaults())
            .csrf(AbstractHttpConfigurer::disable);
        return http.build();
@@ -28,9 +30,10 @@ public class SecurityConfig {
    @Bean
    public UserDetailsService userDetailsService(
            @Value("${app.auth.username}") String username,
            @Value("${app.auth.password}") String password) {
        UserDetails user = User.builder()
-            .username("neurosurgeon")
+            .username(username)
            .password("{noop}" + password)
            .roles("USER")
            .build();
@@ -1,43 +1,43 @@
 package com.aiteacher.document;
 import com.aiteacher.figure.FigureStorageService;
 import org.apache.pdfbox.Loader;
 import org.apache.pdfbox.cos.COSName;
 import org.apache.pdfbox.pdmodel.PDDocument;
 import org.apache.pdfbox.pdmodel.PDPage;
 import org.apache.pdfbox.pdmodel.graphics.PDXObject;
 import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.beans.factory.annotation.Value;
 import org.springframework.stereotype.Service;
 import javax.imageio.ImageIO;
 import java.awt.image.BufferedImage;
 import java.io.ByteArrayInputStream;
 import java.io.IOException;
 import java.nio.file.Path;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.UUID;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 /**
- * Extracts images from each PDF page using PDFBox.
+ * Extracts figure images from {@link PageResult.FigureData} entries produced by
- * Images below the configured minimum size are skipped.
+ * {@link MarkerPageParser}.
- * Caption is detected by the "Fig." pattern in page text.
+ *
 * <p>Marker returns pre-cropped PNG bytes for each detected figure, so no PDFBox
 * page rendering or bounding-box cropping is needed. This service:
 * <ol>
 *   <li>Decodes the PNG bytes to check dimensions (skip images below min size)</li>
 *   <li>Classifies the figure type from caption and surrounding text keywords</li>
 *   <li>Persists the image via {@link FigureStorageService}</li>
 *   <li>Persists a {@link FigureEntity} to the database</li>
 * </ol>
 */
@Service
 public class FigureExtractionService {
    private static final Logger log = LoggerFactory.getLogger(FigureExtractionService.class);
    // Caption: line starting with "Fig." or "Figure" followed by a number
    private static final Pattern CAPTION_PATTERN =
        Pattern.compile("(?m)^(Fig\\.?\\s*\\d+[\\-.]?\\d*[^\\n]*)", Pattern.CASE_INSENSITIVE);
    // Figure label: "Fig. 12-4" or "Fig. 12.4"
    private static final Pattern LABEL_PATTERN =
-        Pattern.compile("(?i)Fig\\.?\\s*(\\d+[\\-.\\d]*)");
+            Pattern.compile("(?i)Fig\\.?\\s*(\\d+[\\-.\\d]*)");
    private final FigureStorageService storageService;
    private final FigureRepository figureRepository;
@@ -52,65 +52,77 @@ public class FigureExtractionService {
        this.minImageSizePx = minImageSizePx;
    }
    /** Holds the extraction output: persisted figures and a Marker blockId → DB figureId map. */
    public record ExtractionResult(List<FigureEntity> figures, Map<String, String> blockIdToFigureId) {}
    /**
-     * Extracts all qualifying images from the PDF for the given book.
+     * Extracts and persists figures for all pages described by {@code pageResults}.
-     * Returns persisted FigureEntity list (without vision descriptions — set later).
+     *
     * @param bookId      owning book
     * @param chapterId   chapter bucket for these sections
     * @param pageResults Marker parse output — each entry's {@code figures} list
     *                    carries pre-cropped PNG bytes for that page
     * @return {@link ExtractionResult} with persisted figures and blockId→figureId map
     *         (used to resolve markdown image placeholders)
     */
-    public List<FigureEntity> extract(UUID bookId, String chapterId,
+    public ExtractionResult extract(UUID bookId, String chapterId,
-                                      List<SectionEntity> sections, Path pdfPath) {
+                                    List<PageResult> pageResults) {
        List<FigureEntity> figures = new ArrayList<>();
        Map<String, String> blockIdToFigureId = new HashMap<>();
        int figureCounter = 0;
-        try (PDDocument doc = Loader.loadPDF(pdfPath.toFile())) {
+        for (PageResult page : pageResults) {
-            for (SectionEntity section : sections) {
+            if (page.figures().isEmpty()) continue;
                int pageIndex = section.getPageStart() - 1; // 0-based
                if (pageIndex < 0 || pageIndex >= doc.getNumberOfPages()) continue;
                PDPage page = doc.getPage(pageIndex);
                String pageText = section.getFullText();
            for (PageResult.FigureData figureData : page.figures()) {
                try {
-                    for (COSName name : page.getResources().getXObjectNames()) {
+                    BufferedImage image = decodeImage(figureData.imageBytes());
-                        PDXObject xObject = page.getResources().getXObject(name);
+                    if (image == null) {
-                        if (!(xObject instanceof PDImageXObject image)) continue;
+                        log.debug("Could not decode image on page {} of book {} (block {})",
-
+                                page.pageNumber(), bookId, figureData.blockId());
-                        BufferedImage bufferedImage = image.getImage();
+                        continue;
                        if (bufferedImage.getWidth() < minImageSizePx
                                || bufferedImage.getHeight() < minImageSizePx) {
                            continue; // skip decorative images
                        }
                        figureCounter++;
                        String figureId = bookId + "-fig-" + pageIndex + "-" + figureCounter;
                        String caption = detectCaption(pageText);
                        String label = detectLabel(caption, figureCounter);
                        FigureType type = classifyType(caption, pageText);
                        String imagePath = storageService.save(bookId, figureId, bufferedImage);
                        FigureEntity figure = new FigureEntity(
                            figureId, bookId, section.getId(), chapterId,
                            label, caption, type, section.getPageStart(), imagePath
                        );
                        figures.add(figureRepository.save(figure));
                    }
-                } catch (IOException ex) {
+                    if (image.getWidth() < minImageSizePx || image.getHeight() < minImageSizePx) {
-                    log.warn("Failed to extract images from page {} of book {}: {}",
+                        log.debug("Skipping small figure on page {} ({}×{})",
-                        section.getPageStart(), bookId, ex.getMessage());
+                                page.pageNumber(), image.getWidth(), image.getHeight());
                        continue;
                    }
                    figureCounter++;
                    String figureId = bookId + "-fig-" + page.pageNumber() + "-" + figureCounter;
                    String caption = figureData.nearestCaption();
                    String label = detectLabel(caption, figureCounter);
                    FigureType type = classifyType(caption, page.orderedText());
                    String sectionId = bookId + "-p" + page.pageNumber();
                    String imagePath = storageService.save(bookId, figureId, image);
                    FigureEntity figure = new FigureEntity(
                            figureId, bookId, sectionId, chapterId,
                            label, caption, type, page.pageNumber(), imagePath);
                    figures.add(figureRepository.save(figure));
                    blockIdToFigureId.put(figureData.blockId(), figureId);
                } catch (Exception ex) {
                    log.warn("Failed to extract figure on page {} of book {}: {}",
                            page.pageNumber(), bookId, ex.getMessage());
                }
            }
        } catch (IOException ex) {
            log.error("Could not open PDF for image extraction, book {}", bookId, ex);
        }
        log.info("Extracted {} figures for book {}", figures.size(), bookId);
-        return figures;
+        return new ExtractionResult(figures, blockIdToFigureId);
    }
-    private String detectCaption(String pageText) {
+    // --- Private helpers ---
-        if (pageText == null) return null;
+
-        Matcher m = CAPTION_PATTERN.matcher(pageText);
+    private BufferedImage decodeImage(byte[] imageBytes) {
-        return m.find() ? m.group(1).trim() : null;
+        if (imageBytes == null || imageBytes.length == 0) return null;
        try {
            return ImageIO.read(new ByteArrayInputStream(imageBytes));
        } catch (IOException ex) {
            return null;
        }
    }
    private String detectLabel(String caption, int counter) {
@@ -122,14 +134,18 @@ public class FigureExtractionService {
    }
    private FigureType classifyType(String caption, String pageText) {
-        String combined = ((caption != null ? caption : "") + " " + (pageText != null ? pageText : "")).toLowerCase();
+        String combined = ((caption != null ? caption : "") + " " +
                           (pageText != null ? pageText : "")).toLowerCase();
        if (combined.contains("mri") || combined.contains("ct ") || combined.contains("magnetic")
-                || combined.contains("tomography")) return FigureType.MRI_CT_SCAN;
+                || combined.contains("tomography"))    return FigureType.MRI_CT_SCAN;
-        if (combined.contains("intraoperative") || combined.contains("intra-op")) return FigureType.INTRAOPERATIVE_IMAGE;
+        if (combined.contains("intraoperative") || combined.contains("intra-op"))
-        if (caption != null && caption.toLowerCase().startsWith("table")) return FigureType.TABLE;
+                                                       return FigureType.INTRAOPERATIVE_IMAGE;
        if (caption != null && caption.toLowerCase().startsWith("table"))
                                                       return FigureType.TABLE;
        if (combined.contains("chart") || combined.contains("histogram") || combined.contains("graph"))
-            return FigureType.CHART;
+                                                       return FigureType.CHART;
-        if (combined.contains("photograph") || combined.contains("photo")) return FigureType.SURGICAL_PHOTOGRAPH;
+        if (combined.contains("photograph") || combined.contains("photo"))
                                                       return FigureType.SURGICAL_PHOTOGRAPH;
        return FigureType.ANATOMICAL_DIAGRAM;
    }
 }
@@ -0,0 +1,14 @@
 package com.aiteacher.document;
 import java.util.UUID;
 public interface MarkdownStorageService {
    /** Uploads the markdown content and returns the S3 key. */
    String save(UUID bookId, int pageNumber, String markdown);
    /** Downloads and returns the markdown content for the given book and page. */
    String getText(UUID bookId, int pageNumber);
    /** Deletes all markdown files for the given book. */
    void deleteAll(UUID bookId);
 }
@@ -0,0 +1,335 @@
 package com.aiteacher.document;
 import tools.jackson.databind.JsonNode;
 import tools.jackson.databind.ObjectMapper;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.beans.factory.annotation.Qualifier;
 import org.springframework.core.io.FileSystemResource;
 import org.springframework.http.MediaType;
 import org.springframework.stereotype.Service;
 import org.springframework.util.LinkedMultiValueMap;
 import org.springframework.util.MultiValueMap;
 import org.springframework.web.client.RestClient;
 import java.io.IOException;
 import java.nio.file.Files;
 import java.nio.file.Path;
 import java.util.*;
 /**
 * Parses a PDF with a single call to the Marker server using {@code output_format=json}.
 *
 * <p>The JSON response contains an {@code output} field that is itself a JSON string with a
 * tree structure: the root has a {@code children} array where each item is a {@code Page} block.
 * Each block carries an {@code html} field with {@code <content-ref src='blockId'>} placeholders
 * that reference its {@code children} by ID.
 *
 * <p>{@link #jsonToHtml} mirrors the Marker Python {@code json_to_html} utility: it walks the
 * tree recursively and resolves every {@code content-ref} with the rendered HTML of the
 * referenced child block.
 *
 * <p>Returns a {@link ParsedBook} with:
 * <ul>
 *   <li>{@code pages} — one {@link PageResult} per non-empty page (drives embeddings)</li>
 *   <li>{@code htmlByPage} — full resolved HTML per page (saved to S3 for the reader)</li>
 * </ul>
 */
@Service
 public class MarkerPageParser {
    private static final Logger log = LoggerFactory.getLogger(MarkerPageParser.class);
    private static final Set<String> TEXT_BLOCK_TYPES = Set.of(
            "Text", "TextInlineMath", "ListItem", "Table", "TableOfContents", "Code", "Equation",
            "Footnote", "Caption", "PageHeader", "PageFooter", "Handwriting"
    );
    private static final Set<String> FIGURE_BLOCK_TYPES = Set.of("Figure", "Picture", "FigureGroup", "PictureGroup");
    private static final int CHUNK_SIZE = 100;
    private static final ObjectMapper MAPPER = new ObjectMapper();
    private final RestClient restClient;
    private final PdfSplitterService pdfSplitterService;
    public MarkerPageParser(@Qualifier("markerRestClient") RestClient restClient,
                            PdfSplitterService pdfSplitterService) {
        this.restClient = restClient;
        this.pdfSplitterService = pdfSplitterService;
    }
    /**
     * Parses a PDF by splitting it into {@value #CHUNK_SIZE}-page chunks, submitting each
     * chunk to Marker individually, and merging the results into a single {@link ParsedBook}.
     * Page numbers in the merged result are absolute (1-based across the whole document).
     */
    public ParsedBook parse(Path pdfPath) throws IOException {
        List<PdfSplitterService.PdfChunk> chunks = pdfSplitterService.split(pdfPath, CHUNK_SIZE);
        log.info("Processing {} chunk(s) for {}", chunks.size(), pdfPath.getFileName());
        List<PageResult> allPages = new ArrayList<>();
        Map<Integer, String> allHtml = new LinkedHashMap<>();
        try {
            for (int c = 0; c < chunks.size(); c++) {
                PdfSplitterService.PdfChunk chunk = chunks.get(c);
                log.info("Submitting chunk {}/{} to Marker (page offset {})", c + 1, chunks.size(), chunk.pageOffset());
                ParsedBook chunkResult = submitChunk(chunk.tempFile());
                // Rebase page numbers from chunk-relative to document-absolute
                for (PageResult page : chunkResult.pages()) {
                    int absolutePage = chunk.pageOffset() + page.pageNumber();
                    allPages.add(new PageResult(absolutePage, page.orderedText(), page.headingTitle(), page.figures()));
                }
                chunkResult.htmlByPage().forEach((chunkPage, html) ->
                        allHtml.put(chunk.pageOffset() + chunkPage, html));
            }
        } finally {
            // Delete temporary chunk files (skip if the chunk is the original PDF)
            for (PdfSplitterService.PdfChunk chunk : chunks) {
                if (!chunk.tempFile().equals(pdfPath)) {
                    try { Files.deleteIfExists(chunk.tempFile()); }
                    catch (IOException e) { log.warn("Could not delete temp chunk {}", chunk.tempFile()); }
                }
            }
        }
        log.info("Marker produced {} non-empty pages from {} chunk(s) of {}",
                allPages.size(), chunks.size(), pdfPath.getFileName());
        return new ParsedBook(allPages, allHtml);
    }
    /** Submits a single PDF file to Marker and returns the parsed result with chunk-relative page numbers. */
    private ParsedBook submitChunk(Path chunkPath) {
        MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
        body.add("file", new FileSystemResource(chunkPath));
        body.add("output_format", "json");
        JsonNode response = restClient.post()
                .uri("/marker/upload")
                .contentType(MediaType.MULTIPART_FORM_DATA)
                .body(body)
                .retrieve()
                .body(JsonNode.class);
        try {
            Files.writeString(Path.of("/tmp/marker-response-json.json"), response.toPrettyString());
        } catch (IOException e) {
            log.warn("Could not save Marker response to /tmp/marker-response-json.json", e);
        }
        List<JsonNode> pageNodes = extractPages(response);
        if (pageNodes.isEmpty()) {
            log.warn("Marker returned no pages for chunk {}", chunkPath.getFileName());
            return new ParsedBook(List.of(), Map.of());
        }
        List<PageResult> pages = new ArrayList<>();
        Map<Integer, String> htmlByPage = new LinkedHashMap<>();
        for (int i = 0; i < pageNodes.size(); i++) {
            JsonNode pageNode = pageNodes.get(i);
            int pageNumber = i + 1; // 1-based, chunk-relative
            PageResult result = buildPageResult(pageNode, pageNumber);
            String html = jsonToHtml(pageNode);
            // Always save HTML so the reader can navigate to every page
            htmlByPage.put(pageNumber, html);
            // Only queue for embedding if the page has extractable content
            if (!result.orderedText().isBlank() || !result.figures().isEmpty()) {
                pages.add(result);
            }
        }
        return new ParsedBook(pages, htmlByPage);
    }
    // ── Page extraction ───────────────────────────────────────────────────────
    /**
     * Parses the {@code output} JSON string and returns the list of page nodes
     * (the top-level {@code children} of the document root).
     */
    private List<JsonNode> extractPages(JsonNode response) {
        if (response == null) return List.of();
        JsonNode outputNode = response.path("output");
        if (outputNode.isMissingNode()) {
            log.warn("Marker response has no 'output' field");
            return List.of();
        }
        try {
            JsonNode root = MAPPER.readTree(outputNode.stringValue());
            JsonNode children = root.path("children");
            if (children.isMissingNode() || !children.isArray()) {
                log.warn("Marker output root has no 'children' array");
                return List.of();
            }
            List<JsonNode> result = new ArrayList<>();
            children.forEach(result::add);
            return result;
        } catch (Exception e) {
            log.warn("Could not parse Marker 'output' string as JSON: {}", e.getMessage());
            return List.of();
        }
    }
    // ── HTML rendering ────────────────────────────────────────────────────────
    /**
     * Java equivalent of the Marker Python {@code json_to_html} utility.
     *
     * <p>Algorithm:
     * <ol>
     *   <li>If the block has no children, return its {@code html} as-is (leaf node).</li>
     *   <li>Otherwise recursively render each child, then replace every
     *       {@code <content-ref src='childId'>} placeholder in the block's own {@code html}
     *       with the rendered child HTML.</li>
     * </ol>
     */
    String jsonToHtml(JsonNode block) {
        String html = str(block.path("html"));
        // If the block carries image data, inject <img> data-URI tags.
        // Marker stores base64 image bytes in block.images keyed by block ID.
        // Picture/Figure leaf blocks have empty html, so this is the only way to
        // get the image into the rendered output.
        JsonNode images = block.path("images");
        if (!images.isMissingNode() && !images.isNull() && !images.isEmpty()) {
            StringBuilder imgTags = new StringBuilder();
            images.properties().forEach(entry -> {
                String base64 = str(entry.getValue());
                if (!base64.isEmpty()) {
                    String mime = detectImageMime(base64);
                    imgTags.append("<img src=\"data:").append(mime)
                           .append(";base64,").append(base64).append("\">");
                }
            });
            if (!imgTags.isEmpty()) {
                html = html + imgTags;
            }
        }
        JsonNode children = block.path("children");
        if (children.isMissingNode() || children.isNull() || !children.isArray() || children.isEmpty()) {
            return html; // leaf node
        }
        // Build id → rendered-html map for all direct children
        Map<String, String> childHtml = new LinkedHashMap<>();
        for (JsonNode child : children) {
            String id = str(child.path("id"));
            childHtml.put(id, jsonToHtml(child));
        }
        // Replace every <content-ref src='id'></content-ref> with the child's HTML
        for (Map.Entry<String, String> entry : childHtml.entrySet()) {
            String ref = "<content-ref src='" + entry.getKey() + "'></content-ref>";
            html = html.replace(ref, entry.getValue());
        }
        return html;
    }
    // ── PageResult (text + figures for embeddings) ────────────────────────────
    private PageResult buildPageResult(JsonNode pageBlock, int pageNumber) {
        StringBuilder text = new StringBuilder();
        String[] headingTitle = {null};
        List<PageResult.FigureData> figures = new ArrayList<>();
        walkBlock(pageBlock, text, headingTitle, figures);
        return new PageResult(pageNumber, text.toString().strip(), headingTitle[0], figures);
    }
    /** Recursively walks the block tree, collecting text and figures in reading order. */
    private void walkBlock(JsonNode block, StringBuilder text, String[] headingTitle,
                           List<PageResult.FigureData> figures) {
        String type = str(block.path("block_type"));
        if ("SectionHeader".equals(type)) {
            String heading = stripHtml(str(block.path("html"))).strip();
            if (!heading.isEmpty() && headingTitle[0] == null) headingTitle[0] = heading;
            appendText(text, heading);
        } else if (TEXT_BLOCK_TYPES.contains(type)) {
            appendText(text, stripHtml(str(block.path("html"))));
        } else if (FIGURE_BLOCK_TYPES.contains(type)) {
            String caption = findCaption(block);
            extractFigures(block, caption, figures);
        }
        // Recurse into children (content-ref ordering is implicit via tree order)
        JsonNode children = block.path("children");
        if (!children.isMissingNode() && !children.isNull() && children.isArray()) {
            for (JsonNode child : children) {
                walkBlock(child, text, headingTitle, figures);
            }
        }
    }
    /** Finds the first Caption child inside a figure block, if any. */
    private String findCaption(JsonNode figureBlock) {
        JsonNode children = figureBlock.path("children");
        if (children.isMissingNode() || !children.isArray()) return null;
        for (JsonNode child : children) {
            if ("Caption".equals(str(child.path("block_type")))) {
                String caption = stripHtml(str(child.path("html"))).strip();
                return caption.isEmpty() ? null : caption;
            }
        }
        return null;
    }
    private void extractFigures(JsonNode block, String caption, List<PageResult.FigureData> out) {
        JsonNode images = block.path("images");
        if (images.isMissingNode() || images.isEmpty()) return;
        images.properties().forEach(entry -> {
            String blockId = entry.getKey();
            String base64 = str(entry.getValue());
            if (base64.isEmpty()) return;
            try {
                byte[] bytes = Base64.getDecoder().decode(base64);
                out.add(new PageResult.FigureData(bytes, caption, blockId));
            } catch (IllegalArgumentException ex) {
                log.warn("Could not decode base64 image for block {}: {}", blockId, ex.getMessage());
            }
        });
    }
    // ── Utilities ─────────────────────────────────────────────────────────────
    private void appendText(StringBuilder sb, String text) {
        if (text == null) return;
        String stripped = text.strip();
        if (stripped.isEmpty()) return;
        if (sb.length() > 0) sb.append("\n\n");
        sb.append(stripped);
    }
    private String stripHtml(String html) {
        if (html == null || html.isEmpty()) return "";
        return html.replaceAll("<[^>]*>", "").replaceAll("\\s{2,}", " ").strip();
    }
    /** Detects MIME type from the first characters of a base64-encoded image. */
    private static String detectImageMime(String base64) {
        if (base64.startsWith("/9j/"))   return "image/jpeg";
        if (base64.startsWith("iVBOR"))  return "image/png";
        if (base64.startsWith("R0lGO"))  return "image/gif";
        if (base64.startsWith("UklGR"))  return "image/webp";
        return "image/png"; // safe fallback
    }
    /** Null-safe string extraction from a JsonNode (Jackson 3: stringValue() returns null for non-strings). */
    private static String str(JsonNode node) {
        String v = node.stringValue();
        return v != null ? v : "";
    }
 }
@@ -0,0 +1,25 @@
 package com.aiteacher.document;
 import java.util.List;
 /**
 * Internal DTO produced by MarkerPageParser for one PDF page.
 * Decouples the Marker HTTP API from downstream services.
 */
 public record PageResult(
        int pageNumber,           // 1-based, derived from Marker page block index
        String orderedText,       // full page text in correct reading order (blocks joined by \n\n)
        String headingTitle,      // first SectionHeader block on page, or null
        List<FigureData> figures  // extracted figure images (may be empty)
 ) {
    /**
     * A figure extracted from the page.
     * Image bytes are PNG data decoded from the Marker JSON {@code images} map.
     */
    public record FigureData(
            byte[] imageBytes,       // PNG image data (base64-decoded from Marker response)
            String nearestCaption,   // text of the adjacent Caption block, or null
            String blockId           // Marker block ID (e.g. "/page/0/Figure/2") for traceability
    ) {}
 }
@@ -0,0 +1,16 @@
 package com.aiteacher.document;
 import java.util.List;
 import java.util.Map;
 /**
 * Result of a full Marker parse: structured page data (from JSON) plus
 * native per-page markdown (from the separate Markdown API call).
 *
 * @param pages       one entry per non-empty page, derived from the chunks response
 * @param htmlByPage  concatenated block HTML keyed by 1-based page number
 */
 public record ParsedBook(
        List<PageResult> pages,
        Map<Integer, String> htmlByPage
 ) {}
@@ -0,0 +1,72 @@
 package com.aiteacher.document;
 import org.apache.pdfbox.io.RandomAccessReadBufferedFile;
 import org.apache.pdfbox.multipdf.Splitter;
 import org.apache.pdfbox.pdfparser.PDFParser;
 import org.apache.pdfbox.pdmodel.PDDocument;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.stereotype.Service;
 import java.io.IOException;
 import java.nio.file.Files;
 import java.nio.file.Path;
 import java.util.ArrayList;
 import java.util.List;
 /**
 * Splits a PDF file into fixed-size chunks using PDFBox.
 * Each chunk is saved as a temporary file so it can be submitted independently to Marker.
 */
@Service
 public class PdfSplitterService {
    private static final Logger log = LoggerFactory.getLogger(PdfSplitterService.class);
    /**
     * A chunk of a split PDF.
     *
     * @param tempFile   path to the temporary PDF file (caller must delete when done)
     * @param pageOffset 0-based index of the first page in this chunk within the original document
     */
    public record PdfChunk(Path tempFile, int pageOffset) {}
    /**
     * Splits {@code pdfPath} into chunks of at most {@code maxPagesPerChunk} pages.
     * Returns a single-element list when the document fits in one chunk.
     *
     * @param pdfPath          source PDF
     * @param maxPagesPerChunk maximum pages per chunk
     * @return ordered list of chunks; caller is responsible for deleting {@code tempFile}s
     */
    public List<PdfChunk> split(Path pdfPath, int maxPagesPerChunk) throws IOException {
        try (PDDocument doc = new PDFParser(new RandomAccessReadBufferedFile(pdfPath.toFile())).parse()) {
            int totalPages = doc.getNumberOfPages();
            log.info("PDF {} has {} pages, splitting into chunks of {}", pdfPath.getFileName(), totalPages, maxPagesPerChunk);
            if (totalPages <= maxPagesPerChunk) {
                // No split needed — return the original file as a single virtual chunk
                return List.of(new PdfChunk(pdfPath, 0));
            }
            Splitter splitter = new Splitter();
            splitter.setSplitAtPage(maxPagesPerChunk);
            List<PDDocument> parts = splitter.split(doc);
            List<PdfChunk> chunks = new ArrayList<>(parts.size());
            int offset = 0;
            for (PDDocument part : parts) {
                try {
                    Path tmp = Files.createTempFile("marker-chunk-", ".pdf");
                    part.save(tmp.toFile());
                    chunks.add(new PdfChunk(tmp, offset));
                    log.debug("Created chunk at {} (page offset {})", tmp, offset);
                    offset += part.getNumberOfPages();
                } finally {
                    part.close();
                }
            }
            return chunks;
        }
    }
 }
@@ -1,13 +1,17 @@
 package com.aiteacher.document;
 import org.apache.pdfbox.Loader;
 import org.apache.pdfbox.pdmodel.PDDocument;
 import org.apache.pdfbox.pdmodel.PDPage;
 import org.apache.pdfbox.pdmodel.common.PDRectangle;
 import org.apache.pdfbox.text.PDFTextStripperByArea;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.ai.reader.pdf.PagePdfDocumentReader;
 import org.springframework.ai.reader.pdf.config.PdfDocumentReaderConfig;
 import org.springframework.core.io.FileSystemResource;
 import org.springframework.stereotype.Service;
 import org.springframework.transaction.annotation.Transactional;
 import java.awt.Rectangle;
 import java.io.IOException;
 import java.nio.file.Path;
 import java.util.ArrayList;
 import java.util.List;
@@ -15,13 +19,18 @@ import java.util.UUID;
 /**
 * Parses a PDF into page-level SectionEntity records stored in Postgres.
- * Each page becomes one section, grouped under a single chapter per book.
+ * Uses column-aware extraction via PDFTextStripperByArea: for two-column pages,
 * left column is extracted first then right, preserving correct reading order.
 * Text is also normalized (collapsed whitespace) before storage.
 */
@Service
 public class PdfStructureParser {
    private static final Logger log = LoggerFactory.getLogger(PdfStructureParser.class);
    // Right column is considered empty (single-column page) if it has < 20% of left column's content
    private static final double TWO_COLUMN_THRESHOLD = 0.2;
    private final ChapterRepository chapterRepository;
    private final SectionRepository sectionRepository;
@@ -35,37 +44,71 @@ public class PdfStructureParser {
    public List<SectionEntity> parse(UUID bookId, String bookTitle, Path pdfPath) {
        log.info("Parsing PDF structure for book {}", bookId);
        // One chapter per book
        String chapterId = bookId + "-ch1";
        ChapterEntity chapter = new ChapterEntity(chapterId, bookId, 1, bookTitle, 1);
        chapterRepository.save(chapter);
        // One section per page
        PagePdfDocumentReader reader = new PagePdfDocumentReader(
            new FileSystemResource(pdfPath.toFile()),
            PdfDocumentReaderConfig.builder().withPagesPerDocument(1).build()
        );
        List<org.springframework.ai.document.Document> pages = reader.get();
        List<SectionEntity> sections = new ArrayList<>();
-        for (int i = 0; i < pages.size(); i++) {
+        try (PDDocument doc = Loader.loadPDF(pdfPath.toFile())) {
-            int pageNum = i + 1;
+            List<PDPage> pages = new ArrayList<>();
-            String text = pages.get(i).getText();
+            doc.getPages().forEach(pages::add);
            if (text == null || text.isBlank()) continue;
-            String sectionId = bookId + "-p" + pageNum;
+            for (int i = 0; i < 25; i++) {
-            SectionEntity section = new SectionEntity(
+                int pageNum = i + 1;
-                sectionId, chapterId, bookId,
+                String text = normalizeWhitespace(extractPageText(pages.get(i)));
-                String.valueOf(pageNum),
+                if (text.isBlank()) continue;
-                "Page " + pageNum,
+
-                pageNum, pageNum,
+                String sectionId = bookId + "-p" + pageNum;
-                text
+                SectionEntity section = new SectionEntity(
-            );
+                    sectionId, chapterId, bookId,
-            sections.add(sectionRepository.save(section));
+                    String.valueOf(pageNum),
                    "Page " + pageNum,
                    pageNum, pageNum,
                    text
                );
                sections.add(sectionRepository.save(section));
            }
        } catch (IOException e) {
            throw new RuntimeException("Failed to parse PDF for book " + bookId, e);
        }
        log.info("Parsed {} sections for book {}", sections.size(), bookId);
        return sections;
    }
    /**
     * Extracts text from a single page using column-aware region extraction.
     * Splits the page at the horizontal midpoint. If the right region has fewer
     * than 20% of the characters of the left region, treats the page as single-column.
     */
    private String extractPageText(PDPage page) throws IOException {
        PDRectangle mediaBox = page.getMediaBox();
        int width  = (int) mediaBox.getWidth();
        int height = (int) mediaBox.getHeight();
        int mid    = width / 2;
        PDFTextStripperByArea stripper = new PDFTextStripperByArea();
        stripper.setSortByPosition(true);
        stripper.addRegion("left",  new Rectangle(0,   0, mid,         height));
        stripper.addRegion("right", new Rectangle(mid, 0, width - mid, height));
        stripper.extractRegions(page);
        String left  = stripper.getTextForRegion("left").strip();
        String right = stripper.getTextForRegion("right").strip();
        if (right.length() < left.length() * TWO_COLUMN_THRESHOLD) {
            // Single-column page — left holds all (or nearly all) content
            return left.isEmpty() ? right : left;
        }
        return left + "\n\n" + right;
    }
    /** Collapses multi-space/tab runs and excessive blank lines. */
    private String normalizeWhitespace(String text) {
        return text
            .replaceAll("[ \t]{2,}", " ")
            .replaceAll("\n{3,}", "\n\n")
            .trim();
    }
 }
@@ -0,0 +1,97 @@
 package com.aiteacher.document;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.beans.factory.annotation.Value;
 import org.springframework.stereotype.Service;
 import software.amazon.awssdk.auth.credentials.AwsBasicCredentials;
 import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider;
 import software.amazon.awssdk.core.sync.RequestBody;
 import software.amazon.awssdk.regions.Region;
 import software.amazon.awssdk.services.s3.S3Client;
 import software.amazon.awssdk.services.s3.S3Configuration;
 import software.amazon.awssdk.services.s3.model.*;
 import java.net.URI;
 import java.nio.charset.StandardCharsets;
 import java.util.ArrayList;
 import java.util.List;
 import java.util.UUID;
@Service
 public class S3MarkdownStorageService implements MarkdownStorageService {
    private static final Logger log = LoggerFactory.getLogger(S3MarkdownStorageService.class);
    private final S3Client s3;
    private final String bucket;
    public S3MarkdownStorageService(
            @Value("${app.figure-storage.endpoint}") String endpoint,
            @Value("${app.figure-storage.region}") String region,
            @Value("${app.figure-storage.bucket}") String bucket,
            @Value("${app.figure-storage.access-key-id}") String accessKeyId,
            @Value("${app.figure-storage.secret-access-key}") String secretKey) {
        this.bucket = bucket;
        URI endpointUri = URI.create(endpoint);
        StaticCredentialsProvider credentials = StaticCredentialsProvider.create(
                AwsBasicCredentials.create(accessKeyId, secretKey));
        Region awsRegion = Region.of(region);
        S3Configuration s3Config = S3Configuration.builder().pathStyleAccessEnabled(true).build();
        this.s3 = S3Client.builder()
                .endpointOverride(endpointUri)
                .region(awsRegion)
                .credentialsProvider(credentials)
                .serviceConfiguration(s3Config)
                .build();
    }
    @Override
    public String save(UUID bookId, int pageNumber, String markdown) {
        String key = key(bookId, pageNumber);
        byte[] bytes = markdown.getBytes(StandardCharsets.UTF_8);
        s3.putObject(
                PutObjectRequest.builder().bucket(bucket).key(key)
                        .contentType("text/html; charset=utf-8")
                        .contentLength((long) bytes.length).build(),
                RequestBody.fromBytes(bytes));
        return key;
    }
    @Override
    public String getText(UUID bookId, int pageNumber) {
        byte[] bytes = s3.getObjectAsBytes(
                GetObjectRequest.builder().bucket(bucket).key(key(bookId, pageNumber)).build()
        ).asByteArray();
        return new String(bytes, StandardCharsets.UTF_8);
    }
    @Override
    public void deleteAll(UUID bookId) {
        String prefix = "html/" + bookId + "/";
        try {
            List<ObjectIdentifier> toDelete = new ArrayList<>();
            s3.listObjectsV2Paginator(ListObjectsV2Request.builder()
                    .bucket(bucket).prefix(prefix).build()).stream()
                    .flatMap(page -> page.contents().stream())
                    .map(S3Object::key)
                    .map(k -> ObjectIdentifier.builder().key(k).build())
                    .forEach(toDelete::add);
            if (toDelete.isEmpty()) return;
            s3.deleteObjects(DeleteObjectsRequest.builder()
                    .bucket(bucket)
                    .delete(Delete.builder().objects(toDelete).build())
                    .build());
            log.info("Deleted {} markdown files from S3 for book {}", toDelete.size(), bookId);
        } catch (S3Exception ex) {
            log.warn("Could not fully delete markdown for book {} from S3: {}", bookId, ex.getMessage());
        }
    }
    private static String key(UUID bookId, int pageNumber) {
        return "html/" + bookId + "/page-" + pageNumber + ".html";
    }
 }
@@ -1,6 +1,8 @@
 package com.aiteacher.document;
 import org.springframework.data.jpa.repository.JpaRepository;
 import org.springframework.data.jpa.repository.Query;
 import org.springframework.data.repository.query.Param;
 import java.util.List;
 import java.util.UUID;
@@ -8,4 +10,10 @@ import java.util.UUID;
 public interface SectionRepository extends JpaRepository<SectionEntity, String> {
    List<SectionEntity> findAllByBookId(UUID bookId);
    void deleteAllByBookId(UUID bookId);
    @Query("SELECT s FROM SectionEntity s WHERE s.bookId = :bookId AND s.pageStart <= :windowEnd AND s.pageEnd >= :windowStart ORDER BY s.pageStart")
    List<SectionEntity> findByBookIdAndPageOverlap(
            @Param("bookId") UUID bookId,
            @Param("windowStart") int windowStart,
            @Param("windowEnd") int windowEnd);
 }
@@ -38,14 +38,52 @@ public class TextChunkingService {
        List<String> windows = new ArrayList<>();
        int start = 0;
        while (start < text.length()) {
-            int end = Math.min(start + TARGET_CHARS, text.length());
+            int hardEnd = Math.min(start + TARGET_CHARS, text.length());
-            windows.add(text.substring(start, end));
+            if (hardEnd == text.length()) {
-            if (end == text.length()) break;
+                String last = text.substring(start).strip();
-            start = end - OVERLAP_CHARS;
+                if (!last.isEmpty()) windows.add(last);
                break;
            }
            int splitAt = findSplitPoint(text, start, hardEnd);
            String chunk = text.substring(start, splitAt).strip();
            if (!chunk.isEmpty()) windows.add(chunk);
            // Overlap: back up from split point, align to a word start
            int overlapStart = Math.max(start + 1, splitAt - OVERLAP_CHARS);
            while (overlapStart < splitAt && text.charAt(overlapStart) != ' ') overlapStart++;
            start = overlapStart < splitAt ? overlapStart + 1 : splitAt;
        }
        return windows;
    }
    /**
     * Finds the best split point at or before hardEnd, preferring (in order):
     * paragraph boundary, sentence boundary, word boundary, hard cut.
     */
    private int findSplitPoint(String text, int start, int hardEnd) {
        int lookback = Math.min(400, (hardEnd - start) / 2);
        // 1. Paragraph boundary
        int paraIdx = text.lastIndexOf("\n\n", hardEnd);
        if (paraIdx > hardEnd - lookback && paraIdx > start) return paraIdx + 2;
        // 2. Sentence boundary (. ! ?) followed by space or newline
        for (int i = hardEnd - 1; i > hardEnd - lookback && i > start; i--) {
            char c = text.charAt(i);
            if ((c == '.' || c == '!' || c == '?') && i + 1 < text.length()) {
                char next = text.charAt(i + 1);
                if (next == ' ' || next == '\n') return i + 1;
            }
        }
        // 3. Word boundary
        for (int i = hardEnd - 1; i > hardEnd - 100 && i > start; i--) {
            if (text.charAt(i) == ' ') return i + 1;
        }
        // 4. Hard cut
        return hardEnd;
    }
    private Map<String, Object> buildMetadata(SectionEntity section, String bookTitle,
                                               int index, int total, String chunkId) {
        Map<String, Object> m = new HashMap<>();
@@ -3,47 +3,106 @@ package com.aiteacher.document;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.ai.chat.client.ChatClient;
-import org.springframework.core.io.FileSystemResource;
+import org.springframework.beans.factory.annotation.Value;
 import org.springframework.core.io.ByteArrayResource;
 import org.springframework.stereotype.Service;
 import org.springframework.util.MimeTypeUtils;
 import java.nio.file.Path;
 /**
- * Generates a clinical text description for an extracted figure image
+ * Analyses an extracted figure image using the OpenAI vision model.
- * using the OpenAI vision model via Spring AI ChatClient.
+ *
 * <p>Returns an {@link ImageAnalysis} record containing:
 * <ul>
 *   <li>{@code description} — 2-3 sentence clinical description of the image</li>
 *   <li>{@code imageText} — all visible text, labels, and annotations copied verbatim
 *       from the image (empty string when none present)</li>
 * </ul>
 *
 * <p>Both fields are stored: {@code description} drives the embedding; {@code imageText}
 * is added to chunk metadata so queries can match exact labels (e.g., "Circle of Willis").
 */
@Service
 public class VisionDescriptionService {
    private static final Logger log = LoggerFactory.getLogger(VisionDescriptionService.class);
-    private static final String PROMPT =
+    private static final String PROMPT = """
-        "You are a neurosurgery educator. Provide a brief 2-3 sentence clinical description of " +
+            You are a neurosurgery educator analysing a medical image.
-        "this image. Focus on anatomical structures, surgical landmarks, labels, and clinical " +
+            Respond in EXACTLY this format — no other text, no markdown:
-        "significance. If text or labels are visible, include them verbatim.";
+            DESCRIPTION: <2-3 sentence clinical description focusing on anatomical structures, surgical landmarks, and clinical significance>
            IMAGE_TEXT: <all visible text, labels, measurements, and annotations copied verbatim, comma-separated; write NONE if no text visible>
            """;
    /** Minimum ms between vision API calls. Configurable via app.vision.min-interval-ms. */
    private final long minIntervalMs;
    private final ChatClient chatClient;
    private volatile long lastCallAt = 0;
-    public VisionDescriptionService(ChatClient chatClient) {
+    public VisionDescriptionService(
            ChatClient chatClient,
            @Value("${app.vision.min-interval-ms:2000}") long minIntervalMs) {
        this.chatClient = chatClient;
        this.minIntervalMs = minIntervalMs;
    }
    /**
-     * Returns a description string. Falls back to the provided caption if vision fails.
+     * Holds the structured output of a vision model call on one figure image.
     *
     * @param description clinical description of the image content
     * @param imageText   verbatim text visible inside the image; empty string if none
     */
-    public String describe(Path imagePath, String captionFallback) {
+    public record ImageAnalysis(String description, String imageText) {}
    /**
     * Analyses the image bytes and returns an {@link ImageAnalysis}.
     * Falls back gracefully: if the vision call fails, the caption is used as description
     * and imageText is left empty.
     *
     * @param imageBytes    PNG bytes of the extracted figure
     * @param captionFallback caption detected from surrounding text, may be null
     */
    public ImageAnalysis analyze(byte[] imageBytes, String captionFallback) {
        throttle();
        try {
-            return chatClient.prompt()
+            String raw = chatClient.prompt()
-                .user(u -> u
+                    .user(u -> u
-                    .text(PROMPT)
+                            .text(PROMPT)
-                    .media(MimeTypeUtils.IMAGE_PNG, new FileSystemResource(imagePath.toFile())))
+                            .media(MimeTypeUtils.IMAGE_PNG, new ByteArrayResource(imageBytes)))
-                .call()
+                    .call()
-                .content();
+                    .content();
            return parse(raw, captionFallback);
        } catch (Exception ex) {
-            log.warn("Vision description failed for {}: {} — using caption as fallback",
+            log.warn("Vision analysis failed: {} — using caption as fallback", ex.getMessage());
-                imagePath.getFileName(), ex.getMessage());
+            return new ImageAnalysis(
-            return captionFallback != null ? captionFallback : "Figure";
+                    captionFallback != null ? captionFallback : "Figure",
                    "");
        }
    }
    private synchronized void throttle() {
        long now = System.currentTimeMillis();
        long wait = minIntervalMs - (now - lastCallAt);
        if (wait > 0) {
            try { Thread.sleep(wait); } catch (InterruptedException e) { Thread.currentThread().interrupt(); }
        }
        lastCallAt = System.currentTimeMillis();
    }
    private ImageAnalysis parse(String raw, String captionFallback) {
        String description = captionFallback != null ? captionFallback : "Figure";
        String imageText = "";
        if (raw != null) {
            for (String line : raw.split("\n")) {
                if (line.startsWith("DESCRIPTION:")) {
                    String val = line.substring("DESCRIPTION:".length()).strip();
                    if (!val.isEmpty()) description = val;
                } else if (line.startsWith("IMAGE_TEXT:")) {
                    String val = line.substring("IMAGE_TEXT:".length()).strip();
                    if (!val.isEmpty() && !"NONE".equalsIgnoreCase(val)) imageText = val;
                }
            }
        }
        return new ImageAnalysis(description, imageText);
    }
 }
@@ -0,0 +1,75 @@
 package com.aiteacher.enrichment;
 import com.aiteacher.document.SectionEntity;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.ai.document.Document;
 import org.springframework.stereotype.Service;
 import java.time.Instant;
 import java.util.List;
 import java.util.Map;
 import java.util.UUID;
@Service
 public class ChunkEnrichmentPipeline {
    private static final Logger log = LoggerFactory.getLogger(ChunkEnrichmentPipeline.class);
    private final ChunkEnrichmentService enrichmentService;
    private final ChunkMetadataRepository metadataRepository;
    public ChunkEnrichmentPipeline(ChunkEnrichmentService enrichmentService,
                                    ChunkMetadataRepository metadataRepository) {
        this.enrichmentService = enrichmentService;
        this.metadataRepository = metadataRepository;
    }
    public void enrichAndPersist(List<Document> chunks,
                                  Map<String, SectionEntity> sectionsById,
                                  String bookTitle) {
        int total = chunks.size();
        int done = 0;
        for (Document chunk : chunks) {
            String sectionId = (String) chunk.getMetadata().get("section_id");
            SectionEntity section = sectionId != null ? sectionsById.get(sectionId) : null;
            UUID chunkId;
            try {
                chunkId = UUID.fromString(chunk.getId());
            } catch (IllegalArgumentException ex) {
                log.warn("Skipping chunk with non-UUID id '{}'", chunk.getId());
                continue;
            }
            UUID bookId = extractBookId(chunk);
            if (bookId == null || sectionId == null) {
                log.warn("Skipping chunk {} missing book_id or section_id metadata", chunkId);
                continue;
            }
            try {
                ChunkEnrichmentResult result = enrichmentService.enrich(chunk.getText(), section, bookTitle);
                ChunkMetadataEntity entity = new ChunkMetadataEntity(
                    chunkId, bookId, sectionId,
                    result.facet(), result.entities(), result.summary(),
                    ChunkEnrichmentService.MODEL_VERSION, Instant.now());
                metadataRepository.save(entity);
            } catch (Exception ex) {
                log.warn("Enrichment failed for chunk {}: {}", chunkId, ex.getMessage());
            }
            done++;
            if (done % 25 == 0) {
                log.info("Enrichment progress: {}/{} chunks", done, total);
            }
        }
        log.info("Enrichment complete: {}/{} chunks enriched", done, total);
    }
    private UUID extractBookId(Document chunk) {
        Object raw = chunk.getMetadata().get("book_id");
        if (raw == null) return null;
        try {
            return UUID.fromString(raw.toString());
        } catch (IllegalArgumentException ex) {
            return null;
        }
    }
 }
@@ -0,0 +1,9 @@
 package com.aiteacher.enrichment;
 import java.util.List;
 public record ChunkEnrichmentResult(
    List<String> entities,
    ConceptFacet facet,
    String summary
 ) {}
@@ -0,0 +1,135 @@
 package com.aiteacher.enrichment;
 import com.aiteacher.document.SectionEntity;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.ai.chat.client.ChatClient;
 import org.springframework.stereotype.Service;
 import java.util.ArrayList;
 import java.util.List;
 import java.util.Locale;
@Service
 public class ChunkEnrichmentService {
    public static final String MODEL_VERSION = "v1";
    private static final int MAX_ENTITIES = 8;
    private static final Logger log = LoggerFactory.getLogger(ChunkEnrichmentService.class);
    private static final String SYSTEM_PROMPT = """
        You are a medical indexing assistant that classifies neurosurgery textbook excerpts.
        For each excerpt you receive, extract three fields:
          - entities: the medical concepts, conditions, procedures, tools, or anatomical
            structures the excerpt is ABOUT. Normalise each to lowercase, singular canonical
            English form. Expand abbreviations (e.g. "SAH" -> "subarachnoid hemorrhage").
            Avoid generic words ("patient", "technique"). Cap at %d entities.
          - facet: exactly one of the following. Pick the SINGLE best fit based on the
            excerpt's PRIMARY teaching purpose. Use OTHER only when nothing else applies.
              DEFINITION              — defines the entity / syndrome / concept ("what is X").
              ANATOMY                 — neuroanatomy, vascular/tract relationships, operative
                                        landmarks, anatomical variants.
              PATHOPHYSIOLOGY         — mechanism of disease, etiology, natural history,
                                        molecular/cellular basis.
              EPIDEMIOLOGY            — incidence, prevalence, demographics, risk factors.
              CLINICAL_PRESENTATION   — symptoms, signs, neurological exam findings, syndromes
                                        as they present in patients.
              IMAGING                 — CT / MRI / angiography / DSA / ultrasound features and
                                        interpretation. If the excerpt describes HOW something
                                        looks on imaging, use IMAGING.
              CLASSIFICATION          — named grading scales, staging systems, subtype
                                        taxonomies (Hunt-Hess, WFNS, Fisher, Spetzler-Martin,
                                        GCS, Karnofsky, mRS, Simpson, etc.). If the excerpt
                                        defines or applies a named scale, use CLASSIFICATION
                                        even if it is grounded in imaging or clinical exam.
              INDICATIONS             — when to operate / treat / observe; patient selection
                                        criteria; contraindications.
              SURGICAL_TECHNIQUE      — operative approach, positioning, steps, landmarks,
                                        instruments, implants, intraoperative monitoring.
              NONSURGICAL_MANAGEMENT  — medical therapy, endovascular treatment, stereotactic
                                        radiosurgery, conservative / observational management.
              COMPLICATIONS           — intra- or postoperative complications, adverse events.
              OUTCOMES_FOLLOWUP       — prognosis, morbidity/mortality rates, recurrence,
                                        surveillance schedules, follow-up care.
              OTHER                   — history, philosophy, ethics, or anything not covered.
            Disambiguation rules:
            * A named grading scale => CLASSIFICATION (even when grounded in imaging/exam).
            * Tools and implants described as part of an operation => SURGICAL_TECHNIQUE,
              not a standalone facet.
            * Illustrative case reports => CLINICAL_PRESENTATION.
            * Imaging findings of complications => COMPLICATIONS, not IMAGING.
          - summary: one or two sentences describing what the excerpt teaches.
        Respond with the structured JSON requested. Do not fabricate content not present in
        the excerpt.
        """.formatted(MAX_ENTITIES);
    private final ChatClient chatClient;
    public ChunkEnrichmentService(ChatClient chatClient) {
        this.chatClient = chatClient;
    }
    public ChunkEnrichmentResult enrich(String chunkText, SectionEntity section, String bookTitle) {
        String userPrompt = buildUserPrompt(chunkText, section, bookTitle);
        LlmOutput raw = chatClient.prompt()
            .system(SYSTEM_PROMPT)
            .user(userPrompt)
            .call()
            .entity(LlmOutput.class);
        if (raw == null) {
            log.warn("LLM returned null enrichment; defaulting to OTHER");
            return new ChunkEnrichmentResult(List.of(), ConceptFacet.OTHER, "");
        }
        List<String> entities = normaliseEntities(raw.entities());
        ConceptFacet facet = parseFacet(raw.facet());
        String summary = raw.summary() != null ? raw.summary().strip() : "";
        return new ChunkEnrichmentResult(entities, facet, summary);
    }
    private String buildUserPrompt(String chunkText, SectionEntity section, String bookTitle) {
        String sectionTitle = section != null && section.getTitle() != null ? section.getTitle() : "";
        return """
            BOOK: %s
            SECTION: %s
            EXCERPT:
            ---
            %s
            ---
            """.formatted(bookTitle, sectionTitle, chunkText);
    }
    private List<String> normaliseEntities(List<String> raw) {
        if (raw == null) return List.of();
        List<String> out = new ArrayList<>();
        for (String e : raw) {
            if (e == null) continue;
            String canonical = e.trim().toLowerCase(Locale.ROOT);
            if (canonical.isEmpty()) continue;
            if (!out.contains(canonical)) out.add(canonical);
            if (out.size() >= MAX_ENTITIES) break;
        }
        return out;
    }
    private ConceptFacet parseFacet(String raw) {
        if (raw == null) return ConceptFacet.OTHER;
        try {
            return ConceptFacet.valueOf(raw.trim().toUpperCase(Locale.ROOT));
        } catch (IllegalArgumentException ex) {
            log.warn("LLM returned unknown facet '{}', defaulting to OTHER", raw);
            return ConceptFacet.OTHER;
        }
    }
    // DTO for Spring AI structured output; facet is read as String so we can defend against bad values
    public record LlmOutput(List<String> entities, String facet, String summary) {}
 }
@@ -0,0 +1,71 @@
 package com.aiteacher.enrichment;
 import jakarta.persistence.*;
 import org.hibernate.annotations.JdbcTypeCode;
 import org.hibernate.type.SqlTypes;
 import java.time.Instant;
 import java.util.List;
 import java.util.UUID;
@Entity
@Table(name = "chunk_metadata")
@org.hibernate.annotations.Check(
    name = "chunk_metadata_facet_check",
    constraints = "facet IN ('DEFINITION','ANATOMY','PATHOPHYSIOLOGY','EPIDEMIOLOGY'," +
        "'CLINICAL_PRESENTATION','IMAGING','CLASSIFICATION','INDICATIONS'," +
        "'SURGICAL_TECHNIQUE','NONSURGICAL_MANAGEMENT','COMPLICATIONS'," +
        "'OUTCOMES_FOLLOWUP','OTHER')")
 public class ChunkMetadataEntity {
    @Id
    @Column(name = "chunk_id", nullable = false)
    private UUID chunkId;
    @Column(name = "book_id", nullable = false)
    private UUID bookId;
    @Column(name = "section_id", nullable = false, length = 200)
    private String sectionId;
    @Enumerated(EnumType.STRING)
    @Column(name = "facet", nullable = false, length = 32)
    private ConceptFacet facet;
    @JdbcTypeCode(SqlTypes.JSON)
    @Column(name = "entities", nullable = false, columnDefinition = "jsonb")
    private List<String> entities;
    @Column(name = "summary", nullable = false, columnDefinition = "TEXT")
    private String summary;
    @Column(name = "model_version", nullable = false, length = 32)
    private String modelVersion;
    @Column(name = "enriched_at", nullable = false)
    private Instant enrichedAt;
    protected ChunkMetadataEntity() {}
    public ChunkMetadataEntity(UUID chunkId, UUID bookId, String sectionId,
                                ConceptFacet facet, List<String> entities, String summary,
                                String modelVersion, Instant enrichedAt) {
        this.chunkId = chunkId;
        this.bookId = bookId;
        this.sectionId = sectionId;
        this.facet = facet;
        this.entities = entities;
        this.summary = summary;
        this.modelVersion = modelVersion;
        this.enrichedAt = enrichedAt;
    }
    public UUID getChunkId() { return chunkId; }
    public UUID getBookId() { return bookId; }
    public String getSectionId() { return sectionId; }
    public ConceptFacet getFacet() { return facet; }
    public List<String> getEntities() { return entities; }
    public String getSummary() { return summary; }
    public String getModelVersion() { return modelVersion; }
    public Instant getEnrichedAt() { return enrichedAt; }
 }
@@ -0,0 +1,36 @@
 package com.aiteacher.enrichment;
 import org.springframework.data.jpa.repository.JpaRepository;
 import org.springframework.data.jpa.repository.Query;
 import org.springframework.data.repository.query.Param;
 import org.springframework.stereotype.Repository;
 import org.springframework.transaction.annotation.Transactional;
 import java.util.Collection;
 import java.util.List;
 import java.util.UUID;
@Repository
 public interface ChunkMetadataRepository extends JpaRepository<ChunkMetadataEntity, UUID> {
    long countByBookId(UUID bookId);
    @Query(value = """
        SELECT * FROM chunk_metadata
        WHERE book_id = :bookId
          AND entities @> to_jsonb(CAST(:entity AS text))
        """, nativeQuery = true)
    List<ChunkMetadataEntity> findByBookIdAndEntityContains(@Param("bookId") UUID bookId,
                                                             @Param("entity") String entity);
    @Query(value = """
        SELECT * FROM chunk_metadata
        WHERE entities @> to_jsonb(CAST(:entity AS text))
        """, nativeQuery = true)
    List<ChunkMetadataEntity> findByEntityContains(@Param("entity") String entity);
    List<ChunkMetadataEntity> findByChunkIdIn(Collection<UUID> chunkIds);
    @Transactional
    void deleteByBookId(UUID bookId);
 }
@@ -0,0 +1,27 @@
 package com.aiteacher.enrichment;
 public enum ConceptFacet {
    DEFINITION("Definition & Overview"),
    ANATOMY("Anatomy"),
    PATHOPHYSIOLOGY("Pathophysiology"),
    EPIDEMIOLOGY("Epidemiology"),
    CLINICAL_PRESENTATION("Clinical Presentation"),
    IMAGING("Imaging"),
    CLASSIFICATION("Classification & Grading"),
    INDICATIONS("Indications & Patient Selection"),
    SURGICAL_TECHNIQUE("Surgical Technique"),
    NONSURGICAL_MANAGEMENT("Non-surgical Management"),
    COMPLICATIONS("Complications"),
    OUTCOMES_FOLLOWUP("Outcomes & Follow-up"),
    OTHER("Other");
    private final String displayTitle;
    ConceptFacet(String displayTitle) {
        this.displayTitle = displayTitle;
    }
    public String displayTitle() {
        return displayTitle;
    }
 }
@@ -0,0 +1,138 @@
 package com.aiteacher.enrichment;
 import com.aiteacher.document.SectionEntity;
 import com.aiteacher.document.SectionRepository;
 import com.fasterxml.jackson.core.JsonProcessingException;
 import com.fasterxml.jackson.databind.JsonNode;
 import com.fasterxml.jackson.databind.ObjectMapper;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.ai.document.Document;
 import org.springframework.jdbc.core.JdbcTemplate;
 import org.springframework.scheduling.annotation.Async;
 import org.springframework.stereotype.Service;
 import java.time.Instant;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.Optional;
 import java.util.UUID;
 import java.util.concurrent.ConcurrentHashMap;
@Service
 public class EnrichmentBackfillService {
    private static final Logger log = LoggerFactory.getLogger(EnrichmentBackfillService.class);
    private final JdbcTemplate jdbcTemplate;
    private final ChunkEnrichmentService enrichmentService;
    private final ChunkMetadataRepository metadataRepository;
    private final SectionRepository sectionRepository;
    private final ObjectMapper objectMapper;
    private final Map<UUID, BackfillProgress> progressByBook = new ConcurrentHashMap<>();
    public EnrichmentBackfillService(JdbcTemplate jdbcTemplate,
                                      ChunkEnrichmentService enrichmentService,
                                      ChunkMetadataRepository metadataRepository,
                                      SectionRepository sectionRepository,
                                      ObjectMapper objectMapper) {
        this.jdbcTemplate = jdbcTemplate;
        this.enrichmentService = enrichmentService;
        this.metadataRepository = metadataRepository;
        this.sectionRepository = sectionRepository;
        this.objectMapper = objectMapper;
    }
    public BackfillProgress getProgress(UUID bookId) {
        return progressByBook.getOrDefault(bookId, BackfillProgress.idle());
    }
    @Async
    public void backfillBook(UUID bookId, String bookTitle) {
        List<Document> pending = listUnenrichedChunks(bookId);
        int total = pending.size();
        progressByBook.put(bookId, new BackfillProgress("RUNNING", total, 0, null));
        log.info("Backfill starting for book {} — {} chunks pending", bookId, total);
        int done = 0;
        Map<String, SectionEntity> sectionCache = new HashMap<>();
        for (Document chunk : pending) {
            try {
                String sectionId = (String) chunk.getMetadata().get("section_id");
                SectionEntity section = sectionId != null
                    ? sectionCache.computeIfAbsent(sectionId,
                        id -> sectionRepository.findById(id).orElse(null))
                    : null;
                ChunkEnrichmentResult result = enrichmentService.enrich(chunk.getText(), section, bookTitle);
                UUID chunkId = UUID.fromString(chunk.getId());
                metadataRepository.save(new ChunkMetadataEntity(
                    chunkId, bookId, sectionId != null ? sectionId : "",
                    result.facet(), result.entities(), result.summary(),
                    ChunkEnrichmentService.MODEL_VERSION, Instant.now()));
            } catch (Exception ex) {
                log.warn("Backfill failed for chunk {} of book {}: {}", chunk.getId(), bookId, ex.getMessage());
            }
            done++;
            progressByBook.put(bookId, new BackfillProgress("RUNNING", total, done, null));
        }
        progressByBook.put(bookId, new BackfillProgress("COMPLETED", total, done, null));
        log.info("Backfill finished for book {} — {}/{} enriched", bookId, done, total);
    }
    private List<Document> listUnenrichedChunks(UUID bookId) {
        // Left anti-join against chunk_metadata so re-runs are cheap.
        String sql = """
            SELECT vs.id, vs.content, vs.metadata::text AS metadata_text
            FROM vector_store vs
            LEFT JOIN chunk_metadata cm ON cm.chunk_id = vs.id
            WHERE vs.metadata->>'book_id' = ?
              AND vs.metadata->>'type' = 'TEXT'
              AND cm.chunk_id IS NULL
            """;
        return jdbcTemplate.query(sql, (rs, rowNum) -> {
            String id = rs.getString("id");
            String content = rs.getString("content");
            String metaJson = rs.getString("metadata_text");
            Map<String, Object> meta = parseMetadata(metaJson);
            return new Document(id, content != null ? content : "", meta);
        }, bookId.toString());
    }
    private Map<String, Object> parseMetadata(String json) {
        if (json == null || json.isBlank()) return Map.of();
        try {
            JsonNode node = objectMapper.readTree(json);
            Map<String, Object> out = new HashMap<>();
            node.properties().forEach(e -> {
                JsonNode v = e.getValue();
                if (v.isTextual()) out.put(e.getKey(), v.asText());
                else if (v.isInt()) out.put(e.getKey(), v.asInt());
                else if (v.isLong()) out.put(e.getKey(), v.asLong());
                else if (v.isBoolean()) out.put(e.getKey(), v.asBoolean());
                else out.put(e.getKey(), v.toString());
            });
            return out;
        } catch (JsonProcessingException ex) {
            log.warn("Failed to parse vector_store metadata JSON: {}", ex.getMessage());
            return Map.of();
        }
    }
    public Optional<Integer> countEnrichedChunks(UUID bookId) {
        return Optional.of((int) metadataRepository.countByBookId(bookId));
    }
    public int countTotalTextChunks(UUID bookId) {
        Integer n = jdbcTemplate.queryForObject(
            "SELECT COUNT(*) FROM vector_store WHERE metadata->>'book_id' = ? AND metadata->>'type' = 'TEXT'",
            Integer.class, bookId.toString());
        return n != null ? n : 0;
    }
    public record BackfillProgress(String status, int chunksTotal, int chunksEnriched, String errorMessage) {
        public static BackfillProgress idle() {
            return new BackfillProgress("IDLE", 0, 0, null);
        }
    }
 }
@@ -0,0 +1,50 @@
 package com.aiteacher.enrichment;
 import com.aiteacher.book.Book;
 import com.aiteacher.book.BookRepository;
 import org.springframework.http.HttpStatus;
 import org.springframework.http.ResponseEntity;
 import org.springframework.web.bind.annotation.*;
 import java.util.NoSuchElementException;
 import java.util.UUID;
@RestController
@RequestMapping("/api/v1/admin/books/{id}/enrich")
 public class EnrichmentController {
    private final BookRepository bookRepository;
    private final EnrichmentBackfillService backfillService;
    public EnrichmentController(BookRepository bookRepository,
                                 EnrichmentBackfillService backfillService) {
        this.bookRepository = bookRepository;
        this.backfillService = backfillService;
    }
    @PostMapping
    public ResponseEntity<EnrichmentBackfillService.BackfillProgress> start(@PathVariable UUID id) {
        Book book = bookRepository.findById(id)
            .orElseThrow(() -> new NoSuchElementException("Book not found."));
        backfillService.backfillBook(id, book.getTitle());
        int total = backfillService.countTotalTextChunks(id);
        int enriched = backfillService.countEnrichedChunks(id).orElse(0);
        return ResponseEntity.status(HttpStatus.ACCEPTED)
            .body(new EnrichmentBackfillService.BackfillProgress("RUNNING", total, enriched, null));
    }
    @GetMapping
    public ResponseEntity<EnrichmentBackfillService.BackfillProgress> status(@PathVariable UUID id) {
        bookRepository.findById(id)
            .orElseThrow(() -> new NoSuchElementException("Book not found."));
        EnrichmentBackfillService.BackfillProgress progress = backfillService.getProgress(id);
        if ("IDLE".equals(progress.status())) {
            int total = backfillService.countTotalTextChunks(id);
            int enriched = backfillService.countEnrichedChunks(id).orElse(0);
            progress = new EnrichmentBackfillService.BackfillProgress(
                enriched >= total && total > 0 ? "COMPLETED" : "IDLE",
                total, enriched, null);
        }
        return ResponseEntity.ok(progress);
    }
 }
@@ -1,24 +1,27 @@
 package com.aiteacher.figure;
 import java.awt.image.BufferedImage;
 import java.nio.file.Path;
 import java.util.UUID;
 public interface FigureStorageService {
    /**
-     * Saves an extracted image to the figure store and returns the relative path
+     * Saves an extracted image to S3 and returns the object key stored in the database.
     * (relative to the configured base-path) stored in the database.
     */
    String save(UUID bookId, String figureId, BufferedImage image);
    /**
-     * Resolves a stored relative path to an absolute filesystem path.
+     * Downloads the image bytes for the given S3 object key.
     */
-    Path resolve(String relativePath);
+    byte[] getBytes(String key);
    /**
-     * Deletes all figure files for the given book.
+     * Returns a presigned GET URL valid for 1 hour for the given S3 object key.
     */
    String presignedUrl(String key);
    /**
     * Deletes all figure objects for the given book.
     */
    void deleteAll(UUID bookId);
 }
@@ -1,59 +0,0 @@
 package com.aiteacher.figure;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.beans.factory.annotation.Value;
 import org.springframework.stereotype.Service;
 import javax.imageio.ImageIO;
 import java.awt.image.BufferedImage;
 import java.io.IOException;
 import java.nio.file.Files;
 import java.nio.file.Path;
 import java.nio.file.Paths;
 import java.util.UUID;
@Service
 public class LocalFigureStorageService implements FigureStorageService {
    private static final Logger log = LoggerFactory.getLogger(LocalFigureStorageService.class);
    private final Path basePath;
    public LocalFigureStorageService(@Value("${app.figure-storage.base-path:./uploads}") String basePath) {
        this.basePath = Paths.get(basePath).toAbsolutePath().normalize();
    }
    @Override
    public String save(UUID bookId, String figureId, BufferedImage image) {
        try {
            Path dir = basePath.resolve("figures").resolve(bookId.toString());
            Files.createDirectories(dir);
            String filename = figureId + ".png";
            Path file = dir.resolve(filename);
            ImageIO.write(image, "PNG", file.toFile());
            // Return relative path for storage in DB
            return "figures/" + bookId + "/" + filename;
        } catch (IOException ex) {
            throw new RuntimeException("Failed to save figure " + figureId, ex);
        }
    }
    @Override
    public Path resolve(String relativePath) {
        return basePath.resolve(relativePath);
    }
    @Override
    public void deleteAll(UUID bookId) {
        Path dir = basePath.resolve("figures").resolve(bookId.toString());
        if (!Files.exists(dir)) return;
        try (var walk = Files.walk(dir)) {
            walk.sorted(java.util.Comparator.reverseOrder())
                .map(Path::toFile)
                .forEach(java.io.File::delete);
        } catch (IOException ex) {
            log.warn("Could not fully delete figures for book {}: {}", bookId, ex.getMessage());
        }
    }
 }
@@ -0,0 +1,132 @@
 package com.aiteacher.figure;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.beans.factory.annotation.Value;
 import org.springframework.stereotype.Service;
 import software.amazon.awssdk.auth.credentials.AwsBasicCredentials;
 import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider;
 import software.amazon.awssdk.core.sync.RequestBody;
 import software.amazon.awssdk.regions.Region;
 import software.amazon.awssdk.services.s3.S3Client;
 import software.amazon.awssdk.services.s3.S3Configuration;
 import software.amazon.awssdk.services.s3.model.*;
 import software.amazon.awssdk.services.s3.presigner.S3Presigner;
 import software.amazon.awssdk.services.s3.presigner.model.GetObjectPresignRequest;
 import software.amazon.awssdk.services.s3.model.S3Object;
 import javax.imageio.ImageIO;
 import java.awt.image.BufferedImage;
 import java.io.ByteArrayOutputStream;
 import java.io.IOException;
 import java.net.URI;
 import java.time.Duration;
 import java.util.ArrayList;
 import java.util.List;
 import java.util.UUID;
@Service
 public class S3FigureStorageService implements FigureStorageService {
    private static final Logger log = LoggerFactory.getLogger(S3FigureStorageService.class);
    private final S3Client s3;
    private final S3Presigner presigner;
    private final String bucket;
    public S3FigureStorageService(
            @Value("${app.figure-storage.endpoint}") String endpoint,
            @Value("${app.figure-storage.region}") String region,
            @Value("${app.figure-storage.bucket}") String bucket,
            @Value("${app.figure-storage.access-key-id}") String accessKeyId,
            @Value("${app.figure-storage.secret-access-key}") String secretKey) {
        this.bucket = bucket;
        URI endpointUri = URI.create(endpoint);
        StaticCredentialsProvider credentials = StaticCredentialsProvider.create(
                AwsBasicCredentials.create(accessKeyId, secretKey));
        Region awsRegion = Region.of(region);
        S3Configuration s3Config = S3Configuration.builder()
                .pathStyleAccessEnabled(true)
                .build();
        this.s3 = S3Client.builder()
                .endpointOverride(endpointUri)
                .region(awsRegion)
                .credentialsProvider(credentials)
                .serviceConfiguration(s3Config)
                .build();
        this.presigner = S3Presigner.builder()
                .endpointOverride(endpointUri)
                .region(awsRegion)
                .credentialsProvider(credentials)
                .serviceConfiguration(s3Config)
                .build();
    }
    @Override
    public String save(UUID bookId, String figureId, BufferedImage image) {
        String key = "figures/" + bookId + "/" + figureId + ".png";
        try {
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            ImageIO.write(image, "PNG", out);
            byte[] bytes = out.toByteArray();
            s3.putObject(
                    PutObjectRequest.builder().bucket(bucket).key(key)
                            .contentType("image/png").contentLength((long) bytes.length).build(),
                    RequestBody.fromBytes(bytes));
            return key;
        } catch (IOException ex) {
            throw new RuntimeException("Failed to encode figure " + figureId, ex);
        } catch (S3Exception ex) {
            throw new RuntimeException("Failed to upload figure " + figureId + " to S3", ex);
        }
    }
    @Override
    public byte[] getBytes(String key) {
        try {
            return s3.getObjectAsBytes(
                    GetObjectRequest.builder().bucket(bucket).key(key).build()).asByteArray();
        } catch (S3Exception ex) {
            throw new RuntimeException("Failed to download figure from S3: " + key, ex);
        }
    }
    @Override
    public String presignedUrl(String key) {
        GetObjectPresignRequest request = GetObjectPresignRequest.builder()
                .signatureDuration(Duration.ofHours(1))
                .getObjectRequest(r -> r.bucket(bucket).key(key))
                .build();
        return presigner.presignGetObject(request).url().toString();
    }
    @Override
    public void deleteAll(UUID bookId) {
        String prefix = "figures/" + bookId + "/";
        try {
            List<ObjectIdentifier> toDelete = new ArrayList<>();
            ListObjectsV2Request listRequest = ListObjectsV2Request.builder()
                    .bucket(bucket).prefix(prefix).build();
            s3.listObjectsV2Paginator(listRequest).stream()
                    .flatMap(page -> page.contents().stream())
                    .map(S3Object::key)
                    .map(k -> ObjectIdentifier.builder().key(k).build())
                    .forEach(toDelete::add);
            if (toDelete.isEmpty()) return;
            s3.deleteObjects(DeleteObjectsRequest.builder()
                    .bucket(bucket)
                    .delete(Delete.builder().objects(toDelete).build())
                    .build());
            log.info("Deleted {} figures from S3 for book {}", toDelete.size(), bookId);
        } catch (S3Exception ex) {
            log.warn("Could not fully delete figures for book {} from S3: {}", bookId, ex.getMessage());
        }
    }
 }
@@ -0,0 +1,59 @@
 package com.aiteacher.retrieval;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.stereotype.Service;
 import java.util.ArrayList;
 import java.util.List;
 import java.util.Set;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 /**
 * Post-processes generated answers to strip citation labels that do not
 * correspond to any passage retrieved for the current query, preventing
 * hallucinated source references from reaching the user.
 */
@Service
 public class CitationValidatorService {
    private static final Logger log = LoggerFactory.getLogger(CitationValidatorService.class);
    /** Matches citation labels of the form [S1], [F2], [S12], etc. */
    private static final Pattern CITATION_PATTERN = Pattern.compile("\\[(S|F)\\d+\\]");
    /**
     * Removes any {@code [Sx]} / {@code [Fx]} citation in {@code generatedAnswer}
     * whose label is not contained in {@code validLabels}.
     *
     * @param generatedAnswer raw model output
     * @param validLabels     set of labels present in the retrieved context
     * @return cleaned answer text with hallucinated citations removed
     */
    public String validate(String generatedAnswer, Set<String> validLabels) {
        if (generatedAnswer == null) return "";
        Matcher matcher = CITATION_PATTERN.matcher(generatedAnswer);
        List<String> removed = new ArrayList<>();
        StringBuffer sb = new StringBuffer();
        while (matcher.find()) {
            String label = matcher.group();
            String inner = label.substring(1, label.length() - 1); // strip [ ]
            if (validLabels.contains(inner)) {
                matcher.appendReplacement(sb, Matcher.quoteReplacement(label));
            } else {
                removed.add(inner);
                matcher.appendReplacement(sb, "");
            }
        }
        matcher.appendTail(sb);
        if (!removed.isEmpty()) {
            log.warn("Stripped hallucinated citations: {}", removed);
        }
        return sb.toString();
    }
 }
@@ -0,0 +1,7 @@
 package com.aiteacher.retrieval;
 /**
 * Value object holding the original user query alongside its clinically
 * rewritten variant used for vector-store retrieval.
 */
 public record ExpandedQuery(String original, String rewritten) {}
@@ -0,0 +1,27 @@
 package com.aiteacher.retrieval;
 import com.aiteacher.document.FigureEntity;
 import com.aiteacher.document.SectionEntity;
 import java.util.HashSet;
 import java.util.Map;
 import java.util.Set;
 /**
 * Value object produced when building the LLM context prompt.
 * Maps short ref-labels (S1, S2… / F1, F2…) to their source entities
 * and carries the fully formatted prompt text.
 */
 public record LabelledContext(
        Map<String, SectionEntity> sectionLabels,
        Map<String, FigureEntity> figureLabels,
        String promptText) {
    /** Returns the union of all valid citation labels for this context. */
    public Set<String> allLabels() {
        Set<String> labels = new HashSet<>();
        labels.addAll(sectionLabels.keySet());
        labels.addAll(figureLabels.keySet());
        return labels;
    }
 }
@@ -0,0 +1,47 @@
 package com.aiteacher.retrieval;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.ai.chat.client.ChatClient;
 import org.springframework.stereotype.Service;
 /**
 * Rewrites a user query into precise clinical/surgical terminology so that
 * vector-store retrieval can match textbook language even when the user's
 * phrasing differs from the documentation vocabulary.
 */
@Service
 public class QueryExpansionService {
    private static final Logger log = LoggerFactory.getLogger(QueryExpansionService.class);
    private static final String EXPANSION_PROMPT = """
            Rewrite the following question using precise medical and surgical terminology \
            as it would appear in a neurosurgery textbook index. \
            Output only the rewritten question, nothing else.
            Question: %s""";
    private final ChatClient chatClient;
    public QueryExpansionService(ChatClient chatClient) {
        this.chatClient = chatClient;
    }
    /**
     * Returns an {@link ExpandedQuery} whose {@code rewritten} field contains
     * the clinically rephrased version of {@code query}.
     */
    public ExpandedQuery expand(String query) {
        String rewritten = chatClient.prompt()
                .user(EXPANSION_PROMPT.formatted(query))
                .call()
                .content();
        if (rewritten == null || rewritten.isBlank()) {
            rewritten = query;
        }
        log.debug("Query expanded: '{}' → '{}'", query, rewritten);
        return new ExpandedQuery(query, rewritten);
    }
 }
@@ -0,0 +1,7 @@
 package com.aiteacher.topic;
 import java.time.Instant;
 import java.util.UUID;
 public record SavedSummaryItem(UUID id, int summaryNumber, Instant generatedAt) {
 }
@@ -5,6 +5,7 @@ import org.springframework.web.bind.annotation.*;
 import java.util.List;
 import java.util.NoSuchElementException;
 import java.util.UUID;
@RestController
@RequestMapping("/api/v1/topics")
@@ -25,11 +26,30 @@ public class TopicController {
    }
    @PostMapping("/{id}/summary")
-    public ResponseEntity<TopicSummaryResponse> generateSummary(@PathVariable String id) {
+    public ResponseEntity<TopicSummaryResponse> generateSummary(
            @PathVariable String id,
            @RequestParam(defaultValue = "en") String language) {
        Topic topic = topicRepository.findById(id)
            .orElseThrow(() -> new NoSuchElementException("Topic not found."));
-        TopicSummaryResponse response = topicSummaryService.generateSummary(topic);
+        TopicSummaryResponse response = topicSummaryService.generateSummary(topic, language);
        return ResponseEntity.ok(response);
    }
    @GetMapping("/{id}/summaries")
    public ResponseEntity<List<SavedSummaryItem>> listSummaries(@PathVariable String id) {
        topicRepository.findById(id)
            .orElseThrow(() -> new NoSuchElementException("Topic not found."));
        return ResponseEntity.ok(topicSummaryService.listSummaries(id));
    }
    @GetMapping("/{id}/summaries/{summaryId}")
    public ResponseEntity<TopicSummaryResponse> getSummary(@PathVariable String id,
                                                            @PathVariable UUID summaryId) {
        topicRepository.findById(id)
            .orElseThrow(() -> new NoSuchElementException("Topic not found."));
        return ResponseEntity.ok(topicSummaryService.getSummary(summaryId));
    }
 }
@@ -0,0 +1,53 @@
 package com.aiteacher.topic;
 import jakarta.persistence.Column;
 import jakarta.persistence.Entity;
 import jakarta.persistence.GeneratedValue;
 import jakarta.persistence.GenerationType;
 import jakarta.persistence.Id;
 import jakarta.persistence.Table;
 import java.time.Instant;
 import java.util.UUID;
@Entity
@Table(name = "topic_summary")
 public class TopicSummaryEntity {
    @Id
    @GeneratedValue(strategy = GenerationType.UUID)
    private UUID id;
    @Column(name = "topic_id", nullable = false)
    private String topicId;
    @Column(name = "summary_number", nullable = false)
    private int summaryNumber;
    @Column(nullable = false, columnDefinition = "TEXT")
    private String summary;
    @Column(name = "sources_json", nullable = false, columnDefinition = "TEXT")
    private String sourcesJson;
    @Column(name = "generated_at", nullable = false)
    private Instant generatedAt;
    protected TopicSummaryEntity() {}
    public TopicSummaryEntity(String topicId, int summaryNumber, String summary,
                               String sourcesJson, Instant generatedAt) {
        this.topicId = topicId;
        this.summaryNumber = summaryNumber;
        this.summary = summary;
        this.sourcesJson = sourcesJson;
        this.generatedAt = generatedAt;
    }
    public UUID getId() { return id; }
    public String getTopicId() { return topicId; }
    public int getSummaryNumber() { return summaryNumber; }
    public String getSummary() { return summary; }
    public String getSourcesJson() { return sourcesJson; }
    public Instant getGeneratedAt() { return generatedAt; }
 }
@@ -0,0 +1,13 @@
 package com.aiteacher.topic;
 import org.springframework.data.jpa.repository.JpaRepository;
 import java.util.List;
 import java.util.UUID;
 public interface TopicSummaryRepository extends JpaRepository<TopicSummaryEntity, UUID> {
    List<TopicSummaryEntity> findByTopicIdOrderBySummaryNumberAsc(String topicId);
    long countByTopicId(String topicId);
 }
@@ -2,8 +2,11 @@ package com.aiteacher.topic;
 import java.time.Instant;
 import java.util.List;
 import java.util.UUID;
 public record TopicSummaryResponse(
    UUID id,
    int summaryNumber,
    String topicId,
    String topicName,
    String summary,
@@ -11,8 +14,17 @@ public record TopicSummaryResponse(
    Instant generatedAt
 ) {
    public record SourceReference(
        String type,
        String refLabel,
        String bookId,
        String bookTitle,
-        Integer page
+        Integer page,
        String chunkText,
        String figureId,
        String label,
        String caption,
        String figureType,
        String imageUrl
    ) {
    }
 }
@@ -1,21 +1,25 @@
 package com.aiteacher.topic;
 import com.aiteacher.book.Book;
 import com.aiteacher.book.BookRepository;
 import com.aiteacher.book.BookStatus;
 import com.aiteacher.book.NoKnowledgeSourceException;
 import com.aiteacher.document.FigureEntity;
 import com.aiteacher.document.SectionEntity;
 import com.aiteacher.retrieval.NeurosurgeryRetriever;
 import com.aiteacher.retrieval.RetrievalResult;
 import com.fasterxml.jackson.core.JsonProcessingException;
 import com.fasterxml.jackson.databind.ObjectMapper;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.ai.chat.client.ChatClient;
 import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
 import org.springframework.ai.chat.model.ChatResponse;
 import org.springframework.ai.document.Document;
 import org.springframework.ai.vectorstore.VectorStore;
 import org.springframework.stereotype.Service;
 import java.time.Instant;
 import java.util.ArrayList;
 import java.util.List;
-import java.util.Map;
+import java.util.NoSuchElementException;
 import java.util.UUID;
@Service
 public class TopicSummaryService {
@@ -23,86 +27,222 @@ public class TopicSummaryService {
    private static final Logger log = LoggerFactory.getLogger(TopicSummaryService.class);
    private static final String SYSTEM_PROMPT = """
-        You are an expert neurosurgery educator. Your role is to provide accurate,
+        You are an expert neurosurgery educator. Your role is to provide accurate, detailed but synthetically concise educational reports on neurosurgery topics, based on the content retrieved from the uploaded medical textbooks. Your audience is highly experienced neurosurgeons, who are looking for a comprehensive yet digestible overview of a specific topic.
-        clinically relevant summaries based ONLY on the content retrieved from the
+        When generating reports, your primary goal is to distill the most important and clinically relevant information about the topic. This includes key concepts, anatomical details, surgical techniques, clinical considerations, and any other information that would be essential for a neurosurgeon to understand the topic thoroughly.
-        uploaded medical textbooks. Do not use any knowledge outside the provided context.
+        Base your reports on uploaded medical textbooks. Do not use any knowledge outside the provided context.
        When answering:
        - Structure your response clearly with key points
-        - If the context mentions specific book titles and page numbers, reference them
+        - Cite claims using ONLY the reference labels provided in the context (e.g. [S1], [F2]).
          Do not invent page numbers, section titles, or labels not present in the CONTEXT block.
        - Figures (labeled [F1], [F2], etc.) are actual images and drawings from the textbook — they will be rendered as inline illustrations in your response. Use them actively to support your explanations: reference a figure when it visually demonstrates anatomy, a surgical step, or a clinical concept you are describing.
        - If the retrieved context does not contain sufficient information on the topic,
          explicitly state: "The uploaded books do not contain sufficient information on this topic."
        - Never hallucinate or fabricate clinical information
        """;
    private final ChatClient chatClient;
    private final VectorStore vectorStore;
    private final BookRepository bookRepository;
    private final NeurosurgeryRetriever retriever;
    private final TopicSummaryRepository summaryRepository;
    private final ObjectMapper objectMapper;
-    public TopicSummaryService(ChatClient chatClient, VectorStore vectorStore,
+    public TopicSummaryService(ChatClient chatClient,
-                                BookRepository bookRepository) {
+                                BookRepository bookRepository,
                                NeurosurgeryRetriever retriever,
                                TopicSummaryRepository summaryRepository,
                                ObjectMapper objectMapper) {
        this.chatClient = chatClient;
        this.vectorStore = vectorStore;
        this.bookRepository = bookRepository;
        this.retriever = retriever;
        this.summaryRepository = summaryRepository;
        this.objectMapper = objectMapper;
    }
-    public TopicSummaryResponse generateSummary(Topic topic) {
+    public TopicSummaryResponse generateSummary(Topic topic, String language) {
-        if (!bookRepository.existsByStatus(BookStatus.READY)) {
+        List<Book> readyBooks = bookRepository.findAll().stream()
            .filter(b -> b.getStatus() == BookStatus.READY)
            .toList();
        if (readyBooks.isEmpty()) {
            throw new NoKnowledgeSourceException(
                "No books are available as knowledge sources. Please upload and process at least one book.");
        }
        String question = buildQuestion(topic);
-        ChatResponse response = chatClient.prompt()
+        List<SectionEntity> allSections = new ArrayList<>();
-            .system(SYSTEM_PROMPT)
+        List<FigureEntity> allFigures = new ArrayList<>();
-            .advisors(QuestionAnswerAdvisor.builder(vectorStore).build())
+        for (Book book : readyBooks) {
-            .user(question)
+            RetrievalResult result = retriever.retrieve(question, book.getId());
-            .call()
+            allSections.addAll(result.parentSections());
-            .chatResponse();
+            allFigures.addAll(result.figures());
        }
-        String summary = response.getResult().getOutput().getText();
+        log.debug("Topic reports for '{}': {} sections, {} figures retrieved",
-        List<TopicSummaryResponse.SourceReference> sources = extractSources(response);
+            topic.getName(), allSections.size(), allFigures.size());
        String contextPrompt = buildContextPrompt(question, allSections, allFigures, language);
        String summary = chatClient.prompt()
            .system(SYSTEM_PROMPT)
            .user(contextPrompt)
            .call()
            .content();
        List<TopicSummaryResponse.SourceReference> sources = buildSources(allSections, allFigures, readyBooks);
        Instant generatedAt = Instant.now();
        int summaryNumber = (int) summaryRepository.countByTopicId(topic.getId()) + 1;
        String sourcesJson = serializeSources(sources);
        TopicSummaryEntity entity = new TopicSummaryEntity(
            topic.getId(), summaryNumber, summary, sourcesJson, generatedAt);
        entity = summaryRepository.save(entity);
        return new TopicSummaryResponse(
            entity.getId(),
            summaryNumber,
            topic.getId(),
            topic.getName(),
            summary,
            sources,
-            Instant.now()
+            generatedAt
        );
    }
    public List<SavedSummaryItem> listSummaries(String topicId) {
        return summaryRepository.findByTopicIdOrderBySummaryNumberAsc(topicId).stream()
            .map(e -> new SavedSummaryItem(e.getId(), e.getSummaryNumber(), e.getGeneratedAt()))
            .toList();
    }
    public TopicSummaryResponse getSummary(UUID summaryId) {
        TopicSummaryEntity entity = summaryRepository.findById(summaryId)
            .orElseThrow(() -> new NoSuchElementException("Summary not found."));
        List<TopicSummaryResponse.SourceReference> sources = deserializeSources(entity.getSourcesJson());
        return new TopicSummaryResponse(
            entity.getId(),
            entity.getSummaryNumber(),
            entity.getTopicId(),
            entity.getTopicId(),
            entity.getSummary(),
            sources,
            entity.getGeneratedAt()
        );
    }
    private String buildQuestion(Topic topic) {
        return String.format(
-            "Please provide a comprehensive educational summary of the following neurosurgery topic: " +
+            "Provide a comprehensive educational report of the following neurosurgery topic: " +
-            "%s. Topic description: %s. " +
+            "%s. Topic description: %s. ",
            "Include key concepts, clinical considerations, and important details that a neurosurgeon should know.",
            topic.getName(), topic.getDescription()
        );
    }
-    private List<TopicSummaryResponse.SourceReference> extractSources(ChatResponse response) {
+    private String buildContextPrompt(String question,
-        List<TopicSummaryResponse.SourceReference> sources = new ArrayList<>();
+                                      List<SectionEntity> sections,
                                      List<FigureEntity> figures,
                                      String language) {
        StringBuilder sb = new StringBuilder();
-        if (response.getMetadata() != null) {
+        if (!sections.isEmpty()) {
-            Object retrieved = response.getMetadata().get(QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS);
+            sb.append("CONTEXT:\n\n");
-            if (retrieved instanceof List<?> docs) {
+            for (int i = 0; i < sections.size(); i++) {
-                for (Object docObj : docs) {
+                SectionEntity s = sections.get(i);
-                    if (docObj instanceof Document doc) {
+                sb.append("[S").append(i + 1).append("] ")
-                        Map<String, Object> metadata = doc.getMetadata();
+                  .append(s.getTitle()).append(", p.").append(s.getPageStart()).append("\n");
-                        String bookTitle = (String) metadata.get("book_title");
+                sb.append(s.getFullText()).append("\n\n");
                        Object pageObj = metadata.get("page_number");
                        Integer page = pageObj instanceof Number n ? n.intValue() : null;
                        if (bookTitle != null) {
                            sources.add(new TopicSummaryResponse.SourceReference(bookTitle, page));
                        }
                    }
                }
            }
        }
-        // Deduplicate by bookTitle + page
+        if (!figures.isEmpty()) {
-        return sources.stream().distinct().toList();
+            sb.append("AVAILABLE FIGURES:\n");
            for (int i = 0; i < figures.size(); i++) {
                FigureEntity f = figures.get(i);
                sb.append("[F").append(i + 1).append("] ")
                  .append(f.getLabel() != null ? f.getLabel() : "Figure")
                  .append(" (p.").append(f.getPage()).append("): ")
                  .append(f.getCaption() != null ? f.getCaption() : "")
                  .append("\n");
            }
            sb.append("\nWhen referencing diagrams, use their label from the context (e.g. [F1]).\n\n");
        }
        sb.append("QUESTION:\n").append(question);
        if ("th".equalsIgnoreCase(language)) {
            sb.append("\n\nIMPORTANT: Write the narrative in Thai. ")
              .append("Keep all medical, anatomical, surgical, pharmacological, and clinical ")
              .append("terminology in English (e.g., cerebellopontine angle, glioblastoma, craniotomy, ")
              .append("dexamethasone). Do NOT translate disease names, anatomical structures, drug names, ")
              .append("procedures, eponyms, or imaging modalities. Translate only connective prose, ")
              .append("explanations, and general descriptions. Citation labels [S#]/[F#] stay unchanged. ")
              .append("The sentinel string for insufficient context must remain exactly: ")
              .append("\"The uploaded books do not contain sufficient information on this topic.\"");
        }
        return sb.toString();
    }
    private List<TopicSummaryResponse.SourceReference> buildSources(List<SectionEntity> sections,
                                                                      List<FigureEntity> figures,
                                                                      List<Book> readyBooks) {
        List<TopicSummaryResponse.SourceReference> sources = new ArrayList<>();
        for (int i = 0; i < sections.size(); i++) {
            SectionEntity s = sections.get(i);
            Book book = readyBooks.stream()
                .filter(b -> b.getId().equals(s.getBookId()))
                .findFirst()
                .orElse(null);
            String title = book != null ? book.getTitle() : "Book";
            String bookId = book != null ? book.getId().toString() : null;
            sources.add(new TopicSummaryResponse.SourceReference(
                "TEXT", "S" + (i + 1), bookId, title, s.getPageStart(),
                truncate(s.getFullText(), 500), null, null, null, null, null));
        }
        for (int i = 0; i < figures.size(); i++) {
            FigureEntity f = figures.get(i);
            Book book = readyBooks.stream()
                .filter(b -> b.getId().equals(f.getBookId()))
                .findFirst()
                .orElse(null);
            String title = book != null ? book.getTitle() : "Book";
            String bookId = book != null ? book.getId().toString() : null;
            String filename = f.getImagePath().substring(f.getImagePath().lastIndexOf('/') + 1);
            String imageUrl = "/api/v1/figures/" + f.getBookId() + "/" + filename;
            sources.add(new TopicSummaryResponse.SourceReference(
                "FIGURE", "F" + (i + 1), bookId, title, f.getPage(),
                null, f.getId(), f.getLabel(), f.getCaption(),
                f.getFigureType().name(), imageUrl));
        }
        return sources;
    }
    private String serializeSources(List<TopicSummaryResponse.SourceReference> sources) {
        try {
            return objectMapper.writeValueAsString(sources);
        } catch (JsonProcessingException e) {
            log.warn("Failed to serialize sources, storing empty array", e);
            return "[]";
        }
    }
    private String truncate(String text, int maxChars) {
        if (text == null) return "";
        return text.length() <= maxChars ? text : text.substring(0, maxChars) + "…";
    }
    private List<TopicSummaryResponse.SourceReference> deserializeSources(String json) {
        try {
            return objectMapper.readValue(json,
                objectMapper.getTypeFactory().constructCollectionType(
                    List.class, TopicSummaryResponse.SourceReference.class));
        } catch (JsonProcessingException e) {
            log.warn("Failed to deserialize sources from stored JSON", e);
            return List.of();
        }
    }
 }
@@ -7,7 +7,7 @@ spring:
  jpa:
    hibernate:
-      ddl-auto: update
+      ddl-auto: none
    show-sql: false
    properties:
      hibernate:
@@ -27,10 +27,11 @@ spring:
        index-type: HNSW
        initialize-schema: false
    openai:
-      api-key: ${OPENAI_API_KEY}
+      api-key: ${OPENAI_API_KEY:}
      chat:
        options:
-          model: gpt-4o
+          model: o4-mini
          reasoning-effort: high      
      embedding:
        options:
          model: "text-embedding-3-small"
@@ -52,11 +53,24 @@ logging:
    "[org.apache.pdfbox]": ERROR
 app:
  features:
    upload-enabled: ${UPLOAD_ENABLED:true}
    delete-enabled: ${DELETE_ENABLED:true}
  auth:
    username: ${APP_AUTH_USERNAME:neurosurgeon}
    password: ${APP_PASSWORD:changeme}
  figure-storage:
-    base-path: ${FIGURE_STORAGE_PATH:./uploads}
+    endpoint: ${S3_ENDPOINT:https://s3.immich-ad.ovh}
    region: ${S3_REGION:garage}
    bucket: ${S3_BUCKET:aiteacher}
    access-key-id: ${S3_ACCESS_KEY_ID:}
    secret-access-key: ${S3_SECRET_ACCESS_KEY:}
    min-image-size-px: 100
  embedding:
    batch-size: 20
    batch-delay-ms: 2000
    skip-embedding: false
  marker:
    base-url: ${MARKER_BASE_URL:http://192.168.1.105:8000}
  vision:
    min-interval-ms: ${VISION_MIN_INTERVAL_MS:2000}
@@ -0,0 +1,10 @@
 CREATE TABLE topic_summary (
    id             UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    topic_id       VARCHAR(100) NOT NULL,
    summary_number INT          NOT NULL,
    summary        TEXT         NOT NULL,
    sources_json   TEXT         NOT NULL,
    generated_at   TIMESTAMPTZ  NOT NULL
 );
 CREATE INDEX idx_topic_summary_topic_id ON topic_summary(topic_id, summary_number);
@@ -0,0 +1,14 @@
 CREATE TABLE chunk_metadata (
    chunk_id      UUID         PRIMARY KEY,
    book_id       UUID         NOT NULL,
    section_id    VARCHAR(200) NOT NULL,
    facet         VARCHAR(32)  NOT NULL,
    entities      JSONB        NOT NULL,
    summary       TEXT         NOT NULL,
    model_version VARCHAR(32)  NOT NULL,
    enriched_at   TIMESTAMPTZ  NOT NULL
 );
 CREATE INDEX idx_chunk_metadata_book         ON chunk_metadata(book_id);
 CREATE INDEX idx_chunk_metadata_book_facet   ON chunk_metadata(book_id, facet);
 CREATE INDEX idx_chunk_metadata_entities_gin ON chunk_metadata USING GIN (entities jsonb_path_ops);
@@ -0,0 +1,11 @@
 CREATE TABLE concept_report (
    id            UUID         PRIMARY KEY DEFAULT gen_random_uuid(),
    topic_id      VARCHAR(100) NOT NULL,
    report_number INT          NOT NULL,
    facets_json   TEXT         NOT NULL,
    sources_json  TEXT         NOT NULL,
    generated_at  TIMESTAMPTZ  NOT NULL,
    UNIQUE (topic_id, report_number)
 );
 CREATE INDEX idx_concept_report_topic ON concept_report(topic_id, report_number);
@@ -0,0 +1,19 @@
 ALTER TABLE chunk_metadata DROP CONSTRAINT IF EXISTS chunk_metadata_facet_check;
 ALTER TABLE chunk_metadata
    ADD CONSTRAINT chunk_metadata_facet_check
    CHECK (facet IN (
        'DEFINITION',
        'ANATOMY',
        'PATHOPHYSIOLOGY',
        'EPIDEMIOLOGY',
        'CLINICAL_PRESENTATION',
        'IMAGING',
        'CLASSIFICATION',
        'INDICATIONS',
        'SURGICAL_TECHNIQUE',
        'NONSURGICAL_MANAGEMENT',
        'COMPLICATIONS',
        'OUTCOMES_FOLLOWUP',
        'OTHER'
    ));
@@ -0,0 +1,172 @@
 # Concept Retrieval via Indexing-Time Chunk Enrichment
 ## Context
 Vector similarity alone can't answer "tell me everything about aneurysms." It surfaces the chunks most *linguistically* similar to the query, not the set of all chunks that *concern* the concept — and it has no notion of whether each chunk is a definition, a case, a technique, or a complication.
 The unlock is to move intelligence from query time to indexing time: for every text chunk, use an LLM to extract **structured metadata** (entities, facet, summary). At retrieval time, concept lookup becomes an SQL filter (`entities @> ['aneurysm']`) bucketed by facet — deterministic, exhaustive, and organized by default. Vector search remains as a fallback for typos / synonyms and for ranking within a facet.
 This plan covers: (1) defining the metadata schema, (2) enriching chunks during new book ingestion, (3) back-filling the already-embedded corpus via an admin endpoint, (4) a new concept retrieval path, and (5) a Topics-page UI to surface the result.
 ## Approach
 ### 1. Data model — new `chunk_metadata` table
 Flyway migration `backend/src/main/resources/db/migration/V7__chunk_metadata.sql`:
 ```sql
 CREATE TABLE chunk_metadata (
    chunk_id        VARCHAR(64) PRIMARY KEY,       -- same UUID that TextChunkingService issues and stores in vectorstore
    book_id         UUID NOT NULL,
    section_id      VARCHAR(255) NOT NULL,
    facet           VARCHAR(32) NOT NULL,           -- enum (see ConceptFacet)
    entities        JSONB NOT NULL,                 -- canonical lowercase string[]
    summary         TEXT NOT NULL,
    model_version   VARCHAR(32) NOT NULL,           -- records which LLM/prompt version tagged this chunk
    enriched_at     TIMESTAMPTZ NOT NULL
 );
 CREATE INDEX idx_chunk_metadata_book         ON chunk_metadata(book_id);
 CREATE INDEX idx_chunk_metadata_book_facet   ON chunk_metadata(book_id, facet);
 CREATE INDEX idx_chunk_metadata_entities_gin ON chunk_metadata USING GIN (entities jsonb_path_ops);
 ```
 Why `chunk_id` is the natural key: `TextChunkingService` already generates a UUID per chunk, uses it as the pgvector Document id, stores it in metadata, and it's the key in `ChunkFigureRefEntity` — so the table joins cleanly to everything already in place.
 ### 2. Enrichment service & facet taxonomy
 New package `com.aiteacher.enrichment`:
 - `ConceptFacet` enum — 13 values tailored to neurosurgery textbooks: `DEFINITION, ANATOMY, PATHOPHYSIOLOGY, EPIDEMIOLOGY, CLINICAL_PRESENTATION, IMAGING, CLASSIFICATION, INDICATIONS, SURGICAL_TECHNIQUE, NONSURGICAL_MANAGEMENT, COMPLICATIONS, OUTCOMES_FOLLOWUP, OTHER`. `OTHER` is mandatory so the LLM always has an out (no hallucinated bucketing). The prompt carries explicit disambiguation rules (named grading scales → `CLASSIFICATION`; imaging of a complication → `COMPLICATIONS`; tools inside an operation → `SURGICAL_TECHNIQUE`).
 - `ChunkEnrichmentResult` — record `(List<String> entities, ConceptFacet facet, String summary)`
 - `ChunkEnrichmentService` — single method `enrich(String chunkText, SectionEntity section, String bookTitle) → ChunkEnrichmentResult`. Uses Spring AI `ChatClient.prompt().call().entity(Class)` for structured output. The prompt gives: book title, section title, chunk text, the fixed facet enum list, and instructs the model to return JSON with entities normalised to lowercase singular canonical form (e.g. "aneurysms" → "aneurysm"; "SAH" → "subarachnoid hemorrhage"). Caps entities at ~8 per chunk.
 - `ChunkMetadataEntity` + `ChunkMetadataRepository` — JPA entity/repo mirroring the table.
 Model version string (e.g. `"v1"`) lives on the service and is stamped into each row so a future prompt rev can be rolled out by filtering `model_version <> 'v2'` in the backfill job.
 ### 3. Hook into new book ingestion
 Modify `BookEmbeddingService.embedBook`:
 ```java
 // Step 3: Chunk and embed text
 List<Document> allChunks = new ArrayList<>();
 for (SectionEntity section : sections) {
    allChunks.addAll(textChunkingService.chunk(section, bookTitle));
 }
 if (skipEmbedding) { ... } else {
    embedInBatches(allChunks, bookId);
    chunkEnrichmentPipeline.enrichAndPersist(allChunks, sectionsById, bookTitle);  // NEW
 }
 ```
 - `ChunkEnrichmentPipeline` — new orchestrator that iterates chunks, calls `ChunkEnrichmentService.enrich(...)` per chunk, saves `ChunkMetadataEntity` rows in batches, with the same throttle pattern as `embedInBatches`.
 - Runs *after* embedding, not in place of it, so a failure in enrichment doesn't corrupt the vector store. On failure, log and continue — the backfill endpoint is the universal recovery path.
 - Extend `deleteBookChunks` to also delete `chunk_metadata` rows so deletion stays consistent.
 ### 4. Backfill endpoint for already-embedded books
 New `EnrichmentController` in `com.aiteacher.enrichment`:
 - `POST /api/v1/admin/books/{id}/enrich` → kicks off async backfill, returns 202 with `{status, chunksTotal, chunksEnriched}`
 - `GET  /api/v1/admin/books/{id}/enrich` → returns progress
 Backfill flow (`EnrichmentBackfillService.backfillBook(UUID bookId)`):
 1. Query the pgvector storage table directly via `JdbcTemplate` for all chunks of the book:
   ```sql
   SELECT id, content, metadata
   FROM vector_store
   WHERE metadata->>'book_id' = ? AND metadata->>'type' = 'TEXT'
   ```
 2. Left-anti-join against `chunk_metadata` to skip already-enriched chunks → idempotent, resumable.
 3. For each missing chunk: look up its `SectionEntity` via `section_id` in metadata, call `ChunkEnrichmentService.enrich`, write a `ChunkMetadataEntity` row.
 4. Progress tracked in an in-memory `ConcurrentHashMap<UUID, BackfillProgress>` (POC scope — no cross-restart resumability needed because the left-anti-join makes re-runs free).
 5. `@Async` on the backfill method using the same executor as `embedBook`.
 ### 5. Concept retrieval path
 New `com.aiteacher.concept.ConceptRetriever`:
 ```java
 public ConceptRetrievalResult retrieveByConcept(String conceptKeyword, UUID bookId) {
    String canonical = canonicalise(conceptKeyword);   // lowercase, trim, simple plural strip
    // 5a. Primary: SQL entity match, grouped by facet
    List<ChunkMetadataEntity> hits = chunkMetadataRepository
        .findByBookIdAndEntityContains(bookId, canonical);   // WHERE entities @> to_jsonb(?::text)
    if (hits.isEmpty()) {
        // 5b. Fallback: vector search, then enrich-join + facet-group
        List<Document> vectorHits = vectorStore.similaritySearch(/* TEXT filter, book_id filter, topK=30 */);
        List<String> chunkIds = vectorHits.stream().map(Document::getId).toList();
        hits = chunkMetadataRepository.findByChunkIdIn(chunkIds);
    }
    Map<ConceptFacet, List<ChunkMetadataEntity>> byFacet = hits.stream()
        .collect(groupingBy(ChunkMetadataEntity::getFacet, LinkedHashMap::new, toList()));
    // Hydrate: load SectionEntity for each chunk's section_id; load linked figures
    // via ChunkFigureRefRepository.findByChunkIdIn(chunkIds) — reuses existing linkage.
    return assemble(byFacet, ...);
 }
 ```
 `ConceptRetrievalResult` = `Map<ConceptFacet, FacetBundle>` where each `FacetBundle` holds the parent sections, linked figures, and the per-chunk `summary` strings.
 Cross-book aggregation: caller loops over READY books and merges bundles by facet.
 ### 6. Concept Report service & controller
 New `ConceptReportService` in `com.aiteacher.concept` — mirrors the shape of `TopicSummaryService`, but:
 - Calls `ConceptRetriever.retrieveByConcept(topic.getName(), bookId)` per book.
 - For each facet that has hits, sends **one** LLM synthesis call with the chunks/figures of that facet — producing a structured, facet-labelled report.
 - Persists in a new `concept_report` table:
 ```sql
 CREATE TABLE concept_report (
    id            UUID PRIMARY KEY,
    topic_id      VARCHAR(255) NOT NULL REFERENCES topic(id),
    report_number INT NOT NULL,
    facets_json   JSONB NOT NULL,        -- [{facetKey,title,markdown,refLabels[]}, ...]
    sources_json  JSONB NOT NULL,        -- deduplicated SourceReference[]
    generated_at  TIMESTAMPTZ NOT NULL,
    UNIQUE (topic_id, report_number)
 );
 ```
 Controller `ConceptReportController` exposes three endpoints under `/api/v1/topics/{id}/concept-reports` (POST generate, GET list, GET `/{reportId}`).
 Reuses `TopicSummaryResponse.SourceReference` verbatim.
 ### 7. Frontend
 - `frontend/src/stores/topicStore.ts`: add parallel state `conceptReportList`, `activeConceptReport`, `conceptReportLoading`, and actions mirroring the existing summary ones.
 - `frontend/src/views/TopicsView.vue`: add a **Summary / Concept Report** tab toggle at the top of the topic panel. Concept Report reuses the history-chips + Generate button UI. Report body renders each `FacetSection` as `<h3>{title}</h3>` + markdown.
 - Loading hint: update the "up to 30 seconds" copy to "up to 60 seconds".
 ### 8. README update
 Add an **Indexing Pipeline** diagram showing: PDF → parse → chunk → embed → **enrich (new)** → chunk_metadata. Plus a **Concept Retrieval** sequence diagram: query → entity-match SQL → facet-grouped bundle → synthesis → report.
 ## Decisions & trade-offs
 - **Storage as separate Postgres table, not vectorstore JSON**: vectorstore has no metadata-only update API, backfill would require delete+reinsert (re-embedding cost). A dedicated table joins cleanly on `chunk_id` and is GIN-indexed.
 - **Entity-match primary, vector fallback**: deterministic for the main use case, robust against typos/synonyms. Vector search stays the default for normal chat retrieval — this feature is additive.
 - **Enrichment runs *after* embedding, not before**: keeps the two failure modes independent. The backfill endpoint is the universal recovery lever.
 - **Fixed 9-value facet enum** (incl. `OTHER`): constrains LLM outputs; `OTHER` prevents forced mis-bucketing.
 - **Direct `JdbcTemplate` read against `vector_store` for backfill**: Spring AI exposes no listing API. Acceptable for a POC, isolated behind one method.
 - **Synchronous (sequential) LLM calls**: simplest; parallelism is a later optimisation if needed.
 - **`model_version` column**: cheap insurance. If the prompt or facet taxonomy changes, backfill can re-enrich only stale rows.
 ## Verification
 1. Migration applies V7 and V8. Tables and indexes created.
 2. New book ingestion: upload PDF → `chunk_metadata` populated with plausible entities/facets/summaries.
 3. Backfill: POST `/api/v1/admin/books/{id}/enrich` → idempotent, completes, re-run is a no-op.
 4. Concept retrieval primary path: POST `/api/v1/topics/aneurysm/concept-reports` → 200 with facets populated.
 5. Fallback path: misspelled topic still returns results via vector fallback.
 6. Frontend: Concept Report tab renders facet-labelled markdown + sources + inline figures; persists across reloads.
 7. Deletion: removing a book cascades to `chunk_metadata` rows.
 8. Regression: existing chat and summary flows still work.
 9. Lint & tests pass.
@@ -0,0 +1,37 @@
 version: '3.9'
 services:
  postgres:
    image: pgvector/pgvector:pg16
    container_name: aiteacher-postgres-native
    environment:
      POSTGRES_DB: aiteacher
      POSTGRES_USER: aiteacher
      POSTGRES_PASSWORD: aiteacher
    ports:
      - "5432:5432"
    volumes:
      - pgdata_native:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U aiteacher -d aiteacher"]
      interval: 10s
      timeout: 5s
      retries: 5
  backend:
    image: ai-teacher-backend:latest
    container_name: aiteacher-backend-native
    env_file:
      - .env
    environment:
      SPRING_DATASOURCE_URL: jdbc:postgresql://postgres:5432/aiteacher
      SPRING_DATASOURCE_USERNAME: aiteacher
      SPRING_DATASOURCE_PASSWORD: aiteacher
    ports:
      - "8080:8080"
    depends_on:
      postgres:
        condition: service_healthy
 volumes:
  pgdata_native:
@@ -3,5 +3,12 @@
 # In production point it directly at the backend, e.g. https://api.example.com/api/v1
 VITE_API_URL=/api/v1
-# Shared password for HTTP Basic auth (must match APP_PASSWORD on the backend).
+# Credentials are no longer configured here. Users enter their username and
-VITE_APP_PASSWORD=changeme
+# password via the login form. The backend validates them via HTTP Basic Auth.
 # Configure the backend credentials with APP_AUTH_USERNAME and APP_PASSWORD.
 # Set to 'false' to hide the upload UI (frontend). Also set UPLOAD_ENABLED=false on the backend to block the endpoint.
 VITE_UPLOAD_ENABLED=true
 # Set to 'false' to hide the delete button (frontend). Also set DELETE_ENABLED=false on the backend to block the endpoint.
 VITE_DELETE_ENABLED=true
@@ -1,5 +1,5 @@
 # ---- Build stage ----
-FROM node:20-alpine AS build
+FROM docker.io/library/node:20-alpine AS build
 WORKDIR /app
 COPY package*.json ./
 RUN npm ci
@@ -7,8 +7,10 @@ COPY . .
 RUN npm run build
 # ---- Runtime stage (nginx) ----
-FROM nginx:alpine
+FROM docker.io/library/nginx:alpine
 COPY --from=build /app/dist /usr/share/nginx/html
 COPY nginx.conf /etc/nginx/conf.d/default.conf
 COPY docker-entrypoint.sh /docker-entrypoint.sh
 RUN chmod +x /docker-entrypoint.sh
 EXPOSE 80
-CMD ["nginx", "-g", "daemon off;"]
+ENTRYPOINT ["/docker-entrypoint.sh"]
@@ -0,0 +1,16 @@
 #!/bin/sh
 set -e
 # Write runtime env vars into a JS file loaded before the app bundle.
 # Any VITE_* variable passed via `docker run -e` will be available as
 # window.__env__.VITE_* inside the browser.
 cat > /usr/share/nginx/html/env-config.js <<EOF
 window.__env__ = {
  VITE_API_URL: "${VITE_API_URL:-}",
  VITE_APP_PASSWORD: "${VITE_APP_PASSWORD:-}",
  VITE_UPLOAD_ENABLED: "${VITE_UPLOAD_ENABLED:-}",
  VITE_DELETE_ENABLED: "${VITE_DELETE_ENABLED:-}"
 };
 EOF
 exec nginx -g "daemon off;"
@@ -8,6 +8,7 @@
  </head>
  <body>
    <div id="app"></div>
    <script src="/env-config.js"></script>
    <script type="module" src="/src/main.ts"></script>
  </body>
 </html>
@@ -6,23 +6,31 @@
        <span class="brand-name">AI Teacher</span>
        <span class="brand-subtitle">Neurosurgeon Learning Platform</span>
      </div>
-      <ul class="navbar-links">
+      <template v-if="authStore.isAuthenticated">
-        <li>
+        <button class="burger" :class="{ open: menuOpen }" @click="menuOpen = !menuOpen" aria-label="Menu">
-          <RouterLink to="/" :class="{ active: $route.path === '/' }">
+          <span></span><span></span><span></span>
-            <span class="nav-icon">📚</span> Library
+        </button>
-          </RouterLink>
+        <div class="nav-drawer" :class="{ open: menuOpen }" @click="menuOpen = false">
-        </li>
+          <ul class="navbar-links">
-        <li>
+            <li>
-          <RouterLink to="/topics" :class="{ active: $route.path === '/topics' }">
+              <RouterLink to="/" :class="{ active: $route.path === '/' }">
-            <span class="nav-icon">🗂</span> Topics
+                <span class="nav-icon">📚</span> Library
-          </RouterLink>
+              </RouterLink>
-        </li>
+            </li>
-        <li>
+            <li>
-          <RouterLink to="/chat" :class="{ active: $route.path === '/chat' }">
+              <RouterLink to="/topics" :class="{ active: $route.path === '/topics' }">
-            <span class="nav-icon">💬</span> Chat
+                <span class="nav-icon">🗂</span> Topics
-          </RouterLink>
+              </RouterLink>
-        </li>
+            </li>
-      </ul>
+            <li>
              <RouterLink to="/chat" :class="{ active: $route.path === '/chat' }">
                <span class="nav-icon">💬</span> Chat
              </RouterLink>
            </li>
          </ul>
          <button class="btn btn-logout" @click.stop="logout">Sign out</button>
        </div>
      </template>
    </nav>
    <main class="main-content">
@@ -35,12 +43,26 @@
 </template>
 <script setup lang="ts">
-import { ref, provide } from 'vue'
+import { ref, provide, watch } from 'vue'
-import { RouterLink, RouterView } from 'vue-router'
+import { RouterLink, RouterView, useRouter, useRoute } from 'vue-router'
 import { useAuthStore } from '@/stores/authStore'
 const authStore = useAuthStore()
 const router = useRouter()
 const route = useRoute()
 const menuOpen = ref(false)
 const toastMessage = ref('')
 const toastType = ref<'toast-error' | 'toast-success'>('toast-error')
 // Close menu on navigation
 watch(() => route.path, () => { menuOpen.value = false })
 function logout() {
  authStore.clearCredentials()
  router.push({ name: 'login' })
 }
 function showToast(message: string, type: 'error' | 'success' = 'error') {
  toastMessage.value = message
  toastType.value = type === 'error' ? 'toast-error' : 'toast-success'
@@ -64,11 +86,11 @@ body {
    Ubuntu, Cantarell, 'Fira Sans', 'Droid Sans', 'Helvetica Neue', sans-serif;
  background: #f0f4f8;
  color: #2d3748;
-  min-height: 100vh;
+  height: 100vh;
 }
 #app {
-  min-height: 100vh;
+  height: 100vh;
  display: flex;
  flex-direction: column;
 }
@@ -82,6 +104,9 @@ body {
  justify-content: space-between;
  height: 64px;
  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3);
  position: sticky;
  top: 0;
  z-index: 100;
 }
 .navbar-brand {
@@ -106,6 +131,13 @@ body {
  margin-left: 0.25rem;
 }
 /* Desktop: links inline */
 .nav-drawer {
  display: flex;
  align-items: center;
  gap: 0.5rem;
 }
 .navbar-links {
  list-style: none;
  display: flex;
@@ -131,8 +163,38 @@ body {
  color: white;
 }
 /* Burger button — hidden on desktop */
 .burger {
  display: none;
  flex-direction: column;
  justify-content: center;
  gap: 5px;
  width: 36px;
  height: 36px;
  background: transparent;
  border: none;
  cursor: pointer;
  padding: 4px;
  border-radius: 6px;
 }
 .burger span {
  display: block;
  height: 2px;
  background: #bee3f8;
  border-radius: 2px;
  transition: transform 0.2s, opacity 0.2s;
 }
 .burger.open span:nth-child(1) { transform: translateY(7px) rotate(45deg); }
 .burger.open span:nth-child(2) { opacity: 0; }
 .burger.open span:nth-child(3) { transform: translateY(-7px) rotate(-45deg); }
 .main-content {
  flex: 1;
  min-height: 0;
  display: flex;
  flex-direction: column;
  padding: 2rem;
  max-width: 1200px;
  margin: 0 auto;
@@ -224,6 +286,20 @@ body {
  background: #cbd5e0;
 }
 .btn-logout {
  background: transparent;
  color: #bee3f8;
  border: 1px solid #4a90b8;
  font-size: 0.85rem;
  padding: 0.4rem 0.9rem;
  margin-left: 1rem;
 }
 .btn-logout:hover {
  background: #2b6cb0;
  color: white;
 }
 .spinner {
  display: inline-block;
  width: 20px;
@@ -284,4 +360,63 @@ body {
  font-size: 0.9rem;
  margin-top: 0.5rem;
 }
@media (max-width: 768px) {
  .navbar {
    padding: 0 1rem;
  }
  .brand-subtitle {
    display: none;
  }
  /* Show burger, hide desktop drawer */
  .burger {
    display: flex;
  }
  .nav-drawer {
    display: none;
    position: absolute;
    top: 64px;
    right: 0;
    left: 0;
    background: #1a365d;
    flex-direction: column;
    align-items: stretch;
    padding: 0.5rem 0 1rem;
    box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3);
    z-index: 99;
  }
  .nav-drawer.open {
    display: flex;
  }
  .navbar-links {
    flex-direction: column;
    gap: 0;
  }
  .navbar-links a {
    padding: 0.85rem 1.5rem;
    border-radius: 0;
    font-size: 1rem;
  }
  .navbar-links a:hover,
  .navbar-links a.active {
    background: #2b6cb0;
  }
  .btn-logout {
    margin: 0.5rem 1.5rem 0;
    width: calc(100% - 3rem);
    justify-content: center;
  }
  .main-content {
    padding: 1rem;
  }
 }
 </style>
@@ -32,8 +32,32 @@
      <span>{{ book.status === 'PENDING' ? 'Queued for processing...' : 'Embedding in progress...' }}</span>
    </div>
    <div v-if="enrichProgress && enrichProgress.status === 'RUNNING'" class="processing-indicator">
      <div class="spinner spinner-dark"></div>
      <span>Enriching chunks {{ enrichProgress.chunksEnriched }} / {{ enrichProgress.chunksTotal }}</span>
    </div>
    <div v-if="enrichFeedback" class="enrich-feedback">{{ enrichFeedback }}</div>
    <div class="book-actions">
      <router-link
        v-if="book.status === 'READY'"
        :to="{ name: 'book-reader', params: { id: book.id } }"
        class="btn btn-secondary"
      >
        Read
      </router-link>
      <button
        v-if="book.status === 'READY' && uploadEnabled"
        class="btn btn-secondary"
        :disabled="enrichRunning"
        @click="handleEnrich"
        title="Enrich chunks with concept metadata"
      >
        {{ enrichRunning ? 'Enriching...' : 'Enrich' }}
      </button>
      <button
        v-if="deleteEnabled"
        class="btn btn-danger"
        :disabled="book.status === 'PROCESSING' || deleting"
        @click="$emit('delete', book.id)"
@@ -46,18 +70,62 @@
 </template>
 <script setup lang="ts">
-import { computed } from 'vue'
+import { computed, onUnmounted, ref } from 'vue'
-import type { Book } from '@/stores/bookStore'
+import type { Book, EnrichmentProgress } from '@/stores/bookStore'
 import { useBookStore } from '@/stores/bookStore'
 import { env } from '@/env';
 const props = defineProps<{
  book: Book
  deleting?: boolean
  deleteEnabled?: boolean
 }>()
 defineEmits<{
  (e: 'delete', id: string): void
 }>()
 const bookStore = useBookStore()
 const enrichProgress = ref<EnrichmentProgress | null>(null)
 const enrichFeedback = ref<string | null>(null)
 let pollTimer: ReturnType<typeof setInterval> | null = null
 const enrichRunning = computed(() => enrichProgress.value?.status === 'RUNNING')
 const uploadEnabled = env('VITE_UPLOAD_ENABLED') !== 'false'
 async function handleEnrich() {
  enrichFeedback.value = null
  const started = await bookStore.startEnrichment(props.book.id)
  if (!started) {
    enrichFeedback.value = bookStore.error ?? 'Enrichment failed to start.'
    return
  }
  enrichProgress.value = started
  startPolling()
 }
 function startPolling() {
  stopPolling()
  pollTimer = setInterval(async () => {
    const status = await bookStore.fetchEnrichmentStatus(props.book.id)
    if (!status) return
    enrichProgress.value = status
    if (status.status === 'COMPLETED') {
      stopPolling()
      enrichFeedback.value = `Enriched ${status.chunksEnriched} / ${status.chunksTotal} chunks.`
    }
  }, 2000)
 }
 function stopPolling() {
  if (pollTimer != null) {
    clearInterval(pollTimer)
    pollTimer = null
  }
 }
 onUnmounted(stopPolling)
 const statusClass = computed(() => {
  switch (props.book.status) {
    case 'READY':
@@ -181,6 +249,16 @@ function formatDate(iso: string): string {
 .book-actions {
  display: flex;
  justify-content: flex-end;
  gap: 0.5rem;
  margin-top: 0.25rem;
 }
 .enrich-feedback {
  font-size: 0.8rem;
  color: #22543d;
  background: #f0fff4;
  border: 1px solid #c6f6d5;
  border-radius: 6px;
  padding: 0.4rem 0.6rem;
 }
 </style>
@@ -0,0 +1,239 @@
 <template>
  <div class="book-panel">
    <div class="book-panel-header">
      <span class="book-panel-title">{{ bookTitle || 'Book' }} — p.&nbsp;{{ page }}</span>
      <div class="book-panel-nav">
        <button class="nav-btn" :disabled="page <= 1" @click="emit('navigate', page - 1)">&#8592;</button>
        <button class="nav-btn" @click="emit('navigate', page + 1)">&#8594;</button>
      </div>
      <button class="close-btn" @click="emit('close')" title="Close">&#x2715;</button>
    </div>
    <div class="book-panel-body">
      <div v-if="loading" class="panel-loading">
        <div class="spinner spinner-dark" style="width:24px;height:24px;margin:0 auto 0.5rem;"></div>
        <p>Loading page {{ page }}…</p>
      </div>
      <div v-else-if="error" class="panel-error">{{ error }}</div>
      <div v-else class="markdown-body" v-html="renderedHtml"></div>
    </div>
  </div>
 </template>
 <script setup lang="ts">
 import { ref, watch, onMounted, onUnmounted } from 'vue'
 import { api } from '@/services/api'
 const props = defineProps<{
  bookId: string
  page: number
  bookTitle?: string
 }>()
 const emit = defineEmits<{
  close: []
  navigate: [page: number]
 }>()
 const loading = ref(false)
 const error = ref<string | null>(null)
 const renderedHtml = ref('')
 let activeBlobUrls: string[] = []
 onMounted(() => loadPage(props.page))
 watch(() => [props.bookId, props.page], () => loadPage(props.page))
 onUnmounted(() => {
  activeBlobUrls.forEach(u => URL.revokeObjectURL(u))
 })
 async function loadPage(page: number) {
  loading.value = true
  error.value = null
  renderedHtml.value = ''
  activeBlobUrls.forEach(u => URL.revokeObjectURL(u))
  activeBlobUrls = []
  try {
    const res = await api.get<string>(`/books/${props.bookId}/pages/${page}/html`, {
      headers: { Accept: 'text/html' },
      responseType: 'text'
    })
    renderedHtml.value = await resolveImages(res.data)
  } catch (e: any) {
    error.value = e.message ?? 'Failed to load page.'
  } finally {
    loading.value = false
  }
 }
 async function resolveImages(html: string): Promise<string> {
  const srcPattern = /src="(\/api\/v1\/figures\/[^"]+)"/g
  const matches = [...html.matchAll(srcPattern)]
  if (matches.length === 0) return html
  const unique = [...new Set(matches.map(m => m[1]))]
  const blobMap: Record<string, string> = {}
  await Promise.all(
    unique.map(async (src) => {
      try {
        const res = await api.get(src.replace(/^\/api\/v1/, ''), { responseType: 'blob' })
        const blobUrl = URL.createObjectURL(res.data)
        activeBlobUrls.push(blobUrl)
        blobMap[src] = blobUrl
      } catch {
        // leave original src
      }
    })
  )
  return html.replace(/src="(\/api\/v1\/figures\/[^"]+)"/g, (_, src) =>
    blobMap[src] ? `src="${blobMap[src]}"` : `src="${src}"`
  )
 }
 </script>
 <style scoped>
 .book-panel {
  display: flex;
  flex-direction: column;
  height: 100%;
  background: white;
  border-left: 1px solid #e2e8f0;
  border-radius: 0 10px 10px 0;
  overflow: hidden;
 }
 .book-panel-header {
  display: flex;
  align-items: center;
  gap: 0.5rem;
  padding: 0.6rem 0.75rem;
  background: #f7fafc;
  border-bottom: 1px solid #e2e8f0;
  flex-shrink: 0;
 }
 .book-panel-title {
  flex: 1;
  font-size: 0.8rem;
  font-weight: 600;
  color: #2b6cb0;
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
 }
 .book-panel-nav {
  display: flex;
  gap: 0.25rem;
 }
 .nav-btn {
  width: 1.75rem;
  height: 1.75rem;
  border: 1px solid #cbd5e0;
  border-radius: 5px;
  background: white;
  cursor: pointer;
  font-size: 0.85rem;
  display: flex;
  align-items: center;
  justify-content: center;
  transition: background 0.15s;
 }
 .nav-btn:hover:not(:disabled) { background: #ebf8ff; border-color: #3182ce; }
 .nav-btn:disabled { opacity: 0.4; cursor: not-allowed; }
 .close-btn {
  width: 1.75rem;
  height: 1.75rem;
  border: none;
  border-radius: 5px;
  background: none;
  cursor: pointer;
  font-size: 1rem;
  color: #718096;
  display: flex;
  align-items: center;
  justify-content: center;
  transition: background 0.15s, color 0.15s;
 }
 .close-btn:hover { background: #fed7d7; color: #742a2a; }
 .book-panel-body {
  flex: 1;
  overflow-y: auto;
  padding: 1rem 1.25rem;
 }
 .panel-loading {
  text-align: center;
  padding: 2rem;
  color: #718096;
  font-size: 0.875rem;
 }
 .panel-error {
  padding: 1rem;
  background: #fff5f5;
  border: 1px solid #fed7d7;
  color: #742a2a;
  border-radius: 6px;
  font-size: 0.875rem;
 }
 .markdown-body {
  font-size: 0.9rem;
  line-height: 1.75;
  color: #2d3748;
 }
 .markdown-body :deep(h1),
 .markdown-body :deep(h2),
 .markdown-body :deep(h3) {
  color: #1a365d;
  font-weight: 600;
  margin: 1.25rem 0 0.5rem;
 }
 .markdown-body :deep(h2) { font-size: 1.05rem; border-bottom: 1px solid #e2e8f0; padding-bottom: 0.3rem; }
 .markdown-body :deep(h3) { font-size: 0.95rem; }
 .markdown-body :deep(p) { margin: 0.6rem 0; }
 .markdown-body :deep(img) {
  max-width: 100%;
  border-radius: 6px;
  display: block;
  margin: 0.75rem auto;
  box-shadow: 0 1px 4px rgba(0,0,0,0.12);
 }
 .markdown-body :deep(ul),
 .markdown-body :deep(ol) { padding-left: 1.4rem; margin: 0.5rem 0; }
 .markdown-body :deep(code) {
  background: #f7fafc;
  border: 1px solid #e2e8f0;
  border-radius: 3px;
  padding: 0.1em 0.3em;
  font-size: 0.85em;
 }
 .markdown-body :deep(blockquote) {
  border-left: 3px solid #3182ce;
  padding-left: 0.75rem;
  color: #4a5568;
  margin: 0.5rem 0;
 }
 .markdown-body :deep(table) {
  width: 100%;
  border-collapse: collapse;
  font-size: 0.875em;
  margin: 0.75rem 0;
 }
 .markdown-body :deep(th),
 .markdown-body :deep(td) {
  border: 1px solid #e2e8f0;
  padding: 0.35rem 0.6rem;
  text-align: left;
 }
 .markdown-body :deep(th) { background: #f7fafc; font-weight: 600; }
 </style>
@@ -3,50 +3,17 @@
    <div class="message-bubble" :class="isUser ? 'bubble-user' : 'bubble-assistant'">
      <div class="message-role">{{ isUser ? 'You' : 'AI Teacher' }}</div>
      <div v-if="isUser" class="message-content">{{ message.content }}</div>
-      <div v-else class="message-content message-content--markdown" v-html="renderedContent"></div>
+      <div v-else class="message-content message-content--markdown" v-html="renderedWithBadges" @click="onContentClick"></div>
      <!-- Sources for assistant messages -->
      <div v-if="!isUser && message.sources && message.sources.length > 0" class="message-sources">
        <div class="sources-label">Sources:</div>
-        <div class="source-list">
+        <SourceList
-          <!-- TEXT sources -->
+          ref="sourceListEl"
-          <div
+          :sources="message.sources"
-            v-for="(source, idx) in textSources"
+          :active-ref="activeRef"
-            :key="'text-' + idx"
+          @open-source="(bookId: string, page: number) => emit('open-source', bookId, page)"
-            class="source-item"
+        />
          >
            <div class="source-chip source-chip--text">
              <span class="source-icon">📖</span>
              <span class="source-book-title">{{ source.bookTitle }}</span>
              <span v-if="source.page" class="source-page">p.&nbsp;{{ source.page }}</span>
            </div>
            <div v-if="source.chunkText" class="source-chunk">{{ source.chunkText }}</div>
          </div>
          <!-- FIGURE sources -->
          <div
            v-for="(source, idx) in figureSources"
            :key="'fig-' + idx"
            class="source-item source-item--figure"
          >
            <div class="source-chip source-chip--figure">
              <span class="source-icon">🖼️</span>
              <span class="source-figure-label">{{ source.label || 'Figure' }}</span>
              <span v-if="source.page" class="source-page">p.&nbsp;{{ source.page }}</span>
              <span v-if="source.figureType" class="source-figure-type">{{ formatFigureType(source.figureType) }}</span>
            </div>
            <div v-if="source.caption" class="source-caption">{{ source.caption }}</div>
            <div class="source-figure-image">
              <img
                :src="source.imageUrl"
                :alt="source.caption || source.label || 'Figure'"
                class="figure-img"
                loading="lazy"
                @error="onImageError"
              />
            </div>
          </div>
        </div>
      </div>
      <div class="message-timestamp">{{ formatTime(message.createdAt) }}</div>
@@ -55,44 +22,70 @@
 </template>
 <script setup lang="ts">
-import { computed } from 'vue'
+import { computed, ref } from 'vue'
 import { marked } from 'marked'
 import type { ChatMessage, ChatSource } from '@/stores/chatStore'
 import SourceList from '@/components/SourceList.vue'
 const props = defineProps<{
  message: ChatMessage
 }>()
 const emit = defineEmits<{
  'open-source': [bookId: string, page: number]
 }>()
 const isUser = computed(() => props.message.role === 'USER')
-const renderedContent = computed(() => marked.parse(props.message.content) as string)
+const activeRef = ref<string | null>(null)
 const sourceListEl = ref<InstanceType<typeof SourceList> | null>(null)
-const textSources = computed(() =>
+function escapeHtml(s: string): string {
-  (props.message.sources ?? []).filter((s: ChatSource) => s.type === 'TEXT' || !s.type)
+  return s.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/"/g, '&quot;')
 )
 const figureSources = computed(() =>
  (props.message.sources ?? []).filter((s: ChatSource) => s.type === 'FIGURE')
 )
 function formatFigureType(type: string): string {
  const labels: Record<string, string> = {
    ANATOMICAL_DIAGRAM: 'Anatomical Diagram',
    SURGICAL_PHOTOGRAPH: 'Surgical Photo',
    MRI_CT_SCAN: 'MRI / CT',
    TABLE: 'Table',
    CHART: 'Chart',
    INTRAOPERATIVE_IMAGE: 'Intraoperative'
  }
  return labels[type] ?? type
 }
-function onImageError(e: Event) {
+const renderedWithBadges = computed(() => {
-  const img = e.target as HTMLImageElement
+  const html = marked.parse(props.message.content) as string
-  img.alt = 'Image unavailable'
+
-  img.style.display = 'none'
+  const figureMap = new Map<string, ChatSource>()
-  const wrapper = img.parentElement
+  for (const src of (props.message.sources ?? [])) {
-  if (wrapper) {
+    if (src.type === 'FIGURE' && src.refLabel) {
-    wrapper.innerHTML = '<span class="figure-missing">Image unavailable</span>'
+      figureMap.set(src.refLabel, src)
    }
  }
  return html.replace(/\[(S|F)\d+\]/g, (match) => {
    const inner = match.slice(1, -1)
    const badge = `<span class="citation-badge" data-ref="${inner}" title="Jump to source ${inner}">${match}</span>`
    const fig = figureMap.get(inner)
    if (fig?.imageUrl) {
      const alt = escapeHtml(fig.caption || fig.label || 'Figure')
      const captionText = [fig.label, fig.caption].filter(Boolean).map(escapeHtml).join(' — ')
      const captionHtml = captionText
        ? `<figcaption class="inline-figure-caption">${captionText}</figcaption>`
        : ''
      return `${badge}<figure class="inline-figure"><img src="${fig.imageUrl}" alt="${alt}" class="inline-figure-img" loading="lazy" onerror="this.parentElement.style.display='none'" />${captionHtml}</figure>`
    }
    return badge
  })
 })
 function onContentClick(e: MouseEvent) {
  const target = e.target as HTMLElement
  if (!target.classList.contains('citation-badge')) return
  const label = target.getAttribute('data-ref')
  if (!label) return
  activeRef.value = activeRef.value === label ? null : label
  const sourceEl = sourceListEl.value?.$el?.querySelector(`[data-ref-label="${label}"]`) as HTMLElement | null
  sourceEl?.scrollIntoView({ behavior: 'smooth', block: 'start' })
  const source = (props.message.sources ?? []).find((s: ChatSource) => s.refLabel === label)
  if (source?.bookId && source.page) {
    emit('open-source', source.bookId, source.page)
  }
 }
@@ -255,6 +248,22 @@ function formatTime(iso: string): string {
  border: 1px solid #bee3f8;
 }
 .source-chip--clickable {
  cursor: pointer;
  transition: background 0.15s, border-color 0.15s;
 }
 .source-chip--clickable:hover {
  background: #bee3f8;
  border-color: #90cdf4;
 }
 .source-open-hint {
  font-size: 0.75rem;
  color: #3182ce;
  margin-left: 0.1rem;
 }
 .source-chip--figure {
  background: #f0fff4;
  border: 1px solid #9ae6b4;
@@ -322,6 +331,67 @@ function formatTime(iso: string): string {
  font-style: italic;
 }
 .message-content--markdown :deep(.citation-badge) {
  display: inline-block;
  background: #ebf8ff;
  border: 1px solid #90cdf4;
  border-radius: 3px;
  padding: 0 0.3em;
  font-size: 0.78em;
  font-weight: 600;
  color: #2b6cb0;
  cursor: pointer;
  user-select: none;
  transition: background 0.15s;
 }
 .message-content--markdown :deep(.citation-badge:hover) {
  background: #bee3f8;
 }
 .source-item--active {
  outline: 2px solid #4299e1;
  border-radius: 6px;
 }
 .source-ref-label {
  font-size: 0.72rem;
  font-weight: 700;
  background: #bee3f8;
  color: #2b6cb0;
  border-radius: 3px;
  padding: 0 0.3rem;
 }
 .source-ref-label--figure {
  background: #9ae6b4;
  color: #276749;
 }
 .message-content--markdown :deep(.inline-figure) {
  display: block;
  margin: 0.75rem 0;
  text-align: center;
 }
 .message-content--markdown :deep(.inline-figure-img) {
  max-width: 100%;
  max-height: 400px;
  border-radius: 6px;
  border: 1px solid #e2e8f0;
  object-fit: contain;
  display: block;
  margin: 0 auto;
 }
 .message-content--markdown :deep(.inline-figure-caption) {
  font-size: 0.78rem;
  color: #718096;
  font-style: italic;
  margin-top: 0.3rem;
  text-align: center;
 }
 .message-timestamp {
  font-size: 0.7rem;
  opacity: 0.6;
@@ -0,0 +1,298 @@
 <template>
  <div class="source-list">
    <!-- TEXT sources -->
    <div
      v-for="(source, idx) in textSources"
      :key="'text-' + idx"
      class="source-item"
      :class="{ 'source-item--active': activeRef === source.refLabel }"
      :data-ref-label="source.refLabel"
    >
      <div class="source-chip-wrapper">
        <div
          class="source-chip source-chip--text"
          :class="{ 'source-chip--clickable': source.bookId && source.page }"
          @click="source.bookId && source.page ? emit('open-source', source.bookId, source.page) : undefined"
        >
          <span class="source-icon">📖</span>
          <span v-if="source.refLabel" class="source-ref-label">{{ source.refLabel }}</span>
          <span class="source-book-title">{{ source.bookTitle }}</span>
          <span v-if="source.page" class="source-page">p.&nbsp;{{ source.page }}</span>
          <span v-if="source.bookId && source.page" class="source-open-hint">↗</span>
        </div>
        <div v-if="source.chunkText" class="tooltip tooltip--text">
          <p class="tooltip-chunk">{{ source.chunkText }}</p>
        </div>
      </div>
    </div>
    <!-- FIGURE sources -->
    <div
      v-for="(source, idx) in figureSources"
      :key="'fig-' + idx"
      class="source-item source-item--figure"
      :class="{ 'source-item--active': activeRef === source.refLabel }"
      :data-ref-label="source.refLabel"
    >
      <div class="source-chip-wrapper">
        <div
          class="source-chip source-chip--figure"
          :class="{ 'source-chip--clickable': source.bookId && source.page }"
          @click="source.bookId && source.page ? emit('open-source', source.bookId, source.page) : undefined"
        >
          <span class="source-icon">🖼️</span>
          <span v-if="source.refLabel" class="source-ref-label source-ref-label--figure">{{ source.refLabel }}</span>
          <span class="source-figure-label">{{ source.label || 'Figure' }}</span>
          <span v-if="source.page" class="source-page">p.&nbsp;{{ source.page }}</span>
          <span v-if="source.figureType" class="source-figure-type">{{ formatFigureType(source.figureType) }}</span>
          <span v-if="source.bookId && source.page" class="source-open-hint">↗</span>
        </div>
        <div v-if="source.imageUrl || source.caption" class="tooltip tooltip--figure">
          <img
            v-if="source.imageUrl"
            :src="source.imageUrl"
            :alt="source.caption || source.label || 'Figure'"
            class="tooltip-figure-img"
            loading="lazy"
            @error="onImageError"
          />
          <p v-if="source.caption" class="tooltip-caption">{{ source.caption }}</p>
        </div>
      </div>
    </div>
  </div>
 </template>
 <script setup lang="ts">
 import { computed } from 'vue'
 export interface SourceItem {
  type?: 'TEXT' | 'FIGURE'
  refLabel?: string
  bookId?: string | null
  bookTitle: string
  page?: number | null
  chunkText?: string
  figureId?: string
  label?: string
  caption?: string
  figureType?: string
  imageUrl?: string
 }
 const props = defineProps<{
  sources: SourceItem[]
  activeRef?: string | null
 }>()
 const emit = defineEmits<{
  'open-source': [bookId: string, page: number]
 }>()
 const textSources = computed(() =>
  props.sources.filter(s => s.type === 'TEXT' || !s.type)
 )
 const figureSources = computed(() =>
  props.sources.filter(s => s.type === 'FIGURE')
 )
 function formatFigureType(type: string): string {
  const labels: Record<string, string> = {
    ANATOMICAL_DIAGRAM: 'Anatomical Diagram',
    SURGICAL_PHOTOGRAPH: 'Surgical Photo',
    MRI_CT_SCAN: 'MRI / CT',
    TABLE: 'Table',
    CHART: 'Chart',
    INTRAOPERATIVE_IMAGE: 'Intraoperative'
  }
  return labels[type] ?? type
 }
 function onImageError(e: Event) {
  const img = e.target as HTMLImageElement
  img.style.display = 'none'
  const wrapper = img.parentElement
  if (wrapper) {
    const missing = document.createElement('span')
    missing.className = 'figure-missing'
    missing.textContent = 'Image unavailable'
    wrapper.appendChild(missing)
  }
 }
 </script>
 <style scoped>
 .source-list {
  display: flex;
  flex-direction: column;
  gap: 0.5rem;
 }
 .source-item {
  display: flex;
  flex-direction: column;
  gap: 0.25rem;
 }
 .source-item--active {
  outline: 2px solid #4299e1;
  border-radius: 6px;
 }
 /* Wrapper provides the positioning context for the tooltip */
 .source-chip-wrapper {
  position: relative;
  display: inline-block;
 }
 /* ── Chip base ── */
 .source-chip {
  display: inline-flex;
  align-items: center;
  gap: 0.25rem;
  border-radius: 4px;
  padding: 0.2rem 0.5rem;
  font-size: 0.78rem;
 }
 .source-chip--text {
  background: #ebf8ff;
  border: 1px solid #bee3f8;
 }
 .source-chip--figure {
  background: #f0fff4;
  border: 1px solid #9ae6b4;
 }
 .source-chip--clickable {
  cursor: pointer;
  transition: background 0.15s, border-color 0.15s;
 }
 .source-chip--clickable:hover {
  background: #bee3f8;
  border-color: #90cdf4;
 }
 .source-chip--figure.source-chip--clickable:hover {
  background: #c6f6d5;
  border-color: #68d391;
 }
 /* ── Tooltip ── */
 .tooltip {
  display: none;
  position: absolute;
  left: 0;
  top: calc(100% + 6px);
  z-index: 100;
  background: #1a202c;
  border-radius: 6px;
  padding: 0.6rem 0.75rem;
  box-shadow: 0 4px 16px rgba(0, 0, 0, 0.2);
  /* Keep it from overflowing too far */
  max-width: min(340px, 80vw);
  pointer-events: none;
 }
 /* Show on chip hover */
 .source-chip-wrapper:hover .tooltip {
  display: block;
 }
 /* Small arrow pointing up */
 .tooltip::before {
  content: '';
  position: absolute;
  top: -5px;
  left: 14px;
  border-left: 5px solid transparent;
  border-right: 5px solid transparent;
  border-bottom: 5px solid #1a202c;
 }
 .tooltip--text .tooltip-chunk {
  margin: 0;
  font-size: 0.78rem;
  color: #e2e8f0;
  line-height: 1.5;
  white-space: pre-wrap;
  word-break: break-word;
 }
 .tooltip--figure {
  max-width: min(300px, 80vw);
 }
 .tooltip-figure-img {
  display: block;
  max-width: 100%;
  max-height: 220px;
  border-radius: 4px;
  object-fit: contain;
  margin-bottom: 0.4rem;
 }
 .tooltip-caption {
  margin: 0;
  font-size: 0.75rem;
  color: #cbd5e0;
  font-style: italic;
  line-height: 1.4;
 }
 /* ── Chip internals ── */
 .source-icon {
  font-size: 0.8rem;
 }
 .source-ref-label {
  font-size: 0.72rem;
  font-weight: 700;
  background: #bee3f8;
  color: #2b6cb0;
  border-radius: 3px;
  padding: 0 0.3rem;
 }
 .source-ref-label--figure {
  background: #9ae6b4;
  color: #276749;
 }
 .source-book-title {
  color: #2b6cb0;
  font-weight: 500;
 }
 .source-figure-label {
  color: #276749;
  font-weight: 600;
 }
 .source-figure-type {
  color: #718096;
  font-size: 0.72rem;
  background: #e2e8f0;
  border-radius: 3px;
  padding: 0 0.3rem;
 }
 .source-page {
  color: #718096;
 }
 .source-open-hint {
  font-size: 0.75rem;
  color: #3182ce;
  margin-left: 0.1rem;
 }
 .figure-missing {
  font-size: 0.78rem;
  color: #a0aec0;
  font-style: italic;
 }
 </style>
@@ -3,6 +3,8 @@
 interface ImportMetaEnv {
  readonly VITE_API_URL: string
  readonly VITE_APP_PASSWORD: string
  readonly VITE_UPLOAD_ENABLED: string
  readonly VITE_DELETE_ENABLED: string
 }
 interface ImportMeta {
@@ -0,0 +1,10 @@
 /**
 * Read a VITE_ env variable.
 * At runtime in Docker, values come from window.__env__ (injected by docker-entrypoint.sh).
 * At build time (dev / CI), values come from import.meta.env.
 */
 export function env(key: string): string | undefined {
  const runtime = (window as Record<string, any>).__env__?.[key]
  if (runtime) return runtime
  return (import.meta as any).env?.[key]
 }
@@ -4,6 +4,21 @@ import App from './App.vue'
 import router from './router'
 const app = createApp(App)
-app.use(createPinia())
+const pinia = createPinia()
 app.use(pinia)
 app.use(router)
 // Verify any session restored from sessionStorage is still valid.
 // If the backend rejects the credentials (e.g. password changed), clear them
 // before the router guard fires so the user lands on /login cleanly.
 import { useAuthStore } from '@/stores/authStore'
 import { api } from '@/services/api'
 const auth = useAuthStore()
 if (auth.isAuthenticated) {
  api.get('/auth/check').catch(() => {
    auth.clearCredentials()
  })
 }
 app.mount('#app')
@@ -1,11 +1,19 @@
 import { createRouter, createWebHistory } from 'vue-router'
 import { useAuthStore } from '@/stores/authStore'
 import LoginView from '@/views/LoginView.vue'
 import UploadView from '@/views/UploadView.vue'
 import TopicsView from '@/views/TopicsView.vue'
 import ChatView from '@/views/ChatView.vue'
 import BookReaderView from '@/views/BookReaderView.vue'
 const router = createRouter({
  history: createWebHistory(import.meta.env.BASE_URL),
  routes: [
    {
      path: '/login',
      name: 'login',
      component: LoginView
    },
    {
      path: '/',
      name: 'upload',
@@ -20,8 +28,20 @@ const router = createRouter({
      path: '/chat',
      name: 'chat',
      component: ChatView
    },
    {
      path: '/books/:id/read',
      name: 'book-reader',
      component: BookReaderView
    }
  ]
 })
 router.beforeEach((to) => {
  const auth = useAuthStore()
  if (to.name !== 'login' && !auth.isAuthenticated) {
    return { name: 'login' }
  }
 })
 export default router
@@ -1,20 +1,30 @@
 import axios from 'axios'
 import { useAuthStore } from '@/stores/authStore'
 import { env } from '@/env'
 export const api = axios.create({
-  baseURL: import.meta.env.VITE_API_URL ?? '/api/v1',
+  baseURL: env('VITE_API_URL') ?? '/api/v1',
  auth: {
    username: 'neurosurgeon',
    password: import.meta.env.VITE_APP_PASSWORD ?? 'changeme'
  },
  headers: {
    'Content-Type': 'application/json'
  }
 })
-// Response interceptor for error normalisation
+api.interceptors.request.use((config) => {
  const auth = useAuthStore()
  if (auth.username && auth.password) {
    config.auth = { username: auth.username, password: auth.password }
  }
  return config
 })
 api.interceptors.response.use(
  (response) => response,
  (error) => {
    if (error.response?.status === 401) {
      useAuthStore().clearCredentials()
      window.location.href = '/login'
      return Promise.reject(new Error('Session expired. Please sign in again.'))
    }
    const message =
      error.response?.data?.error ??
      error.message ??
@@ -0,0 +1,28 @@
 import { defineStore } from 'pinia'
 import { ref, computed } from 'vue'
 const SESSION_KEY = 'auth'
 export const useAuthStore = defineStore('auth', () => {
  const stored = sessionStorage.getItem(SESSION_KEY)
  const parsed = stored ? (JSON.parse(stored) as { username: string; password: string }) : null
  const username = ref<string | null>(parsed?.username ?? null)
  const password = ref<string | null>(parsed?.password ?? null)
  const isAuthenticated = computed(() => !!username.value && !!password.value)
  function setCredentials(u: string, p: string) {
    username.value = u
    password.value = p
    sessionStorage.setItem(SESSION_KEY, JSON.stringify({ username: u, password: p }))
  }
  function clearCredentials() {
    username.value = null
    password.value = null
    sessionStorage.removeItem(SESSION_KEY)
  }
  return { username, password, isAuthenticated, setCredentials, clearCredentials }
 })
@@ -77,5 +77,42 @@ export const useBookStore = defineStore('books', () => {
    }
  }
-  return { books, loading, uploading, error, fetchBooks, uploadBook, refreshBook, deleteBook }
+  async function startEnrichment(id: string): Promise<EnrichmentProgress | null> {
    try {
      const response = await api.post<EnrichmentProgress>(`/admin/books/${id}/enrich`)
      return response.data
    } catch (err: any) {
      error.value = err.message
      return null
    }
  }
  async function fetchEnrichmentStatus(id: string): Promise<EnrichmentProgress | null> {
    try {
      const response = await api.get<EnrichmentProgress>(`/admin/books/${id}/enrich`)
      return response.data
    } catch {
      return null
    }
  }
  return {
    books,
    loading,
    uploading,
    error,
    fetchBooks,
    uploadBook,
    refreshBook,
    deleteBook,
    startEnrichment,
    fetchEnrichmentStatus
  }
 })
 export interface EnrichmentProgress {
  status: 'IDLE' | 'RUNNING' | 'COMPLETED'
  chunksTotal: number
  chunksEnriched: number
  errorMessage: string | null
 }
@@ -4,8 +4,10 @@ import { api } from '@/services/api'
 export interface ChatSource {
  type: 'TEXT' | 'FIGURE'
  bookId?: string
  bookTitle: string
  page: number | null
  refLabel?: string
  // TEXT-specific
  chunkText?: string
  // FIGURE-specific
@@ -10,11 +10,22 @@ export interface Topic {
 }
 export interface SourceReference {
  type?: 'TEXT' | 'FIGURE'
  refLabel?: string
  bookId: string | null
  bookTitle: string
  page: number | null
  chunkText?: string
  figureId?: string
  label?: string
  caption?: string
  figureType?: string
  imageUrl?: string
 }
 export interface TopicSummary {
  id: string
  summaryNumber: number
  topicId: string
  topicName: string
  summary: string
@@ -22,14 +33,50 @@ export interface TopicSummary {
  generatedAt: string
 }
 export interface SavedSummaryItem {
  id: string
  summaryNumber: number
  generatedAt: string
 }
 export interface FacetSection {
  facetKey: string
  title: string
  markdown: string
  refLabels: string[]
 }
 export interface ConceptReport {
  id: string
  reportNumber: number
  topicId: string
  topicName: string
  facets: FacetSection[]
  sources: SourceReference[]
  generatedAt: string
 }
 export interface SavedConceptReportItem {
  id: string
  reportNumber: number
  generatedAt: string
 }
 export const useTopicStore = defineStore('topics', () => {
  const topics = ref<Topic[]>([])
  const activeSummary = ref<TopicSummary | null>(null)
  const activeSummaryTopicId = ref<string | null>(null)
  const summaryList = ref<SavedSummaryItem[]>([])
  const loading = ref(false)
  const summaryLoading = ref(false)
  const summaryListLoading = ref(false)
  const error = ref<string | null>(null)
  const activeConceptReport = ref<ConceptReport | null>(null)
  const conceptReportList = ref<SavedConceptReportItem[]>([])
  const conceptReportLoading = ref(false)
  const conceptReportListLoading = ref(false)
  async function fetchTopics() {
    loading.value = true
    error.value = null
@@ -43,13 +90,47 @@ export const useTopicStore = defineStore('topics', () => {
    }
  }
-  async function generateSummary(topicId: string): Promise<TopicSummary | null> {
+  async function fetchSummaries(topicId: string) {
    summaryListLoading.value = true
    summaryList.value = []
    error.value = null
    try {
      const response = await api.get<SavedSummaryItem[]>(`/topics/${topicId}/summaries`)
      summaryList.value = response.data
    } catch (err: any) {
      error.value = err.message
    } finally {
      summaryListLoading.value = false
    }
  }
  async function fetchSummaryDetail(topicId: string, summaryId: string): Promise<TopicSummary | null> {
    summaryLoading.value = true
    activeSummary.value = null
    error.value = null
    try {
      const response = await api.get<TopicSummary>(`/topics/${topicId}/summaries/${summaryId}`)
      activeSummary.value = response.data
      return response.data
    } catch (err: any) {
      error.value = err.message
      return null
    } finally {
      summaryLoading.value = false
    }
  }
  async function generateSummary(topicId: string, language: 'en' | 'th' = 'en'): Promise<TopicSummary | null> {
    summaryLoading.value = true
    activeSummaryTopicId.value = topicId
    activeSummary.value = null
    error.value = null
    try {
-      const response = await api.post<TopicSummary>(`/topics/${topicId}/summary`)
+      const response = await api.post<TopicSummary>(
        `/topics/${topicId}/summary`,
        null,
        { params: { language } }
      )
      activeSummary.value = response.data
      return response.data
    } catch (err: any) {
@@ -61,14 +142,75 @@ export const useTopicStore = defineStore('topics', () => {
    }
  }
  async function fetchConceptReports(topicId: string) {
    conceptReportListLoading.value = true
    conceptReportList.value = []
    error.value = null
    try {
      const response = await api.get<SavedConceptReportItem[]>(`/topics/${topicId}/concept-reports`)
      conceptReportList.value = response.data
    } catch (err: any) {
      error.value = err.message
    } finally {
      conceptReportListLoading.value = false
    }
  }
  async function fetchConceptReportDetail(topicId: string, reportId: string): Promise<ConceptReport | null> {
    conceptReportLoading.value = true
    activeConceptReport.value = null
    error.value = null
    try {
      const response = await api.get<ConceptReport>(`/topics/${topicId}/concept-reports/${reportId}`)
      activeConceptReport.value = response.data
      return response.data
    } catch (err: any) {
      error.value = err.message
      return null
    } finally {
      conceptReportLoading.value = false
    }
  }
  async function generateConceptReport(topicId: string, language: 'en' | 'th' = 'en'): Promise<ConceptReport | null> {
    conceptReportLoading.value = true
    activeConceptReport.value = null
    error.value = null
    try {
      const response = await api.post<ConceptReport>(
        `/topics/${topicId}/concept-reports`,
        null,
        { params: { language } }
      )
      activeConceptReport.value = response.data
      return response.data
    } catch (err: any) {
      error.value = err.message
      return null
    } finally {
      conceptReportLoading.value = false
    }
  }
  return {
    topics,
    activeSummary,
    activeSummaryTopicId,
    summaryList,
    loading,
    summaryLoading,
    summaryListLoading,
    error,
    activeConceptReport,
    conceptReportList,
    conceptReportLoading,
    conceptReportListLoading,
    fetchTopics,
-    generateSummary
+    fetchSummaries,
    fetchSummaryDetail,
    generateSummary,
    fetchConceptReports,
    fetchConceptReportDetail,
    generateConceptReport
  }
 })
@@ -0,0 +1,335 @@
 <template>
  <div class="reader-view">
    <!-- Header -->
    <div class="reader-header">
      <router-link to="/" class="back-link">← Library</router-link>
      <div class="reader-title">
        <h1 class="book-title">{{ book?.title ?? 'Loading…' }}</h1>
      </div>
      <div class="page-nav">
        <button class="nav-btn" :disabled="currentPage <= 1" @click="goTo(currentPage - 1)">&#8592;</button>
        <form class="page-jump" @submit.prevent="onJump">
          <input
            v-model.number="jumpInput"
            type="number"
            :min="1"
            :max="book?.pageCount ?? 1"
            class="page-input"
          />
          <span class="page-sep">/ {{ book?.pageCount ?? '…' }}</span>
        </form>
        <button class="nav-btn" :disabled="!book || currentPage >= book.pageCount!" @click="goTo(currentPage + 1)">&#8594;</button>
      </div>
    </div>
    <!-- Content -->
    <div class="reader-body">
      <div v-if="loading" class="reader-loading">
        <div class="spinner spinner-dark" style="width:28px;height:28px;margin:0 auto 0.75rem;"></div>
        <p>Loading page {{ currentPage }}…</p>
      </div>
      <div v-else-if="error" class="reader-error card">
        <strong>Could not load page {{ currentPage }}</strong><br />
        {{ error }}
      </div>
      <div v-else class="reader-content card">
        <div class="markdown-body" v-html="renderedHtml"></div>
      </div>
    </div>
  </div>
 </template>
 <script setup lang="ts">
 import { ref, watch, onMounted } from 'vue'
 import { useRoute } from 'vue-router'
 import { api } from '@/services/api'
 import { useBookStore } from '@/stores/bookStore'
 import type { Book } from '@/stores/bookStore'
 const route = useRoute()
 const bookStore = useBookStore()
 const bookId = route.params.id as string
 const book = ref<Book | null>(null)
 const currentPage = ref(1)
 const jumpInput = ref(1)
 const loading = ref(false)
 const error = ref<string | null>(null)
 const renderedHtml = ref('')
 // Blob URLs created this session — revoked on next page load
 let activeBlobUrls: string[] = []
 onMounted(async () => {
  book.value = bookStore.books.find(b => b.id === bookId) ?? null
  if (!book.value) {
    try {
      const res = await api.get<Book>(`/books/${bookId}`)
      book.value = res.data
    } catch {
      error.value = 'Book not found.'
      return
    }
  }
  await loadPage(1)
 })
 watch(currentPage, (page) => {
  jumpInput.value = page
  loadPage(page)
 })
 async function goTo(page: number) {
  if (!book.value) return
  const clamped = Math.max(1, Math.min(page, book.value.pageCount ?? 1))
  if (clamped !== currentPage.value) {
    currentPage.value = clamped
  }
 }
 function onJump() {
  goTo(jumpInput.value)
 }
 async function loadPage(page: number) {
  loading.value = true
  error.value = null
  renderedHtml.value = ''
  // Revoke previous blob URLs to free memory
  activeBlobUrls.forEach(u => URL.revokeObjectURL(u))
  activeBlobUrls = []
  try {
    const res = await api.get<string>(`/books/${bookId}/pages/${page}/html`, {
      headers: { Accept: 'text/html' },
      responseType: 'text'
    })
    let html = await resolveImages(res.data)
    renderedHtml.value = html
  } catch (e: any) {
    error.value = e.message ?? 'Failed to load page.'
  } finally {
    loading.value = false
  }
 }
 /**
 * Finds <img src="/api/v1/figures/..."> in the HTML, fetches each image
 * through the authenticated axios instance, and replaces the src with a
 * temporary blob URL so the browser can render it without re-authenticating.
 */
 async function resolveImages(html: string): Promise<string> {
  const srcPattern = /src="(\/api\/v1\/figures\/[^"]+)"/g
  const matches = [...html.matchAll(srcPattern)]
  if (matches.length === 0) return html
  const unique = [...new Set(matches.map(m => m[1]))]
  const blobMap: Record<string, string> = {}
  await Promise.all(
    unique.map(async (src) => {
      try {
        const res = await api.get(src.replace(/^\/api\/v1/, ''), { responseType: 'blob' })
        const blobUrl = URL.createObjectURL(res.data)
        activeBlobUrls.push(blobUrl)
        blobMap[src] = blobUrl
      } catch {
        // leave original src — browser will attempt (and likely fail silently)
      }
    })
  )
  return html.replace(/src="(\/api\/v1\/figures\/[^"]+)"/g, (_, src) =>
    blobMap[src] ? `src="${blobMap[src]}"` : `src="${src}"`
  )
 }
 </script>
 <style scoped>
 .reader-view {
  display: flex;
  flex-direction: column;
  gap: 1rem;
  max-width: 860px;
  margin: 0 auto;
  flex: 1;
  min-height: 0;
 }
 .reader-header {
  display: flex;
  align-items: center;
  gap: 1rem;
  flex-wrap: wrap;
 }
 .back-link {
  color: #3182ce;
  text-decoration: none;
  font-size: 0.9rem;
  white-space: nowrap;
 }
 .back-link:hover { text-decoration: underline; }
 .reader-title {
  flex: 1;
  min-width: 0;
 }
 .book-title {
  font-size: 1.1rem;
  font-weight: 600;
  color: #1a365d;
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
 }
 .page-nav {
  display: flex;
  align-items: center;
  gap: 0.5rem;
 }
 .nav-btn {
  width: 2rem;
  height: 2rem;
  border: 1px solid #cbd5e0;
  border-radius: 6px;
  background: #fff;
  cursor: pointer;
  font-size: 1rem;
  display: flex;
  align-items: center;
  justify-content: center;
  transition: background 0.15s;
 }
 .nav-btn:hover:not(:disabled) { background: #ebf8ff; border-color: #3182ce; }
 .nav-btn:disabled { opacity: 0.4; cursor: not-allowed; }
 .page-jump {
  display: flex;
  align-items: center;
  gap: 0.35rem;
 }
 .page-input {
  width: 3.5rem;
  text-align: center;
  border: 1px solid #cbd5e0;
  border-radius: 6px;
  padding: 0.25rem 0.4rem;
  font-size: 0.9rem;
  color: #2d3748;
 }
 .page-input:focus { outline: none; border-color: #3182ce; }
 .page-sep {
  font-size: 0.85rem;
  color: #718096;
  white-space: nowrap;
 }
 .reader-body {
  flex: 1;
  min-height: 0;
  display: flex;
  flex-direction: column;
 }
 .reader-loading {
  text-align: center;
  padding: 3rem;
  color: #718096;
 }
 .reader-error {
  padding: 1.25rem;
  background: #fff5f5;
  border: 1px solid #fed7d7;
  color: #742a2a;
  border-radius: 8px;
 }
 .reader-content {
  flex: 1;
  min-height: 0;
  overflow-y: auto;
  padding: 2rem;
 }
 /* Markdown rendering */
 .markdown-body {
  font-size: 0.95rem;
  line-height: 1.75;
  color: #2d3748;
 }
 .markdown-body :deep(h1),
 .markdown-body :deep(h2),
 .markdown-body :deep(h3) {
  color: #1a365d;
  font-weight: 600;
  margin: 1.5rem 0 0.75rem;
 }
 .markdown-body :deep(h2) { font-size: 1.15rem; border-bottom: 1px solid #e2e8f0; padding-bottom: 0.4rem; }
 .markdown-body :deep(h3) { font-size: 1rem; }
 .markdown-body :deep(p) { margin: 0.75rem 0; }
 .markdown-body :deep(img) {
  max-width: 100%;
  border-radius: 6px;
  display: block;
  margin: 1rem auto;
  box-shadow: 0 1px 4px rgba(0,0,0,0.12);
 }
 .markdown-body :deep(ul),
 .markdown-body :deep(ol) {
  padding-left: 1.5rem;
  margin: 0.75rem 0;
 }
 .markdown-body :deep(code) {
  background: #f7fafc;
  border: 1px solid #e2e8f0;
  border-radius: 3px;
  padding: 0.1em 0.35em;
  font-size: 0.88em;
 }
 .markdown-body :deep(blockquote) {
  border-left: 3px solid #3182ce;
  padding-left: 1rem;
  color: #4a5568;
  margin: 0.75rem 0;
 }
 .markdown-body :deep(table) {
  width: 100%;
  border-collapse: collapse;
  font-size: 0.9em;
  margin: 1rem 0;
 }
 .markdown-body :deep(th),
 .markdown-body :deep(td) {
  border: 1px solid #e2e8f0;
  padding: 0.4rem 0.75rem;
  text-align: left;
 }
 .markdown-body :deep(th) { background: #f7fafc; font-weight: 600; }
@media (max-width: 768px) {
  .reader-view {
    max-width: 100%;
  }
  .reader-content {
    padding: 1rem;
  }
 }
 </style>
@@ -3,27 +3,10 @@
    <h1 class="page-title">Knowledge Chat</h1>
    <p class="page-subtitle">Ask questions grounded in your uploaded medical textbooks.</p>
-    <!-- Step 1: Topic Selection -->
+    <!-- Session selection -->
-    <div v-if="!chatStore.session && !selectedTopic" class="topic-selection">
+    <div v-if="!chatStore.session" class="session-setup card">
      <h2 class="section-title">Select a Topic</h2>
      <div class="topic-grid">
        <button
          v-for="topic in topicStore.topics"
          :key="topic.id"
          :class="['topic-tile', { 'topic-tile-freeform': topic.id === 'free-form' }]"
          @click="handleTopicSelect(topic)"
        >
          <span class="topic-tile-name">{{ topic.name }}</span>
          <span v-if="topic.id === 'free-form'" class="topic-tile-hint">Any neurosurgery question</span>
        </button>
      </div>
    </div>
    <!-- Step 2: Topic selected — previous sessions + new chat -->
    <div v-else-if="!chatStore.session && selectedTopic" class="session-setup card">
      <div class="setup-header">
-        <button class="btn-back" @click="handleBack">← Topics</button>
+        <h2 class="section-title">Free-form Chat</h2>
        <h2 class="section-title">{{ selectedTopic.name }}</h2>
      </div>
      <div v-if="chatStore.error" class="error-banner">{{ chatStore.error }}</div>
@@ -71,56 +54,74 @@
        </div>
      </div>
-      <!-- Messages Area -->
+      <!-- Chat + Reader split -->
-      <div class="messages-container" ref="messagesContainer">
+      <div class="chat-reader-split">
-        <div v-if="chatStore.loading && chatStore.messages.length === 0" class="empty-state">
+        <!-- Messages + Input -->
-          <div class="spinner spinner-dark" style="width:32px;height:32px;margin:0 auto 1rem;"></div>
+        <div class="chat-column">
-          <p class="empty-state-text">Loading messages...</p>
+          <!-- Messages Area -->
-        </div>
+          <div class="messages-container" ref="messagesContainer">
            <div v-if="chatStore.loading && chatStore.messages.length === 0" class="empty-state">
              <div class="spinner spinner-dark" style="width:32px;height:32px;margin:0 auto 1rem;"></div>
              <p class="empty-state-text">Loading messages...</p>
            </div>
-        <div v-else-if="chatStore.messages.length === 0" class="empty-state">
+            <div v-else-if="chatStore.messages.length === 0" class="empty-state">
-          <div class="empty-state-icon">💬</div>
+              <div class="empty-state-icon">💬</div>
-          <p class="empty-state-text">No messages yet</p>
+              <p class="empty-state-text">No messages yet</p>
-          <p class="empty-state-hint">Ask a question about the uploaded books below.</p>
+              <p class="empty-state-hint">Ask a question about the uploaded books below.</p>
-        </div>
+            </div>
-        <div v-else class="messages-list">
+            <div v-else class="messages-list">
-          <ChatMessage
+              <ChatMessage
-            v-for="message in chatStore.messages"
+                v-for="message in chatStore.messages"
-            :key="message.id"
+                :key="message.id"
-            :message="message"
+                :message="message"
-          />
+                @open-source="handleOpenSource"
-          <div v-if="chatStore.sending" class="typing-indicator">
+              />
-            <div class="typing-bubble">
+              <div v-if="chatStore.sending" class="typing-indicator">
-              <span></span><span></span><span></span>
+                <div class="typing-bubble">
                  <span></span><span></span><span></span>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
-      <!-- Input Area -->
+          <!-- Input Area -->
-      <div class="input-area card">
+          <div class="input-area card">
-        <div v-if="chatStore.error" class="error-banner">{{ chatStore.error }}</div>
+            <div v-if="chatStore.error" class="error-banner">{{ chatStore.error }}</div>
-        <div class="input-row">
+            <div class="input-row">
-          <textarea
+              <textarea
-            v-model="inputText"
+                v-model="inputText"
-            class="message-input"
+                class="message-input"
-            placeholder="Ask a question about your uploaded books..."
+                placeholder="Ask a question about your uploaded books..."
-            rows="2"
+                rows="2"
-            :disabled="chatStore.sending"
+                :disabled="chatStore.sending"
-            @keydown.enter.exact.prevent="handleSend"
+                @keydown.enter.exact.prevent="handleSend"
-            @keydown.enter.shift.exact="inputText += '\n'"
+                @keydown.enter.shift.exact="inputText += '\n'"
-          ></textarea>
+              ></textarea>
-          <button
+              <button
-            class="btn btn-primary send-btn"
+                class="btn btn-primary send-btn"
-            :disabled="!inputText.trim() || chatStore.sending"
+                :disabled="!inputText.trim() || chatStore.sending"
-            @click="handleSend"
+                @click="handleSend"
-          >
+              >
-            <span v-if="chatStore.sending" class="spinner"></span>
+                <span v-if="chatStore.sending" class="spinner"></span>
-            <span v-else>Send</span>
+                <span v-else>Send</span>
-          </button>
+              </button>
            </div>
            <p class="input-hint">Press Enter to send, Shift+Enter for new line.</p>
          </div>
        </div>
-        <p class="input-hint">Press Enter to send, Shift+Enter for new line.</p>
+
        <!-- Inline book reader panel -->
        <BookPagePanel
          v-if="readerPanel"
          :book-id="readerPanel.bookId"
          :page="readerPanel.page"
          :book-title="readerPanel.bookTitle"
          class="reader-panel"
          @close="readerPanel = null"
          @navigate="(p) => readerPanel && (readerPanel.page = p)"
        />
      </div>
    </div>
  </div>
@@ -130,8 +131,10 @@
 import { ref, nextTick, onMounted, watch, inject } from 'vue'
 import { useChatStore } from '@/stores/chatStore'
 import { useTopicStore } from '@/stores/topicStore'
 import { useBookStore } from '@/stores/bookStore'
 import type { ChatSession } from '@/stores/chatStore'
 import ChatMessage from '@/components/ChatMessage.vue'
 import BookPagePanel from '@/components/BookPagePanel.vue'
 interface Topic {
  id: string
@@ -142,6 +145,7 @@ interface Topic {
 const chatStore = useChatStore()
 const topicStore = useTopicStore()
 const bookStore = useBookStore()
 const showToast = inject<(msg: string, type?: 'error' | 'success') => void>('showToast')
 const selectedTopic = ref<Topic | null>(null)
@@ -150,10 +154,22 @@ const loadingTopicSessions = ref(false)
 const inputText = ref('')
 const messagesContainer = ref<HTMLElement | null>(null)
 interface ReaderPanel { bookId: string; page: number; bookTitle?: string }
 const readerPanel = ref<ReaderPanel | null>(null)
 function handleOpenSource(bookId: string, page: number) {
  const book = bookStore.books.find(b => b.id === bookId)
  readerPanel.value = { bookId, page, bookTitle: book?.title }
 }
 onMounted(async () => {
  if (topicStore.topics.length === 0) {
    await topicStore.fetchTopics()
  }
  const freeForm = topicStore.topics.find((t) => t.id === 'free-form')
  if (freeForm) {
    await handleTopicSelect(freeForm)
  }
 })
 watch(
@@ -189,11 +205,6 @@ async function handleTopicSelect(topic: Topic) {
  loadingTopicSessions.value = false
 }
 function handleBack() {
  selectedTopic.value = null
  topicSessions.value = []
 }
 async function handleNewChat() {
  const ok = await chatStore.createSession(selectedTopic.value!.id)
  if (!ok) {
@@ -207,9 +218,7 @@ async function handleResumeSession(session: ChatSession) {
 }
 function handleLeaveSession() {
  // Leave without deleting — session stays in DB and will appear in "Previous Chats"
  chatStore.leaveSession()
  // Refresh the sessions list for the current topic
  if (selectedTopic.value) {
    loadingTopicSessions.value = true
    chatStore.fetchSessionsByTopic(selectedTopic.value.id).then((sessions) => {
@@ -231,12 +240,6 @@ async function handleSend() {
 </script>
 <style scoped>
 .topic-selection {
  display: flex;
  flex-direction: column;
  gap: 1.25rem;
 }
 .section-title {
  font-size: 1.1rem;
  font-weight: 600;
@@ -244,52 +247,6 @@ async function handleSend() {
  margin: 0;
 }
 .topic-grid {
  display: grid;
  grid-template-columns: repeat(auto-fill, minmax(220px, 1fr));
  gap: 0.75rem;
 }
 .topic-tile {
  display: flex;
  flex-direction: column;
  align-items: flex-start;
  gap: 0.25rem;
  padding: 1rem 1.1rem;
  background: white;
  border: 1px solid #e2e8f0;
  border-radius: 8px;
  cursor: pointer;
  text-align: left;
  transition: border-color 0.15s, box-shadow 0.15s;
 }
 .topic-tile:hover {
  border-color: #3182ce;
  box-shadow: 0 2px 8px rgba(49, 130, 206, 0.15);
 }
 .topic-tile-freeform {
  border-style: dashed;
  border-color: #a0aec0;
 }
 .topic-tile-freeform:hover {
  border-color: #718096;
  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.08);
 }
 .topic-tile-name {
  font-size: 0.9rem;
  font-weight: 600;
  color: #2d3748;
 }
 .topic-tile-hint {
  font-size: 0.78rem;
  color: #a0aec0;
 }
 .session-setup {
  max-width: 540px;
 }
@@ -381,6 +338,29 @@ async function handleSend() {
  min-height: 500px;
 }
 .chat-reader-split {
  display: flex;
  flex: 1;
  min-height: 0;
  gap: 0;
 }
 .chat-column {
  display: flex;
  flex-direction: column;
  flex: 1;
  min-width: 0;
  gap: 1rem;
 }
 .reader-panel {
  width: 420px;
  flex-shrink: 0;
  border-radius: 10px;
  margin-left: 1rem;
  box-shadow: -2px 0 8px rgba(0, 0, 0, 0.07);
 }
 .session-bar {
  display: flex;
  align-items: center;
@@ -505,4 +485,26 @@ async function handleSend() {
  font-size: 0.875rem;
  margin-bottom: 0.75rem;
 }
@media (max-width: 768px) {
  .chat-layout {
    height: auto;
    min-height: unset;
  }
  .chat-reader-split {
    flex-direction: column;
  }
  .chat-column {
    min-height: 60vh;
  }
  .reader-panel {
    width: 100%;
    margin-left: 0;
    margin-top: 1rem;
    box-shadow: none;
  }
 }
 </style>
@@ -0,0 +1,183 @@
 <template>
  <div class="login-wrapper">
    <div class="login-card card">
      <div class="login-header">
        <span class="login-icon">🧠</span>
        <h1 class="login-title">AI Teacher</h1>
        <p class="login-subtitle">Neurosurgeon Learning Platform</p>
      </div>
      <form class="login-form" @submit.prevent="handleSubmit">
        <div class="form-group">
          <label for="username">Username</label>
          <input
            id="username"
            v-model="username"
            type="text"
            autocomplete="username"
            required
            :disabled="loading"
          />
        </div>
        <div class="form-group">
          <label for="password">Password</label>
          <input
            id="password"
            v-model="password"
            type="password"
            autocomplete="current-password"
            required
            :disabled="loading"
          />
        </div>
        <div v-if="errorMessage" class="login-error">
          {{ errorMessage }}
        </div>
        <button type="submit" class="btn btn-primary login-btn" :disabled="loading || !username || !password">
          <span v-if="loading" class="spinner"></span>
          <span v-else>Sign in</span>
        </button>
      </form>
    </div>
  </div>
 </template>
 <script setup lang="ts">
 import { ref } from 'vue'
 import { useRouter } from 'vue-router'
 import { useAuthStore } from '@/stores/authStore'
 import { api } from '@/services/api'
 const router = useRouter()
 const authStore = useAuthStore()
 const username = ref('')
 const password = ref('')
 const loading = ref(false)
 const errorMessage = ref('')
 async function handleSubmit() {
  errorMessage.value = ''
  loading.value = true
  authStore.setCredentials(username.value, password.value)
  try {
    await api.get('/auth/check')
    router.push('/')
  } catch {
    authStore.clearCredentials()
    errorMessage.value = 'Invalid username or password.'
  } finally {
    loading.value = false
  }
 }
 </script>
 <style scoped>
 .login-wrapper {
  display: flex;
  align-items: center;
  justify-content: center;
  min-height: 100vh;
  background: #f0f4f8;
 }
 .login-card {
  width: 100%;
  max-width: 380px;
  padding: 2rem;
 }
 .login-header {
  text-align: center;
  margin-bottom: 1.75rem;
 }
 .login-icon {
  font-size: 2.5rem;
  display: block;
  margin-bottom: 0.5rem;
 }
 .login-title {
  font-size: 1.5rem;
  font-weight: 700;
  color: #1a365d;
  margin-bottom: 0.25rem;
 }
 .login-subtitle {
  font-size: 0.85rem;
  color: #718096;
 }
 .login-form {
  display: flex;
  flex-direction: column;
  gap: 1rem;
 }
 .form-group {
  display: flex;
  flex-direction: column;
  gap: 0.35rem;
 }
 .form-group label {
  font-size: 0.875rem;
  font-weight: 600;
  color: #4a5568;
 }
 .form-group input {
  padding: 0.6rem 0.75rem;
  border: 1px solid #cbd5e0;
  border-radius: 6px;
  font-size: 0.95rem;
  outline: none;
  transition: border-color 0.15s;
 }
 .form-group input:focus {
  border-color: #3182ce;
  box-shadow: 0 0 0 3px rgba(49, 130, 206, 0.15);
 }
 .form-group input:disabled {
  background: #f7fafc;
  color: #a0aec0;
 }
 .login-error {
  padding: 0.6rem 0.75rem;
  background: #fed7d7;
  color: #c53030;
  border: 1px solid #fc8181;
  border-radius: 6px;
  font-size: 0.875rem;
 }
 .login-btn {
  width: 100%;
  justify-content: center;
  padding: 0.7rem;
  font-size: 0.95rem;
  margin-top: 0.25rem;
 }
@media (max-width: 768px) {
  .login-wrapper {
    align-items: flex-start;
    padding-top: 2rem;
    min-height: unset;
  }
  .login-card {
    max-width: 100%;
  }
 }
 </style>
@@ -1,7 +1,7 @@
 <template>
  <div class="topics-view">
    <h1 class="page-title">Topics</h1>
-    <p class="page-subtitle">Select a topic to generate an AI-powered summary from uploaded books.</p>
+    <p class="page-subtitle">Select a topic to view or generate an AI-powered summary from uploaded books.</p>
    <!-- Loading state -->
    <div v-if="topicStore.loading" class="empty-state">
@@ -18,83 +18,444 @@
    </div>
    <div v-else class="topics-layout">
-      <!-- Topic Grid -->
+      <div class="topics-main">
      <div class="topic-grid">
        <TopicCard
          v-for="topic in topicStore.topics"
          :key="topic.id"
          :topic="topic"
          :is-generating="topicStore.activeSummaryTopicId === topic.id"
          @generate="handleGenerate"
        />
      </div>
-      <!-- Summary Panel -->
+        <!-- Mode toggle: Summary vs Concept Report -->
-      <div v-if="topicStore.summaryLoading" class="summary-panel card">
+        <div v-if="selectedTopicId" class="mode-toggle">
-        <div class="summary-loading">
+          <button
-          <div class="spinner spinner-dark" style="width:36px;height:36px;margin:0 auto 1rem;"></div>
+            class="mode-tab"
-          <p class="summary-loading-text">Generating summary from uploaded books...</p>
+            :class="{ 'mode-tab--active': mode === 'summary' }"
-          <p class="summary-loading-hint">This may take up to 30 seconds.</p>
+            @click="setMode('summary')"
-        </div>
+          >Summary</button>
-      </div>
+          <button
-
+            class="mode-tab"
-      <div v-else-if="summaryError" class="summary-panel card summary-error">
+            :class="{ 'mode-tab--active': mode === 'concept' }"
-        <h2 class="summary-topic-name">Summary Error</h2>
+            @click="setMode('concept')"
-        <p class="error-text">{{ summaryError }}</p>
+          >Concept Report</button>
        <p v-if="isNoBooks" class="no-books-hint">
          Please
          <RouterLink to="/">upload and process at least one book</RouterLink>
          first.
        </p>
      </div>
      <div v-else-if="topicStore.activeSummary" class="summary-panel card">
        <div class="summary-header">
          <h2 class="summary-topic-name">{{ topicStore.activeSummary.topicName }}</h2>
          <span class="summary-timestamp">{{ formatDate(topicStore.activeSummary.generatedAt) }}</span>
        </div>
-        <div class="summary-text">{{ topicStore.activeSummary.summary }}</div>
+        <!-- Summary history list -->
-
+        <div v-if="selectedTopicId && mode === 'summary'" class="history-panel card">
-        <div v-if="topicStore.activeSummary.sources.length > 0" class="sources-section">
+          <div class="history-header">
-          <button class="sources-toggle" @click="showSources = !showSources">
+            <span class="history-title">Saved summaries</span>
-            Sources ({{ topicStore.activeSummary.sources.length }})
+            <div class="history-actions">
-            <span>{{ showSources ? '▲' : '▼' }}</span>
+              <div class="lang-toggle" role="group" aria-label="Summary language">
-          </button>
+                <button
-          <div v-if="showSources" class="sources-list">
+                  type="button"
-            <div
+                  class="lang-toggle-btn"
-              v-for="(source, idx) in topicStore.activeSummary.sources"
+                  :class="{ 'lang-toggle-btn--active': summaryLanguage === 'en' }"
-              :key="idx"
+                  :disabled="topicStore.summaryLoading"
-              class="source-chip"
+                  @click="summaryLanguage = 'en'"
-            >
+                >EN</button>
-              <span class="source-book">{{ source.bookTitle }}</span>
+                <button
-              <span v-if="source.page" class="source-page">p. {{ source.page }}</span>
+                  type="button"
                  class="lang-toggle-btn"
                  :class="{ 'lang-toggle-btn--active': summaryLanguage === 'th' }"
                  :disabled="topicStore.summaryLoading"
                  @click="summaryLanguage = 'th'"
                >TH</button>
              </div>
              <button class="btn btn-primary btn-sm" :disabled="topicStore.summaryLoading" @click="handleGenerate(selectedTopicId!)">
                <span v-if="topicStore.summaryLoading" class="spinner" style="width:14px;height:14px;display:inline-block;vertical-align:middle;margin-right:4px;"></span>
                Generate New
              </button>
            </div>
          </div>
          <div v-if="topicStore.summaryListLoading" class="history-loading">
            <div class="spinner spinner-dark" style="width:20px;height:20px;margin-right:8px;display:inline-block;vertical-align:middle;"></div>
            Loading...
          </div>
          <div v-else-if="topicStore.summaryList.length === 0" class="history-empty">
            No summaries yet. Click "Generate New" to create one.
          </div>
          <div v-else class="history-list">
            <button
              v-for="item in topicStore.summaryList"
              :key="item.id"
              class="history-chip"
              :class="{ 'history-chip--active': topicStore.activeSummary?.id === item.id }"
              @click="handleLoadSummary(item)"
            >
              Summary #{{ item.summaryNumber }}
              <span class="history-chip-date">· {{ formatDateShort(item.generatedAt) }}</span>
            </button>
          </div>
        </div>
-        <div v-else class="no-sources">
+
-          No source citations available for this summary.
+        <!-- Concept report history list -->
        <div v-if="selectedTopicId && mode === 'concept'" class="history-panel card">
          <div class="history-header">
            <span class="history-title">Saved concept reports</span>
            <div class="history-actions">
              <div class="lang-toggle" role="group" aria-label="Report language">
                <button
                  type="button"
                  class="lang-toggle-btn"
                  :class="{ 'lang-toggle-btn--active': conceptLanguage === 'en' }"
                  :disabled="topicStore.conceptReportLoading"
                  @click="conceptLanguage = 'en'"
                >EN</button>
                <button
                  type="button"
                  class="lang-toggle-btn"
                  :class="{ 'lang-toggle-btn--active': conceptLanguage === 'th' }"
                  :disabled="topicStore.conceptReportLoading"
                  @click="conceptLanguage = 'th'"
                >TH</button>
              </div>
              <button class="btn btn-primary btn-sm" :disabled="topicStore.conceptReportLoading" @click="handleGenerateConcept(selectedTopicId!)">
                <span v-if="topicStore.conceptReportLoading" class="spinner" style="width:14px;height:14px;display:inline-block;vertical-align:middle;margin-right:4px;"></span>
                Generate New
              </button>
            </div>
          </div>
          <div v-if="topicStore.conceptReportListLoading" class="history-loading">
            <div class="spinner spinner-dark" style="width:20px;height:20px;margin-right:8px;display:inline-block;vertical-align:middle;"></div>
            Loading...
          </div>
          <div v-else-if="topicStore.conceptReportList.length === 0" class="history-empty">
            No concept reports yet. Click "Generate New" to create one.
          </div>
          <div v-else class="history-list">
            <button
              v-for="item in topicStore.conceptReportList"
              :key="item.id"
              class="history-chip"
              :class="{ 'history-chip--active': topicStore.activeConceptReport?.id === item.id }"
              @click="handleLoadConceptReport(item)"
            >
              Report #{{ item.reportNumber }}
              <span class="history-chip-date">· {{ formatDateShort(item.generatedAt) }}</span>
            </button>
          </div>
        </div>
-      </div>
+
        <!-- Summary Panel -->
        <div v-if="mode === 'summary' && topicStore.summaryLoading" class="summary-panel card">
          <div class="summary-loading">
            <div class="spinner spinner-dark" style="width:36px;height:36px;margin:0 auto 1rem;"></div>
            <p class="summary-loading-text">Generating summary from uploaded books...</p>
            <p class="summary-loading-hint">This may take up to 30 seconds.</p>
          </div>
        </div>
        <div v-else-if="mode === 'summary' && summaryError" class="summary-panel card summary-error">
          <h2 class="summary-topic-name">Summary Error</h2>
          <p class="error-text">{{ summaryError }}</p>
          <p v-if="isNoBooks" class="no-books-hint">
            Please
            <RouterLink to="/">upload and process at least one book</RouterLink>
            first.
          </p>
        </div>
        <div v-else-if="mode === 'summary' && !topicStore.activeSummary" class="summary-panel card summary-placeholder">
          <p class="summary-placeholder-text">
            {{ selectedTopicId ? 'Select a saved summary or generate a new one.' : 'Select a topic to get started.' }}
          </p>
        </div>
        <div v-else-if="mode === 'summary'" class="summary-panel card">
          <div class="summary-header">
            <h2 class="summary-topic-name">{{ topicStore.activeSummary.topicName }}</h2>
            <div class="summary-meta">
              <span v-if="topicStore.activeSummary.summaryNumber" class="summary-number">
                Summary #{{ topicStore.activeSummary.summaryNumber }}
              </span>
              <span class="summary-timestamp">{{ formatDate(topicStore.activeSummary.generatedAt) }}</span>
            </div>
          </div>
          <div class="summary-text summary-text--markdown" v-html="renderedSummary" @click="handleSummaryClick"></div>
          <div ref="sourcesSection" v-if="topicStore.activeSummary.sources.length > 0" class="sources-section">
            <button class="sources-toggle" @click="showSources = !showSources">
              Sources ({{ topicStore.activeSummary.sources.length }})
              <span>{{ showSources ? '▲' : '▼' }}</span>
            </button>
            <SourceList
              v-if="showSources"
              :sources="topicStore.activeSummary.sources"
              @open-source="(bookId: string, page: number) => handleOpenSource(bookId, page)"
            />
            <BookPagePanel
              v-if="readerPanel"
              :book-id="readerPanel.bookId"
              :page="readerPanel.page"
              :book-title="readerPanel.bookTitle"
              class="reader-panel"
              @close="readerPanel = null"
              @navigate="(p) => readerPanel && (readerPanel.page = p)"
            />
          </div>
          <div v-else class="no-sources">
            No source citations available for this summary.
          </div>
        </div>
        <!-- Concept Report panel -->
        <div v-if="mode === 'concept' && topicStore.conceptReportLoading" class="summary-panel card">
          <div class="summary-loading">
            <div class="spinner spinner-dark" style="width:36px;height:36px;margin:0 auto 1rem;"></div>
            <p class="summary-loading-text">Generating facet-organized concept report...</p>
            <p class="summary-loading-hint">This may take up to 60 seconds.</p>
          </div>
        </div>
        <div v-else-if="mode === 'concept' && conceptError" class="summary-panel card summary-error">
          <h2 class="summary-topic-name">Concept Report Error</h2>
          <p class="error-text">{{ conceptError }}</p>
          <p v-if="isNoBooks" class="no-books-hint">
            Please
            <RouterLink to="/">upload and process at least one book</RouterLink>
            first.
          </p>
        </div>
        <div v-else-if="mode === 'concept' && !topicStore.activeConceptReport" class="summary-panel card summary-placeholder">
          <p class="summary-placeholder-text">
            {{ selectedTopicId ? 'Select a saved concept report or generate a new one.' : 'Select a topic to get started.' }}
          </p>
        </div>
        <div v-else-if="mode === 'concept'" class="summary-panel card">
          <div class="summary-header">
            <h2 class="summary-topic-name">{{ topicStore.activeConceptReport!.topicName }}</h2>
            <div class="summary-meta">
              <span class="summary-number">
                Concept Report #{{ topicStore.activeConceptReport!.reportNumber }}
              </span>
              <span class="summary-timestamp">{{ formatDate(topicStore.activeConceptReport!.generatedAt) }}</span>
            </div>
          </div>
          <div
            v-for="facet in topicStore.activeConceptReport!.facets"
            :key="facet.facetKey"
            class="concept-facet"
          >
            <h3 class="concept-facet-title">{{ facet.title }}</h3>
            <div class="summary-text summary-text--markdown" v-html="renderFacetMarkdown(facet.markdown)" @click="handleSummaryClick"></div>
          </div>
          <div ref="sourcesSection" v-if="topicStore.activeConceptReport!.sources.length > 0" class="sources-section">
            <button class="sources-toggle" @click="showSources = !showSources">
              Sources ({{ topicStore.activeConceptReport!.sources.length }})
              <span>{{ showSources ? '▲' : '▼' }}</span>
            </button>
            <SourceList
              v-if="showSources"
              :sources="topicStore.activeConceptReport!.sources"
              @open-source="(bookId: string, page: number) => handleOpenSource(bookId, page)"
            />
            <BookPagePanel
              v-if="readerPanel"
              :book-id="readerPanel.bookId"
              :page="readerPanel.page"
              :book-title="readerPanel.bookTitle"
              class="reader-panel"
              @close="readerPanel = null"
              @navigate="(p) => readerPanel && (readerPanel.page = p)"
            />
          </div>
          <div v-else class="no-sources">
            No source citations available for this concept report.
          </div>
        </div>
        <!-- Topic Grid -->
        <div class="topic-grid">
          <TopicCard
            v-for="topic in summaryTopics"
            :key="topic.id"
            :topic="topic"
            :is-generating="topicStore.activeSummaryTopicId === topic.id"
            :is-selected="selectedTopicId === topic.id"
            @generate="handleTopicClick"
          />
        </div>
      </div><!-- end topics-main -->
    </div>
  </div>
 </template>
 <script setup lang="ts">
-import { ref, onMounted, inject } from 'vue'
+import { ref, computed, onMounted, inject } from 'vue'
 import { marked } from 'marked'
 import { RouterLink } from 'vue-router'
-import { useTopicStore } from '@/stores/topicStore'
+import { useTopicStore, type SavedSummaryItem, type SavedConceptReportItem, type SourceReference } from '@/stores/topicStore'
 import { useBookStore } from '@/stores/bookStore'
 import TopicCard from '@/components/TopicCard.vue'
 import BookPagePanel from '@/components/BookPagePanel.vue'
 import SourceList from '@/components/SourceList.vue'
 const topicStore = useTopicStore()
 const bookStore = useBookStore()
 const showToast = inject<(msg: string, type?: 'error' | 'success') => void>('showToast')
 const showSources = ref(true)
 const summaryError = ref<string | null>(null)
 const conceptError = ref<string | null>(null)
 const isNoBooks = ref(false)
 const conceptLanguage = ref<'en' | 'th'>('en')
 const summaryLanguage = ref<'en' | 'th'>('en')
 const sourcesSection = ref<HTMLElement | null>(null)
 const selectedTopicId = ref<string | null>(null)
 const mode = ref<'summary' | 'concept'>('summary')
 interface ReaderPanel { bookId: string; page: number; bookTitle?: string }
 const readerPanel = ref<ReaderPanel | null>(null)
 const summaryTopics = computed(() => topicStore.topics.filter(t => t.id !== 'free-form'))
 function escapeHtml(s: string): string {
  return s.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/"/g, '&quot;')
 }
 function renderOneCitation(label: string, figureMap: Map<string, SourceReference>): string {
  const badge = `<span class="source-ref" data-ref="${label}" title="Jump to source ${label}">[${label}]</span>`
  const fig = figureMap.get(label)
  if (fig?.imageUrl) {
    const alt = escapeHtml(fig.caption || fig.label || 'Figure')
    const captionText = [fig.label, fig.caption].filter(Boolean).map(escapeHtml).join(' — ')
    const captionHtml = captionText
      ? `<figcaption class="inline-figure-caption">${captionText}</figcaption>`
      : ''
    return `${badge}<figure class="inline-figure"><img src="${fig.imageUrl}" alt="${alt}" class="inline-figure-img" loading="lazy" onerror="this.parentElement.style.display='none'" />${captionHtml}</figure>`
  }
  return badge
 }
 // Matches [S1], [F2], and tolerates multi-label malformed output like [S26 1], [S1, S2], [S1 F3].
 // Inside each bracket we extract every ([SF]?)(\d+) token; bare numbers inherit the last seen prefix.
 function replaceCitations(html: string, figureMap: Map<string, SourceReference>): string {
  return html.replace(/\[([SF]\d+(?:[\s,]+[SF]?\d+)*)\]/g, (_match, inner: string) => {
    const tokens: string[] = []
    let lastType: 'S' | 'F' = 'S'
    for (const m of inner.matchAll(/([SF]?)(\d+)/g)) {
      const prefix = (m[1] || lastType) as 'S' | 'F'
      lastType = prefix
      tokens.push(`${prefix}${m[2]}`)
    }
    return tokens.map(label => renderOneCitation(label, figureMap)).join(' ')
  })
 }
 const renderedSummary = computed(() => {
  if (!topicStore.activeSummary) return ''
  const html = marked.parse(topicStore.activeSummary.summary) as string
  const figureMap = new Map<string, SourceReference>()
  for (const src of topicStore.activeSummary.sources) {
    if (src.type === 'FIGURE' && src.refLabel) {
      figureMap.set(src.refLabel, src)
    }
  }
  return replaceCitations(html, figureMap)
 })
 function handleSummaryClick(e: MouseEvent) {
  if ((e.target as HTMLElement).classList.contains('source-ref')) {
    showSources.value = true
    sourcesSection.value?.scrollIntoView({ behavior: 'smooth', block: 'start' })
  }
 }
 function handleOpenSource(bookId: string, page: number) {
  const book = bookStore.books.find(b => b.id === bookId)
  readerPanel.value = { bookId, page, bookTitle: book?.title }
  showSources.value = true
 }
 async function handleTopicClick(topicId: string) {
  if (selectedTopicId.value !== topicId) {
    selectedTopicId.value = topicId
    topicStore.activeSummary = null
    topicStore.activeConceptReport = null
    summaryError.value = null
    conceptError.value = null
    if (mode.value === 'summary') {
      await topicStore.fetchSummaries(topicId)
      const list = topicStore.summaryList
      if (list.length > 0) {
        await topicStore.fetchSummaryDetail(topicId, list[list.length - 1].id)
      }
    } else {
      await topicStore.fetchConceptReports(topicId)
      const list = topicStore.conceptReportList
      if (list.length > 0) {
        await topicStore.fetchConceptReportDetail(topicId, list[list.length - 1].id)
      }
    }
  }
 }
 async function handleLoadSummary(item: SavedSummaryItem) {
  if (!selectedTopicId.value) return
  summaryError.value = null
  await topicStore.fetchSummaryDetail(selectedTopicId.value, item.id)
 }
 async function setMode(next: 'summary' | 'concept') {
  if (mode.value === next) return
  mode.value = next
  readerPanel.value = null
  if (next === 'concept' && selectedTopicId.value) {
    await topicStore.fetchConceptReports(selectedTopicId.value)
    const list = topicStore.conceptReportList
    if (list.length > 0) {
      await topicStore.fetchConceptReportDetail(selectedTopicId.value, list[list.length - 1].id)
    }
  }
 }
 function renderFacetMarkdown(md: string): string {
  if (!md) return ''
  const html = marked.parse(md) as string
  const figureMap = new Map<string, SourceReference>()
  const sources = topicStore.activeConceptReport?.sources ?? []
  for (const src of sources) {
    if (src.type === 'FIGURE' && src.refLabel) figureMap.set(src.refLabel, src)
  }
  return replaceCitations(html, figureMap)
 }
 async function handleLoadConceptReport(item: SavedConceptReportItem) {
  if (!selectedTopicId.value) return
  conceptError.value = null
  await topicStore.fetchConceptReportDetail(selectedTopicId.value, item.id)
 }
 async function handleGenerateConcept(topicId: string) {
  conceptError.value = null
  isNoBooks.value = false
  showSources.value = true
  const result = await topicStore.generateConceptReport(topicId, conceptLanguage.value)
  if (!result) {
    conceptError.value = topicStore.error ?? 'Failed to generate concept report.'
    isNoBooks.value =
      conceptError.value.toLowerCase().includes('no books') ||
      conceptError.value.toLowerCase().includes('knowledge source')
    showToast?.(conceptError.value, 'error')
  } else {
    await topicStore.fetchConceptReports(topicId)
  }
 }
 onMounted(async () => {
  await topicStore.fetchTopics()
  if (bookStore.books.length === 0) {
    await bookStore.fetchBooks()
  }
 })
 async function handleGenerate(topicId: string) {
@@ -102,34 +463,215 @@ async function handleGenerate(topicId: string) {
  isNoBooks.value = false
  showSources.value = true
-  const result = await topicStore.generateSummary(topicId)
+  const result = await topicStore.generateSummary(topicId, summaryLanguage.value)
  if (!result) {
    summaryError.value = topicStore.error ?? 'Failed to generate summary.'
    isNoBooks.value =
      summaryError.value.toLowerCase().includes('no books') ||
      summaryError.value.toLowerCase().includes('knowledge source')
    showToast?.(summaryError.value, 'error')
  } else {
    // Refresh the history list to include the newly saved summary
    await topicStore.fetchSummaries(topicId)
  }
 }
 function formatDate(iso: string): string {
  return new Date(iso).toLocaleString()
 }
 function formatDateShort(iso: string): string {
  return new Date(iso).toLocaleDateString(undefined, { month: 'short', day: 'numeric' })
 }
 </script>
 <style scoped>
 .topics-layout {
  display: flex;
  gap: 2rem;
 }
 .topics-main {
  flex: 1;
  min-width: 0;
  display: flex;
  flex-direction: column;
  gap: 2rem;
 }
 .reader-panel {
  margin-top: 1rem;
  height: 600px;
  min-height: 400px;
  border-radius: 10px;
  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.07);
 }
 .topic-grid {
  display: grid;
  grid-template-columns: repeat(auto-fill, minmax(280px, 1fr));
  gap: 1rem;
 }
 .mode-toggle {
  display: flex;
  gap: 0.5rem;
  margin-bottom: 0.25rem;
 }
 .mode-tab {
  background: transparent;
  border: 1px solid #cbd5e0;
  color: #4a5568;
  padding: 0.4rem 1rem;
  font-size: 0.9rem;
  font-weight: 500;
  cursor: pointer;
  border-radius: 999px;
 }
 .mode-tab:hover {
  background: #edf2f7;
 }
 .mode-tab--active {
  background: #553c9a;
  color: white;
  border-color: #553c9a;
 }
 .concept-facet {
  margin-bottom: 1.5rem;
 }
 .concept-facet-title {
  font-size: 1.1rem;
  font-weight: 600;
  color: #553c9a;
  margin: 0 0 0.5rem 0;
  padding-bottom: 0.25rem;
  border-bottom: 1px solid #e2e8f0;
 }
 /* History panel */
 .history-panel {
  border-top: 3px solid #805ad5;
  padding: 1rem;
 }
 .history-header {
  display: flex;
  align-items: center;
  justify-content: space-between;
  margin-bottom: 0.75rem;
 }
 .history-actions {
  display: flex;
  align-items: center;
  gap: 0.5rem;
 }
 .lang-toggle {
  display: inline-flex;
  border: 1px solid var(--border-color, #d0d7de);
  border-radius: 6px;
  overflow: hidden;
 }
 .lang-toggle-btn {
  padding: 0.25rem 0.55rem;
  font-size: 0.75rem;
  font-weight: 600;
  background: transparent;
  color: var(--text-secondary, #57606a);
  border: none;
  cursor: pointer;
  line-height: 1;
 }
 .lang-toggle-btn:not(:last-child) {
  border-right: 1px solid var(--border-color, #d0d7de);
 }
 .lang-toggle-btn:hover:not(:disabled) {
  background: var(--hover-bg, #f3f4f6);
 }
 .lang-toggle-btn--active {
  background: var(--primary-color, #0969da);
  color: #fff;
 }
 .lang-toggle-btn--active:hover:not(:disabled) {
  background: var(--primary-color, #0969da);
 }
 .lang-toggle-btn:disabled {
  opacity: 0.5;
  cursor: not-allowed;
 }
 .history-title {
  font-size: 0.875rem;
  font-weight: 600;
  color: #553c9a;
 }
 .btn-sm {
  font-size: 0.8rem;
  padding: 0.3rem 0.75rem;
 }
 .history-loading {
  font-size: 0.85rem;
  color: #718096;
  display: flex;
  align-items: center;
 }
 .history-empty {
  font-size: 0.85rem;
  color: #a0aec0;
  font-style: italic;
 }
 .history-list {
  display: flex;
  flex-wrap: wrap;
  gap: 0.5rem;
 }
 .history-chip {
  background: #faf5ff;
  border: 1px solid #d6bcfa;
  border-radius: 6px;
  padding: 0.3rem 0.75rem;
  font-size: 0.8rem;
  font-weight: 500;
  color: #553c9a;
  cursor: pointer;
  transition: background 0.15s, border-color 0.15s;
 }
 .history-chip:hover {
  background: #e9d8fd;
  border-color: #b794f4;
 }
 .history-chip--active {
  background: #805ad5;
  border-color: #805ad5;
  color: #fff;
 }
 .history-chip-date {
  font-weight: 400;
  color: inherit;
  opacity: 0.75;
 }
 /* Summary panel */
 .summary-panel {
  border-top: 3px solid #3182ce;
 }
@@ -170,6 +712,22 @@ function formatDate(iso: string): string {
  color: #1a365d;
 }
 .summary-meta {
  display: flex;
  align-items: baseline;
  gap: 0.5rem;
 }
 .summary-number {
  font-size: 0.8rem;
  font-weight: 600;
  color: #805ad5;
  background: #faf5ff;
  border: 1px solid #d6bcfa;
  border-radius: 4px;
  padding: 0.1rem 0.4rem;
 }
 .summary-timestamp {
  font-size: 0.8rem;
  color: #a0aec0;
@@ -179,10 +737,60 @@ function formatDate(iso: string): string {
  font-size: 0.95rem;
  line-height: 1.7;
  color: #2d3748;
  white-space: pre-wrap;
  margin-bottom: 1rem;
 }
 .summary-text--markdown {
  white-space: normal;
 }
 .summary-text--markdown :deep(h1),
 .summary-text--markdown :deep(h2),
 .summary-text--markdown :deep(h3),
 .summary-text--markdown :deep(h4) {
  font-weight: 700;
  margin: 0.75rem 0 0.35rem;
  line-height: 1.3;
  color: #1a202c;
 }
 .summary-text--markdown :deep(h1) { font-size: 1.15rem; }
 .summary-text--markdown :deep(h2) { font-size: 1.05rem; }
 .summary-text--markdown :deep(h3) { font-size: 0.975rem; }
 .summary-text--markdown :deep(h4) { font-size: 0.925rem; }
 .summary-text--markdown :deep(p) { margin: 0.4rem 0; }
 .summary-text--markdown :deep(ul),
 .summary-text--markdown :deep(ol) {
  padding-left: 1.4rem;
  margin: 0.4rem 0;
 }
 .summary-text--markdown :deep(li) { margin: 0.2rem 0; }
 .summary-text--markdown :deep(strong) {
  font-weight: 700;
  color: #1a202c;
 }
 .summary-text--markdown :deep(em) { font-style: italic; }
 .summary-text--markdown :deep(code) {
  background: #edf2f7;
  border-radius: 3px;
  padding: 0.1em 0.35em;
  font-size: 0.87em;
  font-family: monospace;
 }
 .summary-text--markdown :deep(blockquote) {
  border-left: 3px solid #bee3f8;
  margin: 0.5rem 0;
  padding: 0.25rem 0.75rem;
  color: #4a5568;
 }
 .sources-section {
  border-top: 1px solid #e2e8f0;
  padding-top: 0.75rem;
@@ -208,30 +816,116 @@ function formatDate(iso: string): string {
 .sources-list {
  display: flex;
-  flex-wrap: wrap;
+  flex-direction: column;
  gap: 0.5rem;
 }
-.source-chip {
+.source-item {
  display: flex;
-  align-items: center;
+  flex-direction: column;
  gap: 0.25rem;
 }
 .source-item--figure {
  gap: 0.4rem;
 }
 .source-chip {
  display: inline-flex;
  align-items: center;
  gap: 0.25rem;
  border-radius: 4px;
  padding: 0.2rem 0.5rem;
  font-size: 0.78rem;
 }
 .source-chip--text {
  background: #ebf8ff;
  border: 1px solid #bee3f8;
-  border-radius: 6px;
+}
-  padding: 0.3rem 0.7rem;
+
 .source-chip--figure {
  background: #f0fff4;
  border: 1px solid #9ae6b4;
 }
 .source-chip--clickable {
  cursor: pointer;
  transition: background 0.15s, border-color 0.15s;
 }
 .source-chip--text.source-chip--clickable:hover {
  background: #bee3f8;
  border-color: #90cdf4;
 }
 .source-chip--figure.source-chip--clickable:hover {
  background: #c6f6d5;
  border-color: #68d391;
 }
 .source-icon {
  font-size: 0.8rem;
 }
 .source-ref-label {
  font-size: 0.72rem;
  font-weight: 700;
  background: #bee3f8;
  color: #2b6cb0;
  border-radius: 3px;
  padding: 0 0.3rem;
 }
 .source-ref-label--figure {
  background: #9ae6b4;
  color: #276749;
 }
 .source-book {
  color: #2b6cb0;
  font-weight: 500;
 }
 .source-figure-label {
  color: #276749;
  font-weight: 600;
 }
 .source-page {
  color: #718096;
 }
 .source-open-hint {
  font-size: 0.75rem;
  color: #3182ce;
  margin-left: 0.1rem;
 }
 .source-caption {
  font-size: 0.78rem;
  color: #4a5568;
  font-style: italic;
 }
 .source-figure-image {
  max-width: 100%;
 }
 .figure-img {
  max-width: 100%;
  max-height: 300px;
  border-radius: 6px;
  border: 1px solid #e2e8f0;
  object-fit: contain;
 }
 .figure-missing {
  font-size: 0.78rem;
  color: #a0aec0;
  font-style: italic;
 }
 .no-sources {
  font-size: 0.85rem;
  color: #a0aec0;
@@ -252,4 +946,55 @@ function formatDate(iso: string): string {
  color: #3182ce;
  text-decoration: underline;
 }
 .summary-placeholder {
  display: flex;
  align-items: center;
  justify-content: center;
  min-height: 6rem;
  border-top-color: #cbd5e0;
 }
 .summary-placeholder-text {
  font-size: 0.95rem;
  color: #a0aec0;
  font-style: italic;
 }
 .summary-text--markdown :deep(.inline-figure) {
  display: block;
  margin: 0.75rem 0;
  text-align: center;
 }
 .summary-text--markdown :deep(.inline-figure-img) {
  max-width: 100%;
  max-height: 400px;
  border-radius: 6px;
  border: 1px solid #e2e8f0;
  object-fit: contain;
  display: block;
  margin: 0 auto;
 }
 .summary-text--markdown :deep(.inline-figure-caption) {
  font-size: 0.78rem;
  color: #718096;
  font-style: italic;
  margin-top: 0.3rem;
  text-align: center;
 }
 .summary-text--markdown :deep(.source-ref) {
  color: #3182ce;
  font-weight: 600;
  cursor: pointer;
  border-radius: 3px;
  padding: 0 0.15em;
 }
 .summary-text--markdown :deep(.source-ref:hover) {
  background: #ebf8ff;
  text-decoration: underline;
 }
 </style>
@@ -1,10 +1,10 @@
 <template>
  <div class="upload-view">
    <h1 class="page-title">Book Library</h1>
-    <p class="page-subtitle">Upload medical textbooks (PDF) to build the knowledge base.</p>
+    <p v-if="uploadEnabled" class="page-subtitle">Upload medical textbooks (PDF) to build the knowledge base.</p>
    <!-- Upload Section -->
-    <div class="upload-section card">
+    <div v-if="uploadEnabled" class="upload-section card">
      <h2 class="section-title">Upload a Book</h2>
      <div
@@ -87,6 +87,7 @@
          :key="book.id"
          :book="book"
          :deleting="deletingId === book.id"
          :delete-enabled="deleteEnabled"
          @delete="handleDelete"
        />
      </div>
@@ -98,6 +99,10 @@
 import { ref, onMounted, onUnmounted, inject } from 'vue'
 import { useBookStore } from '@/stores/bookStore'
 import BookCard from '@/components/BookCard.vue'
 import { env } from '@/env'
 const uploadEnabled = env('VITE_UPLOAD_ENABLED') !== 'false'
 const deleteEnabled = env('VITE_DELETE_ENABLED') !== 'false'
 const bookStore = useBookStore()
 const showToast = inject<(msg: string, type?: 'error' | 'success') => void>('showToast')
@@ -0,0 +1,79 @@
 # Internal Contract: DocumentAiPageParser → FigureExtractionService
 **Branch**: `002-image-aware-embedding` | **Date**: 2026-04-04  
 **Type**: Internal Java DTO (not an HTTP contract)
 ---
 ## Purpose
 `PageResult` is the internal data transfer object produced by `DocumentAiPageParser` for each
 PDF page. It decouples the Google Document AI SDK types from the rest of the pipeline so that
 `PdfStructureParser` can be replaced without cascading changes.
 ---
 ## Java Record
 ```java
 package com.aiteacher.document;
 import java.util.List;
 /**
 * Internal DTO produced by DocumentAiPageParser for one PDF page.
 * Decouples the Document AI SDK types from downstream services.
 */
 public record PageResult(
    int pageNumber,           // 1-based, matches Document.Page.getPageNumber()
    String orderedText,       // full page text in correct reading order (blocks joined by \n\n)
    String headingTitle,      // first HEADING block on page, or null
    List<FigureBbox> figures  // detected figure regions (may be empty)
 ) {
    /**
     * Normalized bounding box for a detected figure region.
     * Coordinates are in the [0.0, 1.0] range relative to page dimensions.
     */
    public record FigureBbox(
        float x,       // left edge (normalized)
        float y,       // top edge (normalized)
        float width,   // width (normalized)
        float height,  // height (normalized)
        String nearestCaption  // text of adjacent paragraph block, or null
    ) {}
 }
 ```
 ---
 ## Production Rules
 | Field | Rule |
 |-------|------|
 | `orderedText` | Concatenation of all `PARAGRAPH` and `HEADING_*` blocks, joined with `\n\n`. Tables are represented as tab-separated text. |
 | `headingTitle` | First block whose `blockType` is `HEADING_1` through `HEADING_6`. `null` if no heading detected. |
 | `figures` | One entry per `VisualElement` with `type == "figure"` and `confidence ≥ 0.5`. Sorted top-to-bottom by `y`. |
 | `nearestCaption` | The `PARAGRAPH` block immediately following the figure bbox (by Y coordinate). May be `null` if no paragraph follows within 10% of page height. |
 ---
 ## Mapping from Document AI Proto
 ```
 Document.Page.Block         → orderedText (concatenated)
 Document.Page.Block (HEADING_*) → headingTitle (first match)
 Document.Page.VisualElement → FigureBbox
  └─ layout.bounding_poly.normalized_vertices[0] → (x, y) top-left
  └─ normalized_vertices[2] → (x+w, y+h) bottom-right
 ```
 ---
 ## Consumers
 | Consumer | What It Uses |
 |----------|-------------|
 | `BookEmbeddingService` | `orderedText` → `SectionEntity.fullText`; `headingTitle` → `SectionEntity.title` |
 | `FigureExtractionService` | `figures` list → renders page via PDFBox, crops each bbox to `BufferedImage` |
 | `TextChunkingService` | Receives `SectionEntity` (indirectly uses `orderedText`) — **unchanged** |
@@ -0,0 +1,84 @@
 # Internal Contract: MarkerPageParser → FigureExtractionService / BookEmbeddingService
 **Branch**: `002-image-aware-embedding` | **Date**: 2026-04-04  
 **Type**: Internal Java DTO (not an HTTP contract)
 ---
 ## Purpose
 `PageResult` is the internal data transfer object produced by `MarkerPageParser` for each
 PDF page. It decouples the Marker HTTP API from the rest of the pipeline. Downstream consumers
 (`BookEmbeddingService`, `FigureExtractionService`, `TextChunkingService`) are unaware of
 Marker and depend only on this DTO.
 ---
 ## Java Record
 ```java
 package com.aiteacher.document;
 import java.util.List;
 /**
 * Internal DTO produced by MarkerPageParser for one PDF page.
 * Decouples the Marker HTTP API from downstream services.
 */
 public record PageResult(
    int pageNumber,              // 1-based, derived from Marker page block index
    String orderedText,          // full page text in correct reading order (blocks joined by \n\n)
    String headingTitle,         // first SectionHeader block on page, or null
    List<FigureData> figures     // extracted figure images (may be empty)
 ) {
    /**
     * A figure extracted from the page.
     * Image bytes are PNG data decoded from the Marker JSON `images` map.
     */
    public record FigureData(
        byte[] imageBytes,       // PNG image data (base64-decoded from Marker response)
        String nearestCaption,   // text of the adjacent Caption block, or null
        String blockId           // Marker block ID (e.g. "/page/0/Figure/2") for traceability
    ) {}
 }
 ```
 ---
 ## Production Rules
 | Field | Rule |
 |-------|------|
 | `pageNumber` | 1-based index derived from the Marker page block's position in the `children` array (index + 1). |
 | `orderedText` | HTML-stripped text from all `Text`, `TextInlineMath`, `SectionHeader`, `ListItem`, and `Table` blocks, joined with `\n\n`. Marker already returns them in reading order. |
 | `headingTitle` | Plain text of the first `SectionHeader` block on the page. `null` if no heading detected. |
 | `figures` | One `FigureData` per `Figure` or `Picture` block that has a non-empty `images` entry. Blocks with no image data are skipped. |
 | `imageBytes` | Base64-decoded bytes from `block.images[blockId]`. Marker returns PNG. |
 | `nearestCaption` | Plain text of the first `Caption` block that is a sibling appearing immediately after the figure block. `null` if absent. |
 ---
 ## Mapping from Marker JSON
 ```
 Marker JSON → PageResult
 Page block ("/page/N/Page/M")       → PageResult(pageNumber = N + 1)
  SectionHeader child                → headingTitle (first match, HTML-stripped)
  Text / TextInlineMath children    → orderedText (HTML-stripped, joined \n\n)
  Figure / Picture child            → FigureData
    images[blockId]                  → FigureData.imageBytes (base64-decoded)
    next Caption sibling             → FigureData.nearestCaption (HTML-stripped)
    blockId                          → FigureData.blockId
 ```
 ---
 ## Consumers
 | Consumer | What It Uses |
 |----------|-------------|
 | `BookEmbeddingService` | `orderedText` → `SectionEntity.fullText`; `headingTitle` → `SectionEntity.title` |
 | `FigureExtractionService` | `figures` list → decodes `imageBytes`, checks min size, saves to S3 |
 | `TextChunkingService` | Receives `SectionEntity` (uses `orderedText` indirectly) — **unchanged** |
@@ -1,40 +1,42 @@
 # Implementation Plan: Enhanced Embedding with Image Parsing and Metadata
-**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03 | **Spec**: [spec.md](spec.md)  
+**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-04 | **Spec**: [spec.md](spec.md)  
 **Input**: Feature specification from `/specs/002-image-aware-embedding/spec.md`
 ## Summary
-Enhance the book embedding pipeline to extract images from every PDF page, generate descriptive
+Enhance the PDF embedding pipeline to extract figures and generate AI descriptions for them,
-text for each image, and store all content (text chunks + figure captions) with rich, consistent
+making image content semantically searchable alongside text. PDF parsing and figure extraction
-metadata in the vector store. A new document hierarchy (Book → Chapter → Section → TextChunk +
+are delegated to a local **Marker** server (`http://localhost:8000/marker/upload`), which
-Figure) is introduced. Postgres holds the full-text sections and figure metadata; the vector
+returns reading-order text and pre-cropped figure images (base64) in a single JSON response,
-store holds chunk and figure caption embeddings; the local file store holds extracted image files.
+eliminating the need for PDFBox column heuristics and figure bbox rendering.
 At query time, both the text-chunk store and figure-caption store are searched in parallel and
 results are merged before being sent to the LLM.
 ## Technical Context
 **Language/Version**: Java 25 (backend), TypeScript / Node 20 (frontend)  
-**Primary Dependencies**: Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (embeddings + chat), PDFBox (via Spring AI PDF reader dependency)  
+**Primary Dependencies**: Spring Boot 4.0.5, Spring AI 2.0.0-M4, OpenAI API (embeddings +
-**Storage**: PostgreSQL (JPA + Flyway), pgvector (Spring AI `VectorStore`), local file system (extracted images — `/uploads/figures/`)  
+GPT-4o vision), PDFBox 3.0.3 (via `spring-ai-pdf-document-reader` — retained transitively,
-**Testing**: Spring Boot Test, JUnit 5, Mockito  
+no longer used directly), Marker local HTTP API (`http://localhost:8000/marker/upload`)  
-**Target Platform**: Linux server (Docker Compose)  
+**Storage**: PostgreSQL (JPA + Flyway), pgvector (Spring AI `VectorStore`), S3-compatible
-**Project Type**: Web application — backend REST API + Vue 3 frontend  
+object store (figure images via `FigureStorageService`)  
-**Performance Goals**: Full book (up to 500 pages with images) processed in ≤ 30 minutes; query response unchanged from existing baseline  
+**Testing**: Maven / JUnit 5 (`spring-boot-starter-test`)  
-**Constraints**: No new deployable units; all changes within the existing `backend/` module; image storage on local disk (S3 migration is a future concern, behind an interface)  
+**Target Platform**: Linux server  
-**Scale/Scope**: POC — <10 concurrent users; single shared book library
+**Project Type**: Web application (backend API + frontend client)  
 **Performance Goals**: SC-003 — book processing time ≤ 3× text-only for ≤ 500 pages  
 **Constraints**: REST API only (Constitution III); Marker server must be running locally;
 S3-compatible storage configured via env vars  
 **Scale/Scope**: POC — handful of books, <10 users
 ## Constitution Check
-*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
+*GATE: Must pass before Phase 0 research. Re-checked after Phase 1 design.*
 | Principle | Status | Notes |
 |-----------|--------|-------|
-| I — KISS | ⚠️ Justified violation — see Complexity Tracking | Hierarchical model + dual search adds complexity; justified by precision requirement |
+| **I. KISS** | ✅ Justified | Marker replaces a bespoke PDFBox column heuristic + Google Cloud SDK with one HTTP call. Net complexity reduction vs. the Document AI approach. |
-| II — Easy to Change | ✅ | Figure storage wrapped behind `FigureStorageService` interface; can swap local disk for S3 |
+| **II. Easy to Change** | ✅ | `MarkerPageParser` is the only class that knows about Marker; swap the implementation to replace Marker with any other parser. `PageResult` DTO remains unchanged. |
-| III — Web-First | ✅ | All new capabilities exposed via existing REST API; no new deployable units |
+| **III. Web-First** | ✅ | Internal pipeline change; no public API contract change. |
-| IV — Docs as Architecture | ⚠️ Required | README Mermaid diagram MUST be updated in this PR to show new storage tiers |
+| **IV. Documentation** | ✅ | README must be updated to show Marker as a local external service. |
 ## Project Structure
@@ -46,60 +48,38 @@ specs/002-image-aware-embedding/
 ├── research.md          # Phase 0 output
 ├── data-model.md        # Phase 1 output
 ├── quickstart.md        # Phase 1 output
-├── contracts/           # Phase 1 output
+├── contracts/
-└── tasks.md             # Phase 2 output (/speckit.tasks)
+│   ├── api.md           # HTTP API contracts (unchanged from initial plan)
 │   └── marker-page-result.md  # Internal DTO contract (MarkerPageParser → downstream)
 └── tasks.md             # Phase 2 output (/speckit.tasks — not created here)
 ```
-### Source Code (repository root)
+### Source Code
 ```text
 backend/
 ├── src/main/java/com/aiteacher/
 │   ├── config/
 │   │   └── MarkerConfig.java          # NEW: RestClient bean + base-url property
 │   ├── document/
 │   │   ├── MarkerPageParser.java      # NEW: replaces DocumentAiPageParser + PdfStructureParser
 │   │   ├── PageResult.java            # UPDATED: FigureBbox → FigureData (bytes not bbox)
 │   │   ├── FigureExtractionService.java  # UPDATED: no PDFBox render; decode bytes directly
 │   │   ├── TextChunkingService.java   # UNCHANGED
 │   │   ├── VisionDescriptionService.java # UNCHANGED
 │   │   └── [removed] DocumentAiPageParser.java
 │   ├── book/
-│   │   ├── Book.java                         (existing)
+│   │   └── BookEmbeddingService.java  # MINOR UPDATE: inject MarkerPageParser, drop DocumentAiPageParser
-│   │   ├── BookController.java               (existing)
+│   └── [removed] config/DocumentAiConfig.java
-│   │   ├── BookService.java                  (existing)
+├── src/main/resources/
-│   │   ├── BookRepository.java               (existing)
+│   └── application.yaml               # UPDATED: remove document-ai.*, add marker.base-url
-│   │   ├── BookStatus.java                   (existing)
+└── pom.xml                            # UPDATED: remove google-cloud-document-ai
 │   │   ├── BookEmbeddingService.java         (existing — enhanced)
 │   │   └── NoKnowledgeSourceException.java   (existing)
 │   ├── document/                             (new package)
 │   │   ├── BookNode.java
 │   │   ├── ChapterNode.java
 │   │   ├── SectionNode.java
 │   │   ├── SectionRepository.java
 │   │   ├── TextChunkNode.java
 │   │   ├── FigureNode.java
 │   │   ├── FigureRepository.java
 │   │   ├── FigureType.java
 │   │   ├── ChunkFigureRef.java
 │   │   └── ChunkFigureRefRepository.java
 │   ├── figure/                               (new package)
 │   │   ├── FigureStorageService.java         (interface)
 │   │   └── LocalFigureStorageService.java    (implementation)
 │   ├── retrieval/                            (new package)
 │   │   └── NeurosurgeryRetriever.java
 │   ├── chat/
 │   │   └── ChatService.java                  (updated — uses NeurosurgeryRetriever)
 │   └── config/
 │       └── FigureStorageConfig.java          (new — configures upload dir)
 └── src/main/resources/
    └── db/migration/
        ├── V4__document_hierarchy.sql        (new)
        └── V5__figures_and_refs.sql          (new)
 uploads/
 └── figures/                                  (runtime — extracted images; gitignored)
 ```
-**Structure Decision**: Option 2 (Web Application) confirmed. All backend changes stay within
+**Structure Decision**: Option 2 (backend + frontend) per constitution Technology Constraints.
-`backend/`. Two new packages (`document/`, `retrieval/`) plus one interface package (`figure/`)
+Frontend changes are display-only (render figure citations inline).
 keep concerns separated without adding a deployable unit.
 ## Complexity Tracking
-| Violation | Why Needed | Simpler Alternative Rejected Because |
+> No constitution violations — Marker reduces complexity compared to the previous
-|-----------|------------|-------------------------------------|
+> Google Document AI approach (fewer dependencies, no GCP credentials, no 15-page batching).
 | Document hierarchy (BookNode → ChapterNode → SectionNode) | Parent-child retrieval: chunks reference their parent section so the LLM receives full section context, not just the matching fragment. This is the established solution for RAG precision. | Flat page-per-doc model (current) loses inter-sentence context; chunk-only retrieval produces incomplete answers for multi-paragraph clinical questions |
 | Dual vector search (text chunks + figure captions) | Figure captions must be independently searchable — a query about "cavernous sinus anatomy" must surface the diagram even if no text chunk scores highly | Single vector store search would miss figures whose captions don't happen to be the highest-similarity hit; this is the core deliverable of the feature |
 | Third storage tier (local file store for images) | Extracted images cannot live in Postgres (binary blobs degrade query performance) or the vector store (only vectors). A file-per-image approach is standard. | Storing images as base64 in Postgres JSONB would bloat the DB and complicate backup/restore; the `FigureStorageService` interface keeps the implementation swappable |
@@ -1,34 +1,67 @@
 # Quickstart: Enhanced Embedding with Image Parsing and Metadata
-**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
+**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-04 (updated: Marker replaces Google Document AI)
 ---
 ## Prerequisites
 - Docker Compose running (PostgreSQL + pgvector)
- OpenAI API key set in `backend/src/main/resources/application.properties` or as env var `OPENAI_API_KEY`
+- OpenAI API key set as env var `OPENAI_API_KEY`
 - Java 25 + Maven on PATH
 - **Marker server running** on `http://localhost:8000` (see setup below)
 - S3-compatible bucket configured (existing setup)
 ---
-## New Configuration
+## Marker Server Setup (one-time)
-Add to `backend/src/main/resources/application.properties`:
+Marker is a local Python service — no cloud credentials required.
-```properties
+```bash
-# Figure storage
+# Install (Python 3.10+ required)
-app.figure-storage.base-path=./uploads
+pip install marker-pdf
-app.figure-storage.min-image-size-px=100
+
 # Start the server on port 8000
 marker_server --port 8000
 ```
-The `uploads/figures/` directory is created automatically on first use. Add it to `.gitignore`.
+The server is ready when you see:
 ```
 INFO:     Uvicorn running on http://0.0.0.0:8000
 ```
 Keep the server running in the background (or use a process manager like `systemd` or `screen`).
 ---
 ## Backend Configuration
 Add or update `backend/src/main/resources/application.yaml`:
 ```yaml
 app:
  figure-storage:
    endpoint: https://your-s3-endpoint
    region: your-region
    bucket: ${S3_BUCKET:aiteacher}
    access-key-id: ${S3_ACCESS_KEY_ID}
    secret-access-key: ${S3_SECRET_ACCESS_KEY}
    min-image-size-px: 100   # skip decorative images smaller than 100×100 px
  marker:
    base-url: ${MARKER_BASE_URL:http://localhost:8000}
  embedding:
    batch-size: 20
    batch-delay-ms: 2000
 ```
 No GCP credentials or project IDs are needed.
 ---
 ## Database Migration
-Two new Flyway migrations run automatically on startup:
+Two Flyway migrations run automatically on startup:
 - `V4__document_hierarchy.sql` — adds `chapter` and `section` tables
 - `V5__figures_and_refs.sql` — adds `figure` and `chunk_figure_ref` tables
@@ -54,10 +87,11 @@ image-aware pipeline runs. Status can be polled via `GET /api/v1/books`.
 ## Verifying Image Extraction
-1. Upload a PDF with diagrams: `POST /api/v1/books/upload`
+1. Ensure Marker is running: `curl http://localhost:8000` should respond.
-2. Wait for `status: "READY"` via `GET /api/v1/books`
+2. Upload a PDF with diagrams: `POST /api/v1/books/upload`
-3. List figures: `GET /api/v1/books/{id}/figures` — should return at least one entry per image page
+3. Wait for `status: "READY"` via `GET /api/v1/books`
-4. Ask a diagram-specific question in chat — response `sources` should include a `type: "FIGURE"` entry
+4. List figures: `GET /api/v1/books/{id}/figures` — should return at least one entry per image page
 5. Ask a diagram-specific question in chat — response `sources` should include a `type: "FIGURE"` entry
 ---
@@ -80,7 +114,8 @@ mvn test
 ```
 Key new test classes:
- `FigureExtractionServiceTest` — unit tests for image extraction and classification
+- `MarkerPageParserTest` — unit tests for JSON parsing and block-to-PageResult mapping
 - `FigureExtractionServiceTest` — unit tests for base64 decode, size filtering, classification
 - `NeurosurgeryRetrieverTest` — unit tests for dual-search merge and deduplication
 - `BookEmbeddingServiceIntegrationTest` — integration test: upload PDF with known figures,
  verify figures appear in `GET /api/v1/books/{id}/figures`
@@ -1,10 +1,10 @@
 # Research: Enhanced Embedding with Image Parsing and Metadata
-**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-03
+**Branch**: `002-image-aware-embedding` | **Date**: 2026-04-04 (updated: Marker replaces Google Document AI)
-This document resolves all technical unknowns identified during planning. The primary source for
+This document resolves all technical unknowns identified during planning. Decisions 1–10 cover
-decisions is the detailed architecture provided directly by the project owner, supplemented by
+the core pipeline. The **Marker Study** section at the bottom explains why Marker was chosen
-Spring AI 2.0.0-M4 API specifics.
+over Google Document AI to drive PDF parsing and figure extraction.
 ---
@@ -28,19 +28,29 @@ association explicit and queryable.
 ---
-## Decision 2: Image Extraction Strategy
+## Decision 2: Document Parsing Strategy
-**Decision**: Use PDFBox (already on classpath via `spring-ai-pdf-document-reader`) to extract
+**Decision**: Use **Marker** (local HTTP server, `http://localhost:8000/marker/upload`) as the
-images per page. Each image is tagged with `page`, `figure_id` (derived from caption, e.g.
+single entry point for PDF parsing. A single `POST` with `output_format=json` returns:
-"Fig. 12-4"), and the parent `sectionId`. Images are saved to local disk under
+- Reading-order text blocks (headings, paragraphs) — no column-split heuristic needed
-`/uploads/figures/{bookId}/`.
+- Pre-cropped figure images as base64-encoded PNG in the `images` map of each `Figure` block
 - Table, equation, and code blocks as structured HTML
-**Rationale**: PDFBox is already present (Spring AI bundles it). No new dependency needed.
+`MarkerPageParser` translates the Marker JSON response into `List<PageResult>`, which is the
-Per-page extraction ensures every image is captured regardless of PDF structure.
+same internal DTO used by the rest of the pipeline.
 **Rationale**: Marker handles column reordering, scanned-page OCR, and figure cropping in one
 call, eliminating the PDFBox column heuristic (`PdfStructureParser`) and the PDFBox
 render+crop loop in `FigureExtractionService`. Net result: fewer classes, no cloud dependency,
 no GCP credentials.
 **Alternatives considered**:
- iText / iText7 → additional commercial dependency; overkill for extraction
+- PDFBox column heuristic (previous approach) → rejected: 50/50 split fails on asymmetric
- Screenshot each page as PNG, then OCR → far slower; loses vector quality
+  columns and scanned pages
 - Google Document AI Layout Parser → rejected: adds GCP credentials, per-page billing, 15-page
  batch limit, and still requires PDFBox to render+crop figure regions from bounding boxes.
  See Marker Study below for detailed comparison.
 - Screenshot each page + OCR → far slower; loses digital text quality
 ---
@@ -103,18 +113,19 @@ search. This is the higher-recall path; dual search (Decision 4) is the higher-p
 ## Decision 6: Image Storage
-**Decision**: Extracted images are saved as PNG files to a local directory
+**Decision**: Marker returns figure images as base64-encoded PNG bytes in the JSON response.
-(`${app.figure-storage.base-path}`, defaults to `./uploads/figures/{bookId}/`). The path is
+`FigureExtractionService` decodes these bytes and passes them to `FigureStorageService`, which
-stored in `figure.image_path` in Postgres. A `FigureStorageService` interface wraps all disk
+persists them to an S3-compatible bucket (`${app.figure-storage.bucket}`). The image path/URL
-I/O so the implementation can be swapped to S3 or another object store without changing
+is stored in `figure.image_path` in Postgres.
 callers.
-**Rationale**: Local disk is the simplest viable option for a POC with <10 users. The interface
+The `FigureStorageService` interface is unchanged; only the caller changes (from PDFBox crop
-boundary satisfies Constitution Principle II (Easy to Change).
+to base64 decode).
 **Rationale**: Marker's pre-cropped images remove the need for PDFBox rendering.
 `FigureStorageService` interface boundary satisfies Constitution Principle II (Easy to Change).
 **Alternatives considered**:
- S3 from day 1 → operational overhead not justified at POC scale
+- Store base64 in Postgres JSONB → bloats DB; complicates backup; query performance degrades
 - Base64 in Postgres JSONB → bloats DB; complicates backup; query performance degrades
 ---
@@ -123,7 +134,8 @@ boundary satisfies Constitution Principle II (Easy to Change).
 **Decision**: Use the enum `FigureType { ANATOMICAL_DIAGRAM, SURGICAL_PHOTOGRAPH, MRI_CT_SCAN,
 TABLE, CHART, INTRAOPERATIVE_IMAGE }`. Classification is derived from:
 1. Caption keywords ("MRI", "CT", "Fig.", "Table") — heuristic, no model needed
-2. Fall back to `ANATOMICAL_DIAGRAM` if unclassifiable
+2. Marker `block_type` hint (`"Table"` → TABLE, `"Figure"` / `"Picture"` → ANATOMICAL_DIAGRAM default)
 3. Fall back to `ANATOMICAL_DIAGRAM` if unclassifiable
 **Rationale**: Allows the frontend to render different icon/label per type (e.g., "MRI" badge).
 Heuristic classification avoids a separate model call per image at extraction time.
@@ -175,14 +187,225 @@ the process fails mid-way. An explicit, idempotent trigger is safer and more obs
 ## Decision 10: Minimum Image Size Threshold
-**Decision**: Images smaller than 100×100 pixels are discarded and no chunk is created. This
+**Decision**: Images smaller than 100×100 pixels are discarded and no chunk is created. Marker
-threshold filters out decorative elements (bullets, dividers, publisher logos) without a
+returns PNG bytes; `FigureExtractionService` decodes to `BufferedImage` solely to check
-classification model.
+dimensions. This threshold filters out decorative elements without a classification model.
 **Rationale**: Neurosurgery textbook diagrams and MRI scans are never smaller than 100×100 px.
-The threshold is configurable via `app.figure-storage.min-image-size-px` in
+The threshold is configurable via `app.figure-storage.min-image-size-px`.
 `application.properties`.
 **Alternatives considered**:
 - No threshold → decorative icons pollute the figure index
 - ML-based classification → accurate but adds model dependency; not needed at POC scale
 ---
 # Marker Study — Why Marker Replaces Google Document AI
 *Added 2026-04-04.*
 ## What Marker Offers
 Marker is an open-source, locally-runnable PDF-to-structured-content converter that uses a
 pipeline of deep-learning models (surya for OCR + layout detection, texify for equations).
 Key capabilities relevant to this project:
 | Capability | Marker | Google Document AI |
 |-----------|--------|--------------------|
 | Multi-column reading order | ✅ | ✅ |
 | OCR on scanned pages | ✅ | ✅ |
 | Figure detection | ✅ returns pre-cropped images | ⚠️ returns bbox only; PDFBox still needed |
 | Table extraction | ✅ HTML tables | ✅ |
 | JSON output with image bytes | ✅ base64 in `images` map | ❌ |
 | No cloud credentials | ✅ | ❌ GCP service account required |
 | No per-page billing | ✅ | ❌ ~$10/1,000 pages |
 | Batch size limits | None (local) | 15 pages / 20 MB per sync call |
 | Setup | `pip install marker-pdf && marker_server` | GCP project + processor + IAM |
 ---
 ## Does Marker Solve the Current Pain Points?
 ### Pain Point 1: Naive 50/50 Column Split
 **Answer: Yes, Marker fixes this completely.**
 `PdfStructureParser.extractPageText()` splits pages at the horizontal midpoint with a 20%
 threshold. This fails on asymmetric columns and scanned pages. Marker's surya layout model
 returns blocks in natural reading order — no heuristic needed.
 ### Pain Point 2: Figure Detection Misses Rasterized Figures
 **Answer: Yes, Marker fixes this for most cases.**
 `FigureExtractionService` previously iterated PDF XObjects (only finds embedded XObject images,
 misses rasterized figures and vector-path drawings). Marker's layout model detects visual
 elements by type and returns the cropped image bytes directly — no PDFBox page rendering needed.
 ### Pain Point 3: OCR on Scanned Pages
 **Answer: Yes, Marker handles scanned pages transparently via surya OCR.**
 ### Pain Point 4: Caption Detection
 **Answer: Improved — Marker groups caption blocks with their figure block.**
 The `block_type = "Caption"` block appears as a sibling or child adjacent to the `"Figure"`
 block in the Marker JSON, making caption association structural rather than regex-based.
 ---
 ## Marker API Integration
 ### Local Server Setup
 ```bash
 pip install marker-pdf
 marker_server --port 8000
 ```
 The server exposes `POST /marker/upload` (the user's configured endpoint).
 ### Request
 ```
 POST http://localhost:8000/marker/upload
 Content-Type: multipart/form-data
 file=@document.pdf
 output_format=json
 ```
 ### Response (abbreviated)
 ```json
 {
  "output_format": "json",
  "output": {
    "block_type": "Document",
    "children": [
      {
        "block_type": "Page",
        "id": "/page/0/Page/0",
        "children": [
          {
            "block_type": "SectionHeader",
            "id": "/page/0/SectionHeader/0",
            "html": "<h1>Cavernous Sinus Anatomy</h1>"
          },
          {
            "block_type": "Text",
            "id": "/page/0/Text/1",
            "html": "<p>The cavernous sinus contains...</p>"
          },
          {
            "block_type": "Figure",
            "id": "/page/0/Figure/2",
            "html": "<figure><img src='/page/0/Figure/2'/></figure>",
            "images": {
              "/page/0/Figure/2": "iVBORw0KGgo..."
            }
          },
          {
            "block_type": "Caption",
            "id": "/page/0/Caption/3",
            "html": "<p>Fig. 12-4. Coronal cross-section...</p>"
          }
        ]
      }
    ],
    "metadata": { "page_stats": [...] }
  }
 }
 ```
 ### Java Integration Pattern
 ```java
 // MarkerPageParser — core call
 MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
 body.add("file", new FileSystemResource(pdfPath));
 body.add("output_format", "json");
 JsonNode response = restClient.post()
    .uri(baseUrl + "/marker/upload")
    .contentType(MediaType.MULTIPART_FORM_DATA)
    .body(body)
    .retrieve()
    .body(JsonNode.class);
 JsonNode document = response.get("output");
 ```
 ### Mapping Marker Blocks to PageResult
 ```
 Page block (id "/page/N/Page/M") → PageResult(pageNumber = N+1)
  SectionHeader children           → headingTitle (first match)
  Text, TextInlineMath children    → orderedText (HTML stripped, joined \n\n)
  Figure children with images map  → FigureData(imageBytes = base64decode(images[id]))
  Caption sibling of Figure        → FigureData.nearestCaption
 ```
 ---
 ## Architecture Change
 ```
 Before (Document AI — removed):
  DocumentAiPageParser
      → Google Document AI API (GCP, 15-page batches, credentials)
      → returns text blocks + figure bboxes
  PdfStructureParser (PDFBox column heuristic)
  FigureExtractionService
      → renders page via PDFBox at 150 DPI
      → crops bbox region
 After (Marker):
  MarkerPageParser
      → POST PDF to http://localhost:8000/marker/upload (output_format=json)
      → returns text blocks (correct reading order) + Figure blocks with base64 images
      → produces List<PageResult> (same DTO, FigureData carries bytes not bbox)
  FigureExtractionService (simplified)
      → base64-decodes image bytes from PageResult.FigureData
      → checks min size (ImageIO.read → getWidth/getHeight)
      → saves to S3 via FigureStorageService (UNCHANGED)
  VisionDescriptionService (UNCHANGED)
  BookEmbeddingService orchestration (MINOR: inject MarkerPageParser)
 ```
 **What is removed**:
 - `DocumentAiPageParser` — replaced by `MarkerPageParser`
 - `DocumentAiConfig` — replaced by `MarkerConfig`
 - `PdfStructureParser` — Marker handles reading order
 - `google-cloud-document-ai` Maven dependency
 - `app.document-ai.*` configuration properties
 **What stays the same**:
 - `PageResult` DTO structure (fields renamed, not restructured)
 - `FigureExtractionService` public interface
 - `TextChunkingService`, `VisionDescriptionService`, `BookEmbeddingService` orchestration
 - All JPA entities, repositories, vector store, S3 storage
 ---
 ## Constitution Compliance
 | Principle | Assessment |
 |-----------|------------|
 | **I. KISS** | ✅ Simpler than Document AI — one HTTP call replaces GCP SDK + PDFBox render loop. No new dependency beyond an HTTP client (Spring RestClient, already available). |
 | **II. Easy to Change** | ✅ `MarkerPageParser` is the only Marker-aware class. Swap it to use any other parser. `PageResult` DTO unchanged in contract. |
 | **III. Web-First** | ✅ Internal pipeline change; no API contract change. |
 | **IV. Documentation** | ✅ README must show Marker as a local external service dependency. |
 ---
 ## Risks & Mitigations
 | Risk | Likelihood | Mitigation |
 |------|-----------|------------|
 | Marker server not running when book is uploaded | Medium | `BookEmbeddingService` catches exception from `MarkerPageParser`, marks book as `FAILED`, logs full error. |
 | Marker misses some figures (complex PDFs) | Medium | `app.figure-storage.min-image-size-px` threshold can be tuned. Add fallback: if Marker returns 0 figures for a page with known images, log a warning. |
 | SC-003 (≤ 3× processing time) violated | Low | Marker runs locally (no network latency to cloud). Benchmark with a real 500-page book early. |
 | Large PDF upload to Marker (>100MB) | Low | Marker server handles the full file; no batching needed. Multipart upload limit configurable. |
 | Marker image quality vs PDFBox crop | Low | Marker crops at native resolution; quality is equivalent or better than 150 DPI PDFBox render. |
@@ -48,12 +48,13 @@
 **Independent Test**: Upload a PDF containing at least one page with a labelled anatomical diagram. After status shows `READY`, call `GET /api/v1/books/{id}/figures` — response must contain at least one entry with `figureType`, `caption`, `page`, and `imageUrl` populated. Verify the PNG file exists at the path in `imagePath`.
- [X] T013 [US2] Create `PdfStructureParser` service in `backend/src/main/java/com/aiteacher/document/PdfStructureParser.java` — uses Spring AI's `PagePdfDocumentReader` to extract per-page text; groups pages into `SectionEntity` records using heading-detection heuristics (lines matching `^\d+(\.\d+)*\s+[A-Z]`); groups sections into `ChapterEntity` records; persists both to Postgres via `ChapterRepository` and `SectionRepository`; returns `List<SectionEntity>` for the book
+- [X] T013 [US2] ~~Create `PdfStructureParser`~~ → **SUPERSEDED**: PDF parsing is handled by `MarkerPageParser` (see T013b). `PdfStructureParser` exists but is not wired into the pipeline.
- [X] T014 [US2] Create `FigureExtractionService` in `backend/src/main/java/com/aiteacher/document/FigureExtractionService.java` — opens PDF with PDFBox `PDDocument`; iterates pages; extracts `PDImageXObject` instances; skips images whose width or height are below `min-image-size-px`; classifies `FigureType` using the keyword-matching table from data-model.md §FigureType; parses caption from the nearest text line matching `CAPTION_PATTERN`; saves PNG via `FigureStorageService`; persists `FigureEntity` to `FigureRepository`; returns `List<FigureEntity>` per book
+- [X] T013b [US2] Create `MarkerPageParser` in `backend/src/main/java/com/aiteacher/document/MarkerPageParser.java` — POSTs PDF to `http://localhost:8000/marker/upload?output_format=json` via Spring `RestClient`; parses JSON response into `List<PageResult>` (one per page block); extracts heading, ordered text, and pre-cropped figure PNG bytes per page
 - [X] T014 [US2] Update `FigureExtractionService` in `backend/src/main/java/com/aiteacher/document/FigureExtractionService.java` — **Marker migration**: removed PDFBox rendering + bbox-crop loop; decodes PNG bytes from `PageResult.FigureData` via `ImageIO.read()`; skips images below `min-image-size-px`; classifies `FigureType`; saves via `FigureStorageService`; persists `FigureEntity`
 - [X] T015 [US2] Create `VisionDescriptionService` in `backend/src/main/java/com/aiteacher/document/VisionDescriptionService.java` — accepts a `Path` to a PNG and a caption String; calls the OpenAI vision model (via Spring AI `ChatClient` with image media type) to generate a 2–4 sentence clinical description; returns the generated description string; handles API failures by returning the caption as fallback
 - [X] T016 [US2] Create `TextChunkingService` in `backend/src/main/java/com/aiteacher/document/TextChunkingService.java` — accepts a `SectionEntity`; splits `fullText` into overlapping 400–600 token windows (20-token overlap); wraps each window in a Spring AI `Document` with the flat metadata map defined in data-model.md §Text chunk document; returns `List<Document>`
 - [X] T017 [US2] Create `ChunkFigureRefService` in `backend/src/main/java/com/aiteacher/document/ChunkFigureRefService.java` — accepts a Spring AI `Document` (with its `id` as `chunkId`) and a `List<FigureEntity>` for the book; scans chunk text for patterns `Fig\.\s*\d+[\-\.]\d+` and `Figure\s+\d+[\-\.]\d+`; matches against figure labels; persists `ChunkFigureRefEntity` rows via `ChunkFigureRefRepository`
- [X] T018 [US2] Rewrite `BookEmbeddingService.embedBook()` in `backend/src/main/java/com/aiteacher/book/BookEmbeddingService.java` to orchestrate the full pipeline: (1) `PdfStructureParser` → sections; (2) parallel: `FigureExtractionService` + `TextChunkingService` for each section; (3) `VisionDescriptionService` for each figure; (4) embed figure captions+descriptions as `Document`s (metadata per data-model.md §Figure caption document) into `vectorStore`; (5) embed text chunks into `vectorStore`; (6) `ChunkFigureRefService` for each chunk; update `captionEmbeddingId` on `FigureEntity` after embedding
+- [X] T018 [US2] Update `BookEmbeddingService.embedBook()` — **Marker migration**: injected `MarkerPageParser` replacing `DocumentAiPageParser`; updated `figureExtractionService.extract()` call (removed `pdfPath` arg); updated log message. Pipeline: (1) `MarkerPageParser` → `List<PageResult>`; (2) `buildAndSaveSections()` → sections; (3) `TextChunkingService` → chunks → embed; (4) `FigureExtractionService.extract()` → figures; (5) `VisionDescriptionService` → embed figure chunks; (6) `ChunkFigureRefService` → refs
 - [X] T019 [US2] Extend `BookEmbeddingService.deleteBookChunks()` to also delete: all `ChunkFigureRefEntity` rows (via `findByFigureIdIn`), all `FigureEntity` rows (via `deleteAllByBookId`), all figure PNG files (via `FigureStorageService.delete(bookId)`), all `SectionEntity` and `ChapterEntity` rows for the book
 - [X] T020 [US2] Add `POST /api/v1/books/{id}/reembed` endpoint to `BookController` in `backend/src/main/java/com/aiteacher/book/BookController.java` — returns `202` with `{ bookId, status: "PROCESSING" }`; returns `404` if not found; returns `409` if already `PROCESSING`; calls `deleteBookChunks()` then `embedBook()` asynchronously
@@ -0,0 +1,35 @@
 # Specification Quality Checklist: Basic Login Protection
 **Purpose**: Validate specification completeness and quality before proceeding to planning
 **Created**: 2026-04-06
 **Feature**: [spec.md](../spec.md)
 ## Content Quality
 - [x] No implementation details (languages, frameworks, APIs)
 - [x] Focused on user value and business needs
 - [x] Written for non-technical stakeholders
 - [x] All mandatory sections completed
 ## Requirement Completeness
 - [x] No [NEEDS CLARIFICATION] markers remain
 - [x] Requirements are testable and unambiguous
 - [x] Success criteria are measurable
 - [x] Success criteria are technology-agnostic (no implementation details)
 - [x] All acceptance scenarios are defined
 - [x] Edge cases are identified
 - [x] Scope is clearly bounded
 - [x] Dependencies and assumptions identified
 ## Feature Readiness
 - [x] All functional requirements have clear acceptance criteria
 - [x] User scenarios cover primary flows
 - [x] Feature meets measurable outcomes defined in Success Criteria
 - [x] No implementation details leak into specification
 ## Notes
 - All items pass. Spec is complete and ready for planning.
 - FR-012 resolved: credentials are managed via environment variables / config file (no in-app user management UI).
@@ -0,0 +1,49 @@
 # API Contract: Auth
 **Base path**: `/api/v1/auth`  
 **Authentication**: HTTP Basic (all endpoints in this group require valid credentials)
 ---
 ## GET /api/v1/auth/check
 Verifies that the supplied HTTP Basic credentials are valid. Used by the frontend after a page refresh to confirm stored credentials are still accepted before rendering the app.
 ### Request
 ```
 GET /api/v1/auth/check
 Authorization: Basic <base64(username:password)>
 ```
 No request body.
 ### Response — 200 OK
 ```json
 {
  "username": "neurosurgeon"
 }
 ```
 | Field | Type | Description |
 |-------|------|-------------|
 | `username` | string | The authenticated username |
 ### Response — 401 Unauthorized
 Spring Security returns a standard 401 with `WWW-Authenticate: Basic realm="Realm"` header. No JSON body.
 ### Behaviour
 - Returns `200` with the authenticated username if credentials are valid.
 - Returns `401` if credentials are absent or incorrect.
 - No side effects (idempotent, read-only).
 ---
 ## Notes
 - All other existing endpoints (`/api/v1/books`, `/api/v1/chat`, etc.) continue to require HTTP Basic Auth as before.
 - The frontend sends `Authorization: Basic ...` on every request via the axios request interceptor.
 - A global axios response interceptor detects `401` responses and redirects the user to `/login`.
@@ -0,0 +1,35 @@
 # Data Model: Basic Login Protection
 **Feature**: 003-basic-login  
 **Date**: 2026-04-06
 ## No Backend Schema Changes
 This feature introduces no new database tables or Flyway migrations. The user account is defined entirely in the Spring Security in-memory configuration (`SecurityConfig.java`) backed by environment variables.
 ## Frontend: Auth Store State
 The Pinia `authStore` is the single source of truth for authentication state in the frontend.
 ```
 AuthState
 ├── username: string | null     — entered username, null if not logged in
 ├── password: string | null     — entered password, null if not logged in
 └── isAuthenticated: boolean    — derived: true when both username and password are non-null
 Actions
 ├── login(username, password)   — validates credentials via /api/v1/auth/check, stores in sessionStorage on success
 ├── logout()                    — clears username, password, sessionStorage; redirects to /login
 └── restoreSession()            — reads credentials from sessionStorage on app start; calls /api/v1/auth/check to verify still valid
 ```
 ## Backend: Application Properties
 Two properties configure the single allowed user account:
 | Property | Default | Source | Example |
 |----------|---------|--------|---------|
 | `app.auth.username` | `neurosurgeon` | `application.yaml` / env var `APP_AUTH_USERNAME` | `admin` |
 | `app.auth.password` | (required) | env var `APP_AUTH_PASSWORD` | `s3cret` |
 No hashing is applied in the current `SecurityConfig` (`{noop}` prefix). The spec (FR-011) requires passwords not to be stored in plaintext — this refers to the backend config/env var pattern, which is acceptable as env vars are not persisted in the codebase. If hashing is required later, the `{noop}` prefix can be replaced with `{bcrypt}` without other code changes.
@@ -0,0 +1,76 @@
 # Implementation Plan: Basic Login Protection
 **Branch**: `003-basic-login` | **Date**: 2026-04-06 | **Spec**: [spec.md](./spec.md)  
 **Input**: Feature specification from `/specs/003-basic-login/spec.md`
 ## Summary
 Add a login page to the Vue frontend so users must enter a username and password before accessing any route. The backend already has Spring Security with HTTP Basic Auth fully configured; credentials are validated on every API call. The implementation introduces a Pinia auth store that holds the entered credentials in `sessionStorage`, an axios interceptor that injects them on every request, a `/login` route with a login form, router guards that redirect unauthenticated users, and a logout button in the navbar. A lightweight `/api/v1/auth/check` endpoint is added to the backend to allow the frontend to verify credentials without side effects. Username is made configurable in the backend (currently hardcoded as "neurosurgeon").
 ## Technical Context
 **Language/Version**: Java 21 (backend) / TypeScript + Node 20 (frontend)  
 **Primary Dependencies**: Spring Boot 4.0.5, Spring Security (already included), Vue 3.4, Vue Router 4.3, Pinia 2.1, Axios 1.7  
 **Storage**: No new storage — credentials held in browser `sessionStorage` (frontend only)  
 **Testing**: Spring Boot Test (backend), Vitest (not yet set up — out of scope for this feature)  
 **Target Platform**: Web (SPA + REST API)  
 **Project Type**: Web application (backend API + Vue frontend client)  
 **Performance Goals**: Login response within 1 second under normal load  
 **Constraints**: No new backend dependencies; no database changes; must not break existing API surface  
 **Scale/Scope**: Small team (POC), single user role
 ## Constitution Check
 *GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
 | Principle | Status | Notes |
 |-----------|--------|-------|
 | I. KISS | PASS | HTTP Basic Auth is reused; no new auth protocol, no new dependencies. Frontend uses sessionStorage — no JWT, no refresh tokens. |
 | II. Easy to Change | PASS | Auth store is a single Pinia store; swapping the auth mechanism later only requires updating the store and the SecurityConfig. |
 | III. Web-First | PASS | Backend exposes REST endpoint; frontend is standalone SPA client. No server-side rendering added. |
 | IV. Documentation as Architecture | PASS | README must be updated to show the login flow in the architecture diagram (same PR). |
 | Technology Constraints | PASS | Still two deployable units (backend + frontend). No new service added. |
 ## Project Structure
 ### Documentation (this feature)
 ```text
 specs/003-basic-login/
 ├── plan.md              # This file
 ├── research.md          # Phase 0 output
 ├── data-model.md        # Phase 1 output
 ├── quickstart.md        # Phase 1 output
 ├── contracts/           # Phase 1 output
 │   └── auth.md
 └── tasks.md             # Phase 2 output (/speckit.tasks — NOT created by /speckit.plan)
 ```
 ### Source Code (repository root)
 ```text
 backend/
 ├── src/main/java/com/aiteacher/
 │   ├── config/
 │   │   └── SecurityConfig.java          # MODIFY: make username configurable
 │   └── auth/
 │       └── AuthController.java          # ADD: GET /api/v1/auth/check endpoint
 frontend/
 ├── src/
 │   ├── stores/
 │   │   └── authStore.ts                 # ADD: Pinia store for credentials + session
 │   ├── views/
 │   │   └── LoginView.vue                # ADD: login form UI
 │   ├── services/
 │   │   └── api.ts                       # MODIFY: read credentials from authStore
 │   ├── router/
 │   │   └── index.ts                     # MODIFY: add /login route + navigation guard
 │   └── App.vue                          # MODIFY: add logout button to navbar
 ```
 **Structure Decision**: Option 2 (web application). Existing `backend/` and `frontend/` layout used; no new projects or packages.
 ## Complexity Tracking
 > No constitution violations. Table left empty.
@@ -0,0 +1,198 @@
 # Quickstart: Basic Login Protection
 **Feature**: 003-basic-login  
 **Date**: 2026-04-06
 ## What Changes
 | Component | Change |
 |-----------|--------|
 | `SecurityConfig.java` | Username made configurable via `app.auth.username` property |
 | `AuthController.java` | New: `GET /api/v1/auth/check` endpoint |
 | `authStore.ts` | New: Pinia store managing credentials + sessionStorage |
 | `LoginView.vue` | New: login form page |
 | `api.ts` | Replace hardcoded Basic Auth with dynamic interceptor |
 | `router/index.ts` | Add `/login` route + `beforeEach` navigation guard |
 | `App.vue` | Add logout button to navbar |
 | `application.yaml` | Add `app.auth.username` property with default |
 ## Backend Setup
 ### 1. Add username to application.yaml
 ```yaml
 app:
  auth:
    username: ${APP_AUTH_USERNAME:neurosurgeon}
    password: ${APP_AUTH_PASSWORD}   # already present
 ```
 ### 2. Update SecurityConfig.java
 Inject both username and password:
 ```java
@Bean
 public UserDetailsService userDetailsService(
        @Value("${app.auth.username}") String username,
        @Value("${app.auth.password}") String password) {
    UserDetails user = User.builder()
        .username(username)
        .password("{noop}" + password)
        .roles("USER")
        .build();
    return new InMemoryUserDetailsManager(user);
 }
 ```
 ### 3. Add AuthController.java
 ```java
@RestController
@RequestMapping("/api/v1/auth")
 public class AuthController {
    @GetMapping("/check")
    public ResponseEntity<Map<String, String>> check(Principal principal) {
        return ResponseEntity.ok(Map.of("username", principal.getName()));
    }
 }
 ```
 ## Frontend Setup
 ### 1. Create authStore.ts
 ```typescript
 // src/stores/authStore.ts
 import { defineStore } from 'pinia'
 import { ref, computed } from 'vue'
 const SESSION_KEY = 'auth'
 export const useAuthStore = defineStore('auth', () => {
  const stored = sessionStorage.getItem(SESSION_KEY)
  const parsed = stored ? JSON.parse(stored) : null
  const username = ref<string | null>(parsed?.username ?? null)
  const password = ref<string | null>(parsed?.password ?? null)
  const isAuthenticated = computed(() => !!username.value && !!password.value)
  function setCredentials(u: string, p: string) {
    username.value = u
    password.value = p
    sessionStorage.setItem(SESSION_KEY, JSON.stringify({ username: u, password: p }))
  }
  function clearCredentials() {
    username.value = null
    password.value = null
    sessionStorage.removeItem(SESSION_KEY)
  }
  return { username, password, isAuthenticated, setCredentials, clearCredentials }
 })
 ```
 ### 2. Update api.ts
 Replace hardcoded `auth` with a request interceptor:
 ```typescript
 import axios from 'axios'
 import { useAuthStore } from '@/stores/authStore'
 export const api = axios.create({
  baseURL: import.meta.env.VITE_API_URL ?? '/api/v1',
  headers: { 'Content-Type': 'application/json' }
 })
 api.interceptors.request.use((config) => {
  const auth = useAuthStore()
  if (auth.username && auth.password) {
    config.auth = { username: auth.username, password: auth.password }
  }
  return config
 })
 api.interceptors.response.use(
  (response) => response,
  (error) => {
    if (error.response?.status === 401) {
      useAuthStore().clearCredentials()
      window.location.href = '/login'
    }
    const message = error.response?.data?.error ?? error.message ?? 'An unexpected error occurred.'
    return Promise.reject(new Error(message))
  }
 )
 ```
 ### 3. Update router/index.ts
 Add `/login` route and guard:
 ```typescript
 import LoginView from '@/views/LoginView.vue'
 import { useAuthStore } from '@/stores/authStore'
 // add to routes array:
 { path: '/login', name: 'login', component: LoginView }
 // add global guard:
 router.beforeEach((to) => {
  const auth = useAuthStore()
  if (to.name !== 'login' && !auth.isAuthenticated) {
    return { name: 'login' }
  }
 })
 ```
 ### 4. Create LoginView.vue
 A simple centered form with username and password fields. On submit:
 1. Store credentials tentatively in the auth store
 2. Call `GET /api/v1/auth/check`
 3. If 200 → navigate to `/`
 4. If 401 → clear credentials, show error message
 ### 5. Add logout to App.vue navbar
 ```html
 <button class="btn btn-secondary" @click="logout">Sign out</button>
 ```
 ```typescript
 import { useAuthStore } from '@/stores/authStore'
 import { useRouter } from 'vue-router'
 const auth = useAuthStore()
 const router = useRouter()
 function logout() {
  auth.clearCredentials()
  router.push({ name: 'login' })
 }
 ```
 ## Environment Variables
 ### Backend (.env / docker-compose environment)
 ```
 APP_AUTH_USERNAME=neurosurgeon   # optional, defaults to neurosurgeon
 APP_AUTH_PASSWORD=your-secret
 ```
 ### Frontend (.env)
 ```
 VITE_API_URL=/api/v1
 # VITE_APP_PASSWORD is no longer needed and should be removed
 ```
 ## Testing the Login Flow
 1. Open the app in an incognito window — should redirect to `/login`
 2. Enter wrong credentials → error message, stay on login
 3. Enter correct credentials → redirect to `/` (Library)
 4. Refresh the page → stay logged in
 5. Click "Sign out" → redirect to `/login`; back button shows login again (no cached page access)
@@ -0,0 +1,64 @@
 # Research: Basic Login Protection
 **Feature**: 003-basic-login  
 **Date**: 2026-04-06
 ## Finding 1: Backend Auth Mechanism — Already Implemented
 **Decision**: Keep existing HTTP Basic Auth (Spring Security, `SecurityConfig.java`).  
 **Rationale**: Spring Security with HTTP Basic is already configured and working. The backend validates credentials on every API request. There is nothing to add except making the username configurable and adding a credential-check endpoint.  
 **Alternatives considered**: Form-based login with server-side sessions — rejected because it adds session management complexity on the backend that is unnecessary for an SPA using HTTP Basic.
 ---
 ## Finding 2: Frontend Credential Storage — sessionStorage
 **Decision**: Store entered username and password in browser `sessionStorage` via a Pinia store.  
 **Rationale**: 
 - `sessionStorage` persists across page refreshes (same tab) but is cleared when the tab is closed — this matches the expected session behavior (SC-004) without needing a server-side session or JWT.
 - Simpler than `localStorage` (no explicit logout needed to clear on browser close).
 - No additional dependencies required.
 **Alternatives considered**:
 - `localStorage` — rejected: credentials would persist indefinitely across browser sessions, which is unexpected for a "login" flow.
 - In-memory (reactive ref only) — rejected: credentials lost on page refresh, violating SC-004.
 - Cookie-based session (server-side) — rejected: requires CSRF protection, session store, and more backend complexity; violates KISS.
 ---
 ## Finding 3: Credential Verification — Lightweight Backend Endpoint
 **Decision**: Add `GET /api/v1/auth/check` that returns `200 OK` with `{"username": "..."}` for authenticated requests.  
 **Rationale**: The frontend needs a way to verify that stored credentials are valid when the app loads (e.g., after a refresh). Without this, the first real API call would fail with a 401 and force a re-login on every refresh if credentials changed. This endpoint is protected by Spring Security like all others — no special logic needed.  
 **Alternatives considered**:
 - Re-use any existing GET endpoint (e.g., `GET /api/v1/books`) — rejected: couples auth verification to a business endpoint; semantically wrong and fragile.
 - Intercept 401s globally and redirect to login — used as a fallback but not sufficient alone: the user would see a flash of the main UI before being redirected.
 ---
 ## Finding 4: Axios Integration — Request Interceptor
 **Decision**: Replace the hardcoded `auth` field in `api.ts` with a dynamic request interceptor that reads credentials from the Pinia auth store at request time.  
 **Rationale**: The current `api.ts` sets `auth: { username, password }` once at module initialisation from env vars. This must change so the login form's entered credentials are used. A request interceptor reads the store on every call, enabling logout (clear store → next request gets no credentials → 401 → redirect to login).  
 **Alternatives considered**:
 - Recreate the axios instance after login — rejected: all existing services import the singleton `api`; recreating would require updating every import.
 ---
 ## Finding 5: Backend Username Configurability
 **Decision**: Read username from `${app.auth.username:neurosurgeon}` in `SecurityConfig.java` (with "neurosurgeon" as default).  
 **Rationale**: The spec (FR-012) requires credentials to be configurable. Currently the password is configurable via env var but the username is hardcoded. Adding a `@Value`-injected username field is a one-line change.  
 **Alternatives considered**: None — this is the Spring Boot idiomatic approach already used for the password.
 ---
 ## Summary of Unknowns Resolved
 | Unknown | Resolution |
 |---------|-----------|
 | Where to store credentials on the frontend | `sessionStorage` via Pinia |
 | How to verify credentials after page refresh | `GET /api/v1/auth/check` endpoint |
 | How to inject credentials into axios | Request interceptor in `api.ts` |
 | How to handle 401s globally | Response interceptor → redirect to `/login` |
 | Backend username configurability | `@Value("${app.auth.username:neurosurgeon}")` |
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Adrien	0c226483c0	fix deserialization error in native image	2026-04-18 20:46:16 +02:00
Adrien	ff97c24a55	Add thai support in summary	2026-04-18 19:55:19 +02:00
Adrien	c7a77af2f4	add new concept report	2026-04-18 17:54:54 +02:00
Adrien	5f03e1f41b	improve topics and chat source display	2026-04-12 18:56:18 +02:00
Adrien	c98fe9ceaa	update readme	2026-04-12 18:25:12 +02:00
Adrien	767d1e2dbc	enhance illustration being taken into account in the response	2026-04-12 16:26:25 +02:00
Adrien	820734c251	fix api url setup	2026-04-10 13:55:05 +02:00
Adrien	0711e40c66	Improved responsiveness on mobile phone	2026-04-10 13:41:26 +02:00
Adrien	0db31e91ab	try change image building to buildah	2026-04-09 22:44:14 +02:00
Adrien	d480d04145	change base image	2026-04-09 21:45:22 +02:00
Adrien	c2d034d1fe	Add missing env variables	2026-04-09 20:37:11 +02:00
Adrien	0908355704	Adpat frontend to build docker image with buildah	2026-04-09 19:47:28 +02:00
Adrien	8e227a9429	fine-tune native image config	2026-04-09 18:20:39 +02:00
Adrien	d8bcdce879	Squashed commit of the following: commit 0d624137c2557c6eeb87020749e4977b821c2b5c Author: Adrien <adrien.cesaro@proton.me> Date: Thu Apr 9 11:55:22 2026 +0200 backend native image setup	2026-04-09 12:05:02 +02:00
Adrien	aee6a9dfba	enhance rag retrieval + summary	2026-04-07 22:39:28 +02:00
Adrien	0cf318f0a7	Add simple auth	2026-04-06 14:29:53 +02:00
Adrien	e5d53b4e80	add possibility to disable delete and upload of books	2026-04-06 14:09:17 +02:00
Adrien	5c641f4bcc	enhance page parsing using json output and html	2026-04-05 21:55:30 +02:00
Adrien	ea1276dc2e	adding Marker to parse effectively pdf	2026-04-04 21:30:18 +02:00
Adrien	b154e29f2d	s3 bucket integration for image storage	2026-04-04 13:26:55 +02:00