add new concept report
This commit is contained in:
@@ -35,11 +35,13 @@ graph TD
|
||||
EP3["Vision describe → embed caption"]
|
||||
EP4["Chunk text → embed chunks"]
|
||||
EP5["Link chunks ↔ figures"]
|
||||
EP6["LLM enrich chunk\n(entities, facet, summary)\n→ chunk_metadata"]
|
||||
EP1 --> EP2
|
||||
EP1 --> EP4
|
||||
EP2 --> EP3
|
||||
EP4 --> EP5
|
||||
EP3 --> EP5
|
||||
EP4 --> EP6
|
||||
end
|
||||
|
||||
subgraph "Retrieval Pipeline (per chat query)"
|
||||
@@ -65,6 +67,50 @@ graph TD
|
||||
end
|
||||
```
|
||||
|
||||
### Concept Retrieval Pipeline (per concept report)
|
||||
|
||||
Concept retrieval is an alternative to the semantic-similarity flow above. It uses the
|
||||
LLM-tagged `chunk_metadata` rows written at indexing time to exhaustively gather every
|
||||
chunk that *concerns* a concept (e.g. "aneurysm"), bucketed by facet. One synthesis call
|
||||
per facet yields a structured, multi-section report.
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant FE as Frontend
|
||||
participant BE as Backend (ConceptReportService)
|
||||
participant Retr as ConceptRetriever
|
||||
participant DB as chunk_metadata (GIN)
|
||||
participant Vec as vector_store
|
||||
participant LLM
|
||||
|
||||
User->>FE: Click "Generate Concept Report" on topic
|
||||
FE->>BE: POST /api/v1/topics/{id}/concept-reports
|
||||
loop per READY book
|
||||
BE->>Retr: retrieveByConcept(topicName, bookId)
|
||||
Retr->>DB: WHERE entities @> [canonical]
|
||||
alt SQL hits found
|
||||
DB-->>Retr: chunks grouped by facet
|
||||
else no match (typo / synonym)
|
||||
Retr->>Vec: similaritySearch topK=30
|
||||
Vec-->>Retr: chunk ids
|
||||
Retr->>DB: findByChunkIdIn → group by facet
|
||||
end
|
||||
end
|
||||
BE->>BE: merge facets across books, assign global [S#]/[F#]
|
||||
loop per non-empty facet
|
||||
BE->>LLM: synthesize facet section (focused prompt)
|
||||
LLM-->>BE: facet markdown
|
||||
end
|
||||
BE->>BE: persist concept_report
|
||||
BE-->>FE: { facets[], sources[] }
|
||||
FE->>User: render facet-labelled report + inline figures
|
||||
```
|
||||
|
||||
Backfill path for already-embedded books:
|
||||
`POST /api/v1/admin/books/{id}/enrich` scans `vector_store` for TEXT chunks missing
|
||||
`chunk_metadata` rows and enriches them in place. Idempotent — re-running is a no-op.
|
||||
|
||||
## Marker API Response Structure
|
||||
|
||||
The PDF parsing pipeline calls a local [Marker](https://github.com/VikParuchuri/marker) server (`POST /marker/upload`).
|
||||
|
||||
Reference in New Issue
Block a user