OntoRAG is a practical approach to Retrieval-Augmented Generation where the ontology becomes the contract: a governed model of your domain, linked to evidence, and callable by agents through tools.
The goal is simple: answers that are traceable, consistent, and operational — not just plausible.
RAG is great at “find text and paraphrase it”, but it struggles when the domain has structure, constraints, and operational semantics.
Treat your ontology as the center of gravity: a shared, explicit model that both humans and agents can rely on.
An LLM proposes classes, attributes, and relations — then aligns them to what already exists. The output is reviewable and diff-friendly.
Entities and relations are extracted from documents with “where it came from” metadata: page, paragraph, offsets, snippets.
Agents translate questions into semantic plans and execute them via SPARQL-backed tools. Writes become proposals unless explicitly confirmed.
OntoRAG uses MCP to let agents call a semantic API. The twist: the API can be generated from the ontology itself (entity descriptors, relation descriptors, command descriptors).
This is intentionally minimal: a thin layer between agent intent and a governed knowledge graph.
Not (yet). Think of it as a reference architecture and an open toolkit that can evolve into product(s). The focus is on reproducible patterns: schema cards, provenance, semantic diffs, and tool-driven agents.
No. For small/medium setups, two Turtle files (ontology + instances) can be enough. As volume grows, you can switch to a SPARQL backend (Blazegraph, QLever, etc.) without changing the mental model.
It’s adjacent. OntoRAG is about making the ontology explicit and governed, and using it to generate reliable tool calls. Graph retrieval is part of it, but the emphasis is on semantic contracts and stewardship.
Alignment quality. If extraction drifts or mappings are sloppy, the graph becomes noisy. OntoRAG assumes governance: review loops, confidence thresholds, and “propose, don’t overwrite”.