
Recent improvements in local LLM infrastructure have enabled developers to build Retrieval-Augmented Generation (RAG) pipelines that run fully on-premise while maintaining strong separation between deterministic logic and generative reasoning. In this issue, we explore a category-aware RAG architecture implemented using ASP.NET Core MVC, PostgreSQL + pgvector, and Ollama. The system is designed for enterprise environments where privacy, traceability, and policy alignment are essential.
This implementation emphasizes three design principles:
- Deterministic Retrieval through pgvector
- Strict Context-Bound Reasoning through Ollama
- Transparent Governance through category-aware metadata filtering and inspectable context
The result is a predictable, auditable RAG workflow suitable for HR, Legal, Engineering, and Operations knowledge domains.
System Overview
The system’s workflow consists of three core components:
- Content Ingestion (Title + Category + Content)
- Semantic Retrieval + Optional Category Filtering
- Context-Bound Generation
Users add internal documentation through a web UI. The server generates embeddings through Ollama and stores all metadata and vectors in PostgreSQL.
User queries are embedded and compared using pgvector. Results are filtered by similarity and optionally by category.
The retrieved documents form a structured context block. Ollama generates a response only using that context.
If retrieval relevance is insufficient, the system returns:
“I don’t know. No relevant data found.”
The architecture maintains clean separation between deterministic components (database, embeddings, retrieval) and probabilistic components (LLM reasoning).
The screenshot below shows the application interface for adding content and querying the RAG system.
Architecture Diagram
The following diagram illustrates the flow of the Category-Aware Local RAG system, showing how content is ingested, retrieved, and used for context-bound generation.
- ASP.NET MVC UI: User interface for adding content and querying.
- Document Controller: Handles content ingestion and query processing.
- Embedding Generator (Ollama): Generates embeddings for documents and queries.
- PostgreSQL + pgvector: Stores documents, categories, and embeddings for semantic retrieval.
- RAG Service: Builds context and orchestrates strict prompting.
- Ollama (LLM): Performs context-bound generation.
Moreover, the overall architecture of the system is displayed below:
Embedding and Storage Layer
Embedding Generator Interface
The system defines an abstraction for embedding generation, enabling future model swaps without breaking the pipeline:
Ollama Embedding Generator
Embeddings are generated via a call to the local Ollama runtime:
This design keeps embedding generation local, deterministic, and private.
Ollama models such as mistral, nomic-embed-text, or llama3:instruct can be swapped in without architectural change.
Storage and Semantic Retrieval (PostgreSQL + pgvector)
Table Schema
A typical pgvector-enabled table looks like this:
Storing Documents
Semantic Query + Category Filtering
The method below takes a query, optionally filtered by category, generates its embedding, retrieves the top 5 most similar documents using pgvector similarity, and returns them or a default “no relevant context” result if none are found.
pgvector’s <-> operator ensures deterministic similarity ordering.
RAG Service: Strict Context-Bound Generation
Context Packaging and Strict Prompt
The RAG service enforces grounded LLM behavior:
This hard constraint ensures compliance and eliminates hallucination risk.
MVC Integration
Adding Content (Controller)
Asking a Question
Example Razor View for Query Interface
Application Initialization (Program.cs)
Architectural Insights
This architecture preserves modularity and clear separation of roles:
- Deterministic Layer
- PostgreSQL
- pgvector similarity
- Category filtering
- Embedding storage
- Generative Layer
- Ollama for embeddings
- Ollama for completion
- Strict prompt enforcing context-only reasoning
- Application Layer
- MVC controllers
- Document management UI
- Query interface
- RAG service orchestration
This separation enables robust governance, auditability, and enterprise adoption.
Key Advantages
- Fully Local & Private – Runs entirely on-premise, ideal for sensitive data.
- Context-Only Reasoning – Answers are grounded in retrieved documents, eliminating hallucinations.
- Category-Aware Retrieval – Filter by department or document type for precise results.
- Auditable & Transparent – Returned context is visible alongside the answer for traceability.
- Extensible Architecture – Easy to add new document types, preprocessing, or hybrid search methods.
Potential Enhancements
Future improvements could include:
- PDF ingestion + automatic chunking
- Sentence-level or embedding-level preprocessing
- Keyword + vector hybrid retrieval
- Multimodal inputs (images, diagrams)
- Admin dashboard for document management
- User permissions and role-based access control
- Progressive summarization for large documents
- Exportable API for downstream tools
Final Notes
This issue demonstrated a local, category-aware RAG system implemented using ASP.NET Core MVC, pgvector, and Ollama. By combining deterministic storage + retrieval with strict, context-bound LLM reasoning, the system provides accurate, traceable answers suitable for internal documentation environments.
Explore the source code at the GitHub repository.
See you in the next issue.
Stay curious.
Join the Newsletter
Subscribe for exclusive insights, strategies, and updates from Elliot One. No spam, just value.