Issue #4: Category-Aware Local RAG System using ASP.NET Core MVC, Ollama, and pgvector

8 min read | November 29, 2025

Recent improvements in local LLM infrastructure have enabled developers to build Retrieval-Augmented Generation (RAG) pipelines that run fully on-premise while maintaining strong separation between deterministic logic and generative reasoning. In this issue, we explore a category-aware RAG architecture implemented using ASP.NET Core MVC, PostgreSQL + pgvector, and Ollama. The system is designed for enterprise environments where privacy, traceability, and policy alignment are essential.

This implementation emphasizes three design principles:

Deterministic Retrieval through pgvector
Strict Context-Bound Reasoning through Ollama
Transparent Governance through category-aware metadata filtering and inspectable context

The result is a predictable, auditable RAG workflow suitable for HR, Legal, Engineering, and Operations knowledge domains.

System Overview

The system’s workflow consists of three core components:

Content Ingestion (Title + Category + Content)
Semantic Retrieval + Optional Category Filtering
Context-Bound Generation

Users add internal documentation through a web UI. The server generates embeddings through Ollama and stores all metadata and vectors in PostgreSQL.

User queries are embedded and compared using pgvector. Results are filtered by similarity and optionally by category.

The retrieved documents form a structured context block. Ollama generates a response only using that context.

If retrieval relevance is insufficient, the system returns:

“I don’t know. No relevant data found.”

The architecture maintains clean separation between deterministic components (database, embeddings, retrieval) and probabilistic components (LLM reasoning).

The screenshot below shows the application interface for adding content and querying the RAG system.

Architecture Diagram

The following diagram illustrates the flow of the Category-Aware Local RAG system, showing how content is ingested, retrieved, and used for context-bound generation.

ASP.NET MVC UI: User interface for adding content and querying.
Document Controller: Handles content ingestion and query processing.
Embedding Generator (Ollama): Generates embeddings for documents and queries.
PostgreSQL + pgvector: Stores documents, categories, and embeddings for semantic retrieval.
RAG Service: Builds context and orchestrates strict prompting.
Ollama (LLM): Performs context-bound generation.

Moreover, the overall architecture of the system is displayed below:

Embedding and Storage Layer

Embedding Generator Interface

The system defines an abstraction for embedding generation, enabling future model swaps without breaking the pipeline:

Ollama Embedding Generator

Embeddings are generated via a call to the local Ollama runtime:

This design keeps embedding generation local, deterministic, and private.

Ollama models such as mistral, nomic-embed-text, or llama3:instruct can be swapped in without architectural change.

Storage and Semantic Retrieval (PostgreSQL + pgvector)

Table Schema

A typical pgvector-enabled table looks like this:

Storing Documents

Semantic Query + Category Filtering

The method below takes a query, optionally filtered by category, generates its embedding, retrieves the top 5 most similar documents using pgvector similarity, and returns them or a default “no relevant context” result if none are found.

pgvector’s <-> operator ensures deterministic similarity ordering.

RAG Service: Strict Context-Bound Generation

Context Packaging and Strict Prompt

The RAG service enforces grounded LLM behavior:

This hard constraint ensures compliance and eliminates hallucination risk.

MVC Integration

Adding Content (Controller)

Asking a Question

Example Razor View for Query Interface

Application Initialization (Program.cs)

Architectural Insights

This architecture preserves modularity and clear separation of roles:

Deterministic Layer
- PostgreSQL
- pgvector similarity
- Category filtering
- Embedding storage
Generative Layer
- Ollama for embeddings
- Ollama for completion
- Strict prompt enforcing context-only reasoning
Application Layer
- MVC controllers
- Document management UI
- Query interface
- RAG service orchestration

This separation enables robust governance, auditability, and enterprise adoption.

Key Advantages

Fully Local & Private – Runs entirely on-premise, ideal for sensitive data.
Context-Only Reasoning – Answers are grounded in retrieved documents, eliminating hallucinations.
Category-Aware Retrieval – Filter by department or document type for precise results.
Auditable & Transparent – Returned context is visible alongside the answer for traceability.
Extensible Architecture – Easy to add new document types, preprocessing, or hybrid search methods.

Potential Enhancements

Future improvements could include:

PDF ingestion + automatic chunking
Sentence-level or embedding-level preprocessing
Keyword + vector hybrid retrieval
Multimodal inputs (images, diagrams)
Admin dashboard for document management
User permissions and role-based access control
Progressive summarization for large documents
Exportable API for downstream tools

Final Notes

This issue demonstrated a local, category-aware RAG system implemented using ASP.NET Core MVC, pgvector, and Ollama. By combining deterministic storage + retrieval with strict, context-bound LLM reasoning, the system provides accurate, traceable answers suitable for internal documentation environments.

Explore the source code at the GitHub repository.

See you in the next issue.

Stay curious.

Share this article with your network.

LinkedIn X Facebook

Join the Newsletter

Subscribe for exclusive insights, strategies, and updates from Elliot One. No spam, just value.

Your information is safe. Unsubscribe anytime.