Elliot O
Issue #4: Category-Aware Local RAG System using ASP.NET Core MVC, Ollama, and pgvector
8 min read  |  November 29, 2025
Issue #4: Category-Aware Local RAG System using ASP.NET Core MVC, Ollama, and pgvector

Recent improvements in local LLM infrastructure have enabled developers to build Retrieval-Augmented Generation (RAG) pipelines that run fully on-premise while maintaining strong separation between deterministic logic and generative reasoning. In this issue, we explore a category-aware RAG architecture implemented using ASP.NET Core MVC, PostgreSQL + pgvector, and Ollama. The system is designed for enterprise environments where privacy, traceability, and policy alignment are essential.

This implementation emphasizes three design principles:

  • Deterministic Retrieval through pgvector
  • Strict Context-Bound Reasoning through Ollama
  • Transparent Governance through category-aware metadata filtering and inspectable context

The result is a predictable, auditable RAG workflow suitable for HR, Legal, Engineering, and Operations knowledge domains.


System Overview

The system’s workflow consists of three core components:

  • Content Ingestion (Title + Category + Content)
  • Semantic Retrieval + Optional Category Filtering
  • Context-Bound Generation

Users add internal documentation through a web UI. The server generates embeddings through Ollama and stores all metadata and vectors in PostgreSQL.

User queries are embedded and compared using pgvector. Results are filtered by similarity and optionally by category.

The retrieved documents form a structured context block. Ollama generates a response only using that context.

If retrieval relevance is insufficient, the system returns:

“I don’t know. No relevant data found.”

The architecture maintains clean separation between deterministic components (database, embeddings, retrieval) and probabilistic components (LLM reasoning).

The screenshot below shows the application interface for adding content and querying the RAG system.

Architecture Diagram

The following diagram illustrates the flow of the Category-Aware Local RAG system, showing how content is ingested, retrieved, and used for context-bound generation.

  • ASP.NET MVC UI: User interface for adding content and querying.
  • Document Controller: Handles content ingestion and query processing.
  • Embedding Generator (Ollama): Generates embeddings for documents and queries.
  • PostgreSQL + pgvector: Stores documents, categories, and embeddings for semantic retrieval.
  • RAG Service: Builds context and orchestrates strict prompting.
  • Ollama (LLM): Performs context-bound generation.

Moreover, the overall architecture of the system is displayed below:

Embedding and Storage Layer

Embedding Generator Interface

The system defines an abstraction for embedding generation, enabling future model swaps without breaking the pipeline:

Ollama Embedding Generator

Embeddings are generated via a call to the local Ollama runtime:

This design keeps embedding generation local, deterministic, and private.

Ollama models such as mistral, nomic-embed-text, or llama3:instruct can be swapped in without architectural change.

Storage and Semantic Retrieval (PostgreSQL + pgvector)

Table Schema

A typical pgvector-enabled table looks like this:

Storing Documents

Semantic Query + Category Filtering

The method below takes a query, optionally filtered by category, generates its embedding, retrieves the top 5 most similar documents using pgvector similarity, and returns them or a default “no relevant context” result if none are found.

pgvector’s <-> operator ensures deterministic similarity ordering.

RAG Service: Strict Context-Bound Generation

Context Packaging and Strict Prompt

The RAG service enforces grounded LLM behavior:

This hard constraint ensures compliance and eliminates hallucination risk.

MVC Integration

Adding Content (Controller)

Asking a Question

Example Razor View for Query Interface

Application Initialization (Program.cs)

Architectural Insights

This architecture preserves modularity and clear separation of roles:

  • Deterministic Layer
    • PostgreSQL
    • pgvector similarity
    • Category filtering
    • Embedding storage
  • Generative Layer
    • Ollama for embeddings
    • Ollama for completion
    • Strict prompt enforcing context-only reasoning
  • Application Layer
    • MVC controllers
    • Document management UI
    • Query interface
    • RAG service orchestration

This separation enables robust governance, auditability, and enterprise adoption.

Key Advantages

  • Fully Local & Private – Runs entirely on-premise, ideal for sensitive data.
  • Context-Only Reasoning – Answers are grounded in retrieved documents, eliminating hallucinations.
  • Category-Aware Retrieval – Filter by department or document type for precise results.
  • Auditable & Transparent – Returned context is visible alongside the answer for traceability.
  • Extensible Architecture – Easy to add new document types, preprocessing, or hybrid search methods.

Potential Enhancements

Future improvements could include:

  • PDF ingestion + automatic chunking
  • Sentence-level or embedding-level preprocessing
  • Keyword + vector hybrid retrieval
  • Multimodal inputs (images, diagrams)
  • Admin dashboard for document management
  • User permissions and role-based access control
  • Progressive summarization for large documents
  • Exportable API for downstream tools

Final Notes

This issue demonstrated a local, category-aware RAG system implemented using ASP.NET Core MVC, pgvector, and Ollama. By combining deterministic storage + retrieval with strict, context-bound LLM reasoning, the system provides accurate, traceable answers suitable for internal documentation environments.

Explore the source code at the GitHub repository.

See you in the next issue.

Stay curious.

Share this article with your network.

Join the Newsletter

Subscribe for exclusive insights, strategies, and updates from Elliot One. No spam, just value.

Your information is safe. Unsubscribe anytime.