Issue #11: Building a Local Semantic Runbook Search Engine with Ollama and Microsoft.Extensions.AI

6 min read | January 17, 2026

Modern engineering organizations accumulate large volumes of operational knowledge. Incident reports, troubleshooting notes, and runbooks grow over time, but during an outage the hardest problem remains finding the right document quickly. Keyword-based search breaks down when terminology differs, symptoms are described imprecisely, or stress compresses time. Manual browsing does not scale, and the cost of delayed understanding is measured in downtime.

In this issue, we explore a production-oriented semantic runbook search system implemented in C#. Instead of matching words, the system retrieves documents based on meaning using vector embeddings, deterministic similarity scoring, and explicit domain modeling. Built entirely locally on top of Ollama and Microsoft.Extensions.AI, the architecture cleanly separates AI integration from search logic and retrieval math, forming a minimal but powerful foundation for internal knowledge search, incident response tooling, and future RAG systems.

System Overview

The workflow consists of four core components:

Knowledge Document Model representing runbooks and incidents
Embedding Generator backed by Ollama
Semantic Search Engine using cosine similarity
Interactive Console Client for querying

At startup, internal engineering documents are embedded and indexed. When a user describes a problem in natural language, the query is embedded and compared against the indexed documents. The most semantically relevant runbooks are returned with similarity scores.

This architecture avoids agents, planners, and orchestration layers. It focuses on one thing only: reliable semantic retrieval.

Knowledge Model

The system begins with a simple, explicit domain model. Each runbook or incident is represented as a strongly typed object.

public sealed class KnowledgeDocument
{
  public string Id { get; }
  public string Title { get; }
  public string Body { get; }

  public KnowledgeDocument(string id, string title, string body)
  {
      Id = id;
      Title = title;
      Body = body;
  }

  public override string ToString() => Title;
}

Key aspects of this design:

Titles are used for presentation
Bodies are used for semantic embedding
The model is immutable and intention-revealing

This mirrors traditional software design. AI does not replace domain modeling. It depends on it.

Embedding Generation with Ollama

Semantic search depends on embeddings. Instead of using a cloud service, this project integrates Ollama directly through a custom embedding generator that implements Microsoft.Extensions.AI abstractions.

public sealed class OllamaEmbeddingGenerator : IEmbeddingGenerator
{
  private readonly HttpClient _http = new();
  private readonly Uri _baseUri;
  private readonly string _model;

  public OllamaEmbeddingGenerator(Uri baseUri, string model = "all-minilm")
  {
      _baseUri = baseUri;
      _model = model;
  }

  public async Task<Embedding<float>> GenerateEmbeddingAsync(string input)
  {
      var payload = new { model = _model, prompt = input };

      var response = await _http.PostAsync(
          new Uri(_baseUri, "/api/embeddings"),
          new StringContent(JsonSerializer.Serialize(payload),
          Encoding.UTF8,
          "application/json"));

      response.EnsureSuccessStatusCode();

      using var doc = JsonDocument.Parse(await response.Content.ReadAsStringAsync());

      var vector = doc.RootElement
          .GetProperty("embedding")
          .EnumerateArray()
          .Select(x => x.GetSingle())
          .ToArray();

      return new Embedding<float>(vector);
  }
}

Key design decisions:

Runs fully locally using Ollama
Uses all-minilm for fast, general-purpose embeddings
Implements IEmbeddingGenerator for clean integration
Avoids SDK lock-in by using raw HTTP

This keeps AI integration explicit, testable, and replaceable.

Semantic Search Engine Implementation

The semantic search engine is responsible for indexing documents and computing similarity scores. It is intentionally small and deterministic.

public sealed class SemanticSearchEngine
{
  private readonly List<(KnowledgeDocument Doc, Embedding<float> Embedding)> _index;

  public SemanticSearchEngine(
      IEnumerable<KnowledgeDocument> documents,
      OllamaEmbeddingGenerator embeddingGenerator)
  {
      _index = documents
          .Select(d => (
              d,
              embeddingGenerator.GenerateEmbeddingAsync(d.Body).Result))
          .ToList();
  }

  public IEnumerable<(KnowledgeDocument Doc, float Score)> Search(
      Embedding<float> queryEmbedding,
      int topK = 3)
  {
      return _index
          .Select(entry => (
              entry.Doc,
              Score: TensorPrimitives.CosineSimilarity(
                  entry.Embedding.Vector.Span,
                  queryEmbedding.Vector.Span)))
          .OrderByDescending(x => x.Score)
          .Take(topK);
  }
}

Important characteristics:

Embeddings are computed once at startup
Search is pure and side-effect free
Cosine similarity is computed using System.Numerics.Tensors
Results are ranked deterministically

There is no hidden state and no AI decision-making here. Just math.

Interactive Query Flow

The console application ties the system together. It loads internal documents, builds the semantic index, and allows engineers to describe problems in natural language.

Console.WriteLine("Describe the problem you're investigating:");
var query = Console.ReadLine();

var queryEmbedding = await embeddingGenerator.GenerateEmbeddingAsync(query);
var results = searchEngine.Search(queryEmbedding);

foreach (var (doc, score) in results)
{
  Console.WriteLine($"• {doc.Title} (score: {score:F3})");
}

This interaction mirrors real operational behavior. Engineers describe symptoms, not document titles. Semantic search bridges that gap.

Architecture Diagram

The system can be understood as three distinct layers:

Semantic Index Stores document embeddings and performs similarity scoring.
Embedding Layer Converts text into vectors using Ollama.
User Interface Accepts natural language input and displays ranked results.

This separation ensures:

Modularity with clear responsibilities
Deterministic retrieval behavior
Traceability of results and scores

There is no generative reasoning involved, which makes the system easy to audit and trust.

The following flowchart illustrates the system architecture:

Key Advantages

Fully Local and Private All embeddings and queries run locally with Ollama.
Deterministic Retrieval Similarity scoring is explicit and reproducible.
Clear Separation of Concerns Domain, AI integration, and search logic are isolated.
Production-Friendly No cloud dependencies, no agents, no orchestration.

Potential Enhancements

This foundation can be extended incrementally:

Persistent vector storage
Incremental indexing for large document sets
Hybrid keyword and semantic scoring
RAG pipelines layered on top
Chat-based exploration using retrieved documents

Crucially, none of these require changing the core search engine.

Final Notes

Semantic search is one of the highest-leverage AI capabilities for engineering teams, yet it does not require complex frameworks or autonomous agents. With clear abstractions, local models, and deterministic math, it can be implemented in a way that aligns naturally with traditional software engineering principles.

This project demonstrates how Microsoft.Extensions.AI and Ollama fit cleanly into a disciplined .NET architecture. Before systems can reason, plan, or act, they must retrieve the right knowledge reliably. Semantic runbook search is the correct place to start.

Explore the source code at the GitHub repository.

See you in the next issue.

Stay curious.

Share this article with your network.

LinkedIn X Facebook

Join the Newsletter

Subscribe for AI engineering insights, system design strategies, and workflow tips.

Your information is safe. Unsubscribe anytime.