
Modern engineering organizations accumulate large volumes of operational knowledge. Incident reports, troubleshooting notes, and runbooks grow over time, but during an outage the hardest problem remains finding the right document quickly. Keyword-based search breaks down when terminology differs, symptoms are described imprecisely, or stress compresses time. Manual browsing does not scale, and the cost of delayed understanding is measured in downtime.
In this issue, we explore a production-oriented semantic runbook search system implemented in C#. Instead of matching words, the system retrieves documents based on meaning using vector embeddings, deterministic similarity scoring, and explicit domain modeling. Built entirely locally on top of Ollama and Microsoft.Extensions.AI, the architecture cleanly separates AI integration from search logic and retrieval math, forming a minimal but powerful foundation for internal knowledge search, incident response tooling, and future RAG systems.
System Overview
The workflow consists of four core components:
- Knowledge Document Model representing runbooks and incidents
- Embedding Generator backed by Ollama
- Semantic Search Engine using cosine similarity
- Interactive Console Client for querying
At startup, internal engineering documents are embedded and indexed. When a user describes a problem in natural language, the query is embedded and compared against the indexed documents. The most semantically relevant runbooks are returned with similarity scores.
This architecture avoids agents, planners, and orchestration layers. It focuses on one thing only: reliable semantic retrieval.
Knowledge Model
The system begins with a simple, explicit domain model. Each runbook or incident is represented as a strongly typed object.
public sealed class KnowledgeDocument
{
public string Id { get; }
public string Title { get; }
public string Body { get; }
public KnowledgeDocument(string id, string title, string body)
{
Id = id;
Title = title;
Body = body;
}
public override string ToString() => Title;
}Key aspects of this design:
- Titles are used for presentation
- Bodies are used for semantic embedding
- The model is immutable and intention-revealing
This mirrors traditional software design. AI does not replace domain modeling. It depends on it.
Embedding Generation with Ollama
Semantic search depends on embeddings. Instead of using a cloud service, this project integrates Ollama directly through a custom embedding generator that implements Microsoft.Extensions.AI abstractions.
public sealed class OllamaEmbeddingGenerator : IEmbeddingGenerator
{
private readonly HttpClient _http = new();
private readonly Uri _baseUri;
private readonly string _model;
public OllamaEmbeddingGenerator(Uri baseUri, string model = "all-minilm")
{
_baseUri = baseUri;
_model = model;
}
public async Task<Embedding<float>> GenerateEmbeddingAsync(string input)
{
var payload = new { model = _model, prompt = input };
var response = await _http.PostAsync(
new Uri(_baseUri, "/api/embeddings"),
new StringContent(JsonSerializer.Serialize(payload),
Encoding.UTF8,
"application/json"));
response.EnsureSuccessStatusCode();
using var doc = JsonDocument.Parse(await response.Content.ReadAsStringAsync());
var vector = doc.RootElement
.GetProperty("embedding")
.EnumerateArray()
.Select(x => x.GetSingle())
.ToArray();
return new Embedding<float>(vector);
}
}Key design decisions:
- Runs fully locally using Ollama
- Uses all-minilm for fast, general-purpose embeddings
- Implements IEmbeddingGenerator for clean integration
- Avoids SDK lock-in by using raw HTTP
This keeps AI integration explicit, testable, and replaceable.
Semantic Search Engine Implementation
The semantic search engine is responsible for indexing documents and computing similarity scores. It is intentionally small and deterministic.
public sealed class SemanticSearchEngine
{
private readonly List<(KnowledgeDocument Doc, Embedding<float> Embedding)> _index;
public SemanticSearchEngine(
IEnumerable<KnowledgeDocument> documents,
OllamaEmbeddingGenerator embeddingGenerator)
{
_index = documents
.Select(d => (
d,
embeddingGenerator.GenerateEmbeddingAsync(d.Body).Result))
.ToList();
}
public IEnumerable<(KnowledgeDocument Doc, float Score)> Search(
Embedding<float> queryEmbedding,
int topK = 3)
{
return _index
.Select(entry => (
entry.Doc,
Score: TensorPrimitives.CosineSimilarity(
entry.Embedding.Vector.Span,
queryEmbedding.Vector.Span)))
.OrderByDescending(x => x.Score)
.Take(topK);
}
}Important characteristics:
- Embeddings are computed once at startup
- Search is pure and side-effect free
- Cosine similarity is computed using System.Numerics.Tensors
- Results are ranked deterministically
There is no hidden state and no AI decision-making here. Just math.
Interactive Query Flow
The console application ties the system together. It loads internal documents, builds the semantic index, and allows engineers to describe problems in natural language.
Console.WriteLine("Describe the problem you're investigating:");
var query = Console.ReadLine();
var queryEmbedding = await embeddingGenerator.GenerateEmbeddingAsync(query);
var results = searchEngine.Search(queryEmbedding);
foreach (var (doc, score) in results)
{
Console.WriteLine($"• {doc.Title} (score: {score:F3})");
}This interaction mirrors real operational behavior. Engineers describe symptoms, not document titles. Semantic search bridges that gap.
Architecture Diagram
The system can be understood as three distinct layers:
- Semantic Index Stores document embeddings and performs similarity scoring.
- Embedding Layer Converts text into vectors using Ollama.
- User Interface Accepts natural language input and displays ranked results.
This separation ensures:
- Modularity with clear responsibilities
- Deterministic retrieval behavior
- Traceability of results and scores
There is no generative reasoning involved, which makes the system easy to audit and trust.
The following flowchart illustrates the system architecture:
Key Advantages
- Fully Local and Private All embeddings and queries run locally with Ollama.
- Deterministic Retrieval Similarity scoring is explicit and reproducible.
- Clear Separation of Concerns Domain, AI integration, and search logic are isolated.
- Production-Friendly No cloud dependencies, no agents, no orchestration.
Potential Enhancements
This foundation can be extended incrementally:
- Persistent vector storage
- Incremental indexing for large document sets
- Hybrid keyword and semantic scoring
- RAG pipelines layered on top
- Chat-based exploration using retrieved documents
Crucially, none of these require changing the core search engine.
Final Notes
Semantic search is one of the highest-leverage AI capabilities for engineering teams, yet it does not require complex frameworks or autonomous agents. With clear abstractions, local models, and deterministic math, it can be implemented in a way that aligns naturally with traditional software engineering principles.
This project demonstrates how Microsoft.Extensions.AI and Ollama fit cleanly into a disciplined .NET architecture. Before systems can reason, plan, or act, they must retrieve the right knowledge reliably. Semantic runbook search is the correct place to start.
Explore the source code at the GitHub repository.
See you in the next issue.
Stay curious.
Join the Newsletter
Subscribe for AI engineering insights, system design strategies, and workflow tips.