Issue #17: Deterministic Semantic Retrieval with Embeddings and Vector Search

8 min read | February 28, 2026

Most AI discussions still overfocus on generation.

In production systems, the higher ROI pattern is often retrieval: mapping natural language to the right evidence quickly, consistently, and with inspectable scoring.

That is where embeddings and vector search matter. Not as "agent magic," but as a practical semantic layer you can control.

In this issue, we focus on the AI core: embedding representation, deterministic ranking, thresholded control, and clustering for signal extraction. Customer feedback triage is just the running example.

What You Are Building

A local-first .NET console app that demonstrates a reusable semantic retrieval pipeline:

Load retrieval config with environment overrides
Embed domain text into vector space using Ollama
Run cosine similarity search with deterministic ranking
Apply thresholds to block weak semantic matches
Group related items into themes via centroid clustering
Optionally run the same retrieval against Postgres pgvector
Triage an incoming ticket by retrieving the most similar historical items

The example domain is customer feedback triage. The architecture applies to runbooks, incidents, support routing, and knowledge retrieval.

System Overview

The system is a deterministic shell around one probabilistic component.

Data selection and filtering (deterministic)
Embedding generation (probabilistic)
Similarity ranking and thresholds (deterministic)
Theme clustering and reporting (deterministic)

This separation is the key production idea. Embeddings introduce semantic capability. Deterministic code owns behavior.

The diagram below shows the controlled flow from runtime configuration and domain data, through embeddings as the semantic layer, and into deterministic retrieval, thresholding, and clustering (with an optional pgvector scale-out path).

Runtime Configurations

A retrieval system becomes reliable when it is tunable. These parameters control quality and behavior without code changes:

WindowDays for recency and relevance
TopK for candidate breadth
MinSearchScore for semantic confidence cutoff
ClusterThreshold for theme strictness
EnablePostgresVectorSearch for persistence and scale

In the reference implementation, the app also supports:

QueryText for the search demo query
IncomingTicketText for the triage run (retrieve similar prior tickets)

public sealed class TriageAppConfig
{
  public bool EnablePostgresVectorSearch { get; init; } = false;
  public string PostgresConnectionString { get; init; } = string.Empty;
  public string OllamaBaseUrl { get; init; } = "http://localhost:11434";
  public string EmbeddingModel { get; init; } = "nomic-embed-text";
  public int WindowDays { get; init; } = 7;
  public int TopK { get; init; } = 8;
  public int TopThemes { get; init; } = 5;
  public float ClusterThreshold { get; init; } = 0.75f;
  public float MinSearchScore { get; init; } = 0.50f;
  public string QueryText { get; init; } = "users are being signed out all the time";
  public string IncomingTicketText { get; init; } =
      "Every few hours our team is forced to login again and unsaved edits are lost.";
}

Do not commit secrets. Use TRIAGE_POSTGRES_CONNECTION_STRING for Postgres credentials.

Step 1: Model domain text with operational metadata

The text is what you embed. Metadata is what makes output actionable.

public sealed class FeedbackItem
{
  public string Id { get; }
  public DateTimeOffset Timestamp { get; }
  public string Source { get; }
  public string UserSegment { get; }
  public string Text { get; }

  public FeedbackItem(string id, DateTimeOffset timestamp, string source, string userSegment, string text)
  {
      Id = id;
      Timestamp = timestamp;
      Source = source;
      UserSegment = userSegment;
      Text = text;
  }
}

Even outside feedback, this pattern holds: embed content, preserve metadata, rank by vector similarity, then report with business context.

Step 2: Treat embeddings as a pluggable representation layer

Hide the provider behind an interface. This keeps the retrieval pipeline stable if the embedding model changes.

public interface IEmbeddingClient
{
  Task<float[]> EmbedAsync(string text, CancellationToken cancellationToken = default);
}

Local-first Ollama implementation:

public sealed class OllamaEmbeddingClient : IEmbeddingClient
{
  private readonly HttpClient _httpClient;
  private readonly string _model;

  public OllamaEmbeddingClient(HttpClient httpClient, string model)
  {
      _httpClient = httpClient;
      _model = model;
  }

  public async Task<float[]> EmbedAsync(string text, CancellationToken cancellationToken = default)
  {
      var payload = new OllamaEmbeddingsRequest(_model, text);

      using var response = await _httpClient.PostAsJsonAsync("/api/embeddings", payload, cancellationToken);
      if (!response.IsSuccessStatusCode)
          throw new InvalidOperationException("Embedding request failed. Ensure Ollama is running and the model is pulled.");

      var result = await response.Content.ReadFromJsonAsync<OllamaEmbeddingsResponse>(cancellationToken);
      if (result?.Embedding is null || result.Embedding.Count == 0)
          throw new InvalidOperationException("Ollama embedding response did not contain an embedding vector.");

      return result.Embedding.ToArray();
  }

  private sealed record OllamaEmbeddingsRequest(string Model, string Prompt);
  private sealed class OllamaEmbeddingsResponse { public List<float> Embedding { get; init; } = new(); }
}

Embeddings are not answers. They are coordinates in semantic space.

Step 3: Retrieval is deterministic ranking over vector math

Given vectors, retrieval is deterministic: compute similarity, filter by threshold, sort, and take top K.

Two details make the results stable in practice:

Deterministic ordering of the input set (time window plus stable sort)
Deterministic tie-breaking after similarity sorting (for example, by Id)

public sealed class FeedbackVectorIndex
{
  private readonly IReadOnlyList<EmbeddedFeedbackRow> _rows;

  public FeedbackVectorIndex(IReadOnlyList<EmbeddedFeedbackRow> rows) => _rows = rows;

  public IEnumerable<(FeedbackItem Item, float Score)> Search(float[] queryVector, int topK, float minScore = 0f)
  {
      return _rows
          .Select(row => (row.Item, Score: VectorMath.CosineSimilarity(row.Vector, queryVector)))
          .Where(hit => hit.Score >= minScore)
          .OrderByDescending(hit => hit.Score)
          .ThenBy(hit => hit.Item.Id, StringComparer.Ordinal)
          .Take(topK);
  }
}

Probabilistic encoding, deterministic retrieval. That combination makes systems inspectable.

Step 4: Cluster for signal extraction, not model-generated summaries

Vector retrieval answers one query. Clustering reveals recurring structure in the dataset.

A practical incremental centroid strategy:

Compare each vector against existing centroids
Join the best cluster if score clears threshold
Otherwise create a new cluster
Keep representative examples nearest to centroid

This is transparent, cheap, and good enough for many production workflows before advanced clustering is needed.

Step 5: Calibrate confidence with thresholds and evaluation

A retrieval system without calibration drifts into plausible noise. Thresholding and measurement keep it trustworthy.

MinSearchScore blocks weak neighbors
TopK trades precision vs recall
WindowDays controls temporal relevance

Start simple: evaluate Recall@K on a small labeled set.

public static double RecallAtK(
  IReadOnlyCollection<string> relevantIds,
  IReadOnlyList<string> rankedIds,
  int k)
{
  var top = rankedIds.Take(k).ToHashSet(StringComparer.Ordinal);
  var hits = relevantIds.Count(id => top.Contains(id));
  return relevantIds.Count == 0 ? 0 : (double)hits / relevantIds.Count;
}

If you cannot measure retrieval quality, you cannot improve it safely.

Optional: Scaling with Postgres pgvector

In-memory indexing is excellent for small sets and local development. Use Postgres pgvector when you need persistence, larger corpora, and indexed performance.

The core SQL shape remains deterministic:

SELECT
id,
occurred_at,
source,
user_segment,
text_content,
1 - (embedding <=> CAST(@queryEmbedding AS vector)) AS score
FROM feedback_vectors
WHERE occurred_at >= @fromUtc
ORDER BY embedding <=> CAST(@queryEmbedding AS vector)
LIMIT @topK;

Keep deterministic filters in SQL, then apply the same minimum similarity gate in application code.

Production note: for real scale, store embeddings with a fixed vector dimension and add a pgvector ANN index. Otherwise results remain correct, but performance will not.

Example Domain: Customer Feedback Triage

Feedback is a good demonstration domain because language is noisy and inconsistent.

Query example: "users are being signed out all the time"

Expected behavior:

Retrieve semantically similar tickets despite wording differences
Rank by cosine similarity
Attach metadata (source, segment, timestamp) for triage
Group into themes for prioritization
For a new incoming ticket, retrieve the closest historical items as a starting point for routing

The same pipeline works for incidents, docs, support responses, and operations knowledge.

Why This Architecture Works

This pattern survives model and domain changes because it separates concerns cleanly:

Embeddings provide semantic representation
Deterministic ranking provides stable retrieval behavior
Thresholds provide explicit confidence control
Metadata keeps output operational
Optional pgvector adds scale without changing core logic

The model remains useful, but it is not the control plane.

Potential Enhancements

You can extend this foundation without changing the control model:

Add hybrid retrieval: keyword pre-filter plus vector ranking
Persist embeddings and only embed deltas
Add metadata filters before ranking
Track Recall@K and MRR for ongoing quality
Introduce advanced clustering (HDBSCAN) when scale requires it

Final Notes

Embeddings and vector search are not a niche feature. They are a core AI systems primitive for semantic retrieval.

This architecture does not depend on generation quality. It depends on measurable retrieval behavior.

That is what makes it production-friendly.

Explore the source code at the GitHub repository.

See you in the next issue.

Stay curious.

Share this article with your network.

LinkedIn X Facebook

Join the Newsletter

Subscribe for AI engineering insights, system design strategies, and workflow tips.

Your information is safe. Unsubscribe anytime.