
Most AI discussions still overfocus on generation.
In production systems, the higher ROI pattern is often retrieval: mapping natural language to the right evidence quickly, consistently, and with inspectable scoring.
That is where embeddings and vector search matter. Not as "agent magic," but as a practical semantic layer you can control.
In this issue, we focus on the AI core: embedding representation, deterministic ranking, thresholded control, and clustering for signal extraction. Customer feedback triage is just the running example.
What You Are Building
A local-first .NET console app that demonstrates a reusable semantic retrieval pipeline:
- Load retrieval config with environment overrides
- Embed domain text into vector space using Ollama
- Run cosine similarity search with deterministic ranking
- Apply thresholds to block weak semantic matches
- Group related items into themes via centroid clustering
- Optionally run the same retrieval against Postgres
pgvector - Triage an incoming ticket by retrieving the most similar historical items
The example domain is customer feedback triage. The architecture applies to runbooks, incidents, support routing, and knowledge retrieval.
System Overview
The system is a deterministic shell around one probabilistic component.
- Data selection and filtering (deterministic)
- Embedding generation (probabilistic)
- Similarity ranking and thresholds (deterministic)
- Theme clustering and reporting (deterministic)
This separation is the key production idea. Embeddings introduce semantic capability. Deterministic code owns behavior.
The diagram below shows the controlled flow from runtime configuration and domain data, through embeddings as the semantic layer, and into deterministic retrieval, thresholding, and clustering (with an optional pgvector scale-out path).
Runtime Configurations
A retrieval system becomes reliable when it is tunable. These parameters control quality and behavior without code changes:
WindowDaysfor recency and relevanceTopKfor candidate breadthMinSearchScorefor semantic confidence cutoffClusterThresholdfor theme strictnessEnablePostgresVectorSearchfor persistence and scale
In the reference implementation, the app also supports:
QueryTextfor the search demo queryIncomingTicketTextfor the triage run (retrieve similar prior tickets)
public sealed class TriageAppConfig
{
public bool EnablePostgresVectorSearch { get; init; } = false;
public string PostgresConnectionString { get; init; } = string.Empty;
public string OllamaBaseUrl { get; init; } = "http://localhost:11434";
public string EmbeddingModel { get; init; } = "nomic-embed-text";
public int WindowDays { get; init; } = 7;
public int TopK { get; init; } = 8;
public int TopThemes { get; init; } = 5;
public float ClusterThreshold { get; init; } = 0.75f;
public float MinSearchScore { get; init; } = 0.50f;
public string QueryText { get; init; } = "users are being signed out all the time";
public string IncomingTicketText { get; init; } =
"Every few hours our team is forced to login again and unsaved edits are lost.";
}Do not commit secrets. Use TRIAGE_POSTGRES_CONNECTION_STRING for Postgres credentials.
Step 1: Model domain text with operational metadata
The text is what you embed. Metadata is what makes output actionable.
public sealed class FeedbackItem
{
public string Id { get; }
public DateTimeOffset Timestamp { get; }
public string Source { get; }
public string UserSegment { get; }
public string Text { get; }
public FeedbackItem(string id, DateTimeOffset timestamp, string source, string userSegment, string text)
{
Id = id;
Timestamp = timestamp;
Source = source;
UserSegment = userSegment;
Text = text;
}
}Even outside feedback, this pattern holds: embed content, preserve metadata, rank by vector similarity, then report with business context.
Step 2: Treat embeddings as a pluggable representation layer
Hide the provider behind an interface. This keeps the retrieval pipeline stable if the embedding model changes.
public interface IEmbeddingClient
{
Task<float[]> EmbedAsync(string text, CancellationToken cancellationToken = default);
}Local-first Ollama implementation:
public sealed class OllamaEmbeddingClient : IEmbeddingClient
{
private readonly HttpClient _httpClient;
private readonly string _model;
public OllamaEmbeddingClient(HttpClient httpClient, string model)
{
_httpClient = httpClient;
_model = model;
}
public async Task<float[]> EmbedAsync(string text, CancellationToken cancellationToken = default)
{
var payload = new OllamaEmbeddingsRequest(_model, text);
using var response = await _httpClient.PostAsJsonAsync("/api/embeddings", payload, cancellationToken);
if (!response.IsSuccessStatusCode)
throw new InvalidOperationException("Embedding request failed. Ensure Ollama is running and the model is pulled.");
var result = await response.Content.ReadFromJsonAsync<OllamaEmbeddingsResponse>(cancellationToken);
if (result?.Embedding is null || result.Embedding.Count == 0)
throw new InvalidOperationException("Ollama embedding response did not contain an embedding vector.");
return result.Embedding.ToArray();
}
private sealed record OllamaEmbeddingsRequest(string Model, string Prompt);
private sealed class OllamaEmbeddingsResponse { public List<float> Embedding { get; init; } = new(); }
}Embeddings are not answers. They are coordinates in semantic space.
Step 3: Retrieval is deterministic ranking over vector math
Given vectors, retrieval is deterministic: compute similarity, filter by threshold, sort, and take top K.
Two details make the results stable in practice:
- Deterministic ordering of the input set (time window plus stable sort)
- Deterministic tie-breaking after similarity sorting (for example, by Id)
public sealed class FeedbackVectorIndex
{
private readonly IReadOnlyList<EmbeddedFeedbackRow> _rows;
public FeedbackVectorIndex(IReadOnlyList<EmbeddedFeedbackRow> rows) => _rows = rows;
public IEnumerable<(FeedbackItem Item, float Score)> Search(float[] queryVector, int topK, float minScore = 0f)
{
return _rows
.Select(row => (row.Item, Score: VectorMath.CosineSimilarity(row.Vector, queryVector)))
.Where(hit => hit.Score >= minScore)
.OrderByDescending(hit => hit.Score)
.ThenBy(hit => hit.Item.Id, StringComparer.Ordinal)
.Take(topK);
}
}Probabilistic encoding, deterministic retrieval. That combination makes systems inspectable.
Step 4: Cluster for signal extraction, not model-generated summaries
Vector retrieval answers one query. Clustering reveals recurring structure in the dataset.
A practical incremental centroid strategy:
- Compare each vector against existing centroids
- Join the best cluster if score clears threshold
- Otherwise create a new cluster
- Keep representative examples nearest to centroid
This is transparent, cheap, and good enough for many production workflows before advanced clustering is needed.
Step 5: Calibrate confidence with thresholds and evaluation
A retrieval system without calibration drifts into plausible noise. Thresholding and measurement keep it trustworthy.
MinSearchScoreblocks weak neighborsTopKtrades precision vs recallWindowDayscontrols temporal relevance
Start simple: evaluate Recall@K on a small labeled set.
public static double RecallAtK(
IReadOnlyCollection<string> relevantIds,
IReadOnlyList<string> rankedIds,
int k)
{
var top = rankedIds.Take(k).ToHashSet(StringComparer.Ordinal);
var hits = relevantIds.Count(id => top.Contains(id));
return relevantIds.Count == 0 ? 0 : (double)hits / relevantIds.Count;
}If you cannot measure retrieval quality, you cannot improve it safely.
Optional: Scaling with Postgres pgvector
In-memory indexing is excellent for small sets and local development. Use Postgres pgvector when you need persistence, larger corpora, and indexed performance.
The core SQL shape remains deterministic:
SELECT
id,
occurred_at,
source,
user_segment,
text_content,
1 - (embedding <=> CAST(@queryEmbedding AS vector)) AS score
FROM feedback_vectors
WHERE occurred_at >= @fromUtc
ORDER BY embedding <=> CAST(@queryEmbedding AS vector)
LIMIT @topK;Keep deterministic filters in SQL, then apply the same minimum similarity gate in application code.
Production note: for real scale, store embeddings with a fixed vector dimension and add a pgvector ANN index. Otherwise results remain correct, but performance will not.
Example Domain: Customer Feedback Triage
Feedback is a good demonstration domain because language is noisy and inconsistent.
Query example: "users are being signed out all the time"
Expected behavior:
- Retrieve semantically similar tickets despite wording differences
- Rank by cosine similarity
- Attach metadata (source, segment, timestamp) for triage
- Group into themes for prioritization
- For a new incoming ticket, retrieve the closest historical items as a starting point for routing
The same pipeline works for incidents, docs, support responses, and operations knowledge.
Why This Architecture Works
This pattern survives model and domain changes because it separates concerns cleanly:
- Embeddings provide semantic representation
- Deterministic ranking provides stable retrieval behavior
- Thresholds provide explicit confidence control
- Metadata keeps output operational
- Optional
pgvectoradds scale without changing core logic
The model remains useful, but it is not the control plane.
Potential Enhancements
You can extend this foundation without changing the control model:
- Add hybrid retrieval: keyword pre-filter plus vector ranking
- Persist embeddings and only embed deltas
- Add metadata filters before ranking
- Track Recall@K and MRR for ongoing quality
- Introduce advanced clustering (HDBSCAN) when scale requires it
Final Notes
Embeddings and vector search are not a niche feature. They are a core AI systems primitive for semantic retrieval.
This architecture does not depend on generation quality. It depends on measurable retrieval behavior.
That is what makes it production-friendly.
Explore the source code at the GitHub repository.
See you in the next issue.
Stay curious.
Join the Newsletter
Subscribe for AI engineering insights, system design strategies, and workflow tips.