Issue #14: Intent Routing and Deterministic AI Control Flow

9 min read | February 7, 2026

Most AI prototypes fail in production for a boring reason. They treat every input as a single intent and let the model decide the path. Real users do not do that. They mix requests, omit details, and jump between topics mid sentence. The result is tool misuse, incorrect actions, and confusing answers.

This issue builds a local-first .NET intent router that enforces deterministic control flow around an LLM. The model remains useful, but it never owns routing, tool execution, or safety boundaries.

The design has one goal: predictable behavior under messy input.

What You Are Building

A single console app with a clear pipeline:

Validate input deterministically
Apply deterministic policies for security and trivial compute
Detect multi-intent and ask a clarifying question (no tools)
Classify intent using rules first, embeddings only for disambiguation
Execute allowlisted tools with argument validation and timeouts
Generate grounded answers from tool output only

This is not "agentic." It is controlled flow.

System Overview

The system is intentionally layered so probabilistic components never control execution directly. Think of it as a deterministic shell around an LLM.

1) Input layer (deterministic)

InputValidator blocks invalid or suspicious input early
Policy handles hard refusals and trivial compute without any model calls

2) Routing layer (orchestration + deterministic signals)

IntentClassifier runs rules first, then uses embeddings only to disambiguate
IntentRouter enforces one-intent-per-turn and decides the execution path

3) Execution layer (deterministic)

ToolRegistry is the only place tools execute
It enforces allowlists, argument validation, and strict timeouts

4) Answer layer (AI, but constrained)

GroundedAnswerComposer generates responses from tool output only for tool-backed intents
For non-tool intents, it produces a short response and asks one question if context is missing

The diagram below shows the layered architecture and deterministic control flow used in this system.

All of this is wired through standard .NET hosting and DI:

var host = Host.CreateDefaultBuilder(args)
.ConfigureServices((ctx, services) =>
{
    services.Configure<AppConfig>(ctx.Configuration.GetSection("App"));
    services.AddSingleton<OllamaChatClient>(...);
    services.AddSingleton<OllamaEmbeddingClient>(...);

    services.AddSingleton<IReadOnlyList<KnowledgeDocument>>(_ => SeedRunbooks.Create());
    services.AddSingleton<SemanticSearchEngine>(sp =>
        SemanticSearchEngine.Build(
            sp.GetRequiredService<IReadOnlyList<KnowledgeDocument>>(),
            sp.GetRequiredService<OllamaEmbeddingClient>(),
            sp.GetRequiredService<ILogger<SemanticSearchEngine>>()));

    services.AddSingleton<ToolRegistry>();
    services.AddSingleton<GroundedAnswerComposer>();
    services.AddSingleton<IntentClassifier>();
    services.AddSingleton<IntentRouter>();
    services.AddSingleton<InputValidator>();
})
.Build();

This is production-shaped. Configuration is explicit, components are testable, and side effects are centralized.

Configuration That Matters

The router is governed by a small configuration object. Two values control whether embeddings should ever influence routing.

public sealed class IntentRoutingConfig
{
  public bool UseEmbeddingDisambiguation { get; set; } = true;
  public double EmbeddingConfidenceThreshold { get; set; } = 0.45;
  public double AmbiguityMargin { get; set; } = 0.05;
}

This is important because embedding routing is probabilistic. If the top match is weak or the top two are too close, you treat the request as ambiguous and avoid tool execution.

Step 1: Input validation is a deterministic gate

The validator rejects empty inputs, overly long messages, and obvious prompt injection patterns. This prevents wasteful model calls and reduces attack surface.

if (lower.Contains("ignore previous instructions") ||
  lower.Contains("reveal system prompt") ||
  lower.Contains("developer message") ||
  lower.Contains("print the hidden"))
{
  return ValidationResult.Fail("I can't process that request.");
}

This is not "security solved." It is the first gate. The point is to fail early for obvious bad inputs.

Step 2: Deterministic policy before any model

Some requests should never reach the model. Security sensitive intent is handled as a hard refusal. Simple arithmetic is handled locally.

var deterministic = Policy.TryHandleDeterministically(input);
if (deterministic is not null)
{
  Console.WriteLine($"Assistant: {deterministic}");
  continue;
}

This separation is a production discipline. Probabilistic components must not be responsible for correctness or security boundaries.

Step 3: Multi-intent detection before final routing

Users often mix requests: "What time is it in London, also Redis is down." If you route that straight into one tool, you will be wrong for at least half the request. This is where multi-intent detection becomes a control flow feature.

Your router checks rule-level matches first and refuses to proceed until the user chooses one path.

var ruleMatches = _classifier.DetectRuleMatches(input);
if (ruleMatches.Count >= 2)
{
  var options = string.Join(", ", ruleMatches.OrderBy(x => x).Select(FormatIntentOption));
  return new IntentRoutingResult(
      Intent.Unknown,
      0.40,
      RoutingPath.RulesOnly,
      $"I can only handle one request at a time. Which do you want: {options}?");
}

Key point: no tools are executed during ambiguity. You ask one short clarifying question and wait.

Step 4: Rules-first intent classification

The classifier uses deterministic rules as the primary signal. If rules are confident, you do not consult embeddings.

var rules = ClassifyWithRules(input);
if (rules.Intent != Intent.Unknown && rules.Confidence >= 0.80)
{
  return (rules.Intent, rules.Confidence, RoutingPath.RulesOnly);
}

This keeps routing explainable and stable. Rules are also much cheaper and easier to debug than model-driven routing.

Step 5: Embeddings only for disambiguation

When rules are uncertain, you allow embeddings to break ties. The approach is deliberately simple: compare the query embedding to a small set of intent prototypes.

private static readonly (Intent Intent, string Prototype)[] Prototypes =
[
  (Intent.WorldTime, "user asks for current time in a specific city"),
  (Intent.RunbookSearch, "user describes an incident and wants a runbook or troubleshooting steps"),
  (Intent.GeneralOpsAdvice, "user asks general production operations advice for AI systems")
];

The classifier enforces two safety controls:

If top confidence is below the threshold, embeddings do not override the rule result.
If top two are too close, treat as ambiguous and route to a safe non-action path.

if (topConf < _cfg.IntentRouting.EmbeddingConfidenceThreshold)
  return (rules.Intent, rules.Confidence, RoutingPath.RulesPlusEmbeddings);

if (margin < _cfg.IntentRouting.AmbiguityMargin)
  return (Intent.GeneralOpsAdvice, 0.50, RoutingPath.RulesPlusEmbeddings);

This is the core production pattern. Embeddings can help, but they are never trusted blindly.

Step 6: Tool execution is allowlisted and validated

Tools are executed through ToolRegistry. This is where you enforce deterministic guardrails for tool usage.

Tool execution controls:

Tool allowlist
Argument validation per tool
Strict timeout
Safe error messages

private static readonly HashSet<string> AllowedTools = new(StringComparer.OrdinalIgnoreCase)
{
  "WorldTime.GetCityTime",
  "Runbooks.Search"
};

if (!AllowedTools.Contains(plan.ToolName))
  return ToolExecutionResult.Fail("Tool not allowed.");

Argument validation prevents tool misuse from malformed extraction:

if (args is null || !args.TryGetValue("city", out var cityObj))
  return ToolExecutionResult.Fail("Missing required argument: city");

Timeouts protect your system when models are cold or embeddings are slow:

using var timeout = new CancellationTokenSource(TimeSpan.FromMilliseconds(_cfg.ToolTimeoutMs));
using var linked = CancellationTokenSource.CreateLinkedTokenSource(ct, timeout.Token);

Step 7: Grounded answers prevent hallucination

For tool-backed intents like runbook search, the final response is composed using only tool output.

var system = """
You are a guardrailed assistant for engineering teams.

Rules:
- Answer using ONLY the TOOL_OUTPUT below.
- If TOOL_OUTPUT does not contain enough info, say: "I don't know."
- Keep it concise (max 6 sentences).
""";

This is one of the highest-leverage reliability techniques. If retrieval fails, you get a controlled failure mode instead of a confident fiction.

World time is even simpler. The tool returns the final string and the router prints it. No LLM is needed for the final answer.

Runbook Search Stays Deterministic

Semantic search is built once at startup and stays deterministic at query time. Embeddings are used only to represent text. Ranking is pure cosine similarity.

var results = _index
  .Select(e => (e.Doc, Score: CosineSimilarity(e.Vec, q)))
  .OrderByDescending(x => x.Score)
  .Take(topK)
  .ToList();

This keeps retrieval predictable and inspectable. The model does not "decide" which doc is correct. The math does.

Example Interaction Behavior

Input: "What time is it in London"

Rule matches WorldTime
Extract city
Call WorldTime.GetCityTime
Return the deterministic tool result

Input: "Redis is down, error rate is spiking"

Rule matches RunbookSearch
Call Runbooks.Search
Compose a grounded response from runbook excerpts

Input: "What time is it in London, also Redis is down"

Multi-intent detected
No tools
Clarifying question: choose time or incident search

That last case is the difference between a demo and a system.

Why This Architecture Works

This project is small, but the structure maps directly to production constraints:

Guardrails before models
Rules-first routing
Embeddings only when needed
One intent per turn
Tool allowlists, arg validation, and timeouts
Grounded answers for tool-backed paths

The LLM remains valuable, but it never becomes the control plane.

Potential Enhancements

This system can be extended incrementally:

Replace the rule keyword matching with a small curated keyword dictionary per intent
Add structured logging for intent, confidence, and tool latency
Persist the runbook index and document vectors to avoid startup embedding cost
Add an explicit "clarify" intent instead of overloading GeneralOpsAdvice as a fallback
Add an evaluation harness like Issue #13 to gate routing regressions and tool misuse

Final Notes

Intent routing is not a UX feature. It is a reliability feature. The most production-aligned AI systems treat the model as a component, not as the orchestrator. If you can make routing deterministic, tool execution constrained, and answers grounded, you can ship AI features without gambling your system behavior.

Explore the source code at the GitHub repository.

See you in the next issue.

Stay curious.

Share this article with your network.

LinkedIn X Facebook

Join the Newsletter

Subscribe for AI engineering insights, system design strategies, and workflow tips.

Your information is safe. Unsubscribe anytime.