Issue #19: Deterministic Context Budgeting for Reliable LLM Prompting

7 min read | March 14, 2026

Most LLM reliability failures are context failures. The model is asked to decide with incomplete evidence, irrelevant evidence, or too much evidence for the available token window.

In this issue, we build a local-first deterministic context budgeting pipeline in C#. The running example is an e-commerce refund and resolution assistant for a late and damaged delivery case. The design is simple: reserve output tokens, reserve fixed prompt tokens, deterministically pack business context blocks, and fail fast if required evidence does not fit.

What You Are Building

A production-shaped prompt assembly and budgeting pipeline:

Load app config from appsettings.json plus DCB_ environment overrides
Validate context window, reserved output, and fixed prompt budgets before execution
Model business evidence as typed ContextBlock objects with priority and required flags
Compute available context tokens deterministically
Pack blocks in stable order and track exclusions with explicit reasons
Compose an inspectable prompt with system instructions, user task, and included blocks
Optionally invoke a local OpenAI-compatible model endpoint
Fail fast when required context overflows the budget

While the model proposes, the budgeter decides what context enters the prompt.

System Structure

This system uses a gate-based control flow: configuration and budget constraints are validated first, context blocks are packed deterministically, and execution stops immediately if required evidence does not fit. When the budget is valid, the app composes a bounded prompt preview and, if enabled, invokes the model and cleans response artifacts like <think> tags before printing the final output.

The diagram below shows this high-level control flow:

Context Budget Is a Runtime Contract

The available context budget is explicit and computed from three values:

public int AvailableContextTokens => ModelContextWindowTokens - ReservedOutputTokens - FixedPromptTokens;

The request object enforces this contract up front:

if (AvailableContextTokens <= 0)
{
  throw new InvalidOperationException(
      "No context budget available. ReservedOutputTokens + FixedPromptTokens must be less than ModelContextWindowTokens.");
}

No budget means no run. This prevents hidden truncation and silent prompt drift.

Runtime Config First

Configuration is loaded from file plus environment variables, then validated before any model call:

var configuration = new ConfigurationBuilder()
  .SetBasePath(AppContext.BaseDirectory)
  .AddJsonFile("appsettings.json", optional: true, reloadOnChange: false)
  .AddEnvironmentVariables(prefix: "DCB_")
  .Build();

public int ModelContextWindowTokens { get; init; } = 850;
public int ReservedOutputTokens { get; init; } = 300;
public int FixedPromptTokens { get; init; } = 220;
public string Provider { get; init; } = "lmstudio";
public string BaseUrl { get; init; } = "http://localhost:1234/v1";
public string ApiKey { get; init; } = "not-needed";
public string ModelId { get; init; } = "deepseek/deepseek-r1-0528-qwen3-8b";
public float Temperature { get; init; } = 0.0f;
public bool EnableModelCall { get; init; } = true;

This keeps runtime behavior explicit and reproducible across local runs, CI, and operations.

Model Context as Typed Blocks

Business evidence is represented as typed domain objects with validation at construction time:

public sealed class ContextBlock
{
  public ContextBlock(
      string blockId,
      string source,
      string content,
      int priority,
      DateTimeOffset observedAtUtc,
      bool isRequired = false)
  {
      if (priority is < 0 or > 100)
      {
          throw new ArgumentOutOfRangeException(nameof(priority), "Priority must be in range 0..100.");
      }
  }
}

The scenario data mixes customer message, order data, policy, history, and operations playbook as independent blocks with explicit priorities:

new(
  blockId: "CUSTOMER_MESSAGE",
  source: "support-ticket",
  content: """
Customer says the blender jar arrived cracked and delivery was 4 days late.
Customer requests full refund and asks if return shipping is required.
Order ID: ORD-884120.
""",
  priority: 100,
  observedAtUtc: new DateTimeOffset(2026, 3, 13, 13, 18, 0, TimeSpan.Zero),
  isRequired: true),

new(
  blockId: "REFUND_POLICY",
  source: "policy-service",
  content: """
Damaged-on-arrival items are eligible for full refund or replacement.
Photo evidence is recommended but can be waived for first-time damage claims under $200.
Late-delivery compensation is 10% store credit when delay exceeds 2 days.
""",
  priority: 95,
  observedAtUtc: new DateTimeOffset(2026, 2, 21, 9, 0, 0, TimeSpan.Zero),
  isRequired: true),

This shape makes context auditable before it reaches the model.

Deterministic Packing Algorithm

Packing order is stable and explicit:

var orderedCandidates = request.Candidates
  .OrderByDescending(static block => block.IsRequired)
  .ThenByDescending(static block => block.Priority)
  .ThenByDescending(static block => block.ObservedAtUtc)
  .ThenBy(static block => block.BlockId, StringComparer.Ordinal);

Each block is estimated, included if it fits, and excluded with a concrete reason if it does not:

if (tokenCount <= remainingContextTokens)
{
  included.Add(new PackedContextBlock(block, tokenCount));
  remainingContextTokens -= tokenCount;
  continue;
}

var reason = block.IsRequired
  ? ExclusionReason.RequiredBlockTooLargeForBudget
  : ExclusionReason.ExceedsRemainingBudget;

excluded.Add(new ExcludedContextBlock(block, tokenCount, reason));

public enum ExclusionReason
{
  ExceedsRemainingBudget = 1,
  RequiredBlockTooLargeForBudget = 2
}

Required evidence and optional evidence are treated differently, which is critical for safe behavior.

Token Estimation Strategy

The estimator is intentionally heuristic and provider-agnostic.

It estimates tokens from character length and word count, adds a small newline penalty, then uses the larger estimate.

This is simple but deterministic, which keeps budgeting behavior stable and testable.

public int EstimateTokens(string text)
{
  if (string.IsNullOrWhiteSpace(text))
  {
      return 0;
  }

  var normalized = text.Trim();
  var characterEstimate = (int)Math.Ceiling(normalized.Length / 4.0d);
  var wordEstimate = (int)Math.Ceiling(CountWords(normalized) * 1.2d);
  var newlinePenalty = CountNewlines(normalized);

  return Math.Max(1, Math.Max(characterEstimate, wordEstimate) + newlinePenalty);
}

The goal is consistency, not tokenizer-perfect precision.

Prompt Composition Is Inspectable

The composer builds one deterministic artifact with instructions, task, and each included block plus metadata:

builder.AppendLine("Context Blocks:");

foreach (var packed in includedBlocks)
{
  var block = packed.Block;
  builder.AppendLine($"### Block {block.BlockId}");
  builder.AppendLine($"Source: {block.Source}");
  builder.AppendLine($"Priority: {block.Priority}");
  builder.AppendLine($"ObservedAtUtc: {block.ObservedAtUtc:O}");
  builder.AppendLine($"Required: {block.IsRequired}");
  builder.AppendLine($"EstimatedTokens: {packed.TokenCount}");
  builder.AppendLine("Content:");
  builder.AppendLine(block.Content);
  builder.AppendLine();
}

This makes prompt assembly inspectable and debuggable without guessing what the model actually received.

Execution Path and Failure Modes

The runtime explicitly stops when required context does not fit:

if (!result.CanProceed)
{
  Console.Error.WriteLine();
  Console.Error.WriteLine("Budgeting failed: one or more required business context blocks do not fit.");
  Console.Error.WriteLine("Increase App:ModelContextWindowTokens, reduce App:ReservedOutputTokens, or shorten required blocks.");
  return;
}

Model invocation is also an explicit switch:

if (!config.EnableModelCall)
{
  Console.WriteLine();
  Console.WriteLine("Model invocation disabled (`App:EnableModelCall=false`).");
  Console.WriteLine("Set `App:EnableModelCall` in appsettings.json or `DCB_App__EnableModelCall=true`.");
  return;
}

If a model returns hidden reasoning tags, the response is cleaned before printing:

const string openTag = "<think>";
const string closeTag = "</think>";

The pipeline prefers explicit failure and explicit modes over silent behavior changes.

Test Coverage That Locks Behavior

The tests target deterministic guarantees, not only happy paths:

Stable selection when candidate input order changes
Required overflow detection with CanProceed and HasRequiredOverflow
Exact accounting for available, used, and remaining context tokens
Stable tie-breaking with BlockId when priority and timestamp are equal
Prompt composer preserving included block order

Assert.Equal(new[] { "A", "D", "C" }, result1.IncludedBlocks.Select(static b => b.Block.BlockId).ToArray());
Assert.Equal(result1.IncludedBlocks.Select(static b => b.Block.BlockId), result2.IncludedBlocks.Select(static b => b.Block.BlockId));

Final Notes

Deterministic context budgeting is a practical reliability layer for LLM systems. It ensures the model sees the right evidence in a controlled order within a fixed budget.

When required context is guaranteed, optional context is bounded, and prompt assembly is inspectable, behavior becomes easier to trust and debug.

Explore the source code at the GitHub repository.

See you in the next issue.

Stay curious.

Share this article with your network.

LinkedIn X Facebook

Join the Newsletter

Subscribe for AI engineering insights, system design strategies, and workflow tips.

Your information is safe. Unsubscribe anytime.